Federated pull requests #7

Open
opened 2022-07-12 17:59:13 +00:00 by xy · 9 comments
Owner

For federated pull requests, this requires federated forking (use a Create activity and ForgeFed's Create flow) and patches over ActivityPub (https://codeberg.org/ForgeFed/ForgeFed/issues/88).

To make the implementation easier, I think we should try to reuse most of the existing code in Gitea for handling pull requests. (This is because we are retrofitting federation onto an existing forge, so we have to reuse the already present database representations and code for processing various actions.) To do that, we can mirror the remote fork to the original instance and when the original instance receives a Patch activity, it will create the pull request from the mirror of the remote fork to the original repo. The mirror doesn't necessarily have to be using Gitea's preexisting mirror functionality. We can also use ForgeFed to do mirroring.

I propose that the flow for federated PRs will look like this:

  1. Alice has a repository hello-world on alice.com.

  2. Bobert forks the repository to his instance at bobert.com. His instance creates the forked repository (maybe using the Gitea migrations feature) and sends a Create activity back to alice.com informing it that Alice's repo has been forked. alice.com creates a mirror of Bobert's forked repo. (Alternatively, Bobert's repo can be a push-mirror to the copy on alice.com)

  3. Bobert pushes some commits to his fork. The commits are mirrored to alice.com.

  4. Bobert creates a pull request. His instance sends a Patch activity instructing Alice's instance on how to construct the pull request using the mirror of Bobert's fork on alice.com.

The one thing that's currently missing is that Alice should also be able to push commits to Bobert's fork, but this requires federated collaborators which I haven't figure out how to do yet.

For federated pull requests, this requires federated forking (use a Create activity and ForgeFed's Create flow) and patches over ActivityPub (https://codeberg.org/ForgeFed/ForgeFed/issues/88). To make the implementation easier, I think we should try to reuse most of the existing code in Gitea for handling pull requests. (This is because we are retrofitting federation onto an existing forge, so we have to reuse the already present database representations and code for processing various actions.) To do that, we can mirror the remote fork to the original instance and when the original instance receives a Patch activity, it will create the pull request from the mirror of the remote fork to the original repo. The mirror doesn't necessarily have to be using Gitea's preexisting mirror functionality. We can also use ForgeFed to do mirroring. I propose that the flow for federated PRs will look like this: 1. Alice has a repository hello-world on alice.com. 2. Bobert forks the repository to his instance at bobert.com. His instance creates the forked repository (maybe using the Gitea migrations feature) and sends a Create activity back to alice.com informing it that Alice's repo has been forked. alice.com creates a mirror of Bobert's forked repo. (Alternatively, Bobert's repo can be a push-mirror to the copy on alice.com) 3. Bobert pushes some commits to his fork. The commits are mirrored to alice.com. 4. Bobert creates a pull request. His instance sends a Patch activity instructing Alice's instance on how to construct the pull request using the mirror of Bobert's fork on alice.com. The one thing that's currently missing is that Alice should also be able to push commits to Bobert's fork, but this requires federated collaborators which I haven't figure out how to do yet.

I guess I'm missing something, but I don't quite see why it has to be so complicated:

this requires federated forking (use a Create activity and ForgeFed's Create flow

Why is federated forking (via ActivityPub) needed? Actually why is it needed at all, not just as a prerequisite for federated PRs?
I would have thought a fork simply consists of the other instance doing git clone https://gitea.com/Ta180m/gitea - what else needs to happen?
I guess maybe it's to tell the origin repo that the clone exists?

I would have thought that any clone of the git repo could potentailly be able to send a PR - is there a reason that would be impossible or undesirable?

and patches over ActivityPub (https://codeberg.org/ForgeFed/ForgeFed/issues/88)

I've read the linked issue too, but I'm still puzzled why patches would be necessary for a PR between two Git repos. Surely the PR could just reference the target and origin URIs and the instance receiving the PR could do a git pull from the origin?

What am I missing? (Sorry to be the one asking all the dumb questions…)

I guess I'm missing something, but I don't quite see why it has to be so complicated: > this requires federated forking (use a Create activity and ForgeFed's Create flow Why is federated forking (via ActivityPub) needed? Actually why is it needed *at all*, not just as a prerequisite for federated PRs? I would have thought a fork simply consists of the other instance doing `git clone https://gitea.com/Ta180m/gitea` - what else needs to happen? I guess maybe it's to tell the origin repo that the clone exists? I would have thought that *any* clone of the git repo could potentailly be able to send a PR - is there a reason that would be impossible or undesirable? > and patches over ActivityPub (https://codeberg.org/ForgeFed/ForgeFed/issues/88) I've read the linked issue too, but I'm still puzzled why patches would be necessary for a PR between two Git repos. Surely the PR could just reference the target and origin URIs and the instance receiving the PR could do a `git pull` from the origin? What am I missing? (Sorry to be the one asking all the dumb questions…)

Hmm, thinking more, maybe the answer to both questions is that it's to support inter-VCS clones and PRs – so the origin and target needn't necessarily both be Git for example? ?
I guess that would be kind of cool if it's possible.
Or maybe I'm on the wrong track... ?

Hmm, thinking more, maybe the answer to both questions is that it's to support inter-VCS clones and PRs – so the origin and target needn't necessarily both be Git for example? ? I guess that would be kind of cool if it's possible. Or maybe I'm on the wrong track... ?
Collaborator

and sends a Create activity back to alice.com informing it that Alice's repo has been forked. alice.com creates a mirror of Bobert's forked repo. (Alternatively, Bobert's repo can be a push-mirror to the copy on alice.com)

This should be avoided as much as possible. If I just click on fork and fork it to my Bobert's instance and never touch it again. Alice's instance shouldn't store that fork. You likely want to do this once the Bobert's instance wants to interact(so creating pull-request), but even then we might utilize AGit in order to avoid creating new repositories for each federated interacting with Alice's repository.

> and sends a Create activity back to alice.com informing it that Alice's repo has been forked. alice.com creates a mirror of Bobert's forked repo. (Alternatively, Bobert's repo can be a push-mirror to the copy on alice.com) This should be avoided as much as possible. If I just click on fork and fork it to my Bobert's instance and never touch it again. Alice's instance shouldn't store that fork. You likely want to do this once the Bobert's instance wants to interact(so creating pull-request), but even then we might utilize [AGit](https://git-repo.info/en/2020/03/agit-flow-and-git-repo/) in order to avoid creating new repositories for each federated interacting with Alice's repository.
Author
Owner

Why is federated forking (via ActivityPub) needed? Actually why is it needed at all, not just as a prerequisite for federated PRs?
I would have thought a fork simply consists of the other instance doing git clone https://gitea.com/Ta180m/gitea - what else needs to happen?
I guess maybe it's to tell the origin repo that the clone exists?

The purpose of the Create activity is to inform the original repo that it's been forked so the new fork can be listed as one of the forks of that repo.

I would have thought that any clone of the git repo could potentailly be able to send a PR - is there a reason that would be impossible or undesirable?

Well yes, any Git forge supporting ForgeFed will be able to send a PR over ForgeFed, but not any clone in general, because to send a PR over ForgeFed, your forge has to actually support ForgeFed.

I've read the linked issue too, but I'm still puzzled why patches would be necessary for a PR between two Git repos. Surely the PR could just reference the target and origin URIs and the instance receiving the PR could do a git pull from the origin?

We need some sort of ActivityStreams object to notify the original instance that there's a PR incoming, so it seemed natural to use a ForgeFed patch. For Gitea-to-Gitea federation, it doesn't make much sense to store the actual changes in this object, so it'll probably only contain the target and origin URLs as you said.

What am I missing? (Sorry to be the one asking all the dumb questions…)

No worries, these are great questions!

Hmm, thinking more, maybe the answer to both questions is that it's to support inter-VCS clones and PRs – so the origin and target needn't necessarily both be Git for example? ?
I guess that would be kind of cool if it's possible.
Or maybe I'm on the wrong track... ?

Inter-VCS clones and PRs sound really cool, but I don't think they'd work well in practice since different VCSes have significant differences. It would also be quite the nightmare to implement!

This should be avoided as much as possible. If I just click on fork and fork it to my Bobert's instance and never touch it again. Alice's instance shouldn't store that fork. You likely want to do this once the Bobert's instance wants to interact(so creating pull-request), but even then we might utilize AGit in order to avoid creating new repositories for each federated interacting with Alice's repository.

Good thinking. We can make it so that instead, when the original instance receives the Create activity, it's only used to list the fork as one of the forks of the repo (and maybe a special repository that contains no actual content is created). I haven't looked into the Agit flow yet, but that could be a good way to reduce disk space for storing remote forks when they submit PRs.

> Why is federated forking (via ActivityPub) needed? Actually why is it needed *at all*, not just as a prerequisite for federated PRs? > I would have thought a fork simply consists of the other instance doing `git clone https://gitea.com/Ta180m/gitea` - what else needs to happen? > I guess maybe it's to tell the origin repo that the clone exists? The purpose of the Create activity is to inform the original repo that it's been forked so the new fork can be listed as one of the forks of that repo. > I would have thought that *any* clone of the git repo could potentailly be able to send a PR - is there a reason that would be impossible or undesirable? Well yes, any Git forge supporting ForgeFed will be able to send a PR over ForgeFed, but not any clone in general, because to send a PR over ForgeFed, your forge has to actually support ForgeFed. > I've read the linked issue too, but I'm still puzzled why patches would be necessary for a PR between two Git repos. Surely the PR could just reference the target and origin URIs and the instance receiving the PR could do a `git pull` from the origin? We need some sort of ActivityStreams object to notify the original instance that there's a PR incoming, so it seemed natural to use a ForgeFed patch. For Gitea-to-Gitea federation, it doesn't make much sense to store the actual changes in this object, so it'll probably only contain the target and origin URLs as you said. > What am I missing? (Sorry to be the one asking all the dumb questions…) No worries, these are great questions! > Hmm, thinking more, maybe the answer to both questions is that it's to support inter-VCS clones and PRs – so the origin and target needn't necessarily both be Git for example? ? > I guess that would be kind of cool if it's possible. > Or maybe I'm on the wrong track... ? Inter-VCS clones and PRs sound really cool, but I don't think they'd work well in practice since different VCSes have significant differences. It would also be quite the nightmare to implement! > This should be avoided as much as possible. If I just click on fork and fork it to my Bobert's instance and never touch it again. Alice's instance shouldn't store that fork. You likely want to do this once the Bobert's instance wants to interact(so creating pull-request), but even then we might utilize [AGit](https://git-repo.info/en/2020/03/agit-flow-and-git-repo/) in order to avoid creating new repositories for each federated interacting with Alice's repository. Good thinking. We can make it so that instead, when the original instance receives the Create activity, it's only used to list the fork as one of the forks of the repo (and maybe a special repository that contains no actual content is created). I haven't looked into the Agit flow yet, but that could be a good way to reduce disk space for storing remote forks when they submit PRs.
Collaborator

Good thinking. We can make it so that instead, when the original instance receives the Create activity, it's only used to list the fork as one of the forks of the repo (and maybe a special repository that contains no actual content is created). I haven't looked into the Agit flow yet, but that could be a good way to reduce disk space for storing remote forks when they submit PRs.

Ah yeah maybe still smart idea to send the Create activity so the repository can list it as a fork. FYI AGit isn't really well-known and only Gitea has added support for it since Gitea 1.16, so it might be a bit rough and need some polishing to fit our use-case, but it's really a nice solution in order to avoid disk space problems.

> Good thinking. We can make it so that instead, when the original instance receives the Create activity, it's only used to list the fork as one of the forks of the repo (and maybe a special repository that contains no actual content is created). I haven't looked into the Agit flow yet, but that could be a good way to reduce disk space for storing remote forks when they submit PRs. Ah yeah maybe still smart idea to send the Create activity so the repository can list it as a fork. FYI AGit isn't really well-known and only Gitea has added support for it since [Gitea 1.16](https://github.com/go-gitea/gitea/pull/14295), so it might be a bit rough and need some polishing to fit our use-case, but it's really a nice solution in order to avoid disk space problems.

The purpose of the Create activity is to inform the original repo that it's been forked so the new fork can be listed as one of the forks of that repo.

That makes sense. I don't know if it needs to be a strict requirement in order to be able to send a pull request, but maybe.

Well yes, any Git forge supporting ForgeFed will be able to send a PR over ForgeFed, but not any clone in general, because to send a PR over ForgeFed, your forge has to actually support ForgeFed.

I was inexact with my wording, but that's exactly what I meant. Good to know.

We need some sort of ActivityStreams object to notify the original instance that there's a PR incoming, so it seemed natural to use a ForgeFed patch.

I'm still not familiar enough with the specs to know if it's "natural", so I'll take you at your word – though on the surface it sounds odd to send a patch activity without a patch attached. ?
What other use-cases does the patch activity have? Are there instances when a standalone patch would be sent via ForgeFed/ActivityPub, outside a pull request?

For Gitea-to-Gitea federation, it doesn't make much sense to store the actual changes in this object, so it'll probably only contain the target and origin URLs as you said.

I assume that would be the case with Git-to-Git federation in general – and indeed, more generally, whenever the same DVCS is used on both sides? Or is that not so?

Inter-VCS clones and PRs sound really cool, but I don't think they'd work well in practice since different VCSes have significant differences. It would also be quite the nightmare to implement!

Well, that's why I was assuming that in that instance a simple patch/diff would be sent instead of a literal pull request. But maybe it's not that simple even. I haven't thought about it at all. ?

If I just click on fork and fork it to my Bobert's instance and never touch it again. Alice's instance shouldn't store that fork. You likely want to do this once the Bobert's instance wants to interact(so creating pull-request), but even then we might utilize AGit in order to avoid creating new repositories for each federated interacting with Alice's repository.

Even at the point of a PR being sent, is it actually necessary to duplicate the remote repo locally? Since it's a clone of the local repo anyway, can't we just add it as a remote?

> The purpose of the Create activity is to inform the original repo that it's been forked so the new fork can be listed as one of the forks of that repo. That makes sense. I don't know if it needs to be a strict requirement in order to be able to send a pull request, but maybe. > Well yes, any Git forge supporting ForgeFed will be able to send a PR over ForgeFed, but not any clone in general, because to send a PR over ForgeFed, your forge has to actually support ForgeFed. I was inexact with my wording, but that's exactly what I meant. Good to know. > We need some sort of ActivityStreams object to notify the original instance that there's a PR incoming, so it seemed natural to use a ForgeFed patch. I'm still not familiar enough with the specs to know if it's "natural", so I'll take you at your word – though on the surface it sounds odd to send a `patch` activity without a patch attached. ? What other use-cases does the `patch` activity have? Are there instances when a standalone patch would be sent via ForgeFed/ActivityPub, outside a pull request? > For Gitea-to-Gitea federation, it doesn't make much sense to store the actual changes in this object, so it'll probably only contain the target and origin URLs as you said. I assume that would be the case with Git-to-Git federation in general – and indeed, more generally, whenever the same DVCS is used on both sides? Or is that not so? > Inter-VCS clones and PRs sound really cool, but I don't think they'd work well in practice since different VCSes have significant differences. It would also be quite the nightmare to implement! Well, that's why I was assuming that in that instance a simple patch/diff would be sent instead of a literal *pull* request. But maybe it's not that simple even. I haven't thought about it at all. ? > If I just click on fork and fork it to my Bobert's instance and never touch it again. Alice's instance shouldn't store that fork. You likely want to do this once the Bobert's instance wants to interact(so creating pull-request), but even then we might utilize AGit in order to avoid creating new repositories for each federated interacting with Alice's repository. Even at the point of a PR being sent, is it actually necessary to duplicate the remote repo locally? Since it's a clone of the local repo anyway, can't we just add it as a remote?
Author
Owner

I'm still not familiar enough with the specs to know if it's "natural", so I'll take you at your word – though on the surface it sounds odd to send a patch activity without a patch attached. ?
What other use-cases does the patch activity have? Are there instances when a standalone patch would be sent via ForgeFed/ActivityPub, outside a pull request?

I think my wording there is a bit misleading. By patch activity, I meant the proposal in ForgeFed #88, which is a actually a Ticket object with an extra attachment. Since pull requests in Gitea are just issues with associated code, it seemed natural in that sense to use the Ticket object with extra information about the origin and target branches.

I assume that would be the case with Git-to-Git federation in general – and indeed, more generally, whenever the same DVCS is used on both sides? Or is that not so?

That's right. The only case where you would need to send in the actual changes is if you're using an generic AP server (for instance, pebbles in https://codeberg.org/ForgeFed/ForgeFed/src/branch/main/doc/EXAMPLE_WORKFLOWS.md), but I think this will be a very rare use case and we probably won't support it initially in Gitea.

Even at the point of a PR being sent, is it actually necessary to duplicate the remote repo locally? Since it's a clone of the local repo anyway, can't we just add it as a remote?

One way to deal with this is to use AGit as @gusted suggested.

> I'm still not familiar enough with the specs to know if it's "natural", so I'll take you at your word – though on the surface it sounds odd to send a `patch` activity without a patch attached. ? > What other use-cases does the `patch` activity have? Are there instances when a standalone patch would be sent via ForgeFed/ActivityPub, outside a pull request? I think my wording there is a bit misleading. By patch activity, I meant the proposal in [ForgeFed #88](https://codeberg.org/ForgeFed/ForgeFed/issues/88), which is a actually a Ticket object with an extra attachment. Since pull requests in Gitea are just issues with associated code, it seemed natural in that sense to use the Ticket object with extra information about the origin and target branches. > I assume that would be the case with Git-to-Git federation in general – and indeed, more generally, whenever the same DVCS is used on both sides? Or is that not so? That's right. The only case where you would need to send in the actual changes is if you're using an generic AP server (for instance, pebbles in https://codeberg.org/ForgeFed/ForgeFed/src/branch/main/doc/EXAMPLE_WORKFLOWS.md), but I think this will be a very rare use case and we probably won't support it initially in Gitea. > Even at the point of a PR being sent, is it actually necessary to duplicate the remote repo locally? Since it's a clone of the local repo anyway, can't we just add it as a remote? One way to deal with this is to use AGit as @gusted suggested.

What happens if a server's offline?

What happens if a server's offline?
Author
Owner

What happens if a server's offline?

With the current code, if one server is offline, the two servers will get out of sync.

There are a few ways to fix this. The easiest solution is this: let's say you want to open a PR on a remote instance, but that remote instance is down. Then your instance tries to make the PR and fails, so you see that no PR is created on your side either. Then both servers can be consistent and in sync.

The other solution is to keep an outgoing queue of activities. If you make the PR and it fails, you instance keeps it in a queue and tries again after a certain amount of time.

For all of these solutions, to ensure consistency, creating a PR to a offline server should visibly fail. The question now is whether we want to keep retrying a set number of times or just give up immediately.

> What happens if a server's offline? With the current code, if one server is offline, the two servers will get out of sync. There are a few ways to fix this. The easiest solution is this: let's say you want to open a PR on a remote instance, but that remote instance is down. Then your instance tries to make the PR and fails, so you see that no PR is created on your side either. Then both servers can be consistent and in sync. The other solution is to keep an outgoing queue of activities. If you make the PR and it fails, you instance keeps it in a queue and tries again after a certain amount of time. For all of these solutions, to ensure consistency, creating a PR to a offline server should visibly fail. The question now is whether we want to keep retrying a set number of times or just give up immediately.
This repo is archived. You cannot comment on issues.
No Label
No Milestone
No project
No Assignees
4 Participants
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: xy/gitea#7
No description provided.