The way to go (Overview) #1
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Why this issue?
I'm opening this issue to allow a discussion of how to go forward with the go-fed library. This is a beginner-friendly issue and explains Gitea's possible usage of the Go-Fed library.
Pre-information
Gitea users want and developers will be implementing federation into Gitea.
Federation for the end-user is that you can do collaborative tasks (creating pull requests, opening issues, etc.) independently of which Gitea instance you use. As in, the user is able to create an issue on Gitea instance B while the user's account is managed by Gitea instance A. For the end-user, doing this task is natural and isn't any different than opening an issue on the same Gitea instance.
While at first glance, to the end-user, it seems natural to do these tasks, behind the scenes, the two Gitea instances have to communicate with each other in order to pull this trick off. Gitea will be using an existing protocol in order to talk between Gitea instances. ActivityPub is an existing protocol that is used by many other known software that has federation built-in. Mastadon is one of the most well-known software that implements this.
Go-fed
The ActivityPub protocol has many capabilities and offers a wide range of communication "types" that can be utilized in order to receive specific information. Go-Fed is a go-based ActivityPub library that implements the ActivityPub protocol. It's a nice and neat library that helps you with writing ActivityPub-based requests without requiring you to know the monster details of the protocol itself, existing PR's are already built on this library.
There's a strong consensus to continue using this library as part of the federation. Building a library that does the handling of the ActivityPub protocol from scratch would be error-prone to minor details (which go-fed avoids due to automated code generation) and would also be time-consuming to write all the boilerplate code for it. Go-fed is the perfect fit for this situation.
The problem
Unfortunately, while Go-fed seems to be a charming prince on a white horse, it has a major downside that comes in terms of binary size. Go-fed adds a measurable amount of size to the current Gitea binary. The Gitea developers aren't considering allowing this binary cost to Gitea without exploring possible solutions first. A possible solution was proposed to mitigate this problem, which is to only include federation features under a build tag. Which would mean that those who don't wish to use federation on their Gitea instance wouldn't be impacted by this bigger binary size. However, this solution is a workaround, rather than a solution to the fundamental problem that go-fed introduces.
Technical problem
Congrats! You've come so far; now comes the boring part!
How are we going to tackle this problem? I personally see three possible outcomes:
Before we continue, I must reveal a secret! Which for the careful reader might already been bothering.
The facts
The current main commit without bindata+SQLite+debug symbols is 68Mb(70545408 bytes) with debug symbols 90Mb(93890774)
The latest commit of the initial go-fed PR without bindata+SQLite+debug symbols is 91Mb(95174656 bytes) with debug symbols 123Mb(128240771 bytes)
Without bindata+SQLite+debug symbols
The absolute difference is ~24.6Mb(24629248 bytes)
The relative difference is ~34.9%
Without bindata+SQLite:
The absolute difference is ~34.3Mb(34349997 bytes)
The relative difference is ~36.6%
Why does go-fed add so many bytes?!
The go-fed/activity module implements all the structs that you can send and receive over the ActivityPub protocol. It also implements helper functions for these structs. For each of these structs, machine code has to be generated by Go. It's around 70-250Kb for each of these structs. However, due to the wide range of types that the protocol offers, go-fed implements all of them, which can quickly lead to a measurable large binary size.
To see the big cost eaters:
go build
(builds without SQLite+bindata but includes debug symbols)go tool nm -size -sort size gitea | grep activity | tac
(Linux-Only). What this does is, using the nm tool of Go, which lists the symbols in the Gitea binary, it prints the size and sorts them. Use grep to only show the ones that go-fed/activity adds, and sort them into ascend viatac
asgo tool nm
by default sorts on descend.But hey! Go is an awesome language and will surely dead-code this if you don't use the structs, right? sweat chuckle
Smart observation! But unfortunately, that isn't the case with the current implementation of
go-fed/activity
, it currently unconditionally initializes all structs with a shared value. Which results in Go being keen on the dead-code and decides to include it in the results binary, even though Gitea isn't using it. So we cannot rely on Go being smart here(this requires advanced control analysis to detect this) and let the dead-code do its work here.Why
go-fed/activity
is initializing this shared value for all structs is rather a mystery to me, but it seems like a required side-effect of the code-generation as well as of how the current codebase is being generated to implement all the helper functions for the types.The other two outcomes!
Now that you know this very secretive secret about why go-fed is actually adding this binary cost, we can safely discuss the other two outcomes.
Remove all types that we don't use and only include the necessary ones that we will be using in Gitea. This will partially solve the fundamental problem of go-fed's binary size; by only including the ones necessary, we're effectively doing the dead-code ourselves. This removes a good amount of code and results in a smaller binary size than we currently have. This outcome has a decent, but not overwhelming, agreement to continue along this road. This is a good solution, that is available on the short-term, because we want to merge the federation features into Gitea and really start the work for federation. With this outcome, we will set a trap for our future selves. As we will be bit-by-bit adding these types back into the module, the binary will also grow in measurable size. So in the long term, we will need to revisit this problem and come up with a solution that is compatible with all the current code that we have in the Gitea codebase. So this outcome is good on the short-term and allows to continue the development, but on the long-term we will trap ourselves.
Patch the code generation of
go-fed/activity
to use a codebase that doesn't create a large binary size and allows loading types conditionally. This is the perfect solution! But we can only realistically consider this in a perfect world where we can postpone the development of the federation. This will require time to explore the code generation ofgo-fed/activity
and come up with an architecture that allows us to load the structs conditionally(A.K.A. make it so Go can dead-code it). This is a long-term investment into federation, but it does solve the fundamental problem of the huge binary size. This is a risky outcome, as it's possible that the current architecture is the only and best one to be generated by code.Conclusion
Now that we know the three possible outcomes that we can choose from, we need to give this some careful thought and hopefully come up with the best solution for this situation. Personally, I cannot even decide what would be the best realistic approach to this problem with these possible outcomes. However, we should make some educated guesses and assumptions here and there in order to make a collaborative effort to choose a solution.
I hope we can conclude that this is not a logical or an ideological problem, but a very technical one that we can only solve by using each other's strengths and knowledge.
If you read through all of this, you deserve a cookie! 🍪
I spent some quality time with go-fed/activity and it's
asttool
I've tried to do some magic(outcome 2/3) and concluded that it's near impossible to fix this.I've also realized why dead-code won't work, even if I removed
func init()...
with all packages, every struct is in some way connected with each other so the dead-code is very minimal, if anything. This is due to the "flexibility" and the range of input that each type can receive, so a lot of structs/interfaces are included because they're used in some way or another.While trying to at least optimize some of these interfaces, the shared manager requires defining for each type the interfaces that it needs to handle, such that someone can "dependency inject" into this and define their own type. Removing this shared manager and making a package out of it leads to import cycles, so that's also out of the way.
I was pretty optimistic about fixing this, but it seems the latter two outcomes won't be possible in this codebase. Doing the second outcome manually isn't really recommended, because of the mentioned "every type is linked to each other one way or another". It's possible, but my educated guess is that you will still end up with a lot of binary size.
If we want to still reduce binary-size we would need to implement ActivityPub hand-crafted. This way we can ensure we don't create a web of types linked to each other and optimize for Gitea's use-case. Go-fed's current codebase with the awesome
asttool
isn't the right fit for this. No machine would be able to generate our fit, only humans can.Feel free to try it yourself, but while the current codebase architecture is weird and doesn't make any sense logically, it works and is factually-correct to the specification. It's generated by machine-code, so don't expect any logical optimization going on.
I will currently advise using build tags for the federation feature and recommend never checking the binary size.
This is a impressive and thorough analysis. It is not good news but it is very solid and it looks like there is no way to use go-fed while satisfying the size constraint of Gitea. Using a build tag would be a good strategy to allow federation development to move on and mature. However, it also means that more and more code dependent on go-fed will be created with no hope of being distributed as part of the Gitea releases in the forseable future.
There is a third option: not using go-fed and instead create / parse the ActivityPub JSON in a way that is conformant to the specifications. It may be easier thanks to the introduction of generics in 1.18: a flexibility that the author of go-fed did not benefit from at the time.
@Ta180m experimental implementation for inbox/outbox has a few examples of the level of understanding required to use go-fed and the verbosity of the code involved. For instance to create a "Person" response one needs to know:
The focus on go-fed is to avoid mistakes that can be caught at compile time. But there are many ways to craft an incorrect response that won't be, this is the main reason why a good understanding of the ActivityPub protocol is required.
If the size of the binary was not a blocker I would be inclined to use go-fed even if just of that benefit. More safeguards are better. But given that the size is a blocker, not using go-fed seems like the better option.
Just to note here: it's the architecture that go-fed is using that cannot satisfy the size constraint. It doesn't allow any dead-code removal, because every type will always be used and is currently being generated with a lot of duplicated code due to machine code generation. But yeah, the current state of go-fed won't fit for Gitea.
I do like this idea, as it would allow to create a library that's made with Gitea in mind. Gitea doesn't utilize the full range of ActivityPub and thus can only consider certain types. As well, human-written code will always be better at optimizing(size/performance wise).
ActivityStreams(also referred as 'types' in my previous messages) is flexible in it's types, which go-fed does take into account, because it's built as a library that can be used by anyone. But Gitea obviously don't need to implement a type for GEO events so Gitea can just implement those types that we need.
So far, our usage in the two Gitea PR's seems to be small enough to still create our own library and be able to switch to a new API.
We want to decide this before we merge the first PR.
There are two manually implemented ActivityPub codebases that can be observed in Python to verify it is not overly complex to start from scratch:
The bookwyrm example is particularly interesting because ActivityPub is used for something different from microblogging.
Since those libraries are BSD-3 license, we can just copy some necessary packages from them?
I believe that's what @Gusted had in mind when they wrote:
as well as the followup comment.
This is what CJ (the go-fed maintainer) had to say about the binary size: https://discourse.gitea.io/t/notes-from-cory-regarding-the-go-fed-binary-footprint/4936
I'm open to the possibility of not using go-fed but I still think it is a very useful library. My personal preference for moving forward would be to use a build tag, but this results in a whole bunch of problems. However, there are two details that make this option slightly less painful:
Most people host Gitea using Docker. This means that the Docker image maintainers can easily use Docker tags to create one image with federation and one without.
The
gitea
binary with go-fed compresses down to almost the same size asgitea
without go-fed, so it would not use much more bandwidth to distribute it.This idea is still far from ideal and might be hurting us long-term, but it's definitely the easiest way to move forward.
Also, what are the consequences of going with a build tag and "ignoring" the increased binary size? As I mentioned above, the compressed binary is almost the same size (Another example of this is how Owncast's compressed releases didn't get much bigger after adding go-fed), so bandwidth requirements should only be a little bit higher. I also analyzed Gitea with go-fed's memory usage versus main branch Gitea and it seems to only be about 9% higher (245MB vs 225MB). For a small device like a Raspberry Pi, 20MB more RAM usage is moderately significant but perhaps still tolerable?
On the other hand, I'm also open to not using go-fed at all and implementing ActivityPub from scratch with generics, but that would also require somewhat more work.
IMHO:
The learning curve for generics is not negligible, indeed.
https://github.com/go-ap is another ActivityPub implementation in Go that is actively maintained and has a small binary size. See the discussion on the mailing list for more information.