Add Gitea Actions act runner #596
No reviewers
Labels
No Label
has
backport
in progress
invalid
kind
breaking
kind
bug
kind
build
kind
dependency
kind
deployment
kind
docs
kind
enhancement
kind
feature
kind
lint
kind
proposal
kind
question
kind
refactor
kind
security
kind
testing
kind
translation
kind
ui
need
backport
priority
critical
priority
low
priority
maybe
priority
medium
reviewed
duplicate
reviewed
invalid
reviewed
wontfix
skip-changelog
status
blocked
status
needs-feedback
status
needs-reviews
status
wip
upstream
gitea
upstream
other
No Milestone
No Assignees
6 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: gitea/helm-chart#596
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "dementhorr/helm-chart:gitea-actions"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Description of the change
This adds support for Gitea actions that uses act runner. It allows for automatic act runner deployment.
Benefits
Users no longer need to deploy Gitea act runners manually.
Applicable issues
Closes #459.
Additional information
Following the discussion on #459, I agree with monester comments.
As specified in this comment I have saved the runner token in a k8s secret.
For this, it was required to create the secret using the Gitea CLI command
gitea actions generate-runner-token
and then to use the kubectl command with a service account to create the secret. This has been implement as a Job. The other way of generating the token can be seen in the discussion (using initContainers and changing the env variable to point to the file where the token is), but in order to create the k8s secret one needs to use a job with another service account. This chart uses the same way of storing a k8s secret that was generated with a CLI command.As specified by monester, the act-runner must be a statefulset that has persistence volumes since the .runner is different for each pod and it can't be deregistered.
Also, the upstream problems from this comment still stand:
On second thought, the lifecycle.postStart.exec.command can be removed since you can install it inside the Gitea Action:
@dementhorr Thanks for contributing!
Both me and @justusbunsi are quite busy during the start of the year. I'll try to add my review within the next days so we can make some progress/get an initial discussion going.
Yet what would definitely be a requirement in getting this merged (besides all details TBD) is the addition of unit tests (see the existing ones).
Thanks again for starting here! The effort is much appreciated.
I hope my review is not too overwhelming and you brought some resource for discussion. Some points still need discussion. We can also do so in the Gitea discord to not bloat this PR too much - feel free to ping me there.
I would suggest adding
templates/act_runner
and putting all new templates there instead oftemplates/gitea/actions/
.Open discussion points
Disclaimer: I haven't done a deep dive into act runner yet (and have no time right now), so everything I propose/state here relies on other, similar projects (GitHub self hosted runners, other charts which server external runners)
Do we need a statefulset? We moved from a statefuleset to a deployment (+RWX) in the chart some time ago for easier replica scaling.
Given that there is no need for an internal leader selection in the act runner, a statefulset might not be needed or even make sense. This of course only applies if the persistent config is shared between them using a RWX. Other "runner" charts all make use of deployments, see e.g. https://github.com/actions/actions-runner-controller/blob/master/charts/gha-runner-scale-set-controller/templates/deployment.yaml.
Persistence: is it even needed? Or could the act runner also just live with env vars set in the deployment resource and the k8s secret for connectivity? Is it needed for logs or are they stored in the persistent dir of Gitea itself? If not, maybe we can configure it do so? Does the runner have an integrated logrotate config?
Persistence
Act runner should have it's own persistence mapping. This allows for more flexibility and less conflicts with the gitea persistence one. I wonder if it would even work this way since they would share the same PV/PVC right now AFAICS.
Images
We have the policy to use semver tags everywhere instead of latest. The repo uses renovate and all semver tags are updated in a bundled way in a specified interval. This makes it easier to control updates and inspect what could have broken something (yes, even if it's only a busybox image :))
Tests
For the tests, we should add some "meaningful" ones, i.e. we don't need to check that the configmaps/statefulsets are just existing (this is usually not an issue) but that they also "do" what they are supposed to do. I.e. if certain mappings result in certain resources/values which are created dynamically. Specifically, checking for actions of non-default values and their validity.
@ -0,0 +15,4 @@
create_token() {
echo "Waiting for new token to be generated..."
begin=$(date +%s)
end=$((begin + 300)) # 5 minutes
I think this should be way shorter. If a token doesn't exist or cannot be created for some reason, it won't resolve itself anyhow. I think 10-15 seconds would also suffice?
@ -0,0 +34,4 @@
fi
if ! create_token; then
echo "Timed out waiting for a token to appear."
This reminds me about the "good old days" waiting for Pokemon XY to appear 😛
Suggestion:
"Checking for an existing act runner token in secret $SECRET_NAME timed out after $time".
@ -0,0 +7,4 @@
labels:
{{- include "gitea.labels" . | nindent 4 }}
data:
config.yaml: |
The act runner config should be configurable in
values.yml
.@ -0,0 +6,4 @@
name: {{ include "gitea.fullname" . }}-scripts
labels:
{{- include "gitea.labels" . | nindent 4 }}
annotations:
Scheduling it as a post-install hook makes sense. Why did you comment it out here?
I tried it but it didn't work. I had the chicken-egg problem, in order for the deployment to be completed it had to have the secret, but to have the secret, the deployment should have completed, so it was a deadlock.
@ -0,0 +28,4 @@
spec:
containers:
- name: actions-token-create
image: "{{ .Values.actions.job.tokenImage.repository }}:{{ .Values.actions.job.tokenImage.tag | default "latest-rootless" }}"
Instead of
latest-rootless
we should use a semver tag here and versionize it viarenovate
.Changed the tag, but didn't modify renovate config.
@ -0,0 +41,4 @@
sleep 5
done
echo "Generating token..."
echo "Generating act_runner token via 'gitea actions generate-runner-token' and storing in kubernetes secret..."
@ -0,0 +53,4 @@
subPath: {{ .Values.persistence.subPath }}
{{- end }}
- name: actions-token-upload
image: "{{ .Values.actions.job.publishImage.repository }}:{{ .Values.actions.job.publishImage.tag | default "latest" }}"
Instead of
latest
we should use a semver tag here and versionize it viarenovate
.Changed the tag, but didn't modify renovate config.
@ -0,0 +62,4 @@
- sh
- -c
- |
printf "Checking rights to update secret... "
printf "Checking rights to update kubernetes act_runner secret... "
@ -0,0 +8,4 @@
annotations:
# helm.sh/hook: post-install
# helm.sh/hook-delete-policy: never
argocd.argoproj.io/hook: Skip
What are those argocd hooks needed for?
@ -0,0 +27,4 @@
spec:
initContainers:
- name: init-gitea
image: busybox:latest
Instead of
latest
we should use a semver tag here and versionize it viarenovate
.Changed the tag, but didn't modify renovate config.
@ -0,0 +32,4 @@
- sh
- -c
- |
while ! nc -z gitea-http 3000; do
The port could differ and should dynamically use the respective port of the main gitea deployment.
@ -0,0 +37,4 @@
done
containers:
- name: act-runner
image: "{{ .Values.actions.statefulset.actRunnerImage.repository }}:{{ .Values.actions.statefulset.actRunnerImage.tag | default "latest" }}"
Instead of
latest
we should use a semver tag here and versionize it viarenovate
.Changed the tag, but didn't modify renovate config.
@ -0,0 +52,4 @@
secretKeyRef:
name: {{ $secretName }}
key: token
- name: GITEA_INSTANCE_URL
This should be configurable/exposed in
values.yml
.Used the already defined variables from
values.yml
.@ -0,0 +54,4 @@
key: token
- name: GITEA_INSTANCE_URL
value: http://gitea-http:3000
- name: GITEA_RUNNER_LABELS
This should be configurable/exposed in
values.yml
.@ -0,0 +67,4 @@
- mountPath: /data
name: data-act-runner
- name: dind
image: "{{ .Values.actions.statefulset.dindImage.repository }}:{{ .Values.actions.statefulset.dindImage.tag | default "24.0.7-dind" }}"
We can expose the tag in
values.yml
directly and let the version be maintained byrenovate
.Changed the tag, but didn't modify renovate config.
@ -0,0 +5,4 @@
templates:
- templates/gitea/actions/config-act-runner.yaml
tests:
- it: renders a deployment
I haven't seen a way to unregister an act runner, only to register it. This is a problem since they are registered in the Runners Management UI. If we don't store the the
.runner
config file in a persistent volume, each time a new runner will start it will have another name. So that means there will be very many runners. Right now (when using StatefulSet with persistence) I get:If I scale the StatefulSet to 3 replicas, I get:
With a deployment, each time I scale, I would get another act-runner instance, so if I scale up and down 5 times from 1 to 5 replicas, I would have 1 Idle
gitea-act-runner-XXXXXXX
and 20 Offlinegitea-act-runner-XXXXXX
as I didn't find a way to really delete (unregister) act runners.So the real reason for the StatefulSet is not for a leader, but to be able to store the
.runner
file. If this doesn't matter (the clutter), we can easily switch to a deployment and just ignore the offline runners.The act runners' logs are stored in the gitea PVC, so we only need the PVC for the
.runner
file. If we switch to deployments, we won't need any PVCs.Persistence for the Gitea dir is needed in order to store the Gitea Actions Token as a Kubernetes Secret. Right now, the token is stored in the the persistent Gitea dir
/data/actions/token
and the Job mounts the volume such that kubectl can patch the secret. Without persistency, the Job can't access the token.Regarding renovate, I don't exactly know how to use and test it. I have changed the tags from latest to a semver tag, configurable in
values.yaml
.@dementhorr I don't have enough time right now to get back to this, it's on the list though!
Thanks for your patience!
Deployment vs. Statefulset
You can also attach a PV to a deployment. That's how the chart currently works. The
deployment
resource doesn't necessarily mean it's stateless. The resource kind also has an effect on how k8s treats the pods WRT to updating/replacement. So deployment + RWX works quite well, what doesn't work so well is deployment + RWO or statefulset + RWX.Statefulsets vs. deployments is a bigger topic which we already discussed before making the switch. Yet this doesn't mean we are bound to only deployments now, it always a new decision to make.
If we really need distinct pods for each runner with a persistent config then I guess statefulset + RWO (with a very small size) would be the best. What do you think?
One point which is still missing is affinity + nodeSelector. These are needed to control the placement of the agents, most commonly to force a placement on
arm64
oramd64
nodes. Would you mind adding these to the statefulset definition?Dind
Do we really need it? Some months ago we removed a similar approach in the Woodpecker helm chart as dind was not needed anymore. I need to check again on the details but maybe we could also omit it here.
I have added affinity, tolerations and nodeSelector for the statefulset and for the job.
I don't think that using a deployment with a RWX would work since the Gitea act runner would create a
.runner
file in the current working directory and as soon as another pod would start it will overwrite that file. Maybe there is a way to create a new dir for each runner and change the dir withworkingDir
to match the hostname, but I think it would make the chart more complicated.In my opinion we should either stick with the StatefulSet with a very small volume to accomodate the
.runner
file, or switch to Deployment without any volumes, and maybe in the future Gitea will do some garbage collection to delete inactive runners.Regarding Dind, I don't think the act runners work the same way in Gitea as in Woodpecker. From my understanding, each step in Woodpecker would spawn a new container (inside a pod) that would run in the cluster (so it's like a First Citizen approach). For the act runner it seems that all the steps run in the same container and that's why we have two containers per runner (1 agent, 1 dind) (so there isn't any transparency as with Woodpecker k8s backend). I may be wrong, though.
@ -980,6 +980,41 @@ To comply with the Gitea helm chart definition of the digest parameter, a "custo
| `signing.privateKey` | Inline private gpg key for signed Gitea actions | `""` |
| `signing.existingSecret` | Use an existing secret to store the value of `signing.privateKey` | `""` |
### GiteaActions
Gitea Actions
@ -983,0 +984,4 @@
| Name | Description | Value |
| ----------------------------------------------- | --------------------------------------------------------------------------- | ------------------ |
| `actions.statefulset.enabled` | Create an act runner StatefulSet. | `false` |
I'd suggest using
actions.enabled
as a global flag for enabling actions.@ -983,0 +997,4 @@
| `actions.statefulset.actRunnerImage.tag` | The Gitea act runner tag | `0.2.6` |
| `actions.statefulset.actRunnerImage.pullPolicy` | The Gitea act runner pullPolicy | `IfNotPresent` |
| `actions.statefulset.dindImage.repository` | The Docker-in-Docker image | `docker` |
| `actions.statefulset.dindImage.tag` | The Docker-in-Docker image tag | `24.0.7-dind` |
let's bump to 25.0.2
@ -47,1 +47,4 @@
{{/*
Create a default worker name.
We truncate at 63 chars because some Kubernetes name fields are limited to this (by the DNS naming spec).
I'd leave this comment out as it's too k8s-internal-specific and might cause more confusion than it possibly helps. We also don't have stdout comments in other places and I haven't seen any other chart doing this.
@ -0,0 +1,11 @@
{{- if and (and .Values.actions.job.enabled .Values.persistence.enabled) .Values.persistence.mount }}
Would the non-creation of this part still lead to a functional actions?
If not, I would suggest to only condition essential parts on
actions.enabled
.In the current chart format, you can use the
act-runner
without the job and the persistence volumes if you manually generate thetoken
(via Web UI or running a shell command into the pod) and save it to a Secret. After that you can specifyactions.existingSecret
andactions.existingSecretKey
.@ -334,0 +380,4 @@
config: ""
runnerLabels: ""
actRunnerImage:
I'd remove the
Image
part from the name. Same fordind
.Yes, this would not make sense. I also didn't mean it that way, i.e. using RWX across all agents. I just explained the general "use" or deployment/statefulset and their possible combinations. E.g. for Gitea itself deployment + RWX makes a lot of sense as each deployment should share the same underlying files but can still act as an independent deployment resource.
I agree. Given that we don't now if the GC will ever happen, let's go with a STS and a small RWO volume.
Dind: In WP the
buildx
plugin is used which spawns the docker engine itself in the image, so no additional sidecar/service is needed. With GA, one needs to bring the docker engine yourself. So yes, in this case adind
sidecar is needed. So it's not about pod vs. container (btw in WP currently each step is it's own distinct pod but that will likely change) but about "who brings the docker engine". An alternative might actually bepodman
which could be included in the base image for the runner and then wouldn't require an additional service to boot up first. But this is another topic.Moving forward: I assume you have tested this branch already and are likely also using it since some time? I think we are not too far from a merge, though I would definitely want to have a review from @justusbunsi too. He usually sees things I don't see 😄
I just found a setting in the
Site Administration
>Runners
where one can edit current runners (including deleting inactive runners). I haven't found a downside to having inactive runners (just that they clutter the UI), so there is a way to delete them for estetic purposes.I also have change the code from StatefulSet > Deployment. I can revert the commit if you prefer the StatefulSet.
They also clutter the DB at some point. When running in HA and frequent pod movements, the number can get quite large quickly.
I think a statefulset with a small RWO volume would be my preference right now.
@dementhorr Thanks so far!
Open points:
Review comments
README documentation
Review from Justus
@dementhorr Did you already test this on a cluster of yours?
I think it would also be good if we would store a dev-focused tech-documentation alongside the templates, i.e. how all components play together, especially all the jobs and scripts.
Hello! I am interested in testing this Actions runners addition and would be happy to share/help solve bugs that may crop up. Thanks to @dementhorr and @pat-s for your contributions to this feature - deploying runners easily and in an automated fashion would be great!
I have added documentation for the Job. I don't know exactly where to place it.
I have tested the chart on my cluster and it works fine.
@dementhorr Great, thanks for the feedback! I'll try to finish everything up in the next weeks. The plan is to include it into the release which ships v1.22 (1.22 is currently in feature freeze).
Given that it is disabled by default, we can see which issues occur and fix them in subsequent releases.
@justusbunsi Your review is still welcome. If you can't make it in time, we can also address your comments post-merge. Given the amount of work and discussion that went into this already, I think it would be good to release this to users with the next version.
Apologies for not responding earlier. Even with being pinged twice. I am not sure if I have enough time until Friday but would love to give my review the upcoming weekend. Afaik this should still be within the feature freeze time slot of Gitea 1.22. Does this work for everyone?
Hi everyone. First of all, thank you @dementhorr for contributing to this Chart in general, and to propose Gitea Actions in particular. 👍 I appreciate every minute you already have put into it. And @pat-s for reviewing so far.
Its much more complex than first thought, so I'll split my review into 2 parts:
chart user perspective
(this very review below)technical perspective
(will follow in the next days)@ -980,6 +980,41 @@ To comply with the Gitea helm chart definition of the digest parameter, a "custo
| `signing.privateKey` | Inline private gpg key for signed Gitea actions | `""` |
| `signing.existingSecret` | Use an existing secret to store the value of `signing.privateKey` | `""` |
### Gitea Actions
This section misses a Table-of-content reference at the top of this file.
@ -0,0 +2,4 @@
In order to use the Gitea Actions act-runner you must:
- set the following environment variables to `deployment.env` (modify LOCAL_ROOT_URL if you used a different service name):
Obsolete documentation when considering https://gitea.com/gitea/helm-chart/pulls/596/files#issuecomment-811808
@ -0,0 +13,4 @@
value: http://gitea-http:3000
```
- enable persistence (used for automatic deployment to be able to store the token in a place accessible for the Job)
Obsolete documentation when considering https://gitea.com/gitea/helm-chart/pulls/596/files#issuecomment-811808
@ -0,0 +17,4 @@
log:
level: debug
cache:
enabled: false
Let's move the default config.yaml definition into
values.yaml
. That way it should be more obvious what content it should contain. And we eliminate the if-else here. 🙂@ -0,0 +56,4 @@
- name: GITEA_INSTANCE_URL
value: "http://{{ include "gitea.fullname" . }}-http:{{ .Values.service.http.port }}"
- name: GITEA_RUNNER_LABELS
value: "{{ .Values.actions.statefulset.runnerLabels | default "ubuntu-latest" }}"
Let's move the default value into
values.yaml
. Eliminates the default usage here. 🙂As an alternative thought: I've noticed that when running
act_runner generate-config
, you'll get a yaml structure where you can define a list of labels for the runner viarunner.labels
. Was there a reason for providing the labels as environment variable instead through the config.yaml? If not - and given the fact that it works - I suggest to moveubuntu-latest
into the config.yaml and drop this environment variable and its values.yaml spec entirely.@ -0,0 +80,4 @@
- name: DOCKER_CERT_PATH
value: /certs/server
securityContext:
# allowPrivilegeEscalation: true
Is this comment a left-over from testing?
yes
@ -333,1 +333,4 @@
# Configure Gitea Actions
# - must enable persistence if the job is enabled
# - must define deployment.env.GITEA__ACTIONS__ENABLED and GITEA__SERVER__LOCAL_ROOT_URL
As mentioned for
GITEA__ACTIONS__ENABLED
in https://gitea.com/gitea/helm-chart/pulls/596/files#issuecomment-811807, these preconditions should be part of the templating logic.actions.enabled: true
and at the same timepersistence.enabled: false
.GITEA__SERVER__LOCAL_ROOT_URL
what is also dynamically built intemplates/gitea/act_runner/statefulset.yaml
forGITEA_INSTANCE_URL
? We can define that environment variable for the user via helpers.tpl, too.The chart must throw an error when
actions.provisioning.enabled: true
andpersistence.enabled: false
. Otherwiseactions.enabled: true
withpersistence.enabled: false
is valid ifactions.existingSecret
andactions.existingSecretKey
is defined. I have modified thereadme-actions-dev.md
file.I wasn't able to modify the
GITEA_INSTANCE_URL
using the.Values.gitea.config.actions
variable from_helpers.tpl
.My initial comment was misleading - just realized that now. I didn't mean we should set
GITEA_INSTANCE_URL
via_helpers.tpl
. This value is not known to theapp.ini
. ButGITEA__SERVER__LOCAL_ROOT_URL
is. And since the value for both environment variables are identical, there is no technical reason to define one but let the user define the other. 🙂Sorry for that confusion.
@ -334,0 +343,4 @@
## @param actions.statefulset.nodeSelector NodeSelector for the statefulset
## @param actions.statefulset.tolerations Tolerations for the statefulset
## @param actions.statefulset.affinity Affinity for the statefulset
## @param actions.statefulset.config Act runner custom configuration.
Suggestion:
Or, combined with https://gitea.com/gitea/helm-chart/pulls/596/files#issuecomment-811801 (to prevent broken README generation 😆):
@ -334,0 +344,4 @@
## @param actions.statefulset.tolerations Tolerations for the statefulset
## @param actions.statefulset.affinity Affinity for the statefulset
## @param actions.statefulset.config Act runner custom configuration.
## @param actions.statefulset.runnerLabels Act runner labels.
Suggestion:
@ -334,0 +367,4 @@
## @param actions.existingSecret Secret that contains the token
## @param actions.existingSecretKey Secret key
actions:
enabled: false
The current approach of only documenting what a chart user has to do manually is not ideal. Setting
actions.enabled: true
should automatically configure the instance to have enable Gitea Actions. Means, it should either definegitea.config.actions.enabled
as provided inline values while templating, or inject the environment variableGITEA__ACTIONS__ENABLED=true
to the app.ini processing.To keep the amount of injected envs as low as possible, I would prefer we do it via inline values. There already is a method for that in helpers.tpl defining other default values. Here, we also can detect conflicting configurations (e.g.
actions.enabled: true
and user-defined inlinegitea.config.actions.enabled: false
) and fail the templating process with an appropriate error message.Making that part of the chart logic reduces usage errors.
@ -334,0 +377,4 @@
affinity: {}
config: ""
runnerLabels: ""
Since
config
andrunnerLabels
are only used within theactRunner
container, we should move those values settings intoactions.statefulset.actRunner
to make that clear. It also gives a better understanding for administrators.@ -334,0 +390,4 @@
pullPolicy: IfNotPresent
job:
enabled: false
Since
job.enabled: true
basically means to automatically connect the runner to Gitea, I suggest renaming the object fromjob
toprovisioning
orautoConnect
. That makes it easier for users to understand its purpose. The fact its a Kubernetes Job is not important for users.@ -321,6 +345,15 @@ https
{{- if not .Values.gitea.config.indexer.ISSUE_INDEXER_TYPE -}}
{{- $_ := set .Values.gitea.config.indexer "ISSUE_INDEXER_TYPE" "db" -}}
{{- end -}}
{{- if not .Values.gitea.config.actions.GITEA__ACTIONS__ENABLED -}}
The whole actions related
app.ini
block is rendered when runninghelm template my-release .
, although it is disabled by default. Looks like this check is missing. And a test that covers this.Any help needed on this? I’d be happy to contribute
We haven't heard from @dementhorr for two months now. A lot of time was already put in. The current state is a bit unclear.
@vjm If you want to continue, maybe use a separate PR based on this one to continue. This would allow @dementhorr to eventually continue here.
Hey 👋
What requirements are left to be achieved in order to merge it ?
From the comments I can see:
actions
whenactions.enabled: false
. @justusbunsi you seem to have suggested to test that, right ? I'm not familiar withunittest
, does it means testing thehelpers.yaml
file output ?If it cannot, it is required to ship this chart ?
Is moving from statefulset to deployments a requirement ?
@pat-s , @justusbunsi -- I have rebased in #666 based on this pull request. Happy to make changes if they are needed, but are we good to merge #666 in current state and make revisions as needed?
Closed in favor of #666
Pull request closed