replicaCount fails on other values than 1 #604

Closed
opened 2024-01-20 16:12:27 +00:00 by mrdunski · 15 comments
Is this line intended? https://gitea.com/gitea/helm-chart/blame/branch/main/templates/_helpers.tpl#L8
Author

I'm not sure about bleve few lines below, but I think it could work correctly when there is RWX PV. Updates are synchronized with queue and ran on the single instance.

I'm not sure about bleve few lines below, but I think it could work correctly when there is RWX PV. Updates are synchronized with queue and ran on the single instance.
Member

Multiple replicas with individual RWO PVs won't work as there will be write conflicts between different sessions. There is no central managament within Gitea which would handle/queue those parallel writes, causing potential conflicts or failures if they would occur. Which is why we are requiring a single RWX with multiple replicas.

This is hard to test on your own as you would need to mock almost parallel write requests from multiple sessions to trigger the issue.

(Yet I am not fully sure I understand your second comment. -> What "could work if there is a RWX PVC"?)

Multiple replicas with individual RWO PVs won't work as there will be write conflicts between different sessions. There is no central managament within Gitea which would handle/queue those parallel writes, causing potential conflicts or failures if they would occur. Which is why we are requiring a single RWX with multiple replicas. This is hard to test on your own as you would need to mock almost parallel write requests from multiple sessions to trigger the issue. (Yet I am not fully sure I understand your second comment. -> What "could work if there is a RWX PVC"?)
pat-s added the
kind
question
label 2024-01-22 09:42:47 +00:00
Member

Closing as likely stale, feel free to continue commenting if needed.

Closing as likely stale, feel free to continue commenting if needed.
pat-s closed this issue 2024-02-11 11:55:39 +00:00
Author

Probably I'm wrong. Please explain.

Lines 9-23 seems to be a dead code. I might misunderstood the purpose of it, but I don't think you can actually trigger those lines.

If I'm wrong, what is the values.yaml file, that could trigger the error messages from lines 9-23?

My sugesstion is remove dead code or refactor missplaced failure (I can prepare that change, but I'm not sure if I'm missing something).

In the case of the second comment I meant gitea is using jobs and queues to manage indexing and it is designed to handle multiple instances. As far as I know it is well tested behavior and I was using it in the past without error (although maybe I was just lucky and there are some known bugs).

Probably I'm wrong. Please explain. Lines 9-23 seems to be a dead code. I might misunderstood the purpose of it, but I don't think you can actually trigger those lines. If I'm wrong, what is the values.yaml file, that could trigger the error messages from lines 9-23? My sugesstion is remove dead code or refactor missplaced failure (I can prepare that change, but I'm not sure if I'm missing something). In the case of the second comment I meant gitea is using jobs and queues to manage indexing and it is designed to handle multiple instances. As far as I know it is well tested behavior and I was using it in the past without error (although maybe I was just lucky and there are some known bugs).
mrdunski reopened this issue 2024-02-11 14:44:06 +00:00
Author

Or let me put it in other words: this line means: if replicas is more than 1, fail with unrelevant message.

What is the point of the rest of the checks if this chart is designed to work with single instance only?

Or let me put it in other words: this line means: if replicas is more than 1, fail with unrelevant message. What is the point of the rest of the checks if this chart is designed to work with single instance only?
Member

To answer the initial question: yep, looks like dead code.

To answer the initial question: yep, looks like dead code.
Member

If I'm wrong, what is the values.yaml file, that could trigger the error messages from lines 9-23?

replicaCount: 2 should go into 9-23 and then all the other conditions apply.

L.8 is a mistake and should be removed, yes.

{{- fail "When using multiple replicas, a RWX file system is required" -}}

EDIT: I just realized that all assertions in helpers.tpl are not effective. Only the ones in config.yaml. They were duplicated anyhow. I did a cleanup and added tests in the linked PR.

Given that I am running Gitea in HA and not encountering the failure, something seems off, yes.
I've tested all of the assertions though in a DEV instance back when I added the code, so they are working in general. I will add some tests.

In the case of the second comment I meant gitea is using jobs and queues to manage indexing and it is designed to handle multiple instances. As far as I know it is well tested behavior and I was using it in the past without error (although maybe I was just lucky and there are some known bugs).

As far as we/I know, Bleve is not "simply" working in multi-instance mode which is why there has been effort to push for another alternative besides elastic which works in such scenarios. I cannot fully proof it though. It surely would be great if bleve would definitely work in HA.

What is the point of the rest of the checks if this chart is designed to work with single instance only?

I don't understand that comment. The chart is clearly aiming for HA, see the README etc. l8. is just a mistake.

> If I'm wrong, what is the values.yaml file, that could trigger the error messages from lines 9-23? `replicaCount: 2` should go into 9-23 and then all the other conditions apply. L.8 is a mistake and should be removed, yes. https://gitea.com/gitea/helm-chart/src/commit/829bca241d4a829db411124741f53ffbb83431bc/templates/_helpers.tpl#L8 EDIT: I just realized that all assertions in `helpers.tpl` are not effective. Only the ones in `config.yaml`. They were duplicated anyhow. I did a cleanup and added tests in the linked PR. Given that I am running Gitea in HA and not encountering the failure, something seems off, yes. I've tested all of the assertions though in a DEV instance back when I added the code, so they are working *in general*. I will add some tests. > In the case of the second comment I meant gitea is using jobs and queues to manage indexing and it is designed to handle multiple instances. As far as I know it is well tested behavior and I was using it in the past without error (although maybe I was just lucky and there are some known bugs). As far as we/I know, Bleve is not "simply" working in multi-instance mode which is why there has been effort to push for another alternative besides elastic which works in such scenarios. I cannot fully proof it though. It surely would be great if bleve would *definitely* work in HA. > What is the point of the rest of the checks if this chart is designed to work with single instance only? I don't understand that comment. The chart is clearly aiming for HA, see the README etc. l8. is just a mistake.
Author

I don't understand that comment. The chart is clearly aiming for HA, see the README etc. l8. is just a mistake.

I wasn't sure if the pointed line is on purpose, and I wasn't sure how to understand your previous comment (I was pointing the line that forces this chart to work in single instance and you were saying this is correct and I wasn't sure why).

As far as we/I know, Bleve is not "simply" working in multi-instance mode (...)

I think I was misunderstood. I'm not saying blave can handle multiple instances. As far as I know, gitea uses queues, and do not update blave on multiple instances.
You do need to have queues that can handle multiple instances (like redis).
Is there any documentation that states this is not thread safe and that those assertions are actually needed?

> I don't understand that comment. The chart is clearly aiming for HA, see the README etc. l8. is just a mistake. I wasn't sure if the pointed line is on purpose, and I wasn't sure how to understand your previous comment (I was pointing the line that forces this chart to work in single instance and you were saying this is correct and I wasn't sure why). > As far as we/I know, Bleve is not "simply" working in multi-instance mode (...) I think I was misunderstood. I'm not saying blave can handle multiple instances. As far as I know, gitea uses [queues](https://docs.gitea.com/administration/config-cheat-sheet#queue-queue-and-queue), and do not update blave on multiple instances. You do need to have queues that can handle multiple instances (like redis). Is there any documentation that states this is not thread safe and that those assertions are actually needed?
Author

EDIT: I just realized that all assertions in helpers.tpl are not effective. Only the ones in config.yaml. They were duplicated anyhow. I did a cleanup and added tests in the linked PR.

Classic "works on my machine"? 😆

It fails for me. That's why I created this issue.

Otherwise dead code should probably be removed.

> EDIT: I just realized that all assertions in helpers.tpl are not effective. Only the ones in config.yaml. They were duplicated anyhow. I did a cleanup and added tests in the linked PR. Classic "works on my machine"? 😆 It fails for me. That's why I created this issue. Otherwise dead code should probably be removed.
Member

I wasn't sure if the pointed line is on purpose, and I wasn't sure how to understand your previous comment (I was pointing the line that forces this chart to work in single instance and you were saying this is correct and I wasn't sure why).

I think this was a double-sided misunderstanding.
As the README prominently states, the charts provides (and aims) for a HA setup. And the setup works and is in good use since many months (also on different clusters I am managing).

I think I was misunderstood. I'm not saying blave can handle multiple instances. As far as I know, gitea uses queues, and do not update blave on multiple instances.
You do need to have queues that can handle multiple instances (like redis).
Is there any documentation that states this is not thread safe and that those assertions are actually needed?

There were some initial discussions in issues of this repo and in Gitea core where people stated that bleve won't work in HA. Which is why all the elastic & friends efforts were pushed. Don't have the links right now though.

Classic "works on my machine"? 😆
It fails for me. That's why I created this issue.

Hmm strange. I couldn't trigger the failure when creating the unittest and if it would have been effective, nobody should be able to use >1 replica ever since the failure would not be conditioned on anything (it was a c/p mistake likely).

So you're saying helm install fails for you with replicaCount: 2 and the error points to the line in helpers.tpl?

> I wasn't sure if the pointed line is on purpose, and I wasn't sure how to understand your previous comment (I was pointing the line that forces this chart to work in single instance and you were saying this is correct and I wasn't sure why). I think this was a double-sided misunderstanding. As the README prominently states, the charts provides (and aims) for a HA setup. And the setup works and is in good use since many months (also on different clusters I am managing). > I think I was misunderstood. I'm not saying blave can handle multiple instances. As far as I know, gitea uses queues, and do not update blave on multiple instances. You do need to have queues that can handle multiple instances (like redis). Is there any documentation that states this is not thread safe and that those assertions are actually needed? There were some initial discussions in issues of this repo and in Gitea core where people stated that `bleve` won't work in HA. Which is why all the elastic & friends efforts were pushed. Don't have the links right now though. > Classic "works on my machine"? 😆 It fails for me. That's why I created this issue. Hmm strange. I couldn't trigger the failure when creating the unittest and if it would have been effective, nobody should be able to use >1 replica ever since the failure would not be conditioned on anything (it was a c/p mistake likely). So you're saying `helm install` fails for you with `replicaCount: 2` and the error points to the line in `helpers.tpl`?
Author

I am using ArgoCD, so it is not a direct helm install.
It was failing when I was creating the issue.

I will setup test environment and try to reproduce it once again and compare it with plain helm. I will come back with the result.

I am using ArgoCD, so it is not a direct helm install. It was failing when I was creating the issue. I will setup test environment and try to reproduce it once again and compare it with plain helm. I will come back with the result.
Member

Hmm interesting, this would mean that the code in helpers.tpl is not getting ignored.
Wondering what the difference is for Argo compared to helm install or the unittest plugin we're using - since I couldn't trigger it there and I also did not fail over it in my own k8s install, which uses multiple replicas.

Let's see what you can find out!

Hmm interesting, this would mean that the code in `helpers.tpl` is not getting ignored. Wondering what the difference is for Argo compared to `helm install` or the unittest plugin we're using - since I couldn't trigger it there and I also did not fail over it in my own k8s install, which uses multiple replicas. Let's see what you can find out!
Author

As far I have no luck in reproducing exactly the same issue.

Here is what ArgoCD says when I'm trying to set replicaCount to 2:

Unable to save changes: application spec for gitea is invalid: InvalidSpecError: Unable to generate manifests in : rpc error: code = Unknown desc = `helm template . --name-template gitea --namespace gitea --kube-version 1.28 --values /tmp/4ea110cb-aad8-4d37-bc5d-ab7121688ff4 <api versions removed> --include-crds` failed exit status 1: Error: template: gitea/templates/gitea/deployment.yaml:29:28: executing "gitea/templates/gitea/deployment.yaml" at <include (print $.Template.BasePath "/gitea/config.yaml") .>: error calling include: template: gitea/templates/gitea/config.yaml:28:40: executing "gitea/templates/gitea/config.yaml" at <"cron.GIT_GC_REPOS">: wrong type for value; expected map[string]interface {}; got string Use --debug flag to render out invalid YAML

I am able to reproduce the same result with

helm repo add gitea-charts https://dl.gitea.io/charts/
helm template gitea gitea-charts/gitea -f ./gitea.yaml
# ./gitea.yaml
gitea:
  podAnnotations:
    prometheus.io/scrape: 'true'
ingress:
  annotations:
    kubernetes.io/ingress.class: nginx
    kubernetes.io/tls-acme: 'true'
    nginx.ingress.kubernetes.io/proxy-body-size: 10000m
  enabled: true
  hosts:
    - host: gitea.example.com
      paths:
        - path: /
          pathType: Prefix
  tls:
    - hosts:
        - gitea.example.com
      secretName: gitea-tls
persistence:
  accessModes:
    - ReadWriteMany
  enabled: true
  size: 12Gi
  storageClass: gitea-gitea-shared-storage
replicaCount: 2
resources:
  limits:
    cpu: 1000m
    memory: 1024M
  requests:
    cpu: 476m
    memory: 226M
service:
  ssh:
    nodePort: 30222
    type: NodePort
As far I have no luck in reproducing exactly the same issue. Here is what ArgoCD says when I'm trying to set replicaCount to 2: ``` Unable to save changes: application spec for gitea is invalid: InvalidSpecError: Unable to generate manifests in : rpc error: code = Unknown desc = `helm template . --name-template gitea --namespace gitea --kube-version 1.28 --values /tmp/4ea110cb-aad8-4d37-bc5d-ab7121688ff4 <api versions removed> --include-crds` failed exit status 1: Error: template: gitea/templates/gitea/deployment.yaml:29:28: executing "gitea/templates/gitea/deployment.yaml" at <include (print $.Template.BasePath "/gitea/config.yaml") .>: error calling include: template: gitea/templates/gitea/config.yaml:28:40: executing "gitea/templates/gitea/config.yaml" at <"cron.GIT_GC_REPOS">: wrong type for value; expected map[string]interface {}; got string Use --debug flag to render out invalid YAML ``` I am able to reproduce the same result with ```shell helm repo add gitea-charts https://dl.gitea.io/charts/ helm template gitea gitea-charts/gitea -f ./gitea.yaml ``` ```yaml # ./gitea.yaml gitea: podAnnotations: prometheus.io/scrape: 'true' ingress: annotations: kubernetes.io/ingress.class: nginx kubernetes.io/tls-acme: 'true' nginx.ingress.kubernetes.io/proxy-body-size: 10000m enabled: true hosts: - host: gitea.example.com paths: - path: / pathType: Prefix tls: - hosts: - gitea.example.com secretName: gitea-tls persistence: accessModes: - ReadWriteMany enabled: true size: 12Gi storageClass: gitea-gitea-shared-storage replicaCount: 2 resources: limits: cpu: 1000m memory: 1024M requests: cpu: 476m memory: 226M service: ssh: nodePort: 30222 type: NodePort ```
Author

Same result for:

persistence:
  accessModes:
    - ReadWriteMany
  enabled: true
replicaCount: 2

This is gitea-10.1.1

EDIT:
It seems like this is other issue. I probably should create a separate ticket for that.

Same result for: ```yaml persistence: accessModes: - ReadWriteMany enabled: true replicaCount: 2 ``` This is gitea-10.1.1 EDIT: It seems like this is other issue. I probably should create a separate ticket for that.
Member

The error might come from Values.gitea.config.cron.GIT_GC_REPOS not being defined and hence helm cannot check it.

In #611, specifically in

{{- if .Values.gitea.config.cron -}}
{{- if .Values.gitea.config.cron.GIT_GC_REPOS -}}
{{- if eq .Values.gitea.config.cron.GIT_GC_REPOS.ENABLED true -}}
{{ fail "Invoking the garbage collector via CRON is not yet supported when running with multiple replicas. Please set 'cron.GIT_GC_REPOS.enabled = false'." }}
{{- end }}
{{- end }}
{{- end }}
I've updated the existence checking of the value and I think this should solve the issue you're seeing.

Can you run the checks on the PR? Otherwise we merge and you can check on the next release.

The error might come from `Values.gitea.config.cron.GIT_GC_REPOS` not being defined and hence helm cannot check it. In #611, specifically in https://gitea.com/gitea/helm-chart/src/commit/250214aa9ef910a22598cb60f4cc3f4f86e338f6/templates/gitea/config.yaml#L28-L34 I've updated the existence checking of the value and I think this should solve the issue you're seeing. Can you run the checks on the PR? Otherwise we merge and you can check on the next release.
pat-s closed this issue 2024-02-23 07:27:48 +00:00
Sign in to join this conversation.
No Milestone
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: gitea/helm-chart#604
No description provided.