[Breaking] HA-support via Deployment #437

pat-s · 2023-04-14T09:24:44Z

pat-s commented

2023-04-14 09:24:44 +00:00

Changes

A big shoutout to @luhahn for all his work in #205 which served as the base for this PR.

Documentation

After thinking for some time about it, I still prefer the distinct option (as started in #350), i.e. having a standalone "HA" doc under docs/ha-setup.md to not have a very long README (which is already quite long).
Most of the information below should go into it with more details and explanations behind all of the individual components.

Chart deps

~~- Adds meilisearch as a chart dependency for a HA-ready issue indexer. Only works with >= Gitea 1.20~~
~~- Adds redis as a chart dependency for a HA-ready session and queue store.~~

Adds redis-cluster as a chart dependency for a HA-ready session and queue store (alternative to redis). Only works with >= Gitea 1.19.2.
Removes memcached instead of redis-cluster
Add postgresql-ha as default DB dep in favor of postgres

Adds smart HA chart logic

The goal is to set smart config values that result in a HA-ready Gitea deployment if replicaCount > 1.

If replicaCount > 1,
- gitea.config.session.PROVIDER is automatically set to redis-cluster
- gitea.config.indexer.REPO_INDEXER_ENABLED is automatically set to false unless the value is elasticsearch or meilisearch
- redis-cluster is used for [queue] and [cache] and [session]mode or not

Configuration of external instances of meilisearch and minio are documented in a new markdown doc.

Deployment vs Statefulset

Given all the discussions about this lately (#428), I think we could use both.
In the end, we do not have the requirement for a sequential pod scale up/scale down as it would happen in statefulsets.
On the other side, we do not have actual stateless pods as we are attaching a RWX to the deployment.
Yet I think because we do not have a leader-election requirement, spawning the pods as a deployment makes "Rolling Updates" easier and also signals users that there is no "leader election" logic and each pod can just be "destroyed" at anytime without causing interruption.

Hence I think we should be able to switch from a statefulset to a deployment, even in the single-replica case.

This change also brought up a templating/linting issue: the definition of .Values.gitea.config.server.SSH_LISTEN_PORT in ssh-svc.yaml just "luckily" worked so far due to naming-related lint processing. Due to the change from "statefulset" to "deployment", the processing queue changed and caused a failure complaining about config.server.SSH_LISTEN_PORT not being defined yet.
The only way I could see to fix this was to "properly" define the value in values.yaml instead of conditionally definining it in helpers.tpl. Maybe there's a better way?

Chart PVC Creation

I've adapted the automated PVC creation from another chart to be able to provide the storageClassName as I couldn't get dynamic provisioning for EFS going with the current implementation.
In addition the naming and approach within the Gitea chart for PV creation is a bit unusual and aligning it might be beneficial.

A semi-unrelated change which will result in a breaking change for existing users but this PR includes a lot of breaking changes already, so including another one might not make it much worse...

New persistence.mount: whether to mount an existing PVC (via persistence.existingClaim
New persistence.create: whether to create a new PVC

Testing

As this PR does a lot of things, we need proper testing.
The helm chart can be installed from the Git branch via helm-git as follows:

helm repo add gitea-charts git+https://gitea.com/gitea/helm-chart@/?ref=deployment
helm install gitea --version 0.0.0

It is highly recommended to test the chart in a dedicated namespace.

I've tested this myself with both redis and redis-cluster and it seemed to work fine.
I just did some basic operations though and we should do more niche testing before merging.

Examplary values.yml for testing (only needs a valid RWX storage class):

values.yaml

image:
  tag: "dev"
  PullPolicy: "Always"
  rootless: true

replicaCount: 2

persistence:
  enabled: true
  accessModes:
    - ReadWriteMany
  storageClass: FIXME

redis-cluster:
  enabled: false
  global:
    redis:
      password: gitea

gitea:
  config:
    indexer:
      ISSUE_INDEXER_ENABLED: true
      REPO_INDEXER_ENABLED: false

Preferred setup

The preferred HA setup with respect to performance and stability might currently be as follows:

Repos: RWX (e.g. EFS or Azurefiles NFS)
Issue indexer: Meilisearch (HA)
Session and cache: Redis Cluster (HA)
Attachments/Avatars: Minio (HA)

This will result in a ~ 10-pod HA setup overall.
All pods have very low resource requests.

fix #98

# Changes A big shoutout to @luhahn for all his work in #205 which served as the base for this PR. ## Documentation - [x] After thinking for some time about it, I still prefer the distinct option (as started in #350), i.e. having a standalone "HA" doc under `docs/ha-setup.md` to not have a very long README (which is already quite long). Most of the information below should go into it with more details and explanations behind all of the individual components. ## Chart deps ~~- Adds `meilisearch` as a chart dependency for a HA-ready issue indexer. Only works with >= Gitea 1.20~~ ~~- Adds `redis` as a chart dependency for a HA-ready session and queue store.~~ - Adds `redis-cluster` as a chart dependency for a HA-ready session and queue store (alternative to `redis`). Only works with >= Gitea 1.19.2. - Removes `memcached` instead of `redis-cluster` - Add `postgresql-ha` as default DB dep in favor of `postgres` ## Adds smart HA chart logic The goal is to set smart config values that result in a HA-ready Gitea deployment if `replicaCount` > 1. - If `replicaCount` > 1, - `gitea.config.session.PROVIDER` is automatically set to `redis-cluster` - `gitea.config.indexer.REPO_INDEXER_ENABLED` is automatically set to `false` unless the value is `elasticsearch` or `meilisearch` - `redis-cluster` is used for `[queue]` and `[cache]` and `[session]`mode or not Configuration of external instances of `meilisearch` and `minio` are documented in a new markdown doc. ## Deployment vs Statefulset Given all the discussions about this lately (#428), I think we could use both. In the end, we do not have the requirement for a sequential pod scale up/scale down as it would happen in statefulsets. On the other side, we do not have actual stateless pods as we are attaching a RWX to the deployment. Yet I think because we do not have a leader-election requirement, spawning the pods as a deployment makes "Rolling Updates" easier and also signals users that there is no "leader election" logic and each pod can just be "destroyed" at anytime without causing interruption. Hence I think we should be able to switch from a statefulset to a deployment, even in the single-replica case. This change also brought up a templating/linting issue: the definition of `.Values.gitea.config.server.SSH_LISTEN_PORT` in `ssh-svc.yaml` just "luckily" worked so far due to naming-related lint processing. Due to the change from "statefulset" to "deployment", the processing queue changed and caused a failure complaining about `config.server.SSH_LISTEN_PORT` not being defined yet. The only way I could see to fix this was to "properly" define the value in `values.yaml` instead of conditionally definining it in `helpers.tpl`. Maybe there's a better way? ## Chart PVC Creation I've adapted the automated PVC creation from another chart to be able to provide the `storageClassName` as I couldn't get dynamic provisioning for EFS going with the current implementation. In addition the naming and approach within the Gitea chart for PV creation is a bit unusual and aligning it might be beneficial. A semi-unrelated change which will result in a breaking change for existing users but this PR includes a lot of breaking changes already, so including another one might not make it much worse... - New `persistence.mount`: whether to mount an existing PVC (via `persistence.existingClaim` - New `persistence.create`: whether to create a new PVC ## Testing As this PR does a lot of things, we need proper testing. The helm chart can be installed from the Git branch via `helm-git` as follows: ``` helm repo add gitea-charts git+https://gitea.com/gitea/helm-chart@/?ref=deployment helm install gitea --version 0.0.0 ``` It is **highly recommended** to test the chart in a dedicated namespace. I've tested this myself with both `redis` and `redis-cluster` and it seemed to work fine. I just did some basic operations though and we should do more niche testing before merging. Examplary `values.yml` for testing (only needs a valid RWX storage class): <details> <summary>values.yaml</summary> ```yml image: tag: "dev" PullPolicy: "Always" rootless: true replicaCount: 2 persistence: enabled: true accessModes: - ReadWriteMany storageClass: FIXME redis-cluster: enabled: false global: redis: password: gitea gitea: config: indexer: ISSUE_INDEXER_ENABLED: true REPO_INDEXER_ENABLED: false ``` </details> ## Preferred setup The preferred HA setup with respect to performance and stability might currently be as follows: - Repos: RWX (e.g. EFS or Azurefiles NFS) - Issue indexer: Meilisearch (HA) - Session and cache: Redis Cluster (HA) - Attachments/Avatars: Minio (HA) This will result in a ~ 10-pod HA setup overall. All pods have very low resource requests. fix #98

❤️ 1 🎉 2

pat-s added the

labels 2023-04-14 09:24:44 +00:00

pat-s added 27 commits 2023-04-14 09:24:45 +00:00

add deployment