fix subchart dns macros #359

Closed
jouve wants to merge 1 commits from jouve/helm-chart:dns into main
Contributor

Description of the change

inspired from https://github.com/bitnami/charts/blob/master/bitnami/common/templates/_names.tpl#L35, add a macro to calculate subchart fullname and use it to calculate dns

trim all metadata.name to not go above 63 chars

Benefits

fix for #345

Possible drawbacks

Applicable issues

Additional information

⚠ BREAKING

Checklist

  • Parameters are documented in the values.yaml and added to the README.md using readme-generator-for-helm
  • Breaking changes are documented in the README.md
<!-- Before you open the request please review the following guidelines and tips to help it be more easily integrated: - Describe the scope of your change - i.e. what the change does. - Describe any known limitations with your change. - Please run any tests or examples that can exercise your modified code. Thank you for contributing! We will try to review, test and integrate the change as soon as we can. --> ### Description of the change <!-- Describe the scope of your change - i.e. what the change does. --> inspired from https://github.com/bitnami/charts/blob/master/bitnami/common/templates/_names.tpl#L35, add a macro to calculate subchart fullname and use it to calculate dns trim all metadata.name to not go above 63 chars ### Benefits <!-- What benefits will be realized by the code change? --> fix for https://gitea.com/gitea/helm-chart/issues/345 ### Possible drawbacks <!-- Describe any known limitations with your change --> ### Applicable issues <!-- Enter any applicable Issues here (You can reference an issue using #). Please remove this section if there is no referenced issue. --> - fixes #345 ### Additional information <!-- If there's anything else that's important and relevant to your pull request, mention that information here. Please remove this section if it remains empty. --> ### ⚠ BREAKING <!-- If there's a breaking change, please shortly describe in which way users are affected and how they can mitigate it. If there are no breakings, please remove this section. --> ### Checklist <!-- [Place an '[X]' (no spaces) in all applicable fields. Please remove unrelated fields.] --> - [x] Parameters are documented in the `values.yaml` and added to the `README.md` using [readme-generator-for-helm](https://github.com/bitnami-labs/readme-generator-for-helm) - [x] Breaking changes are documented in the `README.md`
jouve force-pushed dns from 3c365f2545 to 86f0b1abcd 2022-09-10 15:30:54 +00:00 Compare
jouve force-pushed dns from 86f0b1abcd to 923385674a 2022-09-10 15:34:14 +00:00 Compare
jouve force-pushed dns from 923385674a to dc3f7aa714 2023-03-21 23:29:49 +00:00 Compare
pat-s reviewed 2023-03-23 21:50:41 +00:00
pat-s left a comment
Member

I don't mind this change given the issues of too long namespace names etc.

Yet I wonder why three different lenghts were chosen here (49, 58, 59). Should we agree on one and maybe use a round number (50?)?

I don't mind this change given the issues of too long namespace names etc. Yet I wonder why three different lenghts were chosen here (49, 58, 59). Should we agree on one and maybe use a round number (50?)?
@ -109,3 +122,3 @@
{{- define "gitea.default_domain" -}}
{{- printf "%s-gitea.%s.svc.%s" (include "gitea.fullname" .) .Release.Namespace .Values.clusterDomain | trunc 63 | trimSuffix "-" -}}
{{- printf "%s-http.%s.svc.%s" (include "gitea.fullname" . | trunc 58) .Release.Namespace .Values.clusterDomain -}}
Member

Is there a strong reason for the naming change here? Otherwise I'd probably keep it as is?

Is there a strong reason for the naming change here? Otherwise I'd probably keep it as is?
pat-s added the
kind
enhancement
label 2023-03-23 21:51:10 +00:00
Member

Yet I wonder why three different lenghts were chosen here (49, 58, 59). Should we agree on one and maybe use a round number (50?)?

Ah sorry, I'm dumb 🤦 The different values should add up to 63 with the additional characters in the respective sections.

@jouve Can you resolve the current conflicts? I guess we could merge this then, I would also ask @justusbunsi for a review then.

> Yet I wonder why three different lenghts were chosen here (49, 58, 59). Should we agree on one and maybe use a round number (50?)? Ah sorry, I'm dumb 🤦 The different values should add up to 63 with the additional characters in the respective sections. @jouve Can you resolve the current conflicts? I guess we could merge this then, I would also ask @justusbunsi for a review then.
Member

Reviewing these changes once more I noticed that this should be considered a breaking change due to potential name changes in RBAC related matters and cluster cross connectivity (changing service name), requiring cluster adjustments outside of the Helm Chart. First coming in mind: backup cronjob accessing the secrets or another application using the services.
Please adjust the PR description accordingly.

As there are different trimming limits for a reason, I also vote for explicit unit tests here to detect changes of those values and ensure correct templating behavior.

Reviewing these changes once more I noticed that this should be considered a breaking change due to potential name changes in RBAC related matters and cluster cross connectivity (changing service name), requiring cluster adjustments outside of the Helm Chart. First coming in mind: backup cronjob accessing the secrets or another application using the services. Please adjust the PR description accordingly. As there are different trimming limits for a reason, I also vote for explicit unit tests here to detect changes of those values and ensure correct templating behavior.
jouve force-pushed dns from dc3f7aa714 to 471222b1d1 2023-04-02 22:29:22 +00:00 Compare
jouve force-pushed dns from 471222b1d1 to 788aa805b2 2023-04-02 22:32:53 +00:00 Compare
Author
Contributor

I removed change to the secret as it does not have the 63 length limit.

As for the services, I would say the change is backward compatible:

  • all services length was <=63 chars, nothing changes
  • any services length was >63 chars is impossible as it would have been rejected by the apiserver:
    The Service "abcdefghijklmno-abcdefghijklmno-abcdefghijklmno-abcdefghijklmnop" is invalid: metadata.name: Invalid value: "abcdefghijklmno-abcdefghijklmno-abcdefghijklmno-abcdefghijklmnop": must be no more than 63 characters
    
I removed change to the secret as it does not have the 63 length limit. As for the services, I would say the change is backward compatible: * all services length was <=63 chars, nothing changes * any services length was >63 chars is impossible as it would have been rejected by the apiserver: ``` The Service "abcdefghijklmno-abcdefghijklmno-abcdefghijklmno-abcdefghijklmnop" is invalid: metadata.name: Invalid value: "abcdefghijklmno-abcdefghijklmno-abcdefghijklmno-abcdefghijklmnop": must be no more than 63 characters ```
jouve force-pushed dns from 788aa805b2 to 0b3d7dc836 2023-04-02 22:40:55 +00:00 Compare
jouve force-pushed dns from 0b3d7dc836 to d14a17a605 2023-04-02 22:41:24 +00:00 Compare
pat-s added the
status
needs-reviews
label 2023-04-07 12:23:38 +00:00
pat-s reviewed 2023-04-15 21:01:41 +00:00
@ -101,3 +114,3 @@
{{- define "gitea.default_domain" -}}
{{- printf "%s-gitea.%s.svc.%s" (include "gitea.fullname" .) .Release.Namespace .Values.clusterDomain | trunc 63 | trimSuffix "-" -}}
{{- printf "%s-http.%s.svc.%s" (include "gitea.fullname" . | trunc 58) .Release.Namespace .Values.clusterDomain -}}
Member

@jouve Can you explain why you changed the name from gitea to http here?

It probably won't cause any issues but I would still like to understand it 🙂

@jouve Can you explain why you changed the name from `gitea` to `http` here? It probably won't cause any issues but I would still like to understand it 🙂
Author
Contributor

there is no {{ include "gitea.fullname" }}-gitea service defined in the chart, while -http exists:

helm install -n gitea gitea gitea/gitea :

k -n gitea get svc
NAME           TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
gitea-http     ClusterIP   None            <none>        3000/TCP   49d
gitea-ssh      ClusterIP   None            <none>        22/TCP     49d
curl -fsSL http://gitea-gitea.gitea.svc:3000
curl: (6) Could not resolve host: gitea-gitea.gitea.svc

curl -fsSL http://gitea-http.gitea.svc:3000 | head -n2
<!DOCTYPE html>
<html lang="en-US" class="theme-auto">
...
there is no `{{ include "gitea.fullname" }}-gitea` service defined in the chart, while -http exists: `helm install -n gitea gitea gitea/gitea` : ``` k -n gitea get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE gitea-http ClusterIP None <none> 3000/TCP 49d gitea-ssh ClusterIP None <none> 22/TCP 49d ``` ``` curl -fsSL http://gitea-gitea.gitea.svc:3000 curl: (6) Could not resolve host: gitea-gitea.gitea.svc curl -fsSL http://gitea-http.gitea.svc:3000 | head -n2 <!DOCTYPE html> <html lang="en-US" class="theme-auto"> ... ```
Member

Ah good catch! This looks like an uncaught bug then. Thanks for the reply!

Ah good catch! This looks like an uncaught bug then. Thanks for the reply!
Member

Awesome catch.

Awesome catch.
pat-s approved these changes 2023-04-19 07:55:58 +00:00
pat-s left a comment
Member

LGTM, thanks @jouve!
I agree it won't cause a breaking change in it's current form.

Would be good to get a second review from @justusbunsi on it!

LGTM, thanks @jouve! I agree it won't cause a breaking change in it's current form. Would be good to get a second review from @justusbunsi on it!
justusbunsi requested changes 2023-04-19 15:40:03 +00:00
justusbunsi left a comment
Member

Reject to prevent merge. Still testing.

Reject to prevent merge. Still testing.
@ -94,2 +106,3 @@
{{- define "postgresql.dns" -}}
{{- printf "%s-postgresql.%s.svc.%s:%g" .Release.Name .Release.Namespace .Values.clusterDomain .Values.postgresql.global.postgresql.service.ports.postgresql -}}
{{- printf "%s.%s.svc.%s:%g" (include "gitea.dependency.fullname" (dict "chartName" "postgresql" "chartValues" .Values.postgresql "context" .)) .Release.Namespace .Values.clusterDomain .Values.postgresql.global.postgresql.servicePort -}}
Member

The proposed changes do not match the values specification for postgresql. Correct would be:

- {{- printf "%s.%s.svc.%s:%g" (include "gitea.dependency.fullname" (dict "chartName" "postgresql" "chartValues" .Values.postgresql "context" .)) .Release.Namespace .Values.clusterDomain .Values.postgresql.global.postgresql.servicePort -}}
+ {{- printf "%s.%s.svc.%s:%g" (include "gitea.dependency.fullname" (dict "chartName" "postgresql" "chartValues" .Values.postgresql "context" .)) .Release.Namespace .Values.clusterDomain .Values.postgresql.global.postgresql.service.ports.postgresql -}}
The proposed changes do not match the values specification for postgresql. Correct would be: ```diff - {{- printf "%s.%s.svc.%s:%g" (include "gitea.dependency.fullname" (dict "chartName" "postgresql" "chartValues" .Values.postgresql "context" .)) .Release.Namespace .Values.clusterDomain .Values.postgresql.global.postgresql.servicePort -}} + {{- printf "%s.%s.svc.%s:%g" (include "gitea.dependency.fullname" (dict "chartName" "postgresql" "chartValues" .Values.postgresql "context" .)) .Release.Namespace .Values.clusterDomain .Values.postgresql.global.postgresql.service.ports.postgresql -}} ```
@ -97,3 +110,3 @@
{{- define "memcached.dns" -}}
{{- printf "%s-memcached.%s.svc.%s:%g" .Release.Name .Release.Namespace .Values.clusterDomain .Values.memcached.service.ports.memcached | trunc 63 | trimSuffix "-" -}}
{{- printf "%s.%s.svc.%s:%g" (include "gitea.dependency.fullname" (dict "chartName" "memcached" "chartValues" .Values.memcached "context" .)) .Release.Namespace .Values.clusterDomain .Values.memcached.service.port -}}
Member

The proposed changes do not match the values specification for memcached. Correct would be:

- {{- printf "%s.%s.svc.%s:%g" (include "gitea.dependency.fullname" (dict "chartName" "memcached" "chartValues" .Values.memcached "context" .)) .Release.Namespace .Values.clusterDomain .Values.memcached.service.port -}}
+ {{- printf "%s.%s.svc.%s:%g" (include "gitea.dependency.fullname" (dict "chartName" "memcached" "chartValues" .Values.memcached "context" .)) .Release.Namespace .Values.clusterDomain .Values.memcached.service.ports.memcached -}}
The proposed changes do not match the values specification for memcached. Correct would be: ```diff - {{- printf "%s.%s.svc.%s:%g" (include "gitea.dependency.fullname" (dict "chartName" "memcached" "chartValues" .Values.memcached "context" .)) .Release.Namespace .Values.clusterDomain .Values.memcached.service.port -}} + {{- printf "%s.%s.svc.%s:%g" (include "gitea.dependency.fullname" (dict "chartName" "memcached" "chartValues" .Values.memcached "context" .)) .Release.Namespace .Values.clusterDomain .Values.memcached.service.ports.memcached -}} ```
Member

Follow-up to my review. I've played a bit more with it.

From what I see in my tests, we don't need to truncate the app.ini settings in the first place. I was able to create a working release with FQDNs in app.ini that exceed the current limits by a lot:

helm upgrade -i -n some-really-long-namespace-with-lots-of-dashed-parts --create-namespace long-gitea-release-name ./ creates a memcached host with 104 characters (port excluded): long-gitea-release-name-memcached.some-really-long-namespace-with-lots-of-dashed-parts.svc.cluster.local:11211.

According to the Domain Name RFC, the size limit for FQDN is 255. With Helm limiting its release name to 53 characters and the Kubernetes internal label/name restriction of 63 characters, there is a much higher chance that such release will fail at a much earlier stage than seeing it fail due to a too long FQDN inside app.ini (if even possible with the mentioned restrictions). It seems much more important to ensure that the resource names and labels do not exceed their limits.

Especially when Kubernetes automatically adds suffixes to names.

When I stretch the limit of release and namespace names, I the release does not work, as I am getting errors from and Postgres StatefulSet.

helm upgrade -i --create-namespace --namespace some-really-long-namespace-with-lots-of-dashed-parts-in-it release-name-covering-so-much-more-components ./

This generates a Gitea StatefulSet that creates a pod with label controller-revision-hash=release-name-covering-so-much-more-components-gitea-f858b9499 which is 62 characters long. This is something we have under control by not allowing a very long release name. So far so good.
What we don't have under control is what the Postgres dependency does. In my case it tries to render a pod with the similar label controller-revision-hash=release-name-covering-so-much-more-components-postgresql-7c8cc97779, which is longer than 63 characters. This leads to a release that will never be ready-to-use.

I tested if tools like kubeval (or its successor kubeconform) would detect such cases. Unfortunately, this is Kubernetes internal naming and not detectable. We would need to reject name lengths that would potentially create names/labels longer than 63 characters.

Or a chart dependency trims on its own - incorrectly.

helm upgrade -i --create-namespace --namespace some-really-long-namespace-with-lots-of-dashed-parts-in-it release-name-covering-so-much-more-components-n-stuff ./ does not render the templates correctly. The Postgres chart trucates the two service names in such way that it leaves two identical service names.

Leading to

Error: 1 error occurred:
        * services "release-name-covering-so-much-more-components-n-stuff-postgresq" already exists

So my suggestion is to drop the trunc within gitea.default_domain (namespace limit restriction by Kubernetes catches that first) and memcached.dns helper functions while keeping the fix within gitea.default_domain. That is a valid fix. Good catch.

Those changes would be three lines and definitely no breaking change. Apologies to @jouve for not mentioning this earlier. I actually did not knew it behaves like it does before testing your Pull Request and trying to break Kubernetes. 😃

Follow-up to my review. I've played a bit more with it. From what I see in my tests, we don't need to truncate the `app.ini` settings in the first place. I was able to create a working release with FQDNs in `app.ini` that exceed the current limits by a lot: `helm upgrade -i -n some-really-long-namespace-with-lots-of-dashed-parts --create-namespace long-gitea-release-name ./` creates a memcached host with 104 characters (port excluded): `long-gitea-release-name-memcached.some-really-long-namespace-with-lots-of-dashed-parts.svc.cluster.local:11211`. According to the [Domain Name RFC](https://www.ietf.org/rfc/rfc1035.html#section-2.3.4), the size limit for FQDN is 255. With Helm limiting its release name to 53 characters and the Kubernetes internal label/name restriction of 63 characters, there is a much higher chance that such release will fail at a much earlier stage than seeing it fail due to a too long FQDN inside `app.ini` (if even possible with the mentioned restrictions). It seems much more important to ensure that the resource names and labels do not exceed their limits. <details> <summary>Especially when Kubernetes automatically adds suffixes to names.</summary> When I stretch the limit of release and namespace names, I the release does not work, as I am getting errors from and Postgres StatefulSet. `helm upgrade -i --create-namespace --namespace some-really-long-namespace-with-lots-of-dashed-parts-in-it release-name-covering-so-much-more-components ./` This generates a Gitea StatefulSet that creates a pod with label `controller-revision-hash=release-name-covering-so-much-more-components-gitea-f858b9499` which is 62 characters long. This is something we have under control by not allowing a very long release name. So far so good. What we don't have under control is what the Postgres dependency does. In my case it tries to render a pod with the similar label `controller-revision-hash=release-name-covering-so-much-more-components-postgresql-7c8cc97779`, which is longer than 63 characters. This leads to a release that will never be ready-to-use. I tested if tools like `kubeval` (or its successor [kubeconform](https://github.com/yannh/kubeconform)) would detect such cases. Unfortunately, this is Kubernetes internal naming and not detectable. We would need to reject name lengths that would potentially create names/labels longer than 63 characters. </details> <details> <summary>Or a chart dependency trims on its own - incorrectly.</summary> `helm upgrade -i --create-namespace --namespace some-really-long-namespace-with-lots-of-dashed-parts-in-it release-name-covering-so-much-more-components-n-stuff ./` does not render the templates correctly. The Postgres chart trucates the two service names in such way that it leaves two identical service names. Leading to ``` Error: 1 error occurred: * services "release-name-covering-so-much-more-components-n-stuff-postgresq" already exists ``` </details> --- So my suggestion is to drop the `trunc` within `gitea.default_domain` (namespace limit restriction by Kubernetes catches that first) and `memcached.dns` helper functions while keeping the fix within `gitea.default_domain`. That is a valid fix. Good catch. Those changes would be three lines and definitely _no_ breaking change. Apologies to @jouve for not mentioning this earlier. I actually did not knew it behaves like it does before testing your Pull Request and trying to break Kubernetes. 😃
jouve force-pushed dns from d14a17a605 to b8d2dbdd0a 2023-04-19 19:52:25 +00:00 Compare
jouve force-pushed dns from b8d2dbdd0a to ebe07ef640 2023-04-19 19:56:39 +00:00 Compare
justusbunsi reviewed 2023-04-19 20:16:37 +00:00
@ -97,3 +97,3 @@
{{- define "memcached.dns" -}}
{{- printf "%s-memcached.%s.svc.%s:%g" .Release.Name .Release.Namespace .Values.clusterDomain .Values.memcached.service.ports.memcached | trunc 63 | trimSuffix "-" -}}
{{- printf "%s.%s.svc.%s:%g" (include "common.names.dependency.fullname" (dict "chartName" "memcached" "chartValues" .Values.memcached "context" .)) .Release.Namespace .Values.clusterDomain .Values.memcached.service.ports.memcached -}}
Member

What's the intention of using the common package from bitnami? Consistency with names?
This has the potential to break when updating memcached only and keep Postgres version, or vice versa. The "common" package is not a direct dependency of this chart.

What's the intention of using the common package from bitnami? Consistency with names? This has the potential to break when updating memcached only and keep Postgres version, or vice versa. The "common" package is not a direct dependency of this chart.
Author
Contributor

it's to match how bitnami charts calculate the name :
own name with common.names.fullname and common.names.dependency.fullname when in the context of a subchart.

example in bitnami/wordpress

you don't need direct dependency to common, templates are shared between all charts / there is no scope: https://helm.sh/docs/chart_template_guide/subcharts_and_globals/#sharing-templates-with-subcharts

Same a you, I'd expect that a backward imcompatible of memcached or pg would break many things.

I'll revert that tomorrow anyway

it's to match how bitnami charts calculate the name : own name with [common.names.fullname](https://github.com/bitnami/charts/blob/main/bitnami/common/templates/_names.tpl#L21) and [common.names.dependency.fullname](https://github.com/bitnami/charts/blob/main/bitnami/common/templates/_names.tpl#L39) when in the context of a subchart. example in [bitnami/wordpress](https://github.com/bitnami/charts/blob/main/bitnami/wordpress/templates/_helpers.tpl#L7) you don't need direct dependency to common, templates are shared between all charts / there is no scope: https://helm.sh/docs/chart_template_guide/subcharts_and_globals/#sharing-templates-with-subcharts Same a you, I'd expect that a backward imcompatible of memcached or pg would break many things. I'll revert that tomorrow anyway
jouve force-pushed dns from ebe07ef640 to ffbc3ae44e 2023-04-24 21:21:29 +00:00 Compare
Member

Dang it. I totally missed that you already committed changes.

Dang it. I totally missed that you already committed changes.
justusbunsi requested changes 2023-05-30 18:41:01 +00:00
justusbunsi left a comment
Member

Sorry for not reviewing earlier.

Sorry for not reviewing earlier.
@ -96,4 +96,3 @@
{{- end -}}
{{- define "memcached.dns" -}}
{{- printf "%s-memcached.%s.svc.%s:%g" .Release.Name .Release.Namespace .Values.clusterDomain .Values.memcached.service.ports.memcached | trunc 63 | trimSuffix "-" -}}
Member

Same question as above below.

EDIT: I expected this to be second comment 😆.

Same question as ~~above~~ below. EDIT: I expected this to be second comment 😆.
@ -92,15 +92,15 @@ app.kubernetes.io/instance: {{ .Release.Name }}
{{- end -}}
{{- define "postgresql.dns" -}}
{{- printf "%s-postgresql.%s.svc.%s:%g" .Release.Name .Release.Namespace .Values.clusterDomain .Values.postgresql.global.postgresql.service.ports.postgresql -}}
Member

Any reason you removed the -postgresql suffix? Running helm template --create-namespace --namespace gitea gitea-release ./ and looking at the generated Postgres service, its name is gitea-release-postgresql, leading to FQDN (+port) gitea-release-postgresql.gitea.svc.cluster.local:5432.

With these changes, the app.ini gets the value HOST=gitea-release.gitea.svc.cluster.local:5432, which is incorrect from what I see.

Any reason you removed the `-postgresql` suffix? Running `helm template --create-namespace --namespace gitea gitea-release ./` and looking at the generated Postgres service, its name is `gitea-release-postgresql`, leading to FQDN (+port) `gitea-release-postgresql.gitea.svc.cluster.local:5432`. With these changes, the app.ini gets the value `HOST=gitea-release.gitea.svc.cluster.local:5432`, which is incorrect from what I see.
Member

@jouve Do you want to continue with this PR?

@jouve Do you want to continue with this PR?
Member

This looks stale meanwhile. @justusbunsi Do you want to continue here and finish it? Since you've done so much reviewing already. Or should we close? You decide!

This looks stale meanwhile. @justusbunsi Do you want to continue here and finish it? Since you've done so much reviewing already. Or should we close? You decide!
Member

@pat-s Yes, it contains valid fixes. I am going to create a superseding PR.

@pat-s Yes, it contains valid fixes. I am going to create a superseding PR.
Member

Superseded by #560.

Superseded by #560.
justusbunsi closed this pull request 2023-11-11 16:39:54 +00:00
pat-s referenced this issue from a commit 2023-11-14 22:23:02 +00:00
All checks were successful
check-and-test / check-and-test (pull_request) Successful in 38s
Required
Details

Pull request closed

Sign in to join this conversation.
No description provided.