Display SVG files as images instead of text #14101

techknowlogick · 2020-12-22T00:03:39Z

jtran commented

2020-12-22 00:03:39 +00:00

(Migrated from github.com)

This is an attempt to close #1095. It makes the following changes:

Detects SVG image files and serves them with Content-Type: image/svg+xml instead of text/plain, text/html, or text/xml. This allows users to view an SVG as an image when browsing a repo's tree and when it's embedded in a markdown file.
Serves extra security headers for SVG files in raw and media routes: Content-Security-Policy: default-src 'none'; style-src 'unsafe-inline'; sandbox and X-Content-Type-Options: nosniff. This prevents the execution of scripts in untrusted SVG files.
Adds regression tests for the above
UI toggle on the view file page to switch between the text source and the rendered image for SVG files
Adds a config setting to disable the feature
~Hides the "raw" button when viewing SVG files since unsafe styles aren't displayed, and viewing the image without styles may be confusing. Users can always right-click on the image to view or save it.~

I tested this with https://github.com/edgar-bonet/test-svg-mime in Firefox and Chrome.

This is an attempt to close #1095. It makes the following changes: - Detects SVG image files and serves them with `Content-Type: image/svg+xml` instead of `text/plain`, `text/html`, or `text/xml`. This allows users to view an SVG as an image when browsing a repo's tree and when it's embedded in a markdown file. - Serves extra security headers for SVG files in raw and media routes: `Content-Security-Policy: default-src 'none'; style-src 'unsafe-inline'; sandbox` and `X-Content-Type-Options: nosniff`. This prevents the execution of scripts in untrusted SVG files. - Adds regression tests for the above - UI toggle on the view file page to switch between the text source and the rendered image for SVG files - Adds a config setting to disable the feature - ~Hides the "raw" button when viewing SVG files since unsafe styles aren't displayed, and viewing the image without styles may be confusing. Users can always right-click on the image to view or save it.~ I tested this with https://github.com/edgar-bonet/test-svg-mime in Firefox and Chrome.

👍 10 🎉 2

techknowlogick reviewed 2020-12-22 02:12:28 +00:00

techknowlogick left a comment

Thank you for this PR, and especially thank you for including tests & documentation :)

This is not an exhaustive review, just a few preliminary comments.

Thank you for this PR, and especially thank you for including tests & documentation :) This is not an exhaustive review, just a few preliminary comments.

custom/conf/app.example.ini

techknowlogick commented

2020-12-22 02:05:31 +00:00

Could this be grouped with other ui settings (eg. [ui.svg])?

Could this be grouped with other ui settings (eg. `[ui.svg]`)?

techknowlogick commented

2020-12-22 02:07:54 +00:00

Another note, could the setting itself be ENABLE_RENDER?

Another note, could the setting itself be `ENABLE_RENDER`?

techknowlogick commented

2020-12-22 02:11:39 +00:00

Once the above have been modified, could you also add the additional settings to the "cheat sheet" found in docs/ directory.

integrations/view_test.go

techknowlogick commented

2020-12-22 02:04:32 +00:00

Could you update the year to 2020?

silverwind (Migrated from github.com) reviewed 2020-12-22 22:01:34 +00:00

routers/repo/download.go

silverwind (Migrated from github.com) commented

2020-12-22 22:01:34 +00:00

			ctx.Resp.Header().Set("Content-Security-Policy", "default-src 'none'; style-src 'unsafe-inline'; sandbox")

This fixes the SMIL animation in Firefox (which counts as inline style according to this) and matches the CSP GitHub uses.

```suggestion ctx.Resp.Header().Set("Content-Security-Policy", "default-src 'none'; style-src 'unsafe-inline'; sandbox") ``` This fixes the SMIL animation in Firefox (which counts as inline style according to [this](https://bugzilla.mozilla.org/show_bug.cgi?id=763879)) and matches the CSP GitHub uses.

silverwind (Migrated from github.com) reviewed 2020-12-22 22:03:55 +00:00

custom/conf/app.example.ini

silverwind (Migrated from github.com) commented

2020-12-22 22:03:55 +00:00

; Whether to render SVG files as images.  If SVG rendering is disabled, SVG files are displayed as text and cannot be embedded in markdown files as images.

We usually try to keep these in sync.

```suggestion ; Whether to render SVG files as images. If SVG rendering is disabled, SVG files are displayed as text and cannot be embedded in markdown files as images. ``` We usually try to keep these in sync.

lunny reviewed 2021-01-06 02:59:23 +00:00

routers/repo/view.go

						
				@ -399,0 +400,4 @@

					isDisplayingRendered := !isDisplayingSource

					isRepresentableAsText := base.IsRepresentableAsText(buf)

					ctx.Data["IsRepresentableAsText"] = isRepresentableAsText

					if !isRepresentableAsText {

lunny commented

2021-01-06 02:59:23 +00:00

So we should also change ctx.Data["IsDisplayingSource"]?

So we should also change `ctx.Data["IsDisplayingSource"]`?

lunny reviewed 2021-01-06 03:46:23 +00:00

routers/repo/download.go

						
				@ -47,2 +47,4 @@

						ctx.Resp.Header().Set("Content-Disposition", fmt.Sprintf(`inline; filename="%s"`, name))

						ctx.Resp.Header().Set("Access-Control-Expose-Headers", "Content-Disposition")

						if base.IsSVGImageFile(buf) {

							ctx.Resp.Header().Set("Content-Security-Policy", "default-src 'none'; style-src 'unsafe-inline'; sandbox")

lunny commented

2021-01-06 03:46:23 +00:00

Is unsafe-inline necessary here? ref. https://content-security-policy.com/unsafe-inline/

Is `unsafe-inline` necessary here? ref. https://content-security-policy.com/unsafe-inline/

jtran (Migrated from github.com) reviewed 2021-01-06 03:50:22 +00:00

routers/repo/view.go

						
				@ -399,0 +400,4 @@

					isDisplayingRendered := !isDisplayingSource

					isRepresentableAsText := base.IsRepresentableAsText(buf)

					ctx.Data["IsRepresentableAsText"] = isRepresentableAsText

					if !isRepresentableAsText {

jtran (Migrated from github.com) commented

2021-01-06 03:50:21 +00:00

Fixed.

jtran (Migrated from github.com) reviewed 2021-01-06 03:53:43 +00:00

routers/repo/download.go

						
				@ -47,2 +47,4 @@

						ctx.Resp.Header().Set("Content-Disposition", fmt.Sprintf(`inline; filename="%s"`, name))

						ctx.Resp.Header().Set("Access-Control-Expose-Headers", "Content-Disposition")

						if base.IsSVGImageFile(buf) {

							ctx.Resp.Header().Set("Content-Security-Policy", "default-src 'none'; style-src 'unsafe-inline'; sandbox")

jtran (Migrated from github.com) commented

2021-01-06 03:53:43 +00:00

Without it, many SVG images look nothing like they're intended to look. Many SVG attributes have a style alternative.

That said, I originally omitted it, but I was asked to add it in another comment in this PR. As long as it's safe, I think that including it is a good thing.

Without it, many SVG images look nothing like they're intended to look. Many SVG attributes have a style alternative. That said, I originally omitted it, but I was asked to add it in another comment in this PR. As long as it's safe, I think that including it is a good thing.

lunny reviewed 2021-01-06 04:23:56 +00:00

routers/repo/download.go

						
				@ -47,2 +47,4 @@

						ctx.Resp.Header().Set("Content-Disposition", fmt.Sprintf(`inline; filename="%s"`, name))

						ctx.Resp.Header().Set("Access-Control-Expose-Headers", "Content-Disposition")

						if base.IsSVGImageFile(buf) {

							ctx.Resp.Header().Set("Content-Security-Policy", "default-src 'none'; style-src 'unsafe-inline'; sandbox")

lunny commented

2021-01-06 04:23:56 +00:00

Please see my ref article, It said

When you want to allow inline scripts or styles on a page that uses CSP, there two much better options: nonce or hash.

Please see my ref article, It said > When you want to allow inline scripts or styles on a page that uses CSP, there two much better options: nonce or hash.

jtran (Migrated from github.com) reviewed 2021-01-06 05:57:28 +00:00

routers/repo/download.go

						
				@ -47,2 +47,4 @@

						ctx.Resp.Header().Set("Content-Disposition", fmt.Sprintf(`inline; filename="%s"`, name))

						ctx.Resp.Header().Set("Access-Control-Expose-Headers", "Content-Disposition")

						if base.IsSVGImageFile(buf) {

							ctx.Resp.Header().Set("Content-Security-Policy", "default-src 'none'; style-src 'unsafe-inline'; sandbox")

jtran (Migrated from github.com) commented

2021-01-06 05:57:27 +00:00

I read the article. I'm having trouble understanding how nonce or hash is actually better. If I understand it correctly, a hash prevents modification between the server and client, which is not more than what HTTPS would do. So this is to make the HTTP case safer.

If you think that's what we should do, it's not clear to me how to actually implement it. It seems like we would need to parse the SVG file enough to either insert nonces at the appropriate places in the SVG or read all the inline styles so that we can compute their hashes. Is this what you're saying? I'm guessing this must include all the style="..." attributes. There could potentially be many of them. I'll think about it some more.

I read the article. I'm having trouble understanding how nonce or hash is actually better. If I understand it correctly, a hash prevents modification between the server and client, which is not more than what HTTPS would do. So this is to make the HTTP case safer. If you think that's what we should do, it's not clear to me how to actually implement it. It seems like we would need to parse the SVG file enough to either insert nonces at the appropriate places in the SVG or read all the inline styles so that we can compute their hashes. Is this what you're saying? I'm guessing this must include all the `style="..."` attributes. There could potentially be many of them. I'll think about it some more.

CirnoT (Migrated from github.com) reviewed 2021-01-08 23:33:40 +00:00

routers/repo/download.go

						
				@ -47,2 +47,4 @@

						ctx.Resp.Header().Set("Content-Disposition", fmt.Sprintf(`inline; filename="%s"`, name))

						ctx.Resp.Header().Set("Access-Control-Expose-Headers", "Content-Disposition")

						if base.IsSVGImageFile(buf) {

							ctx.Resp.Header().Set("Content-Security-Policy", "default-src 'none'; style-src 'unsafe-inline'; sandbox")

CirnoT (Migrated from github.com) commented

2021-01-08 23:33:39 +00:00

@lunny The suggestion from the ref article is not applicable in this case, as the CSP is applied when rendering user-generated content (the SVG file). The general purpose of using nonce is so that website can include inline script in non-user controlled HTML content while at the same time benefiting from CSP blocking possible XSS attacks.

zeripath reviewed 2021-01-09 11:15:33 +00:00

modules/base/tool.go

						
				@ -31,1 +38,4 @@

				var svgTagRegex = regexp.MustCompile(`(?s)\A\s*(?:<!--.*?-->\s*)*<svg\b`)

				var svgTagInXMLRegex = regexp.MustCompile(`(?s)\A<\?xml\b.*?\?>\s*(?:<!--.*?-->\s*)*<svg\b`)

				// EncodeMD5 encodes string to md5 hex value.

zeripath commented

2021-01-09 11:15:33 +00:00

mostly we use 1024 bytes - I think that's what works best for most external libraries too.

zeripath reviewed 2021-01-09 11:36:25 +00:00

modules/base/tool.go

						
				@ -31,1 +38,4 @@

				var svgTagRegex = regexp.MustCompile(`(?s)\A\s*(?:<!--.*?-->\s*)*<svg\b`)

				var svgTagInXMLRegex = regexp.MustCompile(`(?s)\A<\?xml\b.*?\?>\s*(?:<!--.*?-->\s*)*<svg\b`)

				// EncodeMD5 encodes string to md5 hex value.

zeripath commented

2021-01-09 11:36:25 +00:00

would it be possible to add a test to ensure that if a file contains SVG beyond the sniff length that the SVG is still stripped? I could imagine it being something that unfamiliar developers might want to remove and we must not allow it to sneak back in unsafely.

jtran (Migrated from github.com) reviewed 2021-01-09 18:26:56 +00:00

modules/base/tool.go

						
				@ -31,1 +38,4 @@

				var svgTagRegex = regexp.MustCompile(`(?s)\A\s*(?:<!--.*?-->\s*)*<svg\b`)

				var svgTagInXMLRegex = regexp.MustCompile(`(?s)\A<\?xml\b.*?\?>\s*(?:<!--.*?-->\s*)*<svg\b`)

				// EncodeMD5 encodes string to md5 hex value.

jtran (Migrated from github.com) commented

2021-01-09 18:26:56 +00:00

I copied this from the go http module that was being used before. I can add a test, but I'm pretty sure 512 is what was being used before.

I copied this from the [`go http module`](https://github.com/golang/go/blob/dev.boringcrypto.go1.15/src/net/http/sniff.go#L13) that was being used before. I can add a test, but I'm pretty sure 512 is what was being used before.

zeripath reviewed 2021-01-09 20:36:33 +00:00

modules/base/tool.go

						
				@ -31,1 +38,4 @@

				var svgTagRegex = regexp.MustCompile(`(?s)\A\s*(?:<!--.*?-->\s*)*<svg\b`)

				var svgTagInXMLRegex = regexp.MustCompile(`(?s)\A<\?xml\b.*?\?>\s*(?:<!--.*?-->\s*)*<svg\b`)

				// EncodeMD5 encodes string to md5 hex value.

zeripath commented

2021-01-09 20:36:33 +00:00

I stand corrected! I'm misremembering the bytes/runes thing and lfs etc.

Would you be able to add a test that ensures a file with the svg tag after 512 chars still gets sanitized?

I stand corrected! I'm misremembering the bytes/runes thing and lfs etc. Would you be able to add a test that ensures a file with the svg tag after 512 chars still gets sanitized?

zeripath reviewed 2021-01-09 20:38:34 +00:00

modules/base/tool.go

						
				@ -31,1 +38,4 @@

				var svgTagRegex = regexp.MustCompile(`(?s)\A\s*(?:<!--.*?-->\s*)*<svg\b`)

				var svgTagInXMLRegex = regexp.MustCompile(`(?s)\A<\?xml\b.*?\?>\s*(?:<!--.*?-->\s*)*<svg\b`)

				// EncodeMD5 encodes string to md5 hex value.

zeripath commented

2021-01-09 20:38:34 +00:00

(I know your code doesn't change this - it's protecting against someone else removing the sanitization later by mistake

jtran (Migrated from github.com) reviewed 2021-01-09 20:56:34 +00:00

modules/base/tool.go

						
				@ -31,1 +38,4 @@

				var svgTagRegex = regexp.MustCompile(`(?s)\A\s*(?:<!--.*?-->\s*)*<svg\b`)

				var svgTagInXMLRegex = regexp.MustCompile(`(?s)\A<\?xml\b.*?\?>\s*(?:<!--.*?-->\s*)*<svg\b`)

				// EncodeMD5 encodes string to md5 hex value.

jtran (Migrated from github.com) commented

2021-01-09 20:56:34 +00:00

I'd be happy to add a test, but to be clear, there is no sanitizing or stripping in this PR. The security is by the nature of how <img> tags and content-security-policies work. So I'm not sure what the test would actually assert. If the svg tag weren't in the first 512 bytes, the file would get detected as something else, such as text/plain, text/html, or text/xml and served as that. When viewing the file, gitea would display it as a text file. When viewing the raw file, the browser will render it as text. When embedded in an <img> tag from a markdown file, the image would fail to display.

I'd be happy to add a test, but to be clear, there is no sanitizing or stripping in this PR. The security is by the nature of how `<img>` tags and content-security-policies work. So I'm not sure what the test would actually assert. If the svg tag weren't in the first 512 bytes, the file would get detected as something else, such as text/plain, text/html, or text/xml and served as that. When viewing the file, gitea would display it as a text file. When viewing the raw file, the browser will render it as text. When embedded in an `<img>` tag from a markdown file, the image would fail to display.

jtran (Migrated from github.com) reviewed 2021-01-09 21:05:08 +00:00

modules/base/tool.go

						
				@ -31,1 +38,4 @@

				var svgTagRegex = regexp.MustCompile(`(?s)\A\s*(?:<!--.*?-->\s*)*<svg\b`)

				var svgTagInXMLRegex = regexp.MustCompile(`(?s)\A<\?xml\b.*?\?>\s*(?:<!--.*?-->\s*)*<svg\b`)

				// EncodeMD5 encodes string to md5 hex value.

jtran (Migrated from github.com) commented

2021-01-09 21:05:08 +00:00

BTW, as it's currently implemented, we only attempt to look for an SVG content type if it's already been detected as one of: text/plain, text/html, text/xml. So it's not possible that having the svg tag later in the file than 512 bytes would cause a file to be detected as something other than those three content types.

zeripath reviewed 2021-01-09 22:21:14 +00:00

modules/base/tool.go

						
				@ -31,1 +38,4 @@

				var svgTagRegex = regexp.MustCompile(`(?s)\A\s*(?:<!--.*?-->\s*)*<svg\b`)

				var svgTagInXMLRegex = regexp.MustCompile(`(?s)\A<\?xml\b.*?\?>\s*(?:<!--.*?-->\s*)*<svg\b`)

				// EncodeMD5 encodes string to md5 hex value.

zeripath commented

2021-01-09 22:21:13 +00:00

That's exactly my point - we want the HTML file with embedded SVG after 512 to have the SVG sanitised away and displayed without it.

The test would simply assert that there were no SVG els.

The point is to prevent some well meaning change to the sanitizer leading to unsanitised SVG leaking.

That's exactly my point - we want the HTML file with embedded SVG after 512 to have the SVG sanitised away and displayed without it. The test would simply assert that there were no SVG els. The point is to prevent some well meaning change to the sanitizer leading to unsanitised SVG leaking.

zeripath reviewed 2021-01-09 22:27:46 +00:00

modules/base/tool.go

						
				@ -31,1 +38,4 @@

				var svgTagRegex = regexp.MustCompile(`(?s)\A\s*(?:<!--.*?-->\s*)*<svg\b`)

				var svgTagInXMLRegex = regexp.MustCompile(`(?s)\A<\?xml\b.*?\?>\s*(?:<!--.*?-->\s*)*<svg\b`)

				// EncodeMD5 encodes string to md5 hex value.

zeripath commented

2021-01-09 22:27:46 +00:00

If you really don't want to do it - it can be added later but it'd be good to get the test in as an example of something that is not going to be supported and also something we don't want to happen.

jtran (Migrated from github.com) reviewed 2021-01-11 02:07:22 +00:00

modules/base/tool.go

						
				@ -31,1 +38,4 @@

				var svgTagRegex = regexp.MustCompile(`(?s)\A\s*(?:<!--.*?-->\s*)*<svg\b`)

				var svgTagInXMLRegex = regexp.MustCompile(`(?s)\A<\?xml\b.*?\?>\s*(?:<!--.*?-->\s*)*<svg\b`)

				// EncodeMD5 encodes string to md5 hex value.

jtran (Migrated from github.com) commented

2021-01-11 02:07:22 +00:00

I think there's a misunderstanding. Maybe I'm not being clear enough. If there's a file that's detected as text/html with a root <svg> tag after 512 bytes, and we serve it with Content-Type: text/html, the browser will never up-convert to image/svg+xml because we're also serving the header X-Content-Type-Options: nosniff, which tells the browser, "take our word for it; don't try to do your own sniffing".

So I don't see a need to assert that there are no <svg> tags. Am I missing something?

I think there's a misunderstanding. Maybe I'm not being clear enough. If there's a file that's detected as text/html with a root `<svg>` tag after 512 bytes, and we serve it with `Content-Type: text/html`, the browser will never up-convert to `image/svg+xml` because we're also serving the header [`X-Content-Type-Options: nosniff`](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/X-Content-Type-Options), which tells the browser, "take our word for it; don't try to do your own sniffing". So I don't see a need to assert that there are no `<svg>` tags. Am I missing something?

jtran (Migrated from github.com) reviewed 2021-01-11 02:31:33 +00:00

modules/base/tool.go

						
				@ -31,1 +38,4 @@

				var svgTagRegex = regexp.MustCompile(`(?s)\A\s*(?:<!--.*?-->\s*)*<svg\b`)

				var svgTagInXMLRegex = regexp.MustCompile(`(?s)\A<\?xml\b.*?\?>\s*(?:<!--.*?-->\s*)*<svg\b`)

				// EncodeMD5 encodes string to md5 hex value.

jtran (Migrated from github.com) commented

2021-01-11 02:31:32 +00:00

I'm working on a test that asserts that a file with a root <svg> tag after 512 bytes is detected as text. Hopefully, that's what you mean.

I'm working on a test that asserts that a file with a root `<svg>` tag after 512 bytes is detected as text. Hopefully, that's what you mean.

zeripath approved these changes 2021-01-12 20:16:14 +00:00

lunny reviewed 2021-01-13 00:23:48 +00:00

routers/repo/download.go

						
				@ -47,2 +47,4 @@

						ctx.Resp.Header().Set("Content-Disposition", fmt.Sprintf(`inline; filename="%s"`, name))

						ctx.Resp.Header().Set("Access-Control-Expose-Headers", "Content-Disposition")

						if base.IsSVGImageFile(buf) {

							ctx.Resp.Header().Set("Content-Security-Policy", "default-src 'none'; style-src 'unsafe-inline'; sandbox")