Optimize logger #17308

Open
wxiaoguang wants to merge 12 commits from wxiaoguang/optimize-logger into main
Contributor

Background:

  1. The logger outputted too many redundant logs (eg: Started/Completed)
  2. No easy way to find a go func when seeing a log
  3. There was a bug introduced by https://github.com/go-gitea/gitea/pull/15176 , the log level isn't printed during startup

This PR

Fixes the problems above. It makes the log system more friendly to developers.

A setting option ROUTER_LOG_HANDLER is introduced. It's default to v2(the new). If users need to use v1 for fail2ban and others, the option can be set to v1 to use the old handler.

ps: In this PR, the long polling handlers also can be recognized, see the still-executing log in the screenshot, so no worry about can not see logs for slow queries.

About performance:

In v1, every request outputs 2 logs (Started/Completed)
In v2 (this), every request only outputs one log

  • all runtime reflections of handler functions are cached
  • the mutexes work in fast path for most cases (atomic incr) because there is seldom concurrency writings.

So generally speaking, the logger context doesn't cost much, using v2 with logger context will not affect performance

Instead, the performance may be improved because now only 1 log is outputted for each request.

Screenshots

Before

gitea-log1

After

gitea2

# Background: 1. The logger outputted too many redundant logs (eg: Started/Completed) 2. No easy way to find a go func when seeing a log 3. There was a bug introduced by https://github.com/go-gitea/gitea/pull/15176 , the log level isn't printed during startup # This PR Fixes the problems above. It makes the log system more friendly to developers. A setting option `ROUTER_LOG_HANDLER` is introduced. It's default to `v2`(the new). If users need to use `v1` for fail2ban and others, the option can be set to `v1` to use the old handler. ps: In this PR, the long polling handlers also can be recognized, see the `still-executing` log in the screenshot, so no worry about can not see logs for slow queries. ### About performance: In v1, every request outputs 2 logs (Started/Completed) In v2 (this), every request only outputs one log * all runtime reflections of handler functions are cached * the mutexes work in fast path for most cases (atomic incr) because there is seldom concurrency writings. So generally speaking, the `logger context` doesn't cost much, using v2 with `logger context` will not affect performance Instead, the performance may be improved because now only 1 log is outputted for each request. ### Screenshots #### Before ![gitea-log1](https://user-images.githubusercontent.com/2114189/137277361-c82c91cb-0f67-4932-b831-881cd04acca5.png) #### After ![gitea2](https://user-images.githubusercontent.com/2114189/137277399-c7b7087e-ae75-4004-9d55-9089fabf0b62.png)
zeripath reviewed 2021-10-14 08:28:24 +00:00
Contributor

This is wrong.

Why do you want the stack trace to start at the log.Stack level?

This is wrong. Why do you want the stack trace to start at the log.Stack level?
wxiaoguang reviewed 2021-10-14 08:33:18 +00:00
Author
Contributor

It is correct. If you do not print the whole stack, it's very difficult for developers to find where the panic was recovered. See the background (4).

I spent much time debugging this annoying issue.

It is correct. If you do not print the whole stack, it's very difficult for developers to find where the panic was recovered. See the background (4). I spent much time debugging this annoying issue.
wxiaoguang reviewed 2021-10-14 08:36:51 +00:00
Author
Contributor

The first stack frame is skipped already, I think ....

image

The first stack frame is skipped already, I think .... ![image](https://user-images.githubusercontent.com/2114189/137281815-5f3f00ee-172d-43d3-af2b-1df513b26799.png)
zeripath reviewed 2021-10-14 09:10:28 +00:00
Contributor

but the log line itself tells you where it was logged from?

Let's look at an example from https://github.com/go-gitea/gitea/issues/17030

2021/09/12 17:50:21 .../graceful/manager.go:135:func1() [C] PANIC during RunWithShutdownContext: runtime error: invalid memory address or nil pointer dereference
    Stacktrace: /usr/local/go/src/runtime/panic.go:212 (0x43ea5a)
    /usr/local/go/src/runtime/signal_unix.go:734 (0x458a32)
    /go/src/code.gitea.io/gitea/vendor/xorm.io/xorm/rows.go:92 (0xc2235b)
    /go/src/code.gitea.io/gitea/vendor/xorm.io/xorm/rows.go:142 (0xc22367)
    /go/src/code.gitea.io/gitea/vendor/xorm.io/xorm/rows.go:78 (0xc21811)
    /go/src/code.gitea.io/gitea/vendor/xorm.io/xorm/session_iterate.go:19 (0xc41316)
    /go/src/code.gitea.io/gitea/vendor/xorm.io/xorm/session_iterate.go:38 (0xc412f3)
    /go/src/code.gitea.io/gitea/models/repo_mirror.go:117 (0x146012c)
    /go/src/code.gitea.io/gitea/services/mirror/mirror.go:55 (0x2009284)
    /go/src/code.gitea.io/gitea/modules/cron/tasks_basic.go:25 (0x2018374)
    /go/src/code.gitea.io/gitea/modules/cron/tasks.go:88 (0x2017c21)
    /go/src/code.gitea.io/gitea/modules/graceful/manager.go:139 (0xc61685)
    /go/src/code.gitea.io/gitea/modules/cron/tasks.go:82 (0x2015ba4)
    /go/src/code.gitea.io/gitea/modules/cron/tasks.go:56 (0x2015a27)
    /go/src/code.gitea.io/gitea/vendor/github.com/gogs/cron/cron.go:171 (0x2004498)
    /usr/local/go/src/runtime/asm_amd64.s:1371 (0x47b360)

The panic was clearly caught in .../graceful/manager.go:135:func1()

If you don't skip the first two frames it looks like it arose from there.

but the log line itself tells you where it was logged from? Let's look at an example from https://github.com/go-gitea/gitea/issues/17030 ``` 2021/09/12 17:50:21 .../graceful/manager.go:135:func1() [C] PANIC during RunWithShutdownContext: runtime error: invalid memory address or nil pointer dereference Stacktrace: /usr/local/go/src/runtime/panic.go:212 (0x43ea5a) /usr/local/go/src/runtime/signal_unix.go:734 (0x458a32) /go/src/code.gitea.io/gitea/vendor/xorm.io/xorm/rows.go:92 (0xc2235b) /go/src/code.gitea.io/gitea/vendor/xorm.io/xorm/rows.go:142 (0xc22367) /go/src/code.gitea.io/gitea/vendor/xorm.io/xorm/rows.go:78 (0xc21811) /go/src/code.gitea.io/gitea/vendor/xorm.io/xorm/session_iterate.go:19 (0xc41316) /go/src/code.gitea.io/gitea/vendor/xorm.io/xorm/session_iterate.go:38 (0xc412f3) /go/src/code.gitea.io/gitea/models/repo_mirror.go:117 (0x146012c) /go/src/code.gitea.io/gitea/services/mirror/mirror.go:55 (0x2009284) /go/src/code.gitea.io/gitea/modules/cron/tasks_basic.go:25 (0x2018374) /go/src/code.gitea.io/gitea/modules/cron/tasks.go:88 (0x2017c21) /go/src/code.gitea.io/gitea/modules/graceful/manager.go:139 (0xc61685) /go/src/code.gitea.io/gitea/modules/cron/tasks.go:82 (0x2015ba4) /go/src/code.gitea.io/gitea/modules/cron/tasks.go:56 (0x2015a27) /go/src/code.gitea.io/gitea/vendor/github.com/gogs/cron/cron.go:171 (0x2004498) /usr/local/go/src/runtime/asm_amd64.s:1371 (0x47b360) ``` The panic was clearly caught in `.../graceful/manager.go:135:func1()` If you don't skip the first two frames it looks like it arose from there.
wxiaoguang reviewed 2021-10-14 09:20:30 +00:00
Author
Contributor

OK, I can revert the behavior. But in my mind depending the .../graceful/manager.go:135:func1() prefix of the log is not a good idea. And some stack traces are directly outputted to HTML response (in development mode), there is no way to see the recovery point if the prefix is missing.

Reverted.

OK, I can revert the behavior. But in my mind depending the `.../graceful/manager.go:135:func1()` prefix of the log is not a good idea. And some stack traces are directly outputted to HTML response (in development mode), there is no way to see the recovery point if the prefix is missing. Reverted.
zeripath reviewed 2021-10-14 18:43:42 +00:00
zeripath left a comment
Contributor

I don't completely understand what you're trying to do here but I really dislike the "panic occurred " changes.

I don't completely understand what you're trying to do here but I really dislike the "panic occurred " changes.
Contributor

Hmm I don't like this change of log line as it's pretty clear that it doesn't get recovered because it's logged at Fatal level. I don't understand what this "panic occurred, no recover," adds except make it more difficult to see the actual error that caused the panic.

			log.Fatal("PANIC: %v\n%s", panicked, log.Stack(2))
Hmm I don't like this change of log line as it's pretty clear that it doesn't get recovered because it's logged at Fatal level. I don't understand what this "panic occurred, no recover," adds except make it more difficult to see the actual error that caused the panic. ```suggestion log.Fatal("PANIC: %v\n%s", panicked, log.Stack(2)) ```
@ -126,6 +126,7 @@ func getStacktraceLogLevel(section *ini.Section, key string, defaultValue string
func generateLogConfig(sec *ini.Section, name string, defaults defaultLogOptions) (mode, jsonConfig, levelName string) {
level := getLogLevel(sec, "LEVEL", LogLevel)
levelName = level.String()
Contributor

This is what fixes the log level reporting regression.

This is what fixes the log level reporting regression.
Contributor

Seems somewhat of a shame to recouple common with public.

How about:

// WrapHandler wraps a handler for a router
func WrapHandler(handler http.HandlerFunc, prefix string, friendlyName ...string ) func(next http.Handler) http.Handler {
	return func(next http.Handler) http.Handler {
		return http.HandlerFunc(func(resp http.ResponseWriter, req *http.Request) {
			if !strings.HasPrefix(req.URL.Path, prefix) {
				next.ServeHTTP(resp, req)
				return
			}
			UpdateContextHandlerFuncInfo(req.Context(), handler, friendlyName...)
			handler(resp, req)
		})
	}
}
Seems somewhat of a shame to recouple common with public. How about: ```go // WrapHandler wraps a handler for a router func WrapHandler(handler http.HandlerFunc, prefix string, friendlyName ...string ) func(next http.Handler) http.Handler { return func(next http.Handler) http.Handler { return http.HandlerFunc(func(resp http.ResponseWriter, req *http.Request) { if !strings.HasPrefix(req.URL.Path, prefix) { next.ServeHTTP(resp, req) return } UpdateContextHandlerFuncInfo(req.Context(), handler, friendlyName...) handler(resp, req) }) } } ```
Contributor

I deliberately chose to leave the anonymous function reporting in the logger because I thought it was useful to indicate that this was coming from within an anonymous function instead of giving the false impression of being within the main function.

I deliberately chose to leave the anonymous function reporting in the logger because I thought it was useful to indicate that this was coming from within an anonymous function instead of giving the false impression of being within the main function.
Contributor

Why isn't this function in modules/log?

Why isn't this function in modules/log?
@ -49,0 +53,4 @@
}
if routerLogHandler == nil {
log.Warn("unknown router log handler '%s', fall back to v2")
routerLogHandler = NewLoggerHandlerV2(setting.RouterLogLevel)
Contributor

Same as above, IMHO "panic occurred and gets recovered" adds nothing and makes it harder to see the panic.

					log.Error("%v", combinedErr)
Same as above, IMHO "panic occurred and gets recovered" adds nothing and makes it harder to see the panic. ```suggestion log.Error("%v", combinedErr) ```
Contributor
						log.Error("%v", combinedErr)
```suggestion log.Error("%v", combinedErr) ```
Contributor
					log.Error("%v", combinedErr)
```suggestion log.Error("%v", combinedErr) ```
Contributor

What does recovered here mean?

It's a panic just log it as a PANIC. This "panic occurred and gets recovered" adds nothing but make it harder to see the PANIC.

					log.Error("%v", combinedErr)
What does recovered here mean? It's a panic just log it as a PANIC. This "panic occurred and gets recovered" adds nothing but make it harder to see the PANIC. ```suggestion log.Error("%v", combinedErr) ```
wxiaoguang reviewed 2021-10-15 19:19:02 +00:00
Author
Contributor

Sometimes the code may just print the error and panic again to pass the error to caller. So I think this message makes it clear that the panic recovery stops here.

Sometimes the code may just print the error and panic again to pass the error to caller. So I think this message makes it clear that the panic recovery stops here.
wxiaoguang reviewed 2021-10-15 19:20:45 +00:00
Author
Contributor

Why it should be? It is really simple and no other modules need it. And it is specialized for the new logger system (eg: fileNameShort, funcNameShort).

Why it should be? It is really simple and no other modules need it. And it is specialized for the new logger system (eg: fileNameShort, funcNameShort).
wxiaoguang reviewed 2021-10-15 19:21:26 +00:00
@ -49,0 +53,4 @@
}
if routerLogHandler == nil {
log.Warn("unknown router log handler '%s', fall back to v2")
routerLogHandler = NewLoggerHandlerV2(setting.RouterLogLevel)
Author
Contributor

As above.

As above.
wxiaoguang reviewed 2021-10-15 19:21:30 +00:00
Author
Contributor

As above.

As above.
wxiaoguang reviewed 2021-10-15 19:21:35 +00:00
Author
Contributor

As above.

As above.
wxiaoguang reviewed 2021-10-15 19:23:41 +00:00
Author
Contributor

We have line numbers in logs, it is easier to indicate where the log comes from. And when you see a anonymous func name like func1, func2, you still do not know what it is, it just makes the log longer and hard to read important information.

Trust me, I just tried and finally choose to remove the suffix. It makes the log more clear. ?

We have line numbers in logs, it is easier to indicate where the log comes from. And when you see a anonymous func name like `func1`, `func2`, you still do not know what it is, it just makes the log longer and hard to read important information. Trust me, I just tried and finally choose to remove the suffix. It makes the log more clear. ?
wxiaoguang reviewed 2021-10-15 19:26:08 +00:00
Author
Contributor

But the couple was already there (in routers/install/routes.go), I just move the code to a new place.

But the couple was already there (in `routers/install/routes.go`), I just move the code to a new place.
wxiaoguang reviewed 2021-10-15 19:27:45 +00:00
Author
Contributor

OK, I get your point, will improve it later. Thanks ~

OK, I get your point, will improve it later. Thanks ~
zeripath reviewed 2021-10-19 20:50:46 +00:00
Contributor

This should be:

go graceful.GetManager().RunWithShutdownContext(func (baseCtx context.Context) { ... })

or if you're really sure you don't need to wait for this goroutine to finish before moving on to terminate:

go func(basectx context.Context) { ... } (graceful.getManager().ShutdownContext())

But it's probably better to just have it tracked.

Graceful tracks functions that are run and will ensure that they're finished before moving on to terminate. Avoid allowing goroutines to escape outside of that control.

This should be: ```go go graceful.GetManager().RunWithShutdownContext(func (baseCtx context.Context) { ... }) ``` or if you're really sure you don't need to wait for this goroutine to finish before moving on to terminate: ```go go func(basectx context.Context) { ... } (graceful.getManager().ShutdownContext()) ``` But it's probably better to just have it tracked. Graceful tracks functions that are run and will ensure that they're finished before moving on to terminate. Avoid allowing goroutines to escape outside of that control.
zeripath reviewed 2021-10-19 20:57:13 +00:00
Contributor

I don't think the ticker is the right thing here as this function could take more than a second to process

			for {
				select {
				case <-baseCtx.Done():
					return
				case <-time.After(time.Second):
I don't think the ticker is the right thing here as this function could take more than a second to process ```suggestion for { select { case <-baseCtx.Done(): return case <-time.After(time.Second): ```
zeripath reviewed 2021-10-19 21:14:38 +00:00
@ -0,0 +217,4 @@
if localPanicErr != nil {
// we do not recover any panic, so let the framework handle the panic error
panic(localPanicErr)
Contributor

hmm... won't this make the panic stack start here when it's eventually caught?

hmm... won't this make the panic stack start here when it's eventually caught?
wxiaoguang reviewed 2021-10-20 02:25:11 +00:00
Author
Contributor

I am sure I don't want to wait for the goroutine. It just works like a background thread to check slow queries.

Update to go func(basectx context.Context) { ... } (graceful.getManager().ShutdownContext()), but I can not quite understand what's the difference here. They seem the same in my mind.

I am sure I don't want to wait for the goroutine. It just works like a background thread to check slow queries. Update to `go func(basectx context.Context) { ... } (graceful.getManager().ShutdownContext())`, but I can not quite understand what's the difference here. They seem the same in my mind.
wxiaoguang reviewed 2021-10-20 02:27:06 +00:00
Author
Contributor

I think the ticker is the right thing. We do not care about accurate duration here. It just does the check periodly, 500ms or 1500ms are all OK. And the code runs really fast here, no problem. I did my best to make sure everything work as fast as possible.

Using NewTicker can save a memory allocation, because time.After always allocates a new Timer struct.

I think the ticker is the right thing. We do not care about accurate duration here. It just does the check periodly, 500ms or 1500ms are all OK. And the code runs really fast here, no problem. I did my best to make sure everything work as fast as possible. Using NewTicker can save a memory allocation, because time.After always allocates a new Timer struct.
wxiaoguang reviewed 2021-10-20 02:41:14 +00:00
@ -0,0 +217,4 @@
if localPanicErr != nil {
// we do not recover any panic, so let the framework handle the panic error
panic(localPanicErr)
Author
Contributor

I think no problem here. the stack is full:

package main
func T3() {
	panic(1)
}
func T2() {
	defer func() {
		e := recover()
		panic(e)
	}()
	T3()
}
func T1() {
	T2()
}
func main() {
	T1()
}
panic: 1 [recovered]
        panic: 1

goroutine 1 [running]:
main.T2.func1()
        /Users/xiaoguang/work/test/go-test/main-cast.go:10 +0x25
panic({0x1058c40, 0x1075988})
        /usr/local/go/src/runtime/panic.go:1038 +0x215
main.T3(...)
        /Users/xiaoguang/work/test/go-test/main-cast.go:4
main.T2()
        /Users/xiaoguang/work/test/go-test/main-cast.go:12 +0x4a
main.T1(...)
        /Users/xiaoguang/work/test/go-test/main-cast.go:16
main.main()
        /Users/xiaoguang/work/test/go-test/main-cast.go:20 +0x18
I think no problem here. the stack is full: ``` package main func T3() { panic(1) } func T2() { defer func() { e := recover() panic(e) }() T3() } func T1() { T2() } func main() { T1() } ``` ``` panic: 1 [recovered] panic: 1 goroutine 1 [running]: main.T2.func1() /Users/xiaoguang/work/test/go-test/main-cast.go:10 +0x25 panic({0x1058c40, 0x1075988}) /usr/local/go/src/runtime/panic.go:1038 +0x215 main.T3(...) /Users/xiaoguang/work/test/go-test/main-cast.go:4 main.T2() /Users/xiaoguang/work/test/go-test/main-cast.go:12 +0x4a main.T1(...) /Users/xiaoguang/work/test/go-test/main-cast.go:16 main.main() /Users/xiaoguang/work/test/go-test/main-cast.go:20 +0x18 ```
wxiaoguang reviewed 2021-10-20 02:43:23 +00:00
@ -0,0 +217,4 @@
if localPanicErr != nil {
// we do not recover any panic, so let the framework handle the panic error
panic(localPanicErr)
Author
Contributor

And in real usage, this panic recover in logger context won't be triggered. We have other recovers. For example, if I do panic in web.Home, the output is:

2021/10/20 10:28:30 routers/web/base.go:132:1() [E] PANIC: 1
        /Users/xiaoguang/work/gitea/routers/web/home.go:26 (0x387f0eb)
                Home: panic(1)
        /Users/xiaoguang/work/gitea/modules/web/route.go:76 (0x35c3621)
                Wrap.func1: t(ctx)
...
2021/10/20 10:28:31 handler: web/home.go:25(web.Home) failed GET / for 127.0.0.1:54300, panic in 439.885644ms, err=1

Everything is fine here. The recover in logger context is just a fallback mechanism, make sure we can see a failed log, it doesn't affect anything.

And in real usage, this panic recover in logger context won't be triggered. We have other recovers. For example, if I do panic in `web.Home`, the output is: ``` 2021/10/20 10:28:30 routers/web/base.go:132:1() [E] PANIC: 1 /Users/xiaoguang/work/gitea/routers/web/home.go:26 (0x387f0eb) Home: panic(1) /Users/xiaoguang/work/gitea/modules/web/route.go:76 (0x35c3621) Wrap.func1: t(ctx) ... 2021/10/20 10:28:31 handler: web/home.go:25(web.Home) failed GET / for 127.0.0.1:54300, panic in 439.885644ms, err=1 ``` Everything is fine here. The recover in logger context is just a fallback mechanism, make sure we can see a `failed` log, it doesn't affect anything.
This repo is archived. You cannot comment on pull requests.
No reviewers
No Milestone
No project
No Assignees
2 Participants
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: lunny/gitea#17308
No description provided.