At DramaFever, we have logging requirements for our services, largely so we make life not-terrible for our Ops Team. We want to be able to configure where a log writes to, be able to configure what gets output from a log, and so on. For Python services, these can all be met just by logging the error; it’ll automatically use the configuration and get sent to our error reporting system, Sentry. However, things weren’t as smooth for our Go services. They had the extra task of having to manually send the Sentry error after every log output. That seemed like something we could fix.
To ensure the strategy would create useful output that included all of our Ops team’s requirements, I worked with Tim, our Director of Operations at the time.
The first step was to establish the reasoning behind our current log levels.
- Debug is to be used by developers only. Ops should never see these messages.
- Info is to be used to surface relevant information to Ops while the service is running but things are going according to plan. Most often, this means version information, ports listening on, etc.
- Warn is to be used when something goes wrong but we can gracefully degrade. The request didn’t fail, but we should probably know something didn’t go the way we had hoped.
- Error is to be used only when something went wrong that could not gracefully degrade. If the response code isn’t 400 or above, it’s not worthy of an error level message.
Beyond clarifying the semantics, we also hammered out some usability concerns. Timestamps that don’t use
/ or have the
T are easier for Unix tools to process. Having the format keep the fixed-width stuff (the timestamp and log level) at the beginning of the line (before things like the filename of the line that raise the log) also helps when filtering or sorting.
Once we established that our output would make the Ops team happy and productive (which are always good goals for your Ops team!), I set about actually building the thing.
The work done with the Ops team paid off: I had a clear understanding of what needed to be built, and knew what the output needed to look like. I had some good thoughts on how to get there. Building the actual logger wasn’t that hard. Create a type that has an
io.Writer, a mutex, and the level you want to log at:
When logging, we’ll obtain a lock on the mutex, write to
out, and release the lock. The lock helps us keep concurrency safety on a single file, and modeling as an
io.Writer instead of an
os.File helps us support writing to stdout or a buffer (when testing) or really anything else. Interfaces are great.
Then it’s a simple matter of defining helper methods to write to
out. We want our typical
Infof, etc. helpers that will write the error to the
io.Writer only if they fit the level configured in the
Logger. So if our
Logger has a
Level of info,
Debugf will not print anything.
For the actual output, I cribbed from the standard library pretty heavily. We didn’t need or want the configurability of the output offered by the standard library, so our code is more concise. We also wanted to support helper functions; sometimes a service has its own logging requirements or helper functions, and we didn’t want those to swallow the line number of the true call that raised the log. So we included a call depth property on our
Logger and ways to set it, so it will look to the call-that-called-it when deciding which file and line to attribute a log entry to.
But of course, we wanted to do more than write to a file, or we would have used one of the numerous logging libraries available to us. The main feature of this library was supposed to be Sentry integration. To achieve that, we added an optional sentry property to our
Logger type, containing a
*raven.Client that we could then use to send log entries to Sentry. We had to add some helper functions, like
AddMeta to the
Logger, so we could associate some metadata with log entries within Sentry. We also needed to add helper functions to set the package prefixes in Sentry that help determine whether an error is from our code or one of our dependencies, as well as the release metadata that helps us understand which release(s) an error is present in. Those are all pretty straightforward: set a property, send it along with the log entry when reporting the error.
It’s reporting the error that’s actually interesting.
To do that, I had to have a helper method that sent some data to Sentry. Then I could just call the helper method within
Errorf. Right? Sort of.
Sentry isn’t as simple as “I’ll record a line of text”. It has a more complex, deep understanding of the data it’s working with. It can display HTTP requests in useful ways. It gathers stack traces for errors. Things like that. But problematically, there isn’t always an
error type variable available when we reach an error condition. And some errors aren’t related to an HTTP request.
So I needed to figure out how to include those things sensibly, but not require them.
Our logging functions have a signature like this:
You’ll recognise these as similar to the
fmt package’s functions. But by using type assertions, we can check whether those interfaces are
error instances, and take further action. For example, when it’s an
error, we capture a stack trace and log it as an exception in Sentry. When it’s an
*http.Request we log it as a request in Sentry, with all the assorted metadata that comes with that.
But calling is still simple and natural:
There is, unfortunately, a downside: any meta information you want to include with the Sentry error will get written to your log file, and it’s sometimes not incredibly helpful to have a log file filled with the exploded output of an HTTP request on every line. In theory, the
AddMeta helper exists to fill this need:
But that’s not as nice, and still isn’t the developer experience we’re aiming for.
Our solution was a middleware pattern that leans heavily on
context.Context. We use helpers to embed our
Logger instances in the
context.Context, then helper methods to retrieve them. These helpers are built into the logging package itself, so everyone’s using the same set/get code for
Logger instances. This has the nice side effect of allowing our middleware to be aware of the
Loggers. So we can, for example, write the following:
context.Context-aware handler we wrap in that
SentryRequestMiddleware function will automatically associate the
*http.Request that was being processed with any errors raised while processing that request.
The ability to pass our
Logger instance around using
context.Context has proven invaluable for having reusable code that still outputs useful logs.
We’re able to pass the
Logger around using
context.Context and assign request-specific variables to it because we’re using copying heavily in the
Logger API. We purposefully and defensively copy a lot when using the
Logger. You may have noticed above that
AddMeta returns a
Logger, instead of mutating the
Logger in place; that’s because we wanted to be clear about the behaviour of the
Logger in relation to the
Any change you make to a
Logger will only be reflected in calls that are passed that new instance. The old instance will continue to behave as before.
So, for example:
The Sentry error generated from
log1 above will not have the
*http.Request associated with it. But the error generated from
log2 will. This allows us to make
Logger instances that are specific to users or requests without fear of accidentally stepping all over another request’s data.
We’re pretty happy with the things our logging package allows us to do. And because we
thought it was silly to pay for this repo to be private love open source, we’ve released it under an MIT license. You can find it on our GitHub. Issues and pull requests are welcome, but we use it as a tool for our specific needs, first and foremost. If you have different needs, hard forks are encouraged! We also make no promises about backwards compatibility on the
master branch. We use vendoring or version pinning internally to manage this, and encourage you to do the same if you’re using this library.