When I started here at DramaFever, I inherited a little Go service that resizes and crops images dynamically for display across a variety of devices. Nothing too complex or too critical of a path, but still, a nice feature to have. You give the service the URL to an image and some transformations, and it applies those transformations to that image and returns to you the transformed image. It’s useful for dynamically cropping and scaling images for mobile / tablet / desktop devices. It’s a quiet service that hums along without bothering anyone. That is, until… well, read the error log:

fatal error: runtime: out of memory

During periods of high usage, my image service would eat up all the memory on the machine and crash. Fortunately, it’s Dockerized, so recovery is pretty easy – the container just restarts and processing picks back up. The magic of load balancing keeps it from affecting too many people. Only the users for which the service was processing requests when it bounced get their requests dropped on the floor. That’s not super terrible, but we can still do better.

We came up with a deceptively simple solution: just don’t run out of memory. If the service is about to run out of memory and it gets another request, it just doesn’t handle that request.

The Plan

In order do this, we need to know two things:

  • The amount of memory we’re using
  • The amount of memory available to us

Then we need to make a determination. If we’re using, e.g. 90% of the memory available to us, we simply respond with a 500 status code. We do that until our garbage collector has a chance to run, and our memory usage drops.

Allocated Memory

Surprisingly, how much memory we’re using is easier to find. The Go Team generously provided a runtime package that contains a MemStats struct, and a ReadMemStats func. ReadMemStats populates the MemStats struct. Then we examine MemStats.Alloc to see the number of bytes allocated and not yet freed.

It is well worth noting that ReadMemStats is a stop-the-world event, so you should be careful not to call it too frequently (where defining “too frequently” is left as an exercise to the reader).

Available Memory

However, determining how much memory we have available to us is a completely different matter. Go gives us the (poorly documented) syscall package. Syscall gives us a Rlimit struct and a GetRLimit func. The value stored in Rlimit.Max is the same as the value given if you were to run $ ulimit -m from your command line, telling us the maximum resident set size (chunk of memory that can be allocated by a process).

Unfortunately, “unlimited” is a perfectly valid value for ulimit -m, in which case our Rlimit.Max is 18,446,744,073,709,551,615. We need to know how many bytes of memory are in the machine itself. The straightforward but tedious steps to do this are:

  • Open /proc/meminfo
  • Read the contents of /proc/meminfo
  • Close /proc/meminfo
  • Match the line with the string “MemTotal:”
  • Split that line on the space character
  • Get the second value from the end (this will be the total installed memory in kB)
  • Convert that string to an unsigned 64-bit integer
  • Multiply that value by 1024.

Of course, that could go wrong in a bunch of different ways. Let’s ignore them all for the sake of brevity.

So, if we’re lucky, we now know how much memory is installed in the machine (let’s also assume that the OS and other processes take up a small percentage of that, and that our microservice can eat the lion’s share), and the maximum amount of memory the OS will allow us to allocate. Our available memory is the smaller of those two numbers: we’ll OOM if we try to allocate more memory than that.

In Summary

When we start our program, we determine our available memory. At a large granularity (so as to avoid stopping the world with unnecessary frequency), we determine our allocated memory. When our allocated memory gets within a fixed percentage (we used 90%) of our available memory, we stop processing new HTTP requests until some old requests finish and our garbage collector has had a chance to run.

That’s a huge pain in the butt, so we wrote a library to do it for us: