Fatal error: runtime: out of memory on trivial steps

HI,

We use the Elastic CI stack with our builds, and in the last few days we’ve started seeing exceptions like this in builds:

fatal error: runtime: out of memory
 
runtime stack:
runtime.throw(0x1445ed0, 0x16)
	/usr/local/go/src/runtime/panic.go:1116 +0x72 fp=0x7ffdf03bda28 sp=0x7ffdf03bd9f8 pc=0x434712
runtime.sysMap(0xc000000000, 0x4000000, 0x20078f8)
	/usr/local/go/src/runtime/mem_linux.go:169 +0xc5 fp=0x7ffdf03bda68 sp=0x7ffdf03bda28 pc=0x4183e5
runtime.(*mheap).sysAlloc(0x1ff2fc0, 0x400000, 0x0, 0x0)
	/usr/local/go/src/runtime/malloc.go:715 +0x1cd fp=0x7ffdf03bdb10 sp=0x7ffdf03bda68 pc=0x40b98d
runtime.(*mheap).grow(0x1ff2fc0, 0x1, 0x0)
	/usr/local/go/src/runtime/mheap.go:1286 +0x11c fp=0x7ffdf03bdb78 sp=0x7ffdf03bdb10 pc=0x42669c
runtime.(*mheap).allocSpan(0x1ff2fc0, 0x1, 0x2a00, 0x2007908, 0x0)
	/usr/local/go/src/runtime/mheap.go:1124 +0x6a0 fp=0x7ffdf03bdbf8 sp=0x7ffdf03bdb78 pc=0x4263e0
runtime.(*mheap).alloc.func1()
	/usr/local/go/src/runtime/mheap.go:871 +0x64 fp=0x7ffdf03bdc50 sp=0x7ffdf03bdbf8 pc=0x4628a4
runtime.(*mheap).alloc(0x1ff2fc0, 0x1, 0x4012a, 0x2200000003)
	/usr/local/go/src/runtime/mheap.go:865 +0x81 fp=0x7ffdf03bdca0 sp=0x7ffdf03bdc50 pc=0x425941
runtime.(*mcentral).grow(0x2003e98, 0x0)
	/usr/local/go/src/runtime/mcentral.go:255 +0x79 fp=0x7ffdf03bdce0 sp=0x7ffdf03bdca0 pc=0x417e09
runtime.(*mcentral).cacheSpan(0x2003e98, 0x2200000003)
	/usr/local/go/src/runtime/mcentral.go:106 +0x2bc fp=0x7ffdf03bdd28 sp=0x7ffdf03bdce0 pc=0x41793c
runtime.(*mcache).refill(0x7f22dae76108, 0x2a)
	/usr/local/go/src/runtime/mcache.go:138 +0x85 fp=0x7ffdf03bdd48 sp=0x7ffdf03bdd28 pc=0x417425
runtime.(*mcache).nextFree(0x7f22dae76108, 0x7f22dae7612a, 0x7f22b6556000, 0x7ffdf03bddd0, 0x7ffdf03bddf8)
	/usr/local/go/src/runtime/malloc.go:868 +0x87 fp=0x7ffdf03bdd80 sp=0x7ffdf03bdd48 pc=0x40c1b7
runtime.mallocgc(0x180, 0x14250a0, 0x7ffdf03bde01, 0x40d352)
	/usr/local/go/src/runtime/malloc.go:1036 +0x793 fp=0x7ffdf03bde20 sp=0x7ffdf03bdd80 pc=0x40caf3
runtime.newobject(0x14250a0, 0x4000)
	/usr/local/go/src/runtime/malloc.go:1165 +0x38 fp=0x7ffdf03bde50 sp=0x7ffdf03bde20 pc=0x40cee8
runtime.malg(0x8000, 0x2007960)
	/usr/local/go/src/runtime/proc.go:3360 +0x31 fp=0x7ffdf03bde90 sp=0x7ffdf03bde50 pc=0x43ed21
runtime.mpreinit(0x1fdbae0)
	/usr/local/go/src/runtime/os_linux.go:339 +0x2d fp=0x7ffdf03bdeb0 sp=0x7ffdf03bde90 pc=0x4318cd
runtime.mcommoninit(0x1fdbae0)
	/usr/local/go/src/runtime/proc.go:630 +0x108 fp=0x7ffdf03bdef8 sp=0x7ffdf03bdeb0 pc=0x438218
runtime.schedinit()
	/usr/local/go/src/runtime/proc.go:547 +0x95 fp=0x7ffdf03bdf50 sp=0x7ffdf03bdef8 pc=0x437e35
runtime.rt0_go(0x7ffdf03bdf88, 0x2, 0x7ffdf03bdf88, 0x0, 0x0, 0x2, 0x7ffdf03c00fc, 0x7ffdf03c0115, 0x0, 0x7ffdf03c011f, ...)
	/usr/local/go/src/runtime/asm_amd64.s:214 +0x125 fp=0x7ffdf03bdf58 sp=0x7ffdf03bdf50 pc=0x465145

We’re using c5.xlarge instances, so I’m a little surprised we could run out of memory on steps that just push previously build Docker images, etc.

Here’s a timeline:

Is there anything we might be able to do to stop this from happening?

Hi @geoffharcourt! :wave:

Yeah, that seems odd, sorry!

Are you running the agent itself with any process-level restrictions? We don’t do any by default, I don’t think.

Do you have anything which might be left running on the host between builds which might be consuming memory?

Do you have memory monitoring tools for the instances? Could you add some diagnostics in your elastic-stacks environment hook to dump ps -ejHF or something similar which could be used to see if there are lingering processes, or some that are leaking memory?

Cheers,
Sam

Hi @sj26 we weren’t, but the errors stopped a few days later. They persisted even overnight when our machines all scaled down and back up in the morning, so it didn’t appear to be related to particular instances, etc.

We’ll keep an eye out for this happening again to see if we can get any further data. Thanks for your help!

1 Like