Tracking down memory leaks in Go applications can be a relatively tricky prospect. The first pass at tracking down issues involves the use of pprof, which can be very helpful in determining which object types are consuming memory, and which stack frame initially allocated those objects. There are literally hundreds of blog posts that cover how to use it to kind of triangulate where in your program a memory leak may be occurring.
For more complex memory leak situations, these approaches require a fair degree of inference about why any given object isn't being released by the garbage collector, as well as a compete knowledge of how your entire application -- including any imported packages -- may be keeping track of objects.
Luckily, the go runtime provides a debug.WriteHeapDump()
function that produces a heap dump file in a format that is reasonably well-documented. Unfortunately, there seem to be (as of February 2023) no tools that can perform any meaningful processing on those files. That's where heapspurs comes in.
Heapspurs parses the Go heap format and provides a small toolkit that can help identify exactly why an object is still hanging around.
Before diving into tool usage, it probably helps to have some understanding of the way Go handles memory. The article Go Memory Management provides an accessible and mercifully short introduction to that topic.
To build the heapspurs utility itself, go build ./cmd/heapspurs
should just work. Once you've done that, you'll need a heap file to operate on. In the program that you're troubleshooting, add code similar to the following, and trigger it when the system is in a state of unexpected memory use (I typically set up a handler for SIGUSR1 to cause a heap dump, but any approach should work):
f, err := os.Create(filename)
if err != nil {
panic("Could not open file for writing:" + err.Error())
} else {
runtime.GC()
debug.WriteHeapDump(f.Fd())
f.Close()
}
It's recommended to force a garbage collection cycle immediately prior to writing the heap, so that your analysis deals with only those objects that are actually reachable.
Once you have done that, you can start investigating what's going on in with your application's memory use.
If you want to simply see what records exist in the heapdump itself, you can invoke the tool with the --print
flag:
./heapspurs heapdump --print
This produces huge volumes of data, even for a relatively small program, in the order in which it is written into the file. The output will generally contain objects that look like the following:
DumpParams: BigEndian=false, PointerSize=8, Heap=0xc000000000-0xc004000000, Architecture=amd64, GOEXPERIMENT=go1.19.5, Cpus=12
TypeDescriptor for 'fmt.' @ 0x1003db320: Objects are 8 bytes
Itab @ 0x100456590: 0x1003db320
...
Object @ 0xc000016000 with 4 pointers in 64 bytes
Pointer[0]@0xc000016000 = 0x7ff7bfeff450
Pointer[1]@0xc000016010 = 0x7ff7bfeff45c
Pointer[2]@0xc000016020 = 0x7ff7bfeff467
Pointer[3]@0xc000016030 = 0x7ff7bfeff470
...
Goroutine[1] @ 0xc0000061a0: Waiting (wait for GC cycle), Stack @ 0xc0000d2700
StackFrame[0] @ 0xc0000d2700: runtime.systemstack_switch with 0 pointers in 8 bytes; child = 0x0
StackFrame[1] @ 0xc0000d2708: runtime/debug.WriteHeapDump with 0 pointers in 5832 bytes; child = 0xc0000d2700
StackFrame[2] @ 0xc0000d3dd0: main.main with 1 pointers in 432 bytes; child = 0xc0000d2708
Pointer[0]@0xc0000d3e70 = 0xc0000a6100
StackFrame[3] @ 0xc0000d3f80: runtime.main with 1 pointers in 96 bytes; child = 0xc0000d3dd0
Pointer[0]@0xc0000d3fc8 = 0xc0000d3fb8
...
DataSegment @ 0x100609660-0x100615fd0 with 2947 pointers
Pointer[0]@0x100609668 = 0x10036c590
Pointer[1]@0x100609670 = 0x10036c1f0
Pointer[2]@0x100609678 = 0x10036c4c0
Pointer[3]@0x100609680 = 0x10036c5f0
Pointer[4]@0x100609688 = 0x1005b4e40
...
BssSegment @ 0x100642fe0-0x100677460 with 10815 pointers
Pointer[0]@0x100642fe0 = 0xc0000f0000
Pointer[1]@0x100642fe8 = 0xc0000f00b0
Pointer[2]@0x100642ff0 = 0xc0000f0160
Pointer[3]@0x100642ff8 = 0xc0000aa008
Pointer[4]@0x100643000 = 0xc00008e060
...
RegisteredFinalizer @ 0xc0000a8420: FuncVal: 0x100412dd0, Type: 0x10039d6e0, Object Type: 0x10039d6e0
MemStats: {Alloc:257584 TotalAlloc:331968 Sys:13548560 Lookups:0 Mallocs:1632 Frees:688 HeapAlloc:257584 HeapSys:3768320 HeapIdle:2973696 HeapInuse:794624 HeapReleased:2752512 HeapObjects:944 StackInuse:393216 StackSys:393216 MSpanInuse:49248 MSpanSys:65088 MCacheInuse:14400 MCacheSys:15600 BuckHashSys:4154 GCSys:8102480 OtherSys:1199702 NextGC:4194304 LastGC:1677192712944952000 PauseTotalNs:18336 PauseNs:[18336 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] NumGC:1}
End Of File
In most cases, you're looking for unexpected objects and trying to figure out what anchors are preventing the garbage collector from deallocating them. In general, there are three things that can anchor an object and prevent it from being collected:
-
A variable in an active stack frame
-
A global variable in the BSS Segment
-
A global variable in the Data Segment
The first thing you'll need to do is get the address of an object that you think should be deallocated but isn't. There are several ways you might discover this (e.g., from your debugger or from the --search
flag described below), but the simplest may be simply printing it out:
fmt.Printf("%T address: 0x%x\n", object, unsafe.Pointer(object))
Once you have the address of the object of interest, you can ask for information about which anchor(s) are keeping it alive, using the --anchor
flag:
# ./heapspurs heapdump --address 0xc000019680 --anchors
BssSegment @ 0x100642fe0-0x100677460 with 10815 pointers
This tells us that the object at 0xc000019680
is ultimately rooted in the BSS segment, meaning that there is a series of pointers from the BSS (global variables) that ultimately lead to our object. (In many cases, the anchor list will also include one or more stack frames that transitively point to the object in question).
You can also ask about the object's direct owners by providing a --owners 1
flag (the "1" indicates that you only want to see the things directly pointing to the object):
# ./heapspurs heapdump --address 0xc000019680 --owners 1
Object @ 0xc000019680 with 11 pointers in 1152 bytes
Object @ 0xc0000076c0 with 11 pointers in 416 bytes
Object @ 0xc000007860 with 11 pointers in 416 bytes
Object @ 0xc000480000 with 11 pointers in 1152 bytes
You can ask for an arbitrary depth of owners (i.e., --owners 2
will show owners and owners' owners); or if you just want to print all owners back to every anchor, you can specify a depth of -1
:
./heapspurs heapdump --address 0xc000019680 --owners -1
Object @ 0xc000019680 with 11 pointers in 1152 bytes
Object @ 0xc0000076c0 with 11 pointers in 416 bytes
Object @ 0xc000007860 with 11 pointers in 416 bytes
Object @ 0xc000480000 with 11 pointers in 1152 bytes
Object @ 0xc000482000 with 11 pointers in 416 bytes
Object @ 0xc0004821a0 with 11 pointers in 416 bytes
BssSegment @ 0x100642fe0-0x100677460 with 10815 pointers
This, of course, all gets a bit tricky to reconstruct in your head. To help visualizing object relationships, the most intuitive way to consume information about object relationships is by producing an svg
file, which is what the tool does by default:
./heapspurs heapdump --address 0xc000019680
Rendering graph (7 nodes)...
The default output is left in heapdump.svg
, which you should be able to open in any web browser.
The object that you specified is highlighted in yellow, and all heap records that point to it -- even transitively -- are shown. From the graph above, we can determine that the object of interest has a pointer to it from a relatively large (1152-byte) object that is pointed to from the BSS segment (i.e., global program scope). There's a chance that this might provide enough information to get you on the right track -- especially when combined with the information you get from pprof
-- but there's a good chance that you'll need some additional information.
Finally, you may find it useful to examine the raw contents of an object's memory, either because you know what it is and want to check the values of its underlying variables, or because you have a hunch about what it might be and would like to sanity-check your guess. The --hexdump
flag gives you that information:
# ./heapspurs heapdump --address 0xc0004821a0 --hexdump
00000000 40 33 fb 0d 00 70 00 00 40 2f 03 0e 00 70 00 00 |@3...p..@/...p..|
00000010 e0 36 fb 0d 00 70 00 00 e0 36 fb 0d 00 70 00 00 |.6...p...6...p..|
00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000030 00 00 48 00 c0 00 00 00 38 2f 03 0e 00 70 00 00 |..H.....8/...p..|
00000040 0c 09 04 00 01 00 00 00 a0 21 48 00 c0 00 00 00 |.........!H.....|
00000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000080 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000090 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000000a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000000b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000000c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000000d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000000e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000000f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000100 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000110 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000120 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000130 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000140 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000150 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000160 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000170 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000180 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000190 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
Pointer: 0x20
Pointer: 0x28
Pointer: 0x30
Pointer: 0x50
Pointer: 0x88
Pointer: 0xf8
Pointer: 0x130
Pointer: 0x148
Pointer: 0x150
Pointer: 0x168
Pointer: 0x170
The output is a raw hexdump of the object's value, followed by a list of the locations inside that object that are known to be pointers (e.g, Pointer:0x30
indicates that the bytes at that position in the object -- 00 00 48 00 c0 00 00 00
-- are a pointer, in the length and byte order of the architecture that generated the dump; in this case, 0xc000480000
)
Unfortunately, the heapdump file produced by go does not contain any typing information, which is why everything is presented only as its record type names. There are a couple of ways heapspurs can pull in additional information about your application to help give some hints.
Global symbols get stored in the BSS and Data Segments, which are stored in the heapdump file. These symbols are also present in the program file itself, along with that symbol's value. For the mainstream Go compiler, for most platforms, the value of any given symbol is the same as the memory location it is loaded into. (This is notably not the case for the "TinyGo" compiler.)
Heapspurs can attempt to extract this information from your program and incorporate it into its rendering of BSS and Data Segment information. To use this, pass the --program
flag to heapspurs, with the name of the binary that generated the heap you're analyzing. For example:
# ./heapspurs heapdump --program myprogram --print
...
BssSegment @ 0x100642fe0-0x100677460 with 10815 pointers
Pointer[0]@0x100642fe0 (runtime.bss) = 0xc000146000
Pointer[1]@0x100642fe8 (compress/flate.fixedOffsetEncoding) = 0xc0001460b0
Pointer[2]@0x100642ff0 (compress/flate.huffOffset) = 0xc000146160
Pointer[3]@0x100642ff8 (context.background) = 0xc000020178
Pointer[4]@0x100643000 (context.closedchan) = 0xc00002a0c0
Pointer[5]@0x100643008 (context.todo) = 0xc000020180
When graphed, this will include a label on references from the BssSegment and DataSegment nodes, indicating which symbol is keeping the object anchored:
In this example, the BSS symbol runtime.allm
is pointing to the object that is keeping our object alive. This isn't actually all that interesting, since that's where Go stores all of the OS threads that are available to do things (it's "all the m's" as that term is explained at https://go.dev/src/runtime/HACKING). But since we know what that is (see its definition in runtime/runtime2.go), we can try to figure things out ourselves.
./heapspurs heapdump --program ./heapspurs --print
...
Object @ 0xc000480000 with 11 pointers in 1152 bytes
Pointer[0]@0xc000480000 = 0xc0004821a0
Pointer[2]@0xc000480050 = 0xc000482000
Pointer[3]@0xc0004800b8 = 0x100413090
Pointer[6]@0xc000480140 = 0xc00048c000
Pointer[7]@0xc000480150 = 0xc000019680
Okay, we can see that the thing we're trying to track down is in Pointer[7]
, which makes it the eighth pointer in m
. Counting pointers from the top, we see, in order:
-
g0
-
gsignal
-
mstartfn
-
curg
-
preemptoff
-
gcoCallers
-
alllink
-
freelink
So we know that this object is the freelink
pointer on an m
, which means it is in the "list of m's waiting to be freed when their m.exited is set". If the object were further down in the tree, you could keep doing this kind of thing until you figured out the relationships that were keeping the object you're interested in alive. Yes, this approach is a bit tedious and error-prone, as you have to figure out which things in any given object declaration are pointers, but it does at least give a path to success. If you're doing this, keep the following in mind:
-
Embedded structures are not pointers, but can contain pointers, and you need to count those.
-
Slices are pointers, but arrays are not.
-
Strings are pointers.
-
uintptr
s are not pointers.
In the case of large trees with deep paths from anchors to "leaked" objects, the approach above can be too cumbersome to be practical. So we have one more tool in our toolkit.
Generally, when working on a difficult leak, we'll have a set of objects that we suspect are (or might be) involved. So we can mark these objects in the heap with unique 64-bit values -- object identifiers -- that appear at the very, very top of their structures. We're going to pivot our examples to a real-world product that I've used this tool to help find leaks in. For core objects, we've included an "ObjectId" field (declared as type ObjectId uint64
) that is initialized to a class-specific value whenever an object is created. So, e.g.:
type webrtcSource struct {
Oid ingest.ObjectId
...
}
sourceImpl := &webrtcSource{
Oid: ingest.ObjectId_pionwebrtcsource_webrtcSource, // 0xcaffe14e00000016
...
}
Based on the Object IDs that we're using, I create an OID mapping file. This is a simple text file that contains one identifier per line, with the unique 64-bit identifer and the name it corresponds to separated by a single space. An excerpt from that file::
0xcaffe14e00000012 pionwebrtcsource.WebrtcSourceService
0xcaffe14e00000013 pionwebrtcsource.statsInterceptor
0xcaffe14e00000014 pionwebrtcsource.statsInterceptorFactory
0xcaffe14e00000015 pionwebrtcsource.statsRtcpWriter
0xcaffe14e00000016 pionwebrtcsource.webrtcSource
In our scheme, we're using a 32-bit unique cookie ("0xcaffe14e") to guard against accidental collisions with data that isn't intended to be an object identifier, followed by a 32-bit identifier that indicates the actual object type. You can use whatever scheme you want, as long as it's not likely to accidentally match other data that you might store in the first eight bytes of an object.
Once you have your objects instrumented in this way, you can pass heapspurs a pointer to your OID file, and it will use those names in association with any of those objects where appropriate. For example:
# ./heapspurs --oid oid.txt --program server heapdump --print
...
pionwebrtcsource.webrtcSource @ 0xc0001956c0 with 15 pointers in 448 bytes
Pointer[0]@0xc000195700 = 0xc000a0b260
Pointer[1]@0xc000195710 = 0xc000372820
Pointer[2]@0xc000195718 = 0xc0009227e0
Pointer[3]@0xc000195738 = 0xc000372820
Pointer[4]@0xc000195740 = 0xc0009406c0
Pointer[5]@0xc000195748 = 0xc000a0b1d0
Pointer[6]@0xc000195758 = 0xc00034ae00
Pointer[7]@0xc000195768 = 0xc000a0b200
Pointer[8]@0xc000195770 = 0xc000b82000
Pointer[9]@0xc000195790 = 0x10172b500 (time.localLoc)
Pointer[10]@0xc000195798 = 0xc000410f60
Pointer[11]@0xc0001957a0 = 0xc000410fc0
Pointer[12]@0xc0001957a8 = 0xc000365040
Pointer[13]@0xc0001957b0 = 0xc000a0b230
...
ingest.WebrtcSource @ 0xc000372820 with 9 pointers in 208 bytes
Pointer[0]@0xc000372828 = 0xc0009227e0
Pointer[1]@0xc000372838 = 0xc000041460
Pointer[2]@0xc000372848 = 0xc00003dd80
Pointer[3]@0xc000372858 = 0x100e4258a
Pointer[4]@0xc000372870 = 0x100e44c39
Pointer[5]@0xc0003728b8 = 0xc000a3da90
Pointer[6]@0xc0003728c0 = 0xc0001ba000
Pointer[7]@0xc0003728d0 = 0xc00092f800
Pointer[8]@0xc0003728e8 = 0xc0001956c0 (pionwebrtcsource.webrtcSource)
(Note that the time.localLoc
shown above came from the BSS analysis, not from the OID file)
Most importantly, this approach can provide labels on nodes in the memory graph:
./heapspurs --oid oid.txt --program server --address 0xc000372820 heapdump
Rendering graph (1774 nodes)...
Finally, once you've provided an OID file to heapspurs, it can also help you find the address of objects that you're interested in, using the --search
flag, which takes a regular expression as its argument, and tells you about any objects with names that match that regular expression:
./heapspurs --oid oid.txt --find WebrtcSource heapdump
ingest.WebrtcSource @ 0xc000372820 with 9 pointers in 208 bytes
Pointer[0]@0xc000372828 = 0xc0009227e0
Pointer[1]@0xc000372838 = 0xc000041460
Pointer[2]@0xc000372848 = 0xc00003dd80
Pointer[3]@0xc000372858 = 0x100e4258a
Pointer[4]@0xc000372870 = 0x100e44c39
Pointer[5]@0xc0003728b8 = 0xc000a3da90
Pointer[6]@0xc0003728c0 = 0xc0001ba000
Pointer[7]@0xc0003728d0 = 0xc00092f800
Pointer[8]@0xc0003728e8 = 0xc0001956c0 (pionwebrtcsource.webrtcSource)
pionwebrtcsource.WebrtcSourceService @ 0xc0001f3140 with 3 pointers in 64 bytes
Pointer[0]@0xc0001f3160 = 0xc00084e5d0
Pointer[1]@0xc0001f3168 = 0xc00034ae00
Pointer[2]@0xc0001f3170 = 0xc00004b0e0
End Of File
Sometimes you'll find memory that hasn't been collected even though it doesn't trace back to a stack frame or global segment:
# ./heapspurs heapdump --oid oid.txt --address 0xc0002116c0 --anchors
#
As far as I know, the only way this situation can arise is when you have a cycle of objects, one (or more) of which has a finalizer set on it. To help diagnose these situations, heapspurs will highlight any objects that have finalizers with a thick, red border:
In this example, the pointer from the red object at the bottom of the graph back to the cmafsink.cmafSink
object will prevent everything in this graph from being cleaned up (as well as any objects that any of these objects point to, transitively)
There's definitely a lot more that could be added to this tool to make it more useful. One approach that I haven't had time to pursue, but which would be very useful, would be recovery of object layout information from the executable itself. There's a fairly good description of how one might start going about this in the post "Analyzing Golang Executables -- JEB in Action". Once this information is extracted, we could parse out the types of the pointers in known objects, and then recursively follow them -- basically, automating the process described above using pointer counting.
With object layout information, we could then find objects that have a unique combination of {size, pointer positions}, and assume that any objects in the heap that match those patterns are necessarily those kinds of objects, which gives us a starting point to begin our object traversal. This could be used instead of an OID file, or supplemented by it.
This approach is complicated by the fact that the moduledata
structure both can and does change between versions; and, in particular, by recently inserted new rdata
and gofunc
fields that change the location of typelinks
within the structure. In theory, we could fix this by linking a helper function into the program being debugged that exported type information from the runtime structures; however, access to these via go:linkname
is broken as of go 1.18 (cf go:linkname must refer to declared function or variable · Issue #51706 · golang/go · GitHub). Ultimately, extracting this information may be a matter of building a custom, patched version of the runtime that emits module type information via an exported interface. For bonus points, we might even try to get a stable API to access this limited subset of type information upstreamed.
It may be possible to pull in additional information from pprof
output as well to assist in object identification. I have not yet done much research in this direction.
On a separate note: currently, when creating the graph, there is no attempt to remove cycles of objects that point to the object in question but which do not lie on any path to an anchor that doesn't first pass through the object itself. These cycles of objects aren't interesting in tracking down leaks, since they aren't responsible for keeping the object from being collected. It would be nice to have an option that removes these extraneous objects. This analysis could also be used to identify and highlight cycles of objects with finalizers set on one of the objects in a cycle.