Heap Layouts of Various Executions

By: enferex
November 26, 2012

The following is just a lame excuse for generating some graphs, scanning a process' heap, and answering the question: "What does the heap look like?"

The table below provides data and visualizations of the heap layout of various programs. These programs were not recompiled, and the C standard library (libc) (which provides malloc()) is GNU glibc version 2.16. The glibc on the machine conducting the tests was provided by Arch Linux. The data was gathered on November 26, 2012.

What?

So what the hell is this? It is a non-precise picture of a process' heap segment of memory upon program exit.

How?

All malloc() function calls, and the requested size of memory for each allocation, are captured in a data file (sniff.log for each program). At program exit, the heap is scanned and the value at each address is inspected.

The heap scanner works by starting at the beginning of the heap segment and working its way towards the end of the heap. Each value (assuming word-size alignment on 64-bit boundaries) is looked at. The value is first converted to an address. If that value/address resides inside the heap segment, then the heap scanner ouputs the data to the data file (sniff_scan1.log). The data output says that the address contains a pointer to data.

To generate the plots the malloc() data in the sniff.log and the who-points-to-who data in sniff_scan1.log are compared. The result of this comparison is a .dot file, which is used to build the .svg and .png images. This comparison is simple, each heap address output from the scaner (sniff_scan1.log) is matched to a malloc() in the sniff.log. If the address is not a match, it is still output. Either way, the address contains data that looks like a heap address. The result is a .dot depicting which address points to the which other address.

Caveat

This is not scientific! First, the heap was never cleared before it is scanned. This means that the scanner could be operating on heap data from other processes. Secondly, the heap scan is similar to a conservative garbage collector. If the value at the heap address looks like another address in the heap, the scanner will output that the address points-to the value. There could be cases where large values in the heap (e.g. large integer values) look like pointer data to other heap objects. If the scanner sees such a value the heap scanner will say it is an address. In other words, if it looks like an address, the heap scanner will say it is one.

Why?

If this isn't scientific, why bother? I was bored... mmm kay?!

Notes

Why do some programs have few allocations but a fairly "exciting" graph?
For instance, 'hostname' has two allocations, but quite a few points in its graph. As covered in the Caveat above, the heap is scanned and never cleared. So some of those address might be of either data lying around from previous processes, and/or from values that look like pointers (e.g. really large integers).

Why didn't I implement feature-x?
I was bored and wanted to write a heap scanner and generate graphs. This is an approximation of the heap layout, nothing scientific, just some fun and such.

The Goods

Heap scanner library source code.
Comparison script (generates .dot file form the scanner's output).
The above are located here.

The Sexxies

The web-graphs are generated using Gephi output files (gexf) and the sigma.js JavaScript library. The .svg and .png are generated using Graphviz.

Command	Data	malloc() Calls	Notes
bzip2 pg2591.txt	[svg \|png \|gexf \|data]	11	537K Grimm's Fairy Tales
bunzip2 pg2591.txt.bz2	[svg \|png \|gexf \|data]	9	141K Grimm's Fairy Tales (compressed)
hostname	[svg \|png \|gexf \|data]	2
ls -la	[svg \|png \|gexf \|data]	2111	13 files and directories
mkdir foo_dir	[svg \|png \|gexf \|data]	30
rmdir foo_dir	[svg \|png \|gexf \|data]	30
touch me	[svg \|png \|gexf \|data]	30
rm me	[svg \|png \|gexf \|data]	36
ps aux	[svg \|png \|gexf \|data]	2408	Around 152 processes
who	[svg \|png \|gexf \|data]	1851	1 sexxy user (me)