Skip to content

Latest commit

 

History

History
373 lines (322 loc) · 18.4 KB

how-bloaty-works.md

File metadata and controls

373 lines (322 loc) · 18.4 KB

How Bloaty Works

At a high level, Bloaty's goal is to create a map of the binary where every byte has a label attached to it. Every byte starts out as unknown (unattributed). As we scan the binary we assign labels to different ranges of the file. For example, if the user selected the "sections" data source we scan the section table and use the section name as the label for each range.

Ideally these labeled ranges will cover the entire file by the time we are done. In practice we usually can't achieve perfect 100% coverage. To compensate for this, we have various kinds of "fallback" labels we attach to mystery regions of the file. This is how we guarantee an important invariant of Bloaty: the totals given in Bloaty's output will always match the total size of the file. This ensures that we always account for the entire file, even if we don't have detailed information for every byte.

The ELF/Mach-O/etc. data structures we are traversing were not designed to enable size profiling. They were designed to assist linkers, loaders, debuggers, stack unwinders, etc. to run and debug the binary. This means that Bloaty's size analysis is inherently an unconventional use of ELF/Mach-O metadata. Bloaty has to be clever about how to use the available information to achieve its goal. This can pose a challenge, but also makes Bloaty fun to work on. Getting the coverage close to 100% requires a lot of ingenuity (and some heuristics).

Range Map

RangeMap (as defined in range_map.h) is the core data structure of Bloaty. It is a sparse map of [start, end) -> std::string that associates regions of VM or file space to a label.

By the time Bloaty is finished, it has built a complete map of both VM and file space for the binary. You can view these maps by running Bloaty with '-v':

$ ./bloaty bloaty -v -d sections
FILE MAP:
0000000-00002e0         736             [LOAD #2 [R]]
00002e0-00002fc          28             .interp
00002fc-0000320          36             .note.gnu.build-id
0000320-0000340          32             .note.ABI-tag
0000340-0000510         464             .gnu.hash
0000510-0001db8        6312             .dynsym
0001db8-0003c8d        7893             .dynstr
0003c8d-0003c8e           1             [LOAD #2 [R]]
0003c8e-0003e9c         526             .gnu.version
0003e9c-0003ea0           4             [LOAD #2 [R]]
0003ea0-0004020         384             .gnu.version_r
0004020-0066f30      405264             .rela.dyn
0066f30-00680b8        4488             .rela.plt
00680b8-0069000        3912             [Unmapped]
0069000-0069017          23             .init
0069017-0069020           9             [LOAD #3 [RX]]
0069020-0069be0        3008             .plt
0069be0-0069c40          96             .plt.got
0069c40-02874d1     2218129             .text
[...]

VM MAP:
000000-0002e0           736             [LOAD #2 [R]]
0002e0-0002fc            28             .interp
0002fc-000320            36             .note.gnu.build-id
000320-000340            32             .note.ABI-tag
000340-000510           464             .gnu.hash
000510-001db8          6312             .dynsym
001db8-003c8d          7893             .dynstr
003c8d-003c8e             1             [LOAD #2 [R]]
003c8e-003e9c           526             .gnu.version
003e9c-003ea0             4             [LOAD #2 [R]]
003ea0-004020           384             .gnu.version_r
004020-066f30        405264             .rela.dyn
066f30-0680b8          4488             .rela.plt
0680b8-069000          3912             [-- Nothing mapped --]
069000-069017            23             .init
069017-069020             9             [LOAD #3 [RX]]
[...]

The file map refers to file offsets, and these always run from 0 to the size of the file. The VM map refers to VM addresses; these start at 0 for shared libraries and position-independent binaries, but these will start at some non-zero address if the binary was linked to be loaded at a fixed address.

Note that some of the regions in the map have labels like [LOAD #2 [R]] instead of a true section name. This is because the section table does not always cover every byte of the file. Bloaty gives these regions a fallback label that contains the segment name instead. We must attach some kind of label to every byte of the file, otherwise Bloaty's totals would not match the file size.

Also notice that there is an entry in the VM map that says [-- Nothing mapped --]. This is calling attention to the fact that there is a gap in the address space here. Since nothing is mapped, these regions of the VM space don't actually need to accessible in the target process image. However, unless this unused space aligns with page boundaries, it will probably end up getting mapped anyway.

Sometimes we know a region's start but not its end. For example, Mach-O symbols have an address but not a size (whereas ELF symbols have both). To support this case, RangeMap supports adding an address with kUnknownSize. A range with unknown size will automatically extend to the beginning of the next region, even if the next region is added later.

If we try to add a label to a range of the binary that has already been assigned a label, the first label assigned takes precedence. This means that the order in which we scan data structures is significant. So our general strategy is to scan our most granular and detailed information first. We scan generic information as a last resort, to give at least some information for parts of the binary that we couldn't find any more specific information about.

VM Space and File Space

Loadable binaries have two fundamental domains of space we are trying to map: VM space and file space. File space is the bytes of the input file. VM space is the bytes of memory when the executable is loaded at runtime. Some regions of the binary exist only in file space (like debug info) and some regions exist only in VM space (like .bss, zero-initialized data). Even entities that exist in both spaces can have different sizes in each.

We create two separate RangeMap structures for these two domains. For convenience, we put them together into a single structure called DualMap:

struct DualMap {
  RangeMap vm_map;
  RangeMap file_map;
};

We populate these two maps simultaneously as we scan the file. We must populate both maps even if we only care about one of them, because most of the metadata we're scanning gives us VM addresses or file offsets, not both. For example, debug info always refers to VM addresses, because it's intended for debugging at runtime. Even if we only care about file size, we still have to scan VM addresses and translate them to file offsets.

Bloaty's overall analysis algorithm (in pseudo-code) is:

for (auto f : files) {
  // Always start by creating the base map.
  DualMap base_map = ScanBaseMap(f);

  // Scan once for every data source the user selected with '-d'.
  std::vector<DualMap> maps;
  for (auto s : data_sources) {
    maps.push_back(ScanDataSource(f, s));
  }
}

Base Map

To translate between VM and file space, we always begin by creating a "base map." The base map is just a DualMap like any other, but we give it special meaning:

  • It defines what ranges of file and VM space constitute "the entire binary" (ie. the "TOTALS" row of the final report).
  • We use it to translate between VM space and File space.

This means that the base map must be exhaustive, and must also provide translation for any entity that exists in both VM and file space. For example, suppose we are scanning the "symbols" data source and we see in the symbol table that address 0x12345 corresponds to symbol foo. We will add that to VM map immediately, but we will also use the base map to translate address 0x12355 to a file offset so we can add that range to the file map.

How does the base map store translation info? I left one thing out about RangeMap above. In addition to storing a label for every region, it can also (optionally) store a member called other_start. This stores the corresponding offset in the other space, and lets you translate addresses from one to the other. The other_start member is only used in the base map.

We build the base map by scanning either the segments (program headers) or sections of the binary. These give both VM address and file offset for regions of the binary that are loaded into memory. To make sure we cover the entire file space, we use [Unmapped] as a last ditch fallback for any regions of the on-disk binary that didn't have any segment/section data associated with them. This ensures that Bloaty always accounts for the entire physical binary, even if we can't find any information about it.

Scanning Data Sources

Once we have built the base map, we can get on to the meat of Bloaty's work. We can now scan the binary according to whatever data source(s) the user has selected.

Segments and Sections

The segments and sections data sources are relatively straightforward. For the most part we can simply scan the segments/sections table and call it a day.

For ELF, segments and sections have distinct tables in the binary that can be scanned independently. This means that technically a section could span multiple segments, but in practice segments/sections form a 1:many relationship, where each section is contained entirely within a single segment.

Currently Bloaty only reports PT_LOAD and PT_TLS segments. We scan PT_LOAD segments first, so if there is overlap with PT_TLS the PT_LOAD label will win. In the future It may make sense to scan PT_TLS first, as this is more granular data that can give insight into the per-thread runtime overhead of TLS variables. It may also make sense to scan other segment types, to give more granular info.

ELF segments do not have names. To distinguish between different PT_LOAD segments, we include both a segment offset and the segment flags in the label, eg. LOAD #2 [R].

For Mach-O, segments are contained within a file-level table of "load commands." Each load command has a type, and technically speaking, segments are a subset of all load commands. However Bloaty's segments data source reports many non-segment load commands such as the symbol table (LC_SYMTAB, LC_DYSYMTAB), code signature (LC_CODE_SIGNATURE), and more. Segments can have zero or more sections, so in Mach-O files the 1:many nature of segments and sections is enforced by the file format.

For segments and sections we have to decide how to attribute the regions of the file that correspond to the segment/section headers themselves. Bloaty's general philosophy is to include the metadata with the data, so each label shows the true weight of everything associated with that label. This would suggest that the .text label should include the .text section as well as the section header entry for the .text section. However this would hide the overhead of the ELF headers, which can be significant if there are many sections. Bloaty currently has no higher-level data source that could show the ELF headers separately from the ELF data, and even if there was such a data source it would have narrow usefulness so people would probably not think to use it very often.

There is not an easy answer to this question. At the moment Bloaty will include section headers with the corresponding section, but will not include segment headers with the corresponding segment. This may or may not be the best solution to this problem, and this may change if another solution proves to work better.

Symbols

The symbols data source is where Bloaty's deep parsing of the binary delivers the most benefit, as it provides detailed information that you cannot get from a linker map or symbol table.

For example, take the following data from running Bloaty on itself:

$ ./bloaty bloaty -d symbols,sections
    FILE SIZE        VM SIZE
 --------------  --------------
[...]
   0.2%   116Ki   1.6%   116Ki    AArch64_printInst
    84.9%  98.8Ki  84.9%  98.8Ki    .text
    14.9%  17.4Ki  14.9%  17.4Ki    .rodata
     0.1%     156   0.1%     156    .eh_frame
     0.0%      24   0.0%       0    .symtab
     0.0%      18   0.0%       0    .strtab
     0.0%       8   0.0%       8    .eh_frame_hdr
[...]
   0.1%  50.1Ki   0.7%  49.8Ki    reg_name_maps
    59.6%  29.8Ki  59.8%  29.8Ki    .rela.dyn
    40.0%  20.0Ki  40.2%  20.0Ki    .data.rel.ro
     0.4%     216   0.0%       0    .symtab
     0.0%      14   0.0%       0    .strtab

I excerpted two symbols from the report. Between these two symbols, Bloaty has found seven distinct kinds of data that contributed to these two symbols. If you wrote a tool that naively just parsed the symbol table, you would only find the first of these seven:

  1. .text./.data.rel.ro: this is the data we obtain by simply following the symbol table entry. This is the primary code or data emitted by the function or variable.

  2. .eh_frame: this is the "unwind information" for a function. It is used for many things, including C++ exceptions and stack traces when no frame pointer is available.

  3. .eh_frame_hdr: this is metadata about the .eh_frame section.

  4. .symtab: this is the function/variable's symbol table entry itself. It is a fixed size for every entry. The fact that reg_name_maps above has a .symtab size of 216 indicates that there must actually be 9 different symbols being represented by this entry. Bloaty has combined them because they all have the same name. We can break them apart if we want using:

    $ ./bloaty bloaty -d compileunits,symbols --source-filter=reg_name_maps$
        FILE SIZE        VM SIZE
     --------------  --------------
      20.3%  10.2Ki  20.3%  10.1Ki    ../third_party/capstone/arch/AArch64/AArch64Mapping.c
       100.0%  10.2Ki 100.0%  10.1Ki    reg_name_maps
      18.9%  9.45Ki  18.9%  9.43Ki    ../third_party/capstone/arch/X86/X86Mapping.c
       100.0%  9.45Ki 100.0%  9.43Ki    reg_name_maps
      16.4%  8.20Ki  16.4%  8.18Ki    ../third_party/capstone/arch/PowerPC/PPCMapping.c
       100.0%  8.20Ki 100.0%  8.18Ki    reg_name_maps
      10.7%  5.35Ki  10.7%  5.33Ki    ../third_party/capstone/arch/Mips/MipsMapping.c
       100.0%  5.35Ki 100.0%  5.33Ki    reg_name_maps
       9.1%  4.57Ki   9.1%  4.55Ki    ../third_party/capstone/arch/SystemZ/SystemZMapping.c
       100.0%  4.57Ki 100.0%  4.55Ki    reg_name_maps
       8.7%  4.35Ki   8.7%  4.31Ki    ../third_party/capstone/arch/ARM/ARMMapping.c
       100.0%  4.35Ki 100.0%  4.31Ki    reg_name_maps
       7.0%  3.52Ki   7.0%  3.49Ki    ../third_party/capstone/arch/TMS320C64x/TMS320C64xMapping.c
       100.0%  3.52Ki 100.0%  3.49Ki    reg_name_maps
       6.9%  3.44Ki   6.9%  3.41Ki    ../third_party/capstone/arch/Sparc/SparcMapping.c
       100.0%  3.44Ki 100.0%  3.41Ki    reg_name_maps
       2.0%  1.02Ki   2.0%    1016    ../third_party/capstone/arch/XCore/XCoreMapping.c
       100.0%  1.02Ki 100.0%    1016    reg_name_maps
     100.0%  50.1Ki 100.0%  49.8Ki    TOTAL
    Filtering enabled (source_filter); omitted file = 46.7Mi, vm = 7.08Mi of entries
    
  5. .strtab: this is the text of the function/variables's name itself in the string table. Longer names take up more space in the binary, and Bloaty's analysis here reflects that (though the symbol table is not loaded at runtime, so it's not costing RAM).

  6. .rela.dyn: these are relocations embedded into the executable. Normally we would associate relocations with .o files and not the final linked binary. However shared objects and position-independent executables must also emit relocations for any global variable that is initialized to an address of some other data. These relocations can take up a significant amount of space, indeed more space than the data itself in this case! Without this deep analysis of the binary, this cost would be invisible. Bloaty scans all relocation tables and "charges" each relocation entry to the function/data that requires the relocation (not the function being pointed to).

  7. .rodata: Bloaty has found some data associated with this function. Sometimes data doesn't get its own symbol table entry, for whatever reason. Bloaty can attribute anonymous data to the function that uses it by disassembling the binary looking for instructions that reference a different part of the binary. If the same anonymous data is used by more than one function, then the first one scanned will "win" and assume the whole cost, as Bloaty has no concept of sharing the cost. Every byte of the file must have exactly one label associated with it.

Note that this is more granular information than you can get from a linker map file. A linker map file will break down some of these sections by compile unit, but the symbol-level granularity is limited to the primary code/data for each symbol (#1 in the list above).

Compile Units

Like symbols, we can see that Bloaty is capable of breaking down lots of sections by compile unit:

$ ./bloaty bloaty -d compileunits,sections
    FILE SIZE        VM SIZE
 --------------  --------------
  37.9%  17.7Mi  49.4%  3.52Mi    [160 Others]
  15.0%  7.04Mi   3.4%   246Ki    ../third_party/protobuf/src/google/protobuf/descriptor.cc
    33.9%  2.38Mi   0.0%       0    .debug_info
    32.6%  2.29Mi   0.0%       0    .debug_loc
    17.2%  1.21Mi   0.0%       0    .debug_str
     6.5%   468Ki   0.0%       0    .debug_ranges
     5.3%   381Ki   0.0%       0    .debug_line
     2.8%   204Ki  83.1%   204Ki    .text
     1.0%  70.9Ki   0.0%       0    .strtab
     0.4%  25.7Ki  10.4%  25.7Ki    .eh_frame
     0.2%  13.3Ki   0.0%       0    .symtab
     0.1%  10.6Ki   4.3%  10.6Ki    .rodata
     0.1%  3.97Ki   1.6%  3.97Ki    .eh_frame_hdr
     0.0%  1.03Ki   0.4%  1.03Ki    .rela.dyn
     0.0%     368   0.1%     368    .data.rel.ro
     0.0%       0   0.0%      81    .bss
[...]

To attribute all of the different .debug_* sections, Bloaty includes parsers for all of the different DWARF formats that live in these sections. We also use the DWARF data to find which symbols belong to which compile units.

The compileunits data source contains much of the same data that you could get from a linker map. Since each compile unit generally comes from a separate .o file, a linker map can often give good data about which parts of the binary came from which translation units. However Bloaty is able to derive this data without needing a linker map file, which may be tricky to obtain. The compileunits data source is also useful when combined with other data sources in hierarchical profiles.