Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

linux: module symbols: Fixes and code improvements #1509

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions volatility3/framework/constants/linux/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -352,3 +352,6 @@ def flags(self) -> str:
MODULE_MAXIMUM_CORE_SIZE = 20000000
MODULE_MAXIMUM_CORE_TEXT_SIZE = 20000000
MODULE_MINIMUM_SIZE = 4096

# Kallsyms
KSYM_NAME_LEN = 512
71 changes: 36 additions & 35 deletions volatility3/framework/symbols/linux/extensions/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -213,65 +213,63 @@ def get_elf_table_name(self):
)
return elf_table_name

def get_symbols(self):
def get_symbols(
self,
) -> Iterable[Tuple[int, interfaces.objects.ObjectInterface]]:
"""Get symbols of the module

Yields:
A symbol object
A tuple containing the ELF symbol index and the corresponding ELF symbol object
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes assumptions about the ordering of the symbols returned (and it's also technically a change in the API since it'll break anything the currently doesn't expect a tuple) so if you want to make this change it'll need a MAJOR version bump somewhere, but I can't see the use in it? What's the reason for needing to return the index?

The only existing call to this that I saw get updated throws the value of the index away?

Copy link
Contributor Author

@gcmoreira gcmoreira Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but I can't see the use in it?

yes, I tried to explain this here:

Hey @ikelos, this is part of a larger pull request that I'm breaking into smaller, more manageable changes. While these updates might seem disconnected at the moment, they will align once the full implementation is in place.

Yes, symbols in the symbol table are ordered, which is important because it is not the only table in a Linux kernel module. The module also contains a types table, where the indices correspond to those in the symbol table. This means that the type for the symbol at index (n) in the symbol table is located at the same index in the types table.

We could handle this in the caller using enumerate, but that would mean get_symbols() must always preserve the original order. It cannot skip any symbol, whether it's invalid, null, empty, or whatever. If skipped, the symbols could shift in relation to the other tables, breaking the framework.

I believe this approach ensures the symbols stay correctly aligned with their indices.
Should this be considered a major version bump for the whole framework? Unfortunately, object extensions don't have versioning in place.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel there's a better way of marrying the two up? The symbols in an ISF can have a type parameter, telling you which type they use? Is that not coming out of the dwarf2json? Enumerating them here seems really fragile and we'll have to remember it if we ever tinker with any of the way the symbol tables work. See here.

Recalculating it each time just opens up the chance something will go wrong later, the mapping is static data, it should be encoded in the ISF itself, surely?

Copy link
Contributor Author

@gcmoreira gcmoreira Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, it's not related to those symbols. These are kernel module symbols, and this information is not part of the ISF. It is gathered directly from memory rather than from d2j. The symbol types are not included in the elf_sym structure and never will be, as this data is embedded within the module, not within the symbol itself

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ikelos If we want to avoid a version major bump, I can make an intermediate function like:
get_symbols_ex() .. which yield index and symbol .. and get_symbols() calls it but ignoring the index. Then, who needs both can still call directly to get_symbols_ex().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think I missed where they came from then (there's a lot of special things in the linux extensions that I'm not up to speed on). If you're happy it makes sense, then that's ok. Yes, please put it in a different function and make the old one a shell that just returns the same type and has the same signature and name as the original. Probably wise to include a deprecation warning in that call too, so anyone that is using it knows to shift themselves over...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't someone just call enumerate on the generator before first use though? If you're changing the function already I guess it makes it less brittle to ensure it's always count from the start, just trying to keep API changes to a minimum. As I say, if really want it I'm ok with it, just want to make sure it doesn't need changing again down the line... 5:)

"""

if not hasattr(self, "_elf_table_name"):
self._elf_table_name = self.get_elf_table_name()
if symbols.symbol_table_is_64bit(self._context, self.get_symbol_table_name()):
prefix = "Elf64_"
else:
prefix = "Elf32_"
syms = self._context.object(
self.get_symbol_table_name() + constants.BANG + "array",
if not self.section_strtab or self.num_symtab < 1:
return None

elf_table_name = self.get_elf_table_name()
symbol_table_name = self.get_symbol_table_name()

is_64bit = symbols.symbol_table_is_64bit(self._context, symbol_table_name)
sym_name = "Elf64_Sym" if is_64bit else "Elf32_Sym"
sym_type = self._context.symbol_space.get_type(
elf_table_name + constants.BANG + sym_name
)
elf_syms = self._context.object(
symbol_table_name + constants.BANG + "array",
layer_name=self.vol.layer_name,
offset=self.section_symtab,
subtype=self._context.symbol_space.get_type(
self._elf_table_name + constants.BANG + prefix + "Sym"
),
count=self.num_symtab + 1,
subtype=sym_type,
count=self.num_symtab,
ikelos marked this conversation as resolved.
Show resolved Hide resolved
)
if self.section_strtab:
yield from syms
for elf_sym_num, elf_sym_obj in enumerate(elf_syms):
# Prepare the symbol object for methods like get_name()
elf_sym_obj.cached_strtab = self.section_strtab
ikelos marked this conversation as resolved.
Show resolved Hide resolved
yield elf_sym_num, elf_sym_obj

def get_symbols_names_and_addresses(self) -> Iterable[Tuple[str, int]]:
"""Get names and addresses for each symbol of the module

Yields:
A tuple for each symbol containing the symbol name and its corresponding value
"""

for sym in self.get_symbols():
sym_arr = self._context.object(
self.get_symbol_table_name() + constants.BANG + "array",
layer_name=self.vol.native_layer_name,
offset=self.section_strtab + sym.st_name,
)
try:
sym_name = utility.array_to_string(
sym_arr, 512
) # 512 is the value of KSYM_NAME_LEN kernel constant
except exceptions.InvalidAddressException:
layer = self._context.layers[self.vol.layer_name]
for _sym_num, sym in self.get_symbols():
sym_name = sym.get_name()
if not sym_name:
continue
if sym_name != "":

# Normalize sym.st_value offset, which is an address pointing to the symbol value
mask = self._context.layers[self.vol.layer_name].address_mask
sym_address = sym.st_value & mask
yield (sym_name, sym_address)
sym_address = sym.st_value & layer.address_mask
yield (sym_name, sym_address)

def get_symbol(self, wanted_sym_name):
"""Get symbol value for a given symbol name"""
def get_symbol(self, wanted_sym_name) -> Optional[int]:
"""Get symbol address for a given symbol name"""
for sym_name, sym_address in self.get_symbols_names_and_addresses():
if wanted_sym_name == sym_name:
return sym_address

return None

def get_symbol_by_address(self, wanted_sym_address):
def get_symbol_by_address(self, wanted_sym_address) -> Optional[str]:
"""Get symbol name for a given symbol address"""
for sym_name, sym_address in self.get_symbols_names_and_addresses():
if wanted_sym_address == sym_address:
Expand All @@ -285,6 +283,7 @@ def section_symtab(self):
return self.kallsyms.symtab
elif self.has_member("symtab"):
return self.symtab

raise AttributeError("Unable to get symtab")

@property
Expand All @@ -293,6 +292,7 @@ def num_symtab(self):
return int(self.kallsyms.num_symtab)
elif self.has_member("num_symtab"):
return int(self.member("num_symtab"))

raise AttributeError("Unable to determine number of symbols")

@property
Expand All @@ -303,6 +303,7 @@ def section_strtab(self):
# Older kernels
elif self.has_member("strtab"):
return self.strtab

raise AttributeError("Unable to get strtab")


Expand Down
34 changes: 21 additions & 13 deletions volatility3/framework/symbols/linux/extensions/elf.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,14 @@
# which is available at https://www.volatilityfoundation.org/license/vsl-v1.0
#

from typing import Dict, Tuple
from typing import Dict, Tuple, Optional
import logging

from volatility3.framework import constants
from volatility3.framework.constants.linux import (
ELF_IDENT,
ELF_CLASS,
KSYM_NAME_LEN,
)
from volatility3.framework import objects, interfaces, exceptions

Expand Down Expand Up @@ -328,22 +329,29 @@ def cached_strtab(self):
def cached_strtab(self, cached_strtab):
self._cached_strtab = cached_strtab

def get_name(self):
addr = self._cached_strtab + self.st_name
def get_name(self, max_size=KSYM_NAME_LEN) -> Optional[str]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically this is an additive change to the API. I'm not sure what the first versioned component above this would be though? Can we avoid this being a parameter, but make it a constant instead? I suspect the maximum doesn't change often, and it would keep the API the same but pull the value out to make it more destinctly something people can see and set when needed?

Copy link
Contributor Author

@gcmoreira gcmoreira Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can remove it from the parameters if you prefer. Since the ELF API could be used for other purposes than just kernel symbols, and for some of those use cases, a shorter maximum symbol name length could make more sense there, I assumed that a general maximum-maximum length would be fine for all cases but allowing to customize a shorter one if for some reason it's needed. I also made it a parameter with default value to minimize the impact of this change.

The Linux kernel has it's maximum length, but for general/user-space ELF files the maximum size can be different. The kernel used to have a maximum length of 128 bytes until Rust happened. In kernels 6.1, it was increased to 512 bytes in this commit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmm, ok. I think still best to make it a constant, and then we can make a new class that overrides that class local constant down the line (and the class local constant can be set from constants if you want)...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, do you mean something like this (internal or not)?

class elf_sym(objects.StructType):
    _MAX_NAME_LENGTH = KSYM_NAME_LEN

Copy link
Member

@ikelos ikelos Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

constants.KSYM_NAME_LEN, but yeah, so there's constants (user overridable in one clear place, class constant so that inheritors could override it, and then the actual use point). Does that make sense or is it just over complicating things? I don't mind so much, so long as it's pulled out somewhere sensible... 5:)

"""Returns the symbol name

Args:
max_size: Maximum length for a symbol name string. Defaults to KSYM_NAME_LEN (512 bytes).

# Just get the first 255 characters, it should be enough for a symbol name
name_bytes = self._context.layers[self.vol.layer_name].read(addr, 255, pad=True)
Returns:
The symbol name
"""

if name_bytes:
idx = name_bytes.find(b"\x00")
if idx != -1:
name_bytes = name_bytes[:idx]
return name_bytes.decode("utf-8", errors="ignore")
else:
# If we cannot read the name from the address space,
# we return None.
addr = self._cached_strtab + self.st_name

layer = self._context.layers[self.vol.layer_name]
name_bytes = layer.read(addr, max_size, pad=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think a try/except should be wrapped around the layer read to prevent from memory smear ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With pad=True it shouldn't raise any exceptions

Copy link
Contributor

@Abyss-W4tcher Abyss-W4tcher Jan 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I had concerns about this early part, if self.st_name is an absurd value due to smear ?

def read(self, offset: int, length: int, pad: bool = False) -> bytes:
"""Reads from the file at offset for length."""
if not self.is_valid(offset, length):
invalid_address = offset
if self.minimum_address < offset <= self.maximum_address:
invalid_address = self.maximum_address + 1
raise exceptions.InvalidAddressException(
self.name, invalid_address, "Offset outside of the buffer boundaries"
)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When pad=True, this for because ignore_errors=pad, will return immediately:

for offset, _, mapped_offset, mapped_length, layer in self.mapping(
offset, length, ignore_errors=pad
):

Then this line will pad everything with zeros:

return recovered_data + b"\x00" * (length - len(recovered_data))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying it !

if not name_bytes:
return None

idx = name_bytes.find(b"\x00")
if idx != -1:
name_bytes = name_bytes[:idx]

return name_bytes.decode("utf-8", errors="ignore")


class elf_phdr(objects.StructType):
"""An elf program header"""
Expand Down
Loading