Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check that vmlinux and kernel modules match core dump/running system #25

Closed
osandov opened this issue Dec 4, 2019 · 7 comments
Closed
Assignees
Labels
bug Something isn't working debuginfo Support for debugging information formats

Comments

@osandov
Copy link
Owner

osandov commented Dec 4, 2019

Right now, we'll blindly accept the debug info passed by the user. We should sanity check that they match the program we're debugging. This issue is specifically for the kernel; userspace is going to need a different implementation.

Ideally, we should be able to check by build ID. Unfortunately, as far as I can tell, there is no easy way to get the build ID of vmlinux from either /proc/kcore or a vmcore. We probably want to add the build ID to the VMCOREINFO note, but in the mean time we can check OSRELEASE. See the discussion in delphix/sdb#41.

I think we can already get the build ID for kernel modules via the sections we get from sysfs or the modules variable in the kernel.

As mentioned in the sdb issue, there should also be a way to override these sanity checks and load the debug information anyways.

@osandov osandov added help wanted Seeking volunteers dependency Needs change to external dependency labels Dec 4, 2019
@vt-alt
Copy link

vt-alt commented Jun 21, 2020

There is easy way to determine build-id for the running kernel from /sys/kernel/notes which is detached .notes section of vmlinux. It exists since v2.6.23 (2007). For example, perf can show it with perf buildid-list --kernel.

@osandov
Copy link
Owner Author

osandov commented Jun 22, 2020

Nice, that works for the live case, thanks! I still haven't found anything for the vmcore case, which might be more important since it's more likely that you'll find the wrong kernel if you're debugging a vmcore from another machine.

@vt-alt
Copy link

vt-alt commented Dec 5, 2020

Btw, sorry for a slight off-topic, what do you think, should Linux show GNU build-ids on stack traces?

@osandov
Copy link
Owner Author

osandov commented Nov 23, 2021

It looks like the build ID was added to VMCOREINFO in Linux 5.9 (torvalds/linux@0935288). We should use that now.

@osandov osandov removed the dependency Needs change to external dependency label Nov 23, 2021
@osandov
Copy link
Owner Author

osandov commented Nov 23, 2021

@vt-alt sorry I missed your other question before! I do think drgn needs an API for getting the debug info module (and corresponding build ID) that a symbol came from. I don't have concrete plans for that quite yet, though.

@osandov osandov removed the help wanted Seeking volunteers label Apr 28, 2022
@osandov osandov self-assigned this Apr 28, 2022
@brenns10
Copy link
Contributor

brenns10 commented Feb 4, 2023

@osandov do you happen to know, is this bug still valid for the latest drgn? I recall we now match modules by build ID rather than name, which means that mismatched modules may get loaded at address 0. But from what I've observed, drgn still loads vmlinux regardless of mismatched build ID. Is that accurate?

@osandov
Copy link
Owner Author

osandov commented Feb 6, 2023

Yup, that's all correct at the moment. My (perpetual) rework of this area will fix this.

@osandov osandov mentioned this issue Jul 5, 2023
@osandov osandov added the bug Something isn't working label Jul 5, 2023
@osandov osandov added the debuginfo Support for debugging information formats label Dec 17, 2024
osandov added a commit that referenced this issue Dec 17, 2024
drgn currently provides limited control over how debugging information
is found. drgn has hardcoded logic for where to search for debugging
information. The most the user can do is provide a list of files for
drgn to try in addition to the default locations (with the -s CLI option
or the drgn.Program.load_debug_info() method).

The implementation is also a mess. We use libdwfl, but its data model is
slightly different from what we want, so we have to work around it or
reimplement its functionality in several places: see commits
e5874ad ("libdrgn: use libdwfl"), e6abfea ("libdrgn:
debug_info: report userspace core dump debug info ourselves"), and
1d4854a ("libdrgn: implement optimized x86-64 ELF relocations") for
some examples. The mismatched combination of libdwfl and our own code is
difficult to maintain, and the lack of control over the whole debug info
pipeline has made it difficult to fix several longstanding issues.

The solution is a major rework removing our libdwfl dependency and
replacing it with our own model. This (huge) commit is that rework
comprising the following components:

- drgn.Module/struct drgn_module, a representation of a binary used by a
  program.
- Automatic discovery of the modules loaded in a program.
- Interfaces for manually creating and overriding modules.
- Automatic discovery of debugging information from the standard
  locations and debuginfod.
- Interfaces for custom debug info finders and for manually overriding
  debugging information.
- Tons of test cases.

A lot of care was taken to make these interfaces extremely flexible yet
cohesive. The existing interfaces are also reimplemented on top of the
new functionality to maintain backwards compatibility, with one
exception: drgn.Program.load_debug_info()/-s would previously accept
files that it didn't find loaded in the program. This turned out to be a
big footgun for users, so now this must be done explicitly (with
drgn.ExtraModule/--extra-symbols).

The API and implementation both owe a lot to libdwfl:

- The concepts of modules, module address ranges/section addresses, and
  file biases are heavily inspired by the libdwfl interfaces.
- Ideas for determining modules in userspace processes and core dumps
  were taken from libdwfl.
- Our implementation of ELF symbol table address lookups is based on
  dwfl_module_addrinfo().

drgn has taken these concepts and fine-tuned them based on lessons
learned.

Credit is also due to Stephen Brennan for early testing and feedback.

Closes #16, closes #25, closes #332.

Signed-off-by: Omar Sandoval <[email protected]>
osandov added a commit that referenced this issue Dec 18, 2024
drgn currently provides limited control over how debugging information
is found. drgn has hardcoded logic for where to search for debugging
information. The most the user can do is provide a list of files for
drgn to try in addition to the default locations (with the -s CLI option
or the drgn.Program.load_debug_info() method).

The implementation is also a mess. We use libdwfl, but its data model is
slightly different from what we want, so we have to work around it or
reimplement its functionality in several places: see commits
e5874ad ("libdrgn: use libdwfl"), e6abfea ("libdrgn:
debug_info: report userspace core dump debug info ourselves"), and
1d4854a ("libdrgn: implement optimized x86-64 ELF relocations") for
some examples. The mismatched combination of libdwfl and our own code is
difficult to maintain, and the lack of control over the whole debug info
pipeline has made it difficult to fix several longstanding issues.

The solution is a major rework removing our libdwfl dependency and
replacing it with our own model. This (huge) commit is that rework
comprising the following components:

- drgn.Module/struct drgn_module, a representation of a binary used by a
  program.
- Automatic discovery of the modules loaded in a program.
- Interfaces for manually creating and overriding modules.
- Automatic discovery of debugging information from the standard
  locations and debuginfod.
- Interfaces for custom debug info finders and for manually overriding
  debugging information.
- Tons of test cases.

A lot of care was taken to make these interfaces extremely flexible yet
cohesive. The existing interfaces are also reimplemented on top of the
new functionality to maintain backwards compatibility, with one
exception: drgn.Program.load_debug_info()/-s would previously accept
files that it didn't find loaded in the program. This turned out to be a
big footgun for users, so now this must be done explicitly (with
drgn.ExtraModule/--extra-symbols).

The API and implementation both owe a lot to libdwfl:

- The concepts of modules, module address ranges/section addresses, and
  file biases are heavily inspired by the libdwfl interfaces.
- Ideas for determining modules in userspace processes and core dumps
  were taken from libdwfl.
- Our implementation of ELF symbol table address lookups is based on
  dwfl_module_addrinfo().

drgn has taken these concepts and fine-tuned them based on lessons
learned.

Credit is also due to Stephen Brennan for early testing and feedback.

Closes #16, closes #25, closes #332.

Signed-off-by: Omar Sandoval <[email protected]>
osandov added a commit that referenced this issue Dec 18, 2024
drgn currently provides limited control over how debugging information
is found. drgn has hardcoded logic for where to search for debugging
information. The most the user can do is provide a list of files for
drgn to try in addition to the default locations (with the -s CLI option
or the drgn.Program.load_debug_info() method).

The implementation is also a mess. We use libdwfl, but its data model is
slightly different from what we want, so we have to work around it or
reimplement its functionality in several places: see commits
e5874ad ("libdrgn: use libdwfl"), e6abfea ("libdrgn:
debug_info: report userspace core dump debug info ourselves"), and
1d4854a ("libdrgn: implement optimized x86-64 ELF relocations") for
some examples. The mismatched combination of libdwfl and our own code is
difficult to maintain, and the lack of control over the whole debug info
pipeline has made it difficult to fix several longstanding issues.

The solution is a major rework removing our libdwfl dependency and
replacing it with our own model. This (huge) commit is that rework
comprising the following components:

- drgn.Module/struct drgn_module, a representation of a binary used by a
  program.
- Automatic discovery of the modules loaded in a program.
- Interfaces for manually creating and overriding modules.
- Automatic discovery of debugging information from the standard
  locations and debuginfod.
- Interfaces for custom debug info finders and for manually overriding
  debugging information.
- Tons of test cases.

A lot of care was taken to make these interfaces extremely flexible yet
cohesive. The existing interfaces are also reimplemented on top of the
new functionality to maintain backwards compatibility, with one
exception: drgn.Program.load_debug_info()/-s would previously accept
files that it didn't find loaded in the program. This turned out to be a
big footgun for users, so now this must be done explicitly (with
drgn.ExtraModule/--extra-symbols).

The API and implementation both owe a lot to libdwfl:

- The concepts of modules, module address ranges/section addresses, and
  file biases are heavily inspired by the libdwfl interfaces.
- Ideas for determining modules in userspace processes and core dumps
  were taken from libdwfl.
- Our implementation of ELF symbol table address lookups is based on
  dwfl_module_addrinfo().

drgn has taken these concepts and fine-tuned them based on lessons
learned.

Credit is also due to Stephen Brennan for early testing and feedback.

Closes #16, closes #25, closes #332.

Signed-off-by: Omar Sandoval <[email protected]>
osandov added a commit that referenced this issue Dec 18, 2024
drgn currently provides limited control over how debugging information
is found. drgn has hardcoded logic for where to search for debugging
information. The most the user can do is provide a list of files for
drgn to try in addition to the default locations (with the -s CLI option
or the drgn.Program.load_debug_info() method).

The implementation is also a mess. We use libdwfl, but its data model is
slightly different from what we want, so we have to work around it or
reimplement its functionality in several places: see commits
e5874ad ("libdrgn: use libdwfl"), e6abfea ("libdrgn:
debug_info: report userspace core dump debug info ourselves"), and
1d4854a ("libdrgn: implement optimized x86-64 ELF relocations") for
some examples. The mismatched combination of libdwfl and our own code is
difficult to maintain, and the lack of control over the whole debug info
pipeline has made it difficult to fix several longstanding issues.

The solution is a major rework removing our libdwfl dependency and
replacing it with our own model. This (huge) commit is that rework
comprising the following components:

- drgn.Module/struct drgn_module, a representation of a binary used by a
  program.
- Automatic discovery of the modules loaded in a program.
- Interfaces for manually creating and overriding modules.
- Automatic discovery of debugging information from the standard
  locations and debuginfod.
- Interfaces for custom debug info finders and for manually overriding
  debugging information.
- Tons of test cases.

A lot of care was taken to make these interfaces extremely flexible yet
cohesive. The existing interfaces are also reimplemented on top of the
new functionality to maintain backwards compatibility, with one
exception: drgn.Program.load_debug_info()/-s would previously accept
files that it didn't find loaded in the program. This turned out to be a
big footgun for users, so now this must be done explicitly (with
drgn.ExtraModule/--extra-symbols).

The API and implementation both owe a lot to libdwfl:

- The concepts of modules, module address ranges/section addresses, and
  file biases are heavily inspired by the libdwfl interfaces.
- Ideas for determining modules in userspace processes and core dumps
  were taken from libdwfl.
- Our implementation of ELF symbol table address lookups is based on
  dwfl_module_addrinfo().

drgn has taken these concepts and fine-tuned them based on lessons
learned.

Credit is also due to Stephen Brennan for early testing and feedback.

Closes #16, closes #25, closes #332.

Signed-off-by: Omar Sandoval <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working debuginfo Support for debugging information formats
Projects
None yet
Development

No branches or pull requests

3 participants