-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for text based stub libraries (.tbd) files #265
Comments
My sense is that the data structures and parsing belongs in a new |
@indygreg first off, thank you for the kind words! :) So re So generally speaking, in the past, I've opted for:
These were the general motivating "directives" of the library. So e.g., when there was some push for wasm support, i was 50/50 on including/not including, creating own submodule, or pulling in external, etc. We may still do that! But on the subject of On that note, i think it could be reasonable to add the variant, e.g. here: #[derive(Debug)]
#[allow(clippy::large_enum_variant)]
/// Either a collection of multiple architectures, or a single mach-o binary
pub enum Mach<'a> {
/// A "fat" multi-architecture binary container
Fat(MultiArch<'a>),
/// A regular Mach-o binary
Binary(MachO<'a>),
/// A Tbd file
TBD(TBD<'a>), // or maybe a `&'a str` ?
} As noted by @willglynn a way to do this might be to have an external There are two major problems with this approach:
so the above two are the major problems I see in adding support for In particular, parsing/loading the
which will likely massively explode the amount of deps. Maybe this can be "fixed" with a cfg on that variant, etc., but this is in violation of 1., and it just doesn't seem very clean to me. Also, having had experience with cfgs and rust features, used like this I do believe it's an anti-pattern, and can cause unintentional bloat and/or unusual recompiles when features don't unify in a workspace with multiple crates. Anyway, these are some of my fears, and so i'm heavily biased adding parsing functionality in goblin itself for tbd (as opposed to external deps); however, as I said, I could be persuaded. Some random thoughts/questions:
So to summarize:
Thoughts appreciated :D EDIT: added clarification i'm heavily biased towards adding parsing functionality inside goblin itself, instead of using external crate with cfgs |
I decided to shave a yak today and I implemented a minimal crate for parsing This would seemingly be more ammunition for not using an uncontrolled 3rd party crate for the FWIW I validated my |
So i'm wondering if a simple, more flexible approach here is just to detect the tbd (glancing at your code, this seems somewhat complicated/tedious), and just return a |
If you are proposing In order to avoid a future API break, do you think it would be worth declaring a new |
YAML supports UTF-8, UTF-16, and UTF-32 in both byte orders (see YAML 1.2 § 5.2). I've been assuming that Edit: I'd be happy with an |
Oh I actually think I like this idea, I believe you’re proposing eg: pub struct Tbd {
pub bytes: &[u8]
} then as first approximation (Maybe final!) we can direct users to parse the file using eg your crate if you publish it or, etc? On the other hand the user could just do all the same by attempting such a parse if they encounter an Unknown variant? I guess the advantage to returning the TBD(TBD) variant is just to signal to the user we didn’t encounter something expected (but still leave it up to them how to parse it further...) |
That's exactly what I'm proposing.
Yes. By introducing the enum variant, goblin effectively says we know this data is related to mach-o. That could be a useful signal to end-users and a nudge that they should consider doing something with the data. It does add an API break though. I'm unsure if you are willing to do that with the minimal added benefit considering there would be no built-in support for reading the TBD content. Although we could potentially throw something useful in there, such as sniffing the YAML document versions and exposing the TBD version via the goblin API. |
Thank you for maintaining goblin. It is a joy being able to open binary files from any platform and analyze their contents without having to install a myriad of tools to support various binary formats.
I recently found myself wanting to parse text based stub libraries (.tbd files) from Rust and was curious if you would be receptive to including support in goblin. (I might contribute support myself.)
.tbd
files are essentially descriptors of mach-o dylibs. Apple uses them in their SDKs to describe dylibs. I think the motive behind these files is it enables linkers to do their job using a minimal representation of the dylib without having to ship full dylibs in SDKs. This helps reduce the size of the SDK.I'm unsure if there is a canonical specification for this file format. However, there's a comprehensive inline comment in the LLVM source code at https://github.com/llvm/llvm-project/blob/main/llvm/lib/TextAPI/MachO/TextStub.cpp that defines it.
The file format is YAML. If I were to implement support for parsing these files in Rust, I'd likely define a bunch of Rust structs representing the various components and then use serde for (de)serialization from/to YAML. If we did this in goblin, we'd pick up a handful of new crate dependencies. I'm unsure if that would be desirable. Of course, we could always define a conditional crate feature to toggle support for text based stub libraries.
Given that text based stub libraries describe mach-o libraries and are used widely on Apple platforms, I can make a compelling case for their inclusion in goblin as a supported format. I was unable to find any Rust crates for parsing this file format on crates.io, so there appears to be a market need.
Are you interested in supported text based stub libraries in goblin? If so, do you have any thoughts on YAML parsing and new crate dependencies?
The text was updated successfully, but these errors were encountered: