Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accept dex files as input #68

Open
dazzleworth opened this issue Dec 17, 2019 · 5 comments
Open

Accept dex files as input #68

dazzleworth opened this issue Dec 17, 2019 · 5 comments

Comments

@dazzleworth
Copy link

Is this in the works? To reduce the need for conversion tools.

@leibnitz27
Copy link
Owner

Hey @dazzleworth - so in general, I strongly favour the 'unix philosophy' - do one thing, do it well, and compose with other tools.

The only real reason I'd imagine adding native dex support would be if there's some useful content inside a dex file that gets impossibly lost / corrupted when converting into a jar, such that (eg) dex2jar produced a sub-optimal jar, which led to cfr producing sub-optimal java.

@dazzleworth
Copy link
Author

dazzleworth commented Dec 17, 2019

If you look at dex2jar (https://github.com/pxb1988/dex2jar), the latest commit was 3 months ago. Its competitor enjarify (https://github.com/google/enjarify) has not been touched for 3 years ago. This is surprising given its a Google project. Such infrequent updates worries me about the code conversion quality. You know the Dalvik bytecode system evolves quickly too, Eg, support for multidex, etc

Whereas JADX (https://github.com/skylot/jadx) and PNF's JEB have engine that works on dex code directly. And they are effectively maintained

@dazzleworth
Copy link
Author

dazzleworth commented Dec 18, 2019

At the very least, please consider using jadx.api.* as your starting point for integration with dex files. Let skylot handle the dex pre-processing for you.

@ghost
Copy link

ghost commented Dec 31, 2019

The only real reason I'd imagine adding native dex support would be if there's some useful content inside a dex file that gets impossibly lost / corrupted when converting into a jar, such that (eg) dex2jar produced a sub-optimal jar, which led to cfr producing sub-optimal java.

I have seen Android Dex classes contain meaningful SourceFile attributes, which dex2jar (for some reason) does not translate into .class SourceFile attributes (even with -d), and so useful data is lost.

JADX helpfully outputs this info as /* compiled from: ... */ comments above each class def.

@Lanchon
Copy link

Lanchon commented Dec 31, 2019

hi @leibnitz27 ,

since 2015 i've been building a toolset that builds upon dexlib2 (smali's dex i/o and analysis library), relies on apktool for some important functionality (loosely coupled), and to a lesser degree on dex2jar. hopefully i can provide some relevant context to this discussion...

AFAIK, no reliable dex to jar conversion tool exists.

dex2jar is abandonware. the author of dex2jar is missing in action. in 2017, tired of waiting for updates and to help users of my tools, i published versions of dex2jar fixed for android 7 and 8, where official releases had been stuck with android 6 support for years already then. it was low hanging fruit anyway. some official development happened since but not very significant.

i sort of dove into dex2jar's code to revive some other dead tools from the dex2jar toolset. i made maybe around 40 or 50 commits until i gave up. dex2jar is notorious for small bugs, and IMHO it was hastily put together. no doubt the developer is smart and very prolific, but you can't help but think that the guy just sat down and spat a lot of code without regard for design, cleanness, maintainability, etc. the code is kind of a mess, and i believe at this point it is easier to start a new project than to continue dex2jar. and the things i wanted to accomplish with dex2jar's codebase i eventually coded from scratch, but on the dalvik bytecode side of the world.

enjarify is abandonware since 2016. the author (with google) claims he made it because dex2jar was full of small issues (i certainly believed him later). he claims his codebase had almost no loose ends. unfortunately enjarify is much slower even than dex2jar, and was written in python. (i think we need tools hosted in the JVM for the JVM reversing ecosystem.) and enjarify is incomplete: it does not translate annotations; the author just didn't bother. this means its usefulness is limited in practice.

enjarify was written on company time. goog later pulled the author away from the project, and he is not interested in maintaining it on his own time. (as a personal project, he did rewrite enjarify completely in rust to learn the language, but the rust codebase is a little behind python's.) so enjarify, albeit supposedly better written than dex2jar, is dead: stuck on android 6 and unable to process annotations. it is not JVM, and the author also reinvented the wheel (i think) with his own bytecode I/O libraries (no finger pointing to CFR here :) ) so that makes maintaining it more of a chore. and no one has. here is a short conversation i had with the author about this in 2016: google/enjarify#4 (comment)

in general, I strongly favour the 'unix philosophy' - do one thing, do it well, and compose with other tools.

abso-effing-lutely! what we need is not CFR reading dex, we need a proper and maintained dex to jar tool. if you ever consider adding dex support for CFR, please, please, work on a conversion tool instead. i don't know why you wanted to rewrite CFR in c++, rust or your choice of native language. tools for the ecosystem should be self-hosted in the JVM as much as possible. and if java feels old, you can always use your preferred niche JVM language instead. in the JVM we already have the mature and maintained ASM (or so i think, i used it just trivially; and it has viable competitors also), so your work on your own classfile reader was not very productive (unless you thought ASM was not good enough for the job). and when you some day abandon CFR, that decision will add weight on future maintainers, if any.

but that's history. i'm guessing your codebase only implements classfile readers, so it make sense to use ASM or the like for a conversion tool if you ever consider working on one. rolling your own makes it less likely for anyone to maintain your code later.

for completeness, the DEX side is not so clear cut unfortunately...

dex2jar has its own dex I/O library. as i said before, i wouldn't use that codebase, i'm not a fan of its quality. JF's dexlib2 from smali is quality code. JF is also a googler. google likes and uses smali: they want two independent dex I/O libraries made from scratch from spec as a means to verify one against the another. the android codebase has smali in it to run tests. unfortunately (and very much for me, since i use it heavily), smali/dexlib2 is on life support: JF only puts a little time in it here and there and doesn't seem interested in the project anymore. the last couple of years there have been no clear indications of what is supported and what is not in the codebase. support for newer versions of android came in late, so late sometimes that google's own users of smali had to fork it and update it themselves sometimes to be able to continue to test their android codebase with it.

my best guess regarding the current status of dexlib2 is that it supports Android 9 and almost all of Android 10 except for 'hiddenapi_class_data_item', which means it can I/O the code of all Android 10 apps, but not the code of Android 10's framework. see: JesusFreke/smali#731

when dexlib2 went stagnant, i regretted the decision of basing my tools on it instead of hacking into official 'dx' code, which i though at the time would be maintained forever. but it was not. and now it's replaced with r8/d8.

the r8/d8 codebase is definitely a strong contender as a basis for a dex I/O library. the current official state of the art, it's new and hopefully maintained for years. d8 is dx rewritten from scratch for performance, to support the new java features, and to integrate proguard's functionality. it provides separate predexing and linking stages (for incremental compilation) so it has the ability to read as well a write dex bytecode. r8/d8 is supposedly well written, and by the well known author of the first ever compiling javascript engine: v8.

ASMDEX was abandoned a long time ago. also, supposedly jadx and other tools have their own readers (which i never looked into). i've been bitten too many times by abandoned code to care.

one more thing: i'm happy with dexlib's quality as i said, but it doesn't seem to be written with performance in mind the way r8/d8 is. but the whole tooling situation is a bit desperate, and i couldn't care less for performance when what we need are dependable tools.

so those are my $0.02.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants