Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup Bech32 logic, isolate or make away with bech32 dependency #3195

Open
optout21 opened this issue Jul 19, 2024 · 10 comments
Open

Cleanup Bech32 logic, isolate or make away with bech32 dependency #3195

optout21 opened this issue Jul 19, 2024 · 10 comments

Comments

@optout21
Copy link
Contributor

optout21 commented Jul 19, 2024

Bech32 logic is used for Lightning invoice decoding & encoding, and the bech32 crate is used for that.
This analysis is triggered by the upgrade from bech32 v0.9.1 to bech32 v0.11.0, during which the library API has changed significantly.
The upgrade is being handled by PR #3181 ( #3176 ) .

Here is an analysis of needed/used functionalities, and recommendations:

  • Type for 5-bit numbers

    • used to be u5 type, a wrapped byte, but that's replaced by Fe32, which is roughly similar, but has different focus (field element arithmetic)
    • Fe32 has some minor shortcomings (no Ord ordering, no default)
    • Recommendation is to create own type for this. It is a simple u8 wrapper with a few conversion methods.
    • As this type is used in many places, rolling own implementation minimizes the dependency points to an external crate.
  • Bech32 character encoding.

    • This can be used from Fe32
    • Recommendation is to duplicate logic, as it's rather simple (2 lookup tables)
  • Encoding/decoding packed 5 bit values to bytes

    • this is an important operation, can be used from bech32 crate (Fe32IterExt, ByteIterExt).
    • it's also possible to duplicate logic, either way seems fine
  • Checksum verification and creation

    • This can be used from bech32 create
    • As this has some hashing operations, recommendation is to use from external (implementation can be taken over later; lightning needs only one of the hash method)
  • Parsing of HRP, data part and Checksum

    • Can be used from external create, or taken over, minimal logic.
  • Parsing of tagged values

    • This is own implementation, as bech32 has no lightning-specific logic (and probably will not have later, not in scope).

Conclusion:

  • Best seems to be to re-implement some logic (u5 type, packed decoding, etc.), while reusing some logic from the external crate,
    but in a reduced and isolated way (checksum handling). Getting rid of the external dependency entirely is also a possibility.
  • Own implementation can have the API finetuned for the lightning use case, and reduce the clutter due to type and error conversions.

Options for location of bech32 logic:

  • A mod within lightning crate (e.g. bech32.rs)
  • A separate create within the repo (e.g. bech32-lightning)
  • The lightning-invoice crate is NOT a good option, as lightning crate cannot use it, and currently there is some need there (invoice encoding for signing).
  • Later it could be isolated into a new repo, potentially usable by other Rust lightning implementations
@tcharding
Copy link
Contributor

I had a bit of a play with both the PRs aimed at closing this issue, unless I'm missing something I don't see any reason to keep your custom u5 type. Of course you could elect to not use bech32 at all but I think that would be seen by us (the authors of bech32 as a failure by us). It looks to me like you should be able to do everything you want with the new crate after rust-bitcoin/rust-bech32#189 and rust-bitcoin/rust-bech32#190 are merged and released.

Admittedly the iterator api code is a bit gnarly to read, I'm happy to help if you get stuck.

@apoelstra
Copy link

apoelstra commented Jul 31, 2024

We can add these if they make derives more convenient (I think Default makes sense, at least as much as it makes sense for every std numeric type, though Ord does not, hence us using ArbitraryOrd). But I'm curious why you need an ordering for elements of an unordered field?

For that matter I'm curious where you're directly using field elements at all.

@apoelstra
Copy link

Reading through the codebase it looks like you have a ton of custom checksummed-data-parsing code that should be able to be removed entirely and replaced with calls into the new API, and you shouldn't ever need to deal with FEs.

@apoelstra
Copy link

Ah, ok, because LN invoices have fields which are 5-bit aligned you probably do want to work with Fes, at least during parsing and serialization. But I don't think you ever need to store them at rest.

@apoelstra
Copy link

apoelstra commented Aug 5, 2024

@optout21 so, after having worked on this for a few days I've nearly converted the serialization logic to the new bech32 library. I think this issue is really nontrivial. Essentially, the LN invoice format is a collection of fields which are ad-hoc aligned on 5-bit and 8-bit boundaries, and in the old bech32 API (which provided no assistance with conversions) you were expected to just "produce a slice of u5s somehow".

This results in the following issues:

  • multiple ad-hoc implementations of u8->u5 conversion (the encode_int_be_base32/try_stretch stuff in lightning-invoice/src/ser.rs, construct_invoice_preimage in lightning/src/util/invoice.rs, the write_base32 stuff in lightning/src/ln/features.rs, and probably others that I haven't found yet
  • the use of Vecs everywhere resulting in temporary and unnecessary allocations; it's quite difficult to eliminate these with the old API
  • the bech32 ToBase32 and Base32Len traits, which are weird implementation details, pollute the public ldk API in several places. There are comments saying "these are not exported" which I guess need to be obeyed manually.
  • Vec<u5> appearing in the API as a public field of structs, even though this is conceptually a type which only has a right to exist ephemerally during encoding/decoding
  • Multiple signing-related functions that take &[u5], which is expected to be a base32-encoded version of a raw data part, which is then decoded from base32 back into bytes, after artificially adding some padding to make sure that this doesn't fail. This pollutes the API in many places, the whole path is full of unnecessary allocations and inefficiencies, and the logic is hard to follow because of all the conversions (this code all dates to 2018 AFAICT when the invoice stuff was a one-man project, and it has just been moved around since then).

So if you naively try to translate the old logic to the new one by just replacing types, you'll basically take an existing huge mess and kinda smear it around. What you need to do is (a) experiment a bit with the new iterator-based API (which probably needs much more extensive doccomments; I hadn't considered something as elaborate as the LN invoice format when I wrote that), and (b) identify all the existing places where bech32 logic is re-implemented inline in rust-lighting, and (c) rewrite all the serialization logic.

As I said, I'm nearly finish this on the serialization side. I haven't looked at deserialization yet. Will try to open a PR today or tomorrow, though I'm a bit behind on other stuff since the recent rust-bitcoin summit.

@optout21
Copy link
Contributor Author

optout21 commented Aug 7, 2024

But I'm curious why you need an ordering for elements of an unordered field?

Ordering would have been needed for the tags in a bech32-encoded lightning invoice, each tag has a 32-bit tag value (usually denoted by their Bech32 char representation), and they are kept in a map, they are ordered, etc.

BTW, here the 32-bit elements are just that: 32-bit numbers, the field properties are not used.

@optout21
Copy link
Contributor Author

optout21 commented Aug 7, 2024

So if you naively try to translate the old logic to the new one by just replacing types, you'll basically take an existing huge mess and kinda smear it around.

Thanks for looking into this, @apoelstra , you're right.
I've started looking into this as a simple library dependency upgrade, just replacing types, but it turned out to be more... :D
After digging deeper, I've changed course, towards minimizing/eliminating rust-bech32 usage. The focus of the library and the needs in rust-lightning are a bit different (e.g. no need for field properties, but need for lightning invoice parsing, which is not in the scope of the rust-bitcoin/rust-bech32 projects).
As I see that now there is no open issue/PR on rust-bech32 from me.

@apoelstra
Copy link

lightning-invoices have a bech32 checksum on them. This requires field properties and is exactly what rust-bech32 is designed for. As far as I can tell, every other use of u5s in the library is an API mistake that comes from storing and manipulating partially-parsed data.

@optout21
Copy link
Contributor Author

optout21 commented Aug 9, 2024

Update: Default support has been added (rust-bitcoin/rust-bech32#184), and Ord can be worked-around (needed only by UnknownSemantics).

@apoelstra
Copy link

FYI I have a branch where I've been working on this https://github.com/apoelstra/rust-lightning/tree/2024-08--new-bech32

Though it is a bit of a mess; it needs #3234

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants