Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: do not parse base62 strings of unexpected length #8

Merged
merged 1 commit into from
Feb 29, 2024

Conversation

nilium
Copy link
Contributor

@nilium nilium commented Feb 23, 2024

Motivation

I noticed that from_base62 would accept inputs of arbitrary length and decode them from base62 instead of validating length ahead of time. Further, if a KSUID decoded to an excessively long string of bytes, it would quietly accept this and only copy the remaining 20 bytes of data. As a result, this meant that rust-ksuid would accept strings of invalid length instead of rejecting them like the Segment KSUID implementation.

I'm unclear if this is a behavior that rust-ksuid users depend on, so maybe should have a default-on strict crate feature (or, preferably, nonstrict that's off by default) so users have to opt into parsing invalid KSUIDs. If that's the right path forward, users should probably also be made aware that rust-ksuid might allocate much more memory than is needed for decoding a KSUID. I have a feeling this isn't a great way to OOM code that uses the crate, but sending enough unusually long KSUIDs might pose a risk to those applications from either memory allocation or consuming time decoding long base62 strings.

Solution

Before decoding any base62 string, check that the string's input length is 27 bytes and add tests to verify long and short strings are still invalid.

Reject parsing base62 strings that would result in unexpectedly
long payloads and potentially large allocations. Without this, it
is possible to use a base62 string to pass in extra bytes of data
that are unused (but do consume memory for a time). I don't know
if this is something people rely on, but it seems unlikely that
anyone would find this desirable. This might represent a possible
way to OOM any code accepting KSUIDs that didn't limit incoming
payload sizes already, but there are probably better options for that.

Add a test for both long and short base62 strings to try to skip
allocating anything for strings of the wrong length by decoding
them. Previously, length checks only happened after decoding and
would only take the tail end of the decoded base62 string.
Copy link

@svix-james svix-james left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR -- looks good to me!

I don't think there's any need to maintain compatibility for invalid lengths, but will defer to others. @tasn any thoughts there?

@svix-jplatte
Copy link
Member

Hm, right below after decoding the base62 strings, we have some logic for ignoring extra bytes though. If we're going to error for over-long IDs anyways, we should at least remove that code. But it being there suggests that there is a compatibility concern there.

@nilium
Copy link
Contributor Author

nilium commented Feb 28, 2024

I think the two later checks have different purposes now and can't really be removed, but for different reasons than they might have originally had:

  1. Handling excess bytes is necessary because base_encode will sometimes produce vectors over 20 bytes in length (there are a few in the test cases JSON file that have 22 bytes; I assume this is just a side effect of 160 not mapping evenly to base62, so some extra padding bits will be left over). So, the function still has to get it back down to 20 bytes for use as a KSUID. Right now this is done with a copy, it could also be done with try_into after slicing, but I have a feeling there isn't much difference in how it gets there.

  2. When checking for fewer bytes (the else when checking the loaded vec's length), base_encode could in case of a bug (none exist that I know of, this is just about handling the case safely) return a shorter vec than can be handled, so at minimum it's needed as a bounds check for that. It should be incredibly unlikely but it's better than panicking.

Copy link
Member

@svix-jplatte svix-jplatte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explanation sounds reasonable to me, but I'm don't have that much experience in this area so will leave merging up to @svix-james.

@tasn tasn merged commit f71ac1d into svix:main Feb 29, 2024
3 checks passed
@nilium nilium deleted the check-input-base62-length branch February 29, 2024 16:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants