Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] Secure against length extension attacks #108

Open
chris-ha458 opened this issue Jul 19, 2023 · 4 comments
Open

[Feature request] Secure against length extension attacks #108

chris-ha458 opened this issue Jul 19, 2023 · 4 comments
Assignees
Labels
enhancement New feature or request

Comments

@chris-ha458
Copy link

chris-ha458 commented Jul 19, 2023

Is your feature request related to a problem? Please describe.
Per folder SHA256 hashes can be potentially vulnerable to length extension attacks.

Describe the solution you'd like

  1. Change hash function to one that is resistant to length extension attacks
    SHA-384 and SHA-512/256 exist. However the latter is difficult to canonicalize due to slash in the name (it is part of its official name). If speed or size is a concern, BLAKE3 is extremely fast and secure as well.
  2. Add filesize in bytes when writing the hash and filename
    This can work, and can be achieved with minimal code change.
  3. Both of the above. It would be nice to have file sizes with the checksum for verification purposes. Also, even if the new hash has a length extension attack found, it will still be secure.

Describe alternatives you've considered
DO NOT

  1. use xxhash from the zstd file.
    I thought about this possibility since it is already present, but xxhash64 as used in zstd is a very fast hash function with minimal
    security guarantees. If we assume that somebody is manipulating the json.zst file and doing so with sufficient compute to actually launch a length extension attack this will not provide any further security.

Additional context
I consider OSCAR as an important part of the Data pipelines supply chain so I this the bar i hope OSCAR can clear.
I will be willing to further investigate potential hash functions, implementations and provide PRs if necessary.

Reference : oscar-project/documentation#13

@chris-ha458 chris-ha458 added the enhancement New feature or request label Jul 19, 2023
@chris-ha458
Copy link
Author

Some more info that I've found

  1. sha384, sha512, blake2b are all available through gnu coreutils [sha2 utilities] (https://www.gnu.org/software/coreutils/manual/html_node/sha2-utilities.html) and b2sum
    GNU coreutils are widely available in most if not all linux versions that are generally used, so it would be useful to use this program.
  2. sha-512/256 nor Blake3 is not available here.
  3. adding filesizes into the produced checksum file will make it incompatible with verification with the following command sha256sum -c or sha384sum -c as such, adding it would necessitate further consideration into downstream uses.

From these findings, I will prepare a miminum viable PR that would drop-in change sha256 with sha384 and all relevant documentation.
Previous datasets would not be harmed. End users can continue to use sha256 to verify their datasets.
Future users can use sha384 to verify their datasets.

An option to select hashes and a robust method to support it would be optimal, but beyond the scope of a "simple" drop in patch.

chris-ha458 added a commit to chris-ha458/ungoliant that referenced this issue Jul 21, 2023
Changes Sha256 usage into sha384 .  Addresses oscar-project#108
@chris-ha458
Copy link
Author

So a prepared a PR to change SHA256 into SHA384.

I have been wrong on some points though.
Unless SHA256 is used as part of an HMAC construction, which it doesn't seem to be, the context of length extension attack vulnerability is invalid.

There is still value in changing into SHA384 since it could be the baseline for when HMAC might be implemented.
Also, SHA384 is much faster than SHA256 due to the underlying SHA512 hash is more amenable to 64bit implementations.

1MiB input
Timing sha256sum... 10_000 iterations
real    0m23.134s
user    0m22.315s
sys     0m1.060s

Timing sha384sum...10_000 iterations
real    0m16.110s
user    0m15.372s
sys     0m1.031s

@Uinelj
Copy link
Member

Uinelj commented Jul 25, 2023

Thanks a lot for the PR!

We are (slowly) removing code that is not immediately linked to the pipeline from ungoliant, putting it in oscar-tools, but we still have duplicates here and there. The PR should be made on oscar-tools rather than on the main project, but IIRC the code is very similar so it shouldn't be hard to do :)

Code is here: https://github.com/oscar-project/oscar-tools/blob/main/src/ops/checksum.rs

@chris-ha458
Copy link
Author

It's very simple code and the main discussion point is whether to integrate it at all, rather than how.
Since it seems like there is interest in integrating such code, I will prepare a new PR under oscar-tools

chris-ha458 added a commit to chris-ha458/oscar-tools that referenced this issue Jul 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants