-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Determine any cross-implementation API requirements #4
Comments
I propose that we use the following common KEM API (following what's currently in the ML-KEM reference implementation): int pqcp_mlkem512_ref_keypair_derand(uint8_t *pk, uint8_t *sk, const uint8_t *coins);
int pqcp_mlkem512_ref_keypair(uint8_t *pk, uint8_t *sk);
int pqcp_mlkem512_ref_enc_derand(uint8_t *ct, uint8_t *ss, const uint8_t *pk, const uint8_t *coins);
int pqcp_mlkem512_ref_enc(uint8_t *ct, uint8_t *ss, const uint8_t *pk);
int pqcp_mlkem512_ref_dec(uint8_t *ss, const uint8_t *ct, const uint8_t *sk); The namespacing can be dropped for languages where it's done differently. (I'm assuming here that NIST is not going to change their mind about having those de-randomized APIs. I personally don't think there is a technical need for these because replacing the This API is already being used by https://github.com/pq-code-package/mlkem-c-embedded and https://github.com/pq-code-package/mlkem-c-aarch64 as those are based on the reference implementations. For the saving code size, it may be useful for some target platforms to have dynamic parameter selection. enum {
MLKEM512 = 0,
MLKEM768 = 1,
MLKEM1024 = 2
};
int pqcp_mlkem_ref_keypair_derand(uint8_t *pk, uint8_t *sk, const uint8_t *coins, int param);
int pqcp_mlkem_ref_keypair(uint8_t *pk, uint8_t *sk, int param);
int pqcp_mlkem_ref_enc_derand(uint8_t *ct, uint8_t *ss, const uint8_t *pk, const uint8_t *coins, int param);
int pqcp_mlkem_ref_enc(uint8_t *ct, uint8_t *ss, const uint8_t *pk, int param);
int pqcp_mlkem_ref_dec(uint8_t *ss, const uint8_t *ct, const uint8_t *sk, int param); |
I agree with Matthias. In addition to that, I propose to use extended names for the function arguments. I think it improves usability and reduces the chances of misusage.
Regarding corresponding header files here is what is currently in use for libjade: api.h It is similar to other projects as well. Regarding namespacing and code organization: should we include, for instance, Currently, in libjade, the name of each exposed function is the full implementation path — an example here. In the context of Regarding derandomized API's, they have the advantage of greatly simplifying known answer tests:
|
I'm a newbie to MK-KEM so forgive any nonsensical input Staying close to the standard/reference implementation make sense..
NIST uses _internal for the derand functions since they are geared to testing rather than consumer use. This seems to make the scope/usage clearer. In the proposal here they are 'first class'. Is this because we expect that in practice those will be used by consumers? is this because we'd seen the need to do this in NSS?
The current functions are prefixed with I can see this was done for
Given this We will have a number of 'ref' implementations varying (I assume) by the nature of their optimizations (and in pqcp we have embedded and aarch64) How much of this should be exposed at the API. Any change between implementation will require code changes from the consumer, so are no longer 'drop in' replacements. I presume why the proposal here with this prefix is that it remains the same for all of the implementations within pqcp that are implemented in a particular language therefore allowing swapping between them. (Accept that with other languages such as go, java etc the namespacing at function level is not needed due to package definitions)
As a followup to the point above about what explicit instruction support is needed for a particular implementation - is there a need to be able to query this at runtime - it to expose a discovery api that amongst other things would report some specifics about the implementation actually being called? |
I've proposed an update on this for the TSC this week. |
I think is worth discussing in a TSC meet (maybe #82).
|
Agreed in TSC meeting 2024-07-18 that we'd open up a new issue to discuss each proposal & gather feedback->decision. |
I'm afraid we will have to revisit this in the next TSC meeting.
My understanding is that it is acceptable to only verify this once and then use the keys many times. I recently had a chat with @cryptojedi and one proposal is a 9-function API consisting of the usual 5 functions (keygen, keygen_derand, encaps, encaps_derand, decaps) plus 4 extra functions for serialization and deserialization of the public and secret key. It could something like this (but function names are to be discussed): int crypto_kem_serialize_sk(uint8_t sks[MLKEM_SECRETKEYBYTES],
const mlkem_secret_key *sk);
int crypto_kem_deserialize_sk(mlkem_secret_key *sk,
const uint8_t sks[MLKEM_SECRETKEYBYTES]);
int crypto_kem_serialize_pk(uint8_t pks[MLKEM_PUBLICKEYBYTES],
const mlkem_public_key *pk);
int crypto_kem_deserialize_pk(mlkem_public_key *pk,
const uint8_t pks[MLKEM_PUBLICKEYBYTES]);
int crypto_kem_keypair_derand(mlkem_public_key *pk, mlkem_secret_key *sk,
const uint8_t *coins);
int crypto_kem_keypair(mlkem_public_key *pk, mlkem_secret_key *sk);
int crypto_kem_enc_derand(uint8_t ct[MLKEM_CIPHERTEXTBYTES],
uint8_t ss[MLKEM_SSBYTES], const mlkem_public_key *pk,
const uint8_t *coins);
int crypto_kem_enc(uint8_t ct[MLKEM_CIPHERTEXTBYTES], uint8_t ss[MLKEM_SSBYTES],
const mlkem_public_key *pk);
int crypto_kem_dec(uint8_t ss[MLKEM_SSBYTES],
const uint8_t ct[MLKEM_CIPHERTEXTBYTES],
const mlkem_secret_key *sk); The If such an API were to be adapted the input validation would only have to happen once in the deserialization and one can be sure that the keys passed to encaps/decaps have been validated. I have implemented a draft of the above API here, but I do not want to change it until we have reached consensus on a API - we should also coordinate this with liboqs. There are some major downsides to this changes:
It may make sense to sent this to the pqc-forum after we reached consensus to ask for feedback from NIST and the community. |
The new API above may be easier to sell as a performance improvement rather than a way to enforce the input validation. I'll try to get some numbers before the meeting tonight. Beyond what was written above three things that need to be dicussed
|
To add a bit of context: This 7-function API (modulo naming of functions) is already in use by BoringSSL [1], by Go [2], and by Zig [3]. I had some discussions with Gorjan Alagic and Douglas Stebila in the hallway sessions of Crypto about how libraries are supposed to enforce input validation without having to do it all the time. My impression was that those discussions converged to this 7-function API . As a summary of the advantages and disadvantages, what I remember is the following: Advantages of the 7-function API
Disadvantages
[1] https://github.com/google/boringssl/blob/b7f5443cfc1298d77dfb9e6f2eea68035de521a4/include/openssl/experimental/kyber.h |
I've done some benchmarking for our ML-KEM implementations. You can see the full results for all parameter sets for various AArch64 and x86_64 platforms here: https://github.com/pq-code-package/mlkem-c-aarch64/actions/runs/11720085631?pr=363 Let's, for example, consider Graviton 4 performance. The new API (with sk being a 64-byte seed) gives you the following results for ML-KEM-768:
wheras the old API (w/ public key check and secret key check) was this:
NB: If you need the pair-wise consistency (PCA) check in key generation, that going to be much faster with the new API. You basically go from 33139 + 38645 + 49292 to 31750 + serialization + 13677 + 23713. That should be around 2x faster for keygen w/ PCA. |
@cryptojedi wrote: "Can actually enforce input validation, without having to do it for every encapsulation and decapsulation" The common use of ML-KEM is ephemeral, so what is really saved here? At best, one would not do a check at decapsulation -- but it is not clear to me that this would actually be FIPS compliant, since the internal secret key, in this case, has not come from the "Efficiency improves if the matrix is fully expanded in the internal representation of the secret key" Agree "Easily allow more tradeoffs between secret-key size and speed (potentially interesting for embedded devices)" Agree "3-function API can easily be implemented on top, but performance will be much worse, in particular if the serialized secret key is just a 64-byte seed" The performance is only worse if the serialized SK is the seed? If it's the standard SK format (which is what PQCP currently uses), then there is no overhead? The redundant serialize+deserialize step amounts to the present inefficiency of MLKEM requiring to re-expand the A-matrix, but not more. "7-function API is further away from the abstract cryptographic object of a KEM ("a KEM is a three-tuple of algorithms...)" Agree. === I support the change from a technical standpoint, so long as the default SK format stays is the standard one, not the seed one. My main concern is with FIPS validation. I would like to have a precise public statement from NIST on the following points:
|
Regarding seed vs. standard: There is currently ongoing discussions in the IETF that seem to heavily push for the seed format.
|
I very much agree on requesting more clear and public statements from NIST regarding input validation and FIPS certification! |
"Regarding seed vs. standard: There is currently ongoing discussions in the IETF that seem to heavily push for the seed format." @cryptojedi Could you share some pointers? |
To me, using just a seed makes sense mostly in two scenarios: My impression is that the discussion about secret format gained traction due to all kind of binding notions for KEMs introduced in https://eprint.iacr.org/2023/1933 and the observation by Sophie Schmieg that with the expanded secret-key format, ML-KEM is not MAL-BIND-K-CT. |
I think it makes sense two discuss switching the API and switching the serialized sk representation separately as they seem orthogonal. To ease that I additionally implemented the new API with the traditional sk format here with benchmarks here.
compare this to the current performance with the 3-function API (w/ sk and pk validation):
Unsurprisingly, we see that now the new Regarding the secret key seed, I do see that this can be advantageous in some scenarios. In most scenarios, however, it's adding no clear (?) benefit while potentially costing a lot of performance. So maybe we can leave this up to the PQCP sub-projects depending on their scenarios and consumers? |
In the TSC meeting on 11/07 we seemed to agree that it is a good idea to ask NIST if the proposed 7-function API is compliant with FIPS203 and a valid way to enforce input validation. Most importantly we want to ask if it can pass FIPS validation. If they say it is fine, ideally we would implement it across all PQCP projects and beyond. @franziskuskiefer @mbbarbosa @tfaoliveira @dstebila @jschanck: It would be great to hear your opinion. One open question was whether a function producing a public key from a secret key should be added to the API as well. Swichting the secret key to a 64-bit seed was more controversial as it does cost performance in many scenarios. For this change it is also more clear that it is allowed as FIPS203 explicitly states it. I'd like to defer that discussion for now as it is somewhat independent to the API discussion and easier to change at a later point. |
This approach makes sense to me. |
Sounds like a good plan.
How would the seed be different from the coins in |
@dstebila I think the 64-byte seed is indeed exactly the coins in |
@dstebila - yes it is the coins from |
In liboqs there is the proposal to support seed-only private keys as an additional algorithm variant for ML-KEM. If there are performance tradeoffs, I think it would be useful to support both seeds and expanded private keys in PQCP. |
No description provided.
The text was updated successfully, but these errors were encountered: