Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Determine any cross-implementation API requirements #4

Open
dstebila opened this issue Feb 3, 2024 · 23 comments
Open

Determine any cross-implementation API requirements #4

dstebila opened this issue Feb 3, 2024 · 23 comments

Comments

@dstebila
Copy link
Contributor

dstebila commented Feb 3, 2024

No description provided.

@mkannwischer
Copy link
Contributor

I propose that we use the following common KEM API (following what's currently in the ML-KEM reference implementation):

int pqcp_mlkem512_ref_keypair_derand(uint8_t *pk, uint8_t *sk, const uint8_t *coins);
int pqcp_mlkem512_ref_keypair(uint8_t *pk, uint8_t *sk);
int pqcp_mlkem512_ref_enc_derand(uint8_t *ct, uint8_t *ss, const uint8_t *pk, const uint8_t *coins);
int pqcp_mlkem512_ref_enc(uint8_t *ct, uint8_t *ss, const uint8_t *pk);
int pqcp_mlkem512_ref_dec(uint8_t *ss, const uint8_t *ct, const uint8_t *sk);

The namespacing can be dropped for languages where it's done differently.
mlkem512 is to be replaced by the corresponding parameter set and ref can be replaced by an identifier of the implementation. This follows what NIST wants to see: https://groups.google.com/a/list.nist.gov/g/pqc-forum/c/Mf2kemwwreY/m/KArjoIhxAQAJ.
The keypair and enc wrappers basically just fill the coins with randombytes() (64 bytes for keypair, 32 bytes for enc) and then call the derandomized versions.

(I'm assuming here that NIST is not going to change their mind about having those de-randomized APIs. I personally don't think there is a technical need for these because replacing the randombytes() implementation with a deterministic one for testing works just as well as Daniel J. Bernstein argues in the thread. We've had good experience with that approach in the PQClean and also pqm4.)

This API is already being used by https://github.com/pq-code-package/mlkem-c-embedded and https://github.com/pq-code-package/mlkem-c-aarch64 as those are based on the reference implementations.

For the saving code size, it may be useful for some target platforms to have dynamic parameter selection.
In that case, I would additionally allow an API with an additional parameter selecting the parameter set. Something like this would work:

enum {
    MLKEM512  = 0,
    MLKEM768  = 1,
    MLKEM1024 = 2
};

int pqcp_mlkem_ref_keypair_derand(uint8_t *pk, uint8_t *sk, const uint8_t *coins, int param);
int pqcp_mlkem_ref_keypair(uint8_t *pk, uint8_t *sk, int param);
int pqcp_mlkem_ref_enc_derand(uint8_t *ct, uint8_t *ss, const uint8_t *pk, const uint8_t *coins, int param);
int pqcp_mlkem_ref_enc(uint8_t *ct, uint8_t *ss, const uint8_t *pk, int param);
int pqcp_mlkem_ref_dec(uint8_t *ss, const uint8_t *ct, const uint8_t *sk, int param);

@tfaoliveira
Copy link

I agree with Matthias. In addition to that, I propose to use extended names for the function arguments. I think it improves usability and reduces the chances of misusage.

int pqcp_mlkem768_keypair_derand(
  uint8_t *public_key,
  uint8_t *secret_key,
  const uint8_t *keypair_coins
);

int pqcp_mlkem768_keypair(
  uint8_t *public_key,
  uint8_t *secret_key
);

int pqcp_mlkem768_enc_derand(
  uint8_t *ciphertext,
  uint8_t *shared_secret,
  const uint8_t *public_key,
  const uint8_t *enc_coins
);

int pqcp_mlkem768_enc(
  uint8_t *ciphertext,
  uint8_t *shared_secret,
  const uint8_t *public_key
);

int pqcp_mlkem768_dec(
  uint8_t *shared_secret,
  const uint8_t *ciphertext,
  const uint8_t *secret_key
);

Regarding corresponding header files here is what is currently in use for libjade: api.h

It is similar to other projects as well.


Regarding namespacing and code organization: should we include, for instance, ref, avx2, amd64/x86_64/x64, or armv7m in the function names to identify, in more detail, the function and avoid misusages?

Currently, in libjade, the name of each exposed function is the full implementation path — an example here.

In the context of amd64, with which I'm most familiar, I think it is important that each implementation is distributed with an extra file stating which CPU extensions the implementation requires (if applicable: Jasmin/assembly implementations). For example, Keccak permutation using the andn instruction (defined in BMI1 extension) can be significantly faster (up to 20% IIRC). I think some CPU with support for AVX2 do not have BMI1. In any case, it might be excessive to include the full set of CPU extensions that an implementation requires in the corresponding function names (note: current mlkem768 requires BMI1 as it uses the non-AVX2 keccak permutation as well).


Regarding derandomized API's, they have the advantage of greatly simplifying known answer tests:

@planetf1
Copy link
Contributor

I'm a newbie to MK-KEM so forgive any nonsensical input

Staying close to the standard/reference implementation make sense..

  • internal vs derand

NIST uses _internal for the derand functions since they are geared to testing rather than consumer use. This seems to make the scope/usage clearer.

In the proposal here they are 'first class'. Is this because we expect that in practice those will be used by consumers? is this because we'd seen the need to do this in NSS?

  • which implementation/namespace

The current functions are prefixed with
pqcp_mlkem512_ref_

I can see this was done for

  • our project (pqcp)
  • what algorithm (mlkem512)
  • which implementation (ref)

Given this

We will have a number of 'ref' implementations varying (I assume) by the nature of their optimizations (and in pqcp we have embedded and aarch64)

How much of this should be exposed at the API. Any change between implementation will require code changes from the consumer, so are no longer 'drop in' replacements.

I presume why the proposal here with this prefix is that it remains the same for all of the implementations within pqcp that are implemented in a particular language therefore allowing swapping between them.

(Accept that with other languages such as go, java etc the namespacing at function level is not needed due to package definitions)

  • More info

As a followup to the point above about what explicit instruction support is needed for a particular implementation - is there a need to be able to query this at runtime - it to expose a discovery api that amongst other things would report some specifics about the implementation actually being called?

@planetf1
Copy link
Contributor

planetf1 commented Jun 3, 2024

I've proposed an update on this for the TSC this week.

@franziskuskiefer
Copy link

I think is worth discussing in a TSC meet (maybe #82).
Two observations on what's in here so far

  • Randomised APIs will need a randomness source in most cases. In Rust we can define that with traits, in C that's a little hard to do. That's why we usually use de-randomised versions for integration.
  • There's another API that does not serialise all the keys (unpacked keys) that is faster and folks may want to use in many cases (like TLS).

@planetf1
Copy link
Contributor

Agreed in TSC meeting 2024-07-18 that we'd open up a new issue to discuss each proposal & gather feedback->decision.

@mkannwischer
Copy link
Contributor

I'm afraid we will have to revisit this in the next TSC meeting.
This came up when we were discussing input validation as mandated by FIPS203.
Context: FIPS203 mandates input validation of the public key before encapsulation and of the secret key before decapsulation. Besides checking they are of the right size, the checks are the following:

  • For the public key, one needs to verify that coefficients are actually in [0, q-1].
  • For the secret key, one needs to verify that the pre-computed H(pk) actually corresponds to the hash of the public key.

My understanding is that it is acceptable to only verify this once and then use the keys many times.

I recently had a chat with @cryptojedi and one proposal is a 9-function API consisting of the usual 5 functions (keygen, keygen_derand, encaps, encaps_derand, decaps) plus 4 extra functions for serialization and deserialization of the public and secret key.

It could something like this (but function names are to be discussed):

int crypto_kem_serialize_sk(uint8_t sks[MLKEM_SECRETKEYBYTES],
                            const mlkem_secret_key *sk);
int crypto_kem_deserialize_sk(mlkem_secret_key *sk,
                              const uint8_t sks[MLKEM_SECRETKEYBYTES]);

int crypto_kem_serialize_pk(uint8_t pks[MLKEM_PUBLICKEYBYTES],
                            const mlkem_public_key *pk);
int crypto_kem_deserialize_pk(mlkem_public_key *pk,
                              const uint8_t pks[MLKEM_PUBLICKEYBYTES]);

int crypto_kem_keypair_derand(mlkem_public_key *pk, mlkem_secret_key *sk,
                              const uint8_t *coins);
int crypto_kem_keypair(mlkem_public_key *pk, mlkem_secret_key *sk);


int crypto_kem_enc_derand(uint8_t ct[MLKEM_CIPHERTEXTBYTES],
                          uint8_t ss[MLKEM_SSBYTES], const mlkem_public_key *pk,
                          const uint8_t *coins);
int crypto_kem_enc(uint8_t ct[MLKEM_CIPHERTEXTBYTES], uint8_t ss[MLKEM_SSBYTES],
                   const mlkem_public_key *pk);
int crypto_kem_dec(uint8_t ss[MLKEM_SSBYTES],
                   const uint8_t ct[MLKEM_CIPHERTEXTBYTES],
                   const mlkem_secret_key *sk);

The mlkem_public_key and mlkem_secret_key would then be opaque to the user such that it is clear that they have to be produced by the deserialization functions (and the inputs are hence checked). The serialized secret key then consists of only 64-bytes (the coins that are used in keygen_derand) as specificially allowed by FIPS203.

If such an API were to be adapted the input validation would only have to happen once in the deserialization and one can be sure that the keys passed to encaps/decaps have been validated.
This API has the additional benefit, that one can pre-compute certain values during deserialization, speeding up encaps/decaps. This could, for example, include the pre-expansion of the matrix A, or pre-computing part of the polynomial multiplication (like here).

I have implemented a draft of the above API here, but I do not want to change it until we have reached consensus on a API - we should also coordinate this with liboqs.

There are some major downsides to this changes:

  • If you do want to implement the original functions operating on byte arrays as wrappers, then performance will be much worse - especially for decapsulation. This would be bad, for example, for SUPERCOP benchmarks.
  • Everyone has gotten very used to the 3 part API so this is going to be confusing and lots of places will have to change (liboqs, PQClean, pqm4, for example).

It may make sense to sent this to the pqc-forum after we reached consensus to ask for feedback from NIST and the community.

@mkannwischer
Copy link
Contributor

mkannwischer commented Nov 7, 2024

The new API above may be easier to sell as a performance improvement rather than a way to enforce the input validation. I'll try to get some numbers before the meeting tonight.

Beyond what was written above three things that need to be dicussed

  1. Do we really want to force seralization to a seed for the secret key? This is required to achieve MAL-BIND-K-PK (https://eprint.iacr.org/2024/523.pdf), but it costs a lot of performance. Maybe we do want to offer both (de)serialization using the seed and the traditional format. If the traditional format is deserialized then we need to check the H(pk) is correct. Note, however, that it's not possible to get the seed from the tradtional secret key format - so this would be introducing additional failure modes.

  2. Is there any value in having a separate serialization for the public key? It seems that you always want to serialize the pk after key gen. In fact, BoringSSL does not separate the serialization of the pk, but instead return the byte array directly. Before encaps, the deserialization is separate though.

  3. How do we go about the checks after key generation. For example, there is a pair-wise consistency check that checks Decaps(Encaps(pk, ss), sk) == ss. According to FIPS203 this check is optional, but according to FIPS 140-3 IG it is mandatory if you do want to get FIPS certification.

@cryptojedi
Copy link

To add a bit of context: This 7-function API (modulo naming of functions) is already in use by BoringSSL [1], by Go [2], and by Zig [3]. I had some discussions with Gorjan Alagic and Douglas Stebila in the hallway sessions of Crypto about how libraries are supposed to enforce input validation without having to do it all the time. My impression was that those discussions converged to this 7-function API . As a summary of the advantages and disadvantages, what I remember is the following:

Advantages of the 7-function API

  • Can actually enforce input validation, without having to do it for every encapsulation and decapsulation
  • Efficiency improves if the matrix is fully expanded in the internal representation of the secret key
  • Easily allow more tradeoffs between secret-key size and speed (potentially interesting for embedded devices)

Disadvantages

  • 7-function API is further away from the abstract cryptographic object of a KEM ("a KEM is a three-tuple of algorithms...)
  • 3-function API can easily be implemented on top, but performance will be much worse, in particular if the serialized secret key is just a 64-byte seed
  • More papers will be written about additional KEM properties with serialized or non-serialized keys and the subtle differences of those ;-)

[1] https://github.com/google/boringssl/blob/b7f5443cfc1298d77dfb9e6f2eea68035de521a4/include/openssl/experimental/kyber.h
[2] https://github.com/golang/go/blob/9177e12ccc2115e44de824ae7247ace88617c29a/src/crypto/internal/mlkem768/mlkem768.go#L98
[3] https://github.com/ziglang/zig/blob/cf87a1a7cf57123aad533a73d8e0fa4d4916a674/lib/std/crypto/ml_kem.zig#L419

@mkannwischer
Copy link
Contributor

mkannwischer commented Nov 7, 2024

I've done some benchmarking for our ML-KEM implementations. You can see the full results for all parameter sets for various AArch64 and x86_64 platforms here: https://github.com/pq-code-package/mlkem-c-aarch64/actions/runs/11720085631?pr=363

Let's, for example, consider Graviton 4 performance. The new API (with sk being a 64-byte seed) gives you the following results for ML-KEM-768:

INFO  > ML-KEM-768
INFO  > 
       keypair cycles = 31750
  serialize_pk cycles = 183
  serialize_sk cycles = 5
deserialize_pk cycles = 22246
        encaps cycles = 13677
deserialize_sk cycles = 31496
        decaps cycles = 23713

wheras the old API (w/ public key check and secret key check) was this:

INFO  > ./test/build/mlkem768/bin/bench_mlkem768
INFO  > ML-KEM-768
INFO  > 
   keypair cycles = 33139
    encaps cycles = 38645
    decaps cycles = 49292

deserialize_pk + encaps is essentially the same as encaps before. If you can re-use the pk, encaps is around 3x faster (I'm struggeling to think of a use case where this is the case though).

deserialize_sk + decaps is slower than decaps before (as you have to regenerate the key from the seed), but if you can avoid serializing the sk, decaps is 2x faster.

NB: If you need the pair-wise consistency (PCA) check in key generation, that going to be much faster with the new API. You basically go from 33139 + 38645 + 49292 to 31750 + serialization + 13677 + 23713. That should be around 2x faster for keygen w/ PCA.

@hanno-becker
Copy link
Contributor

hanno-becker commented Nov 7, 2024

@cryptojedi wrote: "Can actually enforce input validation, without having to do it for every encapsulation and decapsulation"

The common use of ML-KEM is ephemeral, so what is really saved here? At best, one would not do a check at decapsulation -- but it is not clear to me that this would actually be FIPS compliant, since the internal secret key, in this case, has not come from the deserialize() function, but directly out of the internal KeyGen.

"Efficiency improves if the matrix is fully expanded in the internal representation of the secret key"

Agree

"Easily allow more tradeoffs between secret-key size and speed (potentially interesting for embedded devices)"

Agree

"3-function API can easily be implemented on top, but performance will be much worse, in particular if the serialized secret key is just a 64-byte seed"

The performance is only worse if the serialized SK is the seed? If it's the standard SK format (which is what PQCP currently uses), then there is no overhead? The redundant serialize+deserialize step amounts to the present inefficiency of MLKEM requiring to re-expand the A-matrix, but not more.

"7-function API is further away from the abstract cryptographic object of a KEM ("a KEM is a three-tuple of algorithms...)"

Agree.

===

I support the change from a technical standpoint, so long as the default SK format stays is the standard one, not the seed one.

My main concern is with FIPS validation. I would like to have a precise public statement from NIST on the following points:

  • Whether splitting the API as proposed, and dropping redundant deserialize+serialize steps, is still FIPS203 compliant.
  • Whether or not an explicit SK re-validation has to be performed at the end of the new KeyGen, seeing that the SK does not go through Deserialize() in the most efficient flow. One may say that it's 'obvious' that the SK is valid, but there's things like the PCT after all, so I would like to have clarity on whether one has to suffix KeyGen with a re-validation, or not.
  • Whether it is compliant with the IG to implement the (mandatory!) PCT on the internal key types. This would greatly alleviate the overhead of PCT and be a strong argument for the new API. Or, whether such internal PCT would need to be complemented with a random check that serialize and deserialize are inverse to each other -- which would again defeat the point of the speedups, but be somewhat in the spirit of the current PCT.

@cryptojedi
Copy link

Regarding seed vs. standard: There is currently ongoing discussions in the IETF that seem to heavily push for the seed format.
Regarding performance gains:

  • the ephemeral use case saves a little bit (or potentially a lot, depending on serialized secret-key format) because you never need to serialize/deserialize
  • the static use case is not common yet, but ML-KEM is carefully design with IND-CCA2 security by default to enable exactly this use case. It will become more common once applications also work on post-quantum authentication.

@cryptojedi
Copy link

I very much agree on requesting more clear and public statements from NIST regarding input validation and FIPS certification!

@hanno-becker
Copy link
Contributor

"Regarding seed vs. standard: There is currently ongoing discussions in the IETF that seem to heavily push for the seed format."

@cryptojedi Could you share some pointers?

@cryptojedi
Copy link

@cryptojedi
Copy link

To me, using just a seed makes sense mostly in two scenarios:
1.) in embedded devices with very limited storage
2.) if a secret key is received (instead of just generated by yourself). In this second case, the recipient can locally ensure to have a properly generated keypair. This is not the case if secret keys are handed over in expanded format.

My impression is that the discussion about secret format gained traction due to all kind of binding notions for KEMs introduced in https://eprint.iacr.org/2023/1933 and the observation by Sophie Schmieg that with the expanded secret-key format, ML-KEM is not MAL-BIND-K-CT.
How much this matters in practice or if there are even applications that explicitly want to avoid certain binding properties (for example to achieve certain notions of deniability?), I don't know.

@mkannwischer
Copy link
Contributor

mkannwischer commented Nov 7, 2024

I think it makes sense two discuss switching the API and switching the serialized sk representation separately as they seem orthogonal.

To ease that I additionally implemented the new API with the traditional sk format here with benchmarks here.
Taking the same Graviton 4 examples from before:

INFO  > ML-KEM-768
INFO  > 
       keypair cycles = 31758
  serialize_pk cycles = 185
  serialize_sk cycles = 399
deserialize_pk cycles = 22234
        encaps cycles = 13663
deserialize_sk cycles = 21939
        decaps cycles = 23697

compare this to the current performance with the 3-function API (w/ sk and pk validation):

INFO  > ./test/build/mlkem768/bin/bench_mlkem768
INFO  > ML-KEM-768
INFO  > 
   keypair cycles = 33139
    encaps cycles = 38645
    decaps cycles = 49292

Unsurprisingly, we see that now the new deserialize_sk + decaps is roughly the same speed as the old decaps (which already included the secret key validation. This means we can add a 3-function API on top that has roughly the same speed as before.
With that is fairly clear that this change is advantageous. If we can get NIST to publicily state that this is okay, then hopefully we can agree that this is a good change and we should implement it across the PQCA.

Regarding the secret key seed, I do see that this can be advantageous in some scenarios. In most scenarios, however, it's adding no clear (?) benefit while potentially costing a lot of performance. So maybe we can leave this up to the PQCP sub-projects depending on their scenarios and consumers?
For https://github.com/pq-code-package/mlkem-c-aarch64, we'd very likely stick to the old format. If eventually a consumer wants the seed format, we'd probably add additional functions that also allow serializing as seed.

@mkannwischer
Copy link
Contributor

In the TSC meeting on 11/07 we seemed to agree that it is a good idea to ask NIST if the proposed 7-function API is compliant with FIPS203 and a valid way to enforce input validation. Most importantly we want to ask if it can pass FIPS validation. If they say it is fine, ideally we would implement it across all PQCP projects and beyond.
Hopefully this will also be discussed in the next OQS meeting.

@franziskuskiefer @mbbarbosa @tfaoliveira @dstebila @jschanck: It would be great to hear your opinion.

One open question was whether a function producing a public key from a secret key should be added to the API as well.

Swichting the secret key to a 64-bit seed was more controversial as it does cost performance in many scenarios. For this change it is also more clear that it is allowed as FIPS203 explicitly states it. I'd like to defer that discussion for now as it is somewhat independent to the API discussion and easier to change at a later point.
NB: Without the API change, you really do not want to switch to a seed (as otherwise you always have to do keygen as a part of decaps).

@mbbarbosa
Copy link

This approach makes sense to me.

@dstebila
Copy link
Contributor Author

dstebila commented Nov 8, 2024

In the TSC meeting on 11/07 we seemed to agree that it is a good idea to ask NIST if the proposed 7-function API is compliant with FIPS203 and a valid way to enforce input validation. Most importantly we want to ask if it can pass FIPS validation. If they say it is fine, ideally we would implement it across all PQCP projects and beyond. Hopefully this will also be discussed in the next OQS meeting.

Sounds like a good plan.

One open question was whether a function producing a public key from a secret key should be added to the API as well.

Swichting the secret key to a 64-bit seed was more controversial as it does cost performance in many scenarios. For this change it is also more clear that it is allowed as FIPS203 explicitly states it. I'd like to defer that discussion for now as it is somewhat independent to the API discussion and easier to change at a later point. NB: Without the API change, you really do not want to switch to a seed (as otherwise you always have to do keygen as a part of decaps).

How would the seed be different from the coins in crypto_kem_keypair_derand?

@cryptojedi
Copy link

@dstebila I think the 64-byte seed is indeed exactly the coins in crypto_kem_keypair_derand.

@mkannwischer
Copy link
Contributor

@dstebila - yes it is the coins from crypto_kem_keypair_derand, but the proposal here is to change the secret key representation to that, so keypair would be outputting the seed instead of the traditional secret key (or rather it would be outputting a data structure that has to contain the seed in addition to any other parts of the secret key as computing the traditional representation from the seed is easy, but the other way is impossible.

@bhess
Copy link

bhess commented Nov 11, 2024

In liboqs there is the proposal to support seed-only private keys as an additional algorithm variant for ML-KEM. If there are performance tradeoffs, I think it would be useful to support both seeds and expanded private keys in PQCP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: TSC Discussion
Development

No branches or pull requests

9 participants