Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: ZK certificate verification #672

Open
sinui0 opened this issue Dec 3, 2024 · 2 comments
Open

feat: ZK certificate verification #672

sinui0 opened this issue Dec 3, 2024 · 2 comments
Assignees
Milestone

Comments

@sinui0
Copy link
Member

sinui0 commented Dec 3, 2024

We want to support having the Prover prove the certificate chain to the Notary without revealing the server identity but instead obtaining a commitment to the Server's DNS name.

@sinui0 sinui0 added this to the Q1 Wishlist milestone Dec 3, 2024
@themighty1
Copy link
Member

Our goal is to prove in zk that a hash pre-image is a DNS name of the end-entity certificate which links to a trusted root certificate without revealing the identity of the end-entity.

Meta information like the lengths of the certificates in the chain or the lengths/offsets of certain fields in the certificates must not be revealed to the verifier, otherwise the leaked info could be used by a malicious verifier to establish the identity of the end-entity.

Solutions which are incomplete or not optimal:

Cert validation with Risc Zero (r0)

Automata already has the needed circuits for r0 here https://github.com/automata-network/tdx-attestation-sdk/blob/af497429fb82effcc3d0ec83659a8b9d2e20aebb/zk/risc0/methods/guest/src/main.rs#L17
(they verify the cert chain as part of the sgx quote verification).

It takes 90M r0 cpu cycles to prove the chain https://discord.com/channels/974410303255232542/1278660294868074496/1280132404714930227

(via DMs melynx confirmed that 90M equals 30 mins on an M1 Pro machine. Although melynx said that he has some optimizations in mind which would bring down the cost to 10 mins).

halo2-zkcert (https://github.com/zkCert/halo2-zkcert)

Their circuits don't parse the certificates and don't chain them. They prove a statement "I know a public key which verifies a signature over a certificate". But what we need is to additionally prove that the public key was parsed out from a parent certificate and that the certificates are well-formed.

Revealing fields at known offsets.

This approach will not work since x509 uses variable-length encoding of its fields: ASN.1 DER. See https://datatracker.ietf.org/doc/html/rfc5280#section-4.1

It is not possible to reliably predict at which offset in the certificate a certain field will be located.
To learn the offset of a field in the cert, one has to parse the bytes which encode the length of the field.

Parsing with QuickSilver (QS).

This approach will be too expensive both in terms of prover upload bandwidth and compute since the QS circuit has to pay the cost of multiplexing all branches.
This is a limitation of our parsing circuit design. There doesn't seem to be a way to design a parsing circuit and keep the cost of multiplexing low.

(See the circuits below)

A promising solution: parsing (and hashing) with LogRobin++ (LR)

(https://eprint.iacr.org/2024/1427)

LR will not incur the cost of branch multiplexing like QS does. In LR, the prover still has to bear the computation cost of evaluating each branch but the branches of our circuit contain a miniscule amount of AND gates. See the circuits below.

Circuits for zk x509 parsing.

Below will try to use the same identifiers and terminology as in
https://datatracker.ietf.org/doc/html/rfc5280#section-4.1

Below is the pseudocode for parsing a single field of x509. The prover's private input - the tbsCertificate will be padded to the size of 4000 bytes - this should be enough to hide the length of 99% of certificates used able on the internet.

Parsing. Step 1. Parsing the serial number.

The circuit below proves a statement "I know such tbsCertificate that when parsing is applied to it, the result is parsed_serial_number".

/// Arguments are the private inputs to the circuit:
///
/// `tbsCertificate` - TBSCertificate.
/// `parsed_serial_number` - `tbsCertificate` with the `version and `serialNumber` fields
///            parsed and drained.
fn parse_serial_number(tbsCertificate: [u8; 4000], parsed_serial_number: [u8; 4000]) {
    let c = tbsCertificate;
    // An offset in the certificate.
    let mut i = 0;

    // TBS cert length.
    let mut tbs_length = [0u8; 2];
    tbs_length.copy_from_slice(&c[i..i + 2]);
    assert!(u16::from_be_bytes(tbs_length) < 3999u16);
    i += 2;

    // Parse the version:

    // tag 0
    assert!(c[i] == 0);
    i += 1;
    // version length
    assert!(c[i] == 1);
    i += 1;
    // only v3 is supported
    assert!(c[i] == 2);
    i += 1;

    // Parse CertificateSerialNumber

    // ASN.1 type: INTEGER
    assert!(c[i] == 2);
    i += 1;

    // integer value length (let's assume it can never be < 5 or > 100)
    let len = c[i];
    i += 1;

    let mut expected_parsed = [0u8; 4000];

    for value in 5..100 {
        if value == len {
            expected_parsed.copy_from_slice(&c[i + value as usize..]);
        }
    }

    assert!(expected_parsed == parsed_serial_number);
}

Parsing. Step 2. Parsing the signature.

We can reuse the parsed_serial_number private input from the previous circuit to parse out the signature.
(The input re-use is possible due to IT-MAC which QS permits).

fn parse_signature(parsed_serial_number: [u8; 4000], parsed_signature: [u8; 4000]) {
    // Parse the signature in the same way we parsed the version and drain it.
    // Return the drained certificate zero-padded to 4000 bytes.
    unimplemented!()
}

// ... And so on for each field in TBSCertificate.

Parsing. Final Step.

After all parsing steps are taken, the final drained certificate will contain all zeroes
[0; 4000] - this input can be opened to the verifier to convince them that parsing
was done correctly.

Unrolling with QuickSilver is expensive.

Note that in order to unroll the for loop in parse_serial_number when using QuickSilver, we must to evaluate each branch and then multiplex the results from each branch into one output like this:

fn unroll_with_quicksilver(c: [u8; 4000]) {
    // assume the offset is 10
    let i = 10;
    // assume the parsed length of the field was 34
    let len = 34;

    // Expected value for each branch. A total of 95 branches.
    let mut expected_parsed = [[0u8; 4000]; 95];

    // Whether the branch is active. Will be set to `true` only for one branch.
    let mut is_active = [false; 95];

    if len == 5 {
        expected_parsed[0].copy_from_slice(&c[i + 5..]);
        is_active[0] = true;
    } else if len == 6 {
        expected_parsed[1].copy_from_slice(&c[i + 6..]);
        is_active[1] = true;
    }
    // ... and so on for each iteration value
    else if len == 100 {
        expected_parsed[94].copy_from_slice(&c[i + 100..]);
        is_active[94] = true;
    }

    // Multiplex all branches into a single output.
    let mut multiplexed = [0u8; 4000];
    for i in 0..95 {
        // pseudocode: multiply each bit in `expected_parsed` by the `is_active` bit.
        // If `is_active` if false, the `product` will be zero.
        // Each bit multiplication will cost 1 AND gate.
        let product = expected_parsed[i] * is_active;
        multiplexed += product;
    }
}

The cost of parsing with QuickSilver.

The total AND gate cost of the parse_serial_number circuit when using QS is:

  • A) assert! statements: 100K AND gates (approx)

  • B) value == len comparison for 8-bit values costs 16 AND gates. We have 95 comparisons
    for a total: 16*95= ~1.5K AND gates.

  • C) multiplexing the branches: 4000 * 8 * 95 = 3040K AND gates

Unrolling with LogRobin++ is cheap.

When using LogRobin++ instead of QuickSilver, the multiplexing cost C) is not present.

Additionally (this comment also applies to QS), we will combine all smaller parsing circuits into one large circuit, so that the final assert!(expected_parsed == witnessed_parsed) for each subcircuit will be folder into a single assert! at the end of the large circuit.

This will bring the A) cost to almost zero.

Hashing the parsed certificate with LogRobin++ in the naive way.

In order to verify the signature on the certificate, the cert needs to be hashed.

Naively, we could have a circuit where in each branch a set amount of bytes get hashed. This will require a circuit with ~4000 branches and within each branch there would be up to 4000/64 (assuming a sha256 64-byte block size) = ~63 hash evaluations.

In LogRobin++ the bandwidth cost is bounded by the most expensive branch. But the evaluation has to be done on ALL branches by the prover.
Assuming the cost of a sha256 circuit is 25K AND gates, the total cost of the naive circuit with 4000 branches would be 4000(branches)*32(sha256 evaluations per branch on average)*25K = 3000 million AND gates, which is not tenable and calls for an optimization.

Optimising the hashing with LogRobin++.

We need to create a circuit which minimizes the amount of hashing within the branches. Instead, the branch should only compute the data to be hashed.

Below is an example circuit which will bring the compute complexity to only 1.5 million AND gates.

Note that in order to hide the length of the pre-image, we need to hash all 4000 bytes iof the certificate, including the padding bytes. The trick is that the padding bytes will not count towards the final digest even though they will be hashed.

#[allow(clippy::too_many_arguments)]
pub fn hash_one_block(
    // The ceritificate data that needs to be hashed.
    cert: [u8; 4000],
    // How many more bytes of the cert are left to be hashed.
    // `len_remaining` is re-used from certificate parsing circuits.
    // It is guaranteed to have a correct value.
    len_remaining: u16,
    // The hash state from the previous circuit invocation. On first invocation it will
    // be set to `initial_sha256_state`.
    hash_state: [u32; 32],
    // args with the _out suffix can be considered as circuit "outputs" (that's how circom calls them)
    // yet they are in reality private inputs used to make assertions on the state transition.
    cert_out: [u8; 4000],
    len_remaining_out: u16,
    hash_state_out: [u32; 32],
    // This is public inputs:
    //
    // The initial sha256 state.
    initial_sha256_state: [u32; 32],
    // Total data hashed so far, including the padding bytes.
    total_hashed: u16,
) {
    if total_hashed == 0 {
        assert!(hash_state == initial_sha256_state);
    };
    let to_be_hashed = if len_remaining > 63 {
        // We have a full block
        let to_be_hashed: Vec<u8> = cert[0..64].to_vec();
        let mut padded = cert[64..].to_vec();
        padded.append(&mut vec![0u8; 64]);
        assert_eq!(cert_out.to_vec(), padded);
        assert!(len_remaining_out == len_remaining - 64);
        to_be_hashed
    } else {
        // We have the last partial block.
        //
        // This is a rough approximations of sha256 padding the last block and appending the block
        // length.
        let mut to_be_hashed = [0u8; 64];
        if len_remaining == 1 {
            to_be_hashed[0..1].copy_from_slice(&cert[0..1]);
            to_be_hashed[63] = 1;
        } else if len_remaining == 2 {
            to_be_hashed[0..2].copy_from_slice(&cert[0..2]);
            to_be_hashed[63] = 2;
        }

        // .... And so on for each len from 1 to 63
        assert!(cert_out == [0u8; 4000]);
        assert!(len_remaining_out == 0);
        to_be_hashed.to_vec()
    };

    // Permutes the state, returning a new state.
    fn hash(data: Vec<u8>, state: [u32; 32]) -> [u32; 32] {
        unimplemented!()
    }

    let new_state = hash(to_be_hashed, hash_state);

    if len_remaining == 0 {
        // we were hashing the padding bytes, don't change the state
        assert!(hash_state == hash_state_out);
    } else {
        // we were hashing the actual cert, change the state
        assert!(hash_state_out == new_state);
    }
}

This circuit can be run 63 times, each time increasing total_hashed value and re-using the private inputs from the previous invocation.
In reality, we will combine the 63 invocation into one large circuit so that the asserts! will only need to be enforced on a final invocation.
The compute cost will be dominated by the hashing: 63*25K = 1575K AND gates.

@themighty1
Copy link
Member

This paper https://eprint.iacr.org/2024/2010 has useful circuits and protocols which can be borrowed for ECDSA sig verification in zk using QuickSilver.
Seems like the idea in the section "Input consistency for circuits over different Fields" can be used with QuickSilver to first sha256-hash the cert in a binary circuit and then use the digest in an arithmetic circuit to prove the signature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants