Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RVPS Improvements Brainstorm #637

Open
fitzthum opened this issue Dec 18, 2024 · 7 comments
Open

RVPS Improvements Brainstorm #637

fitzthum opened this issue Dec 18, 2024 · 7 comments

Comments

@fitzthum
Copy link
Member

fitzthum commented Dec 18, 2024

We've discussed changing the RVPS in #238 and #350 and #407 (cc @thomas-fossati, @Xynnn007, and @deeglaze), but we've yet to really figure out what we should do. I have a few concrete ideas.

First a little overview of what we have. There are three important parts of the RVPS.

  1. The interface to the attestation service where we use reference values to evaluate the AS policy. This API is pretty simple. It basically looks like this

    pub async fn get_digests(&self) -> Result<HashMap<String, Vec<String>>>
    

    The RVPS takes no input and returns a map of ids to vectors of reference values (usually hashes). We're assuming that for each id there could be multiple valid values.

  2. The internal representation of reference values.

    pub struct ReferenceValue {
        #[serde(default = "default_version")]
        pub version: String,
        pub name: String,
        #[serde(deserialize_with = "primitive_date_time_from_str")]
        pub expiration: DateTime<Utc>,
        #[serde(rename = "hash-value")]
        pub hash_value: Vec<HashValuePair>,
    } 
    

    Note that the internal representation is assuming that the reference values are hashes. This might not be correct. A reference value could be a version number or the policy could even calculate hashes internally. The HashValuePair struct has info about the hashing algorithm that we never pass to the client. Maybe we don't need it?

  3. Extractors produce the reference value struct above after verifying some input. Currently we only have a sample extractor and an in-toto extractor which uses go bindings. We also have a pre-processor which currently doesn't seem to do much and has a confusingly-named Ware interface.

Improvements

  • Add evidence groups/context I'm planning to add an abstraction that allows us to group reference values. In some ways this will be simple. We can change the get_digests API to take a group name. The AS policy won't be aware of the groups, it will continue to operate the same way using the reference value map, but the AS will be able to switch the contents of the map based on some context. The tricky question is how the AS should decide what group of values to use. We can implement this as a second step based on a local config, an init-data field, or something else.
  • Move rvps cli tool into kbs-client We currently have a separate tool for sending values to the RVPS. I think we should combine this with our kbs-client or a trustee cli tool that we might develop. My impression is that the easiest way to use the RVPS is to use the JSON storage backend and simply modify the file directly, skipping extraction and per-processing altogether. We need to make it easier to provision reference values, especially if we want people to write meaningful policies.
  • Fixup interfaces described above Maybe we should change the internal representation and clarify or get rid of the pre-processing step.
  • Add more extractors Self-explanatory

Adopting specifications

Personally I'm not very interested in adopting specifications just for the sake of it or in the name of hypothetical interoperability. Of course, there is value to using standards, but we should make sure our work in this area helps us move more quickly towards a usable and insightful project rather than bogging us down.

Probably the simplest way for us to interact with standards is in the extractor, allowing us to take various bundles of reference values and convert them to our internal representation. Not every spec is a good fit for the extractor, tho. For instance my impression of CoRIM is that its scope exceeds our current extractor interface. Something like RIM might be a better fit. My understanding here is still a bit fuzzy. The point is that unless we want to do a somewhat significant rework, we should be looking for specs that simply bundle reference values.

The RVPS is currently pretty simple and I think we should try to keep it that way or make it even simpler. Fundamentally, the job of the RVPS isn't very complicated. LMK if you have any ideas about this stuff. I will probably start working on some of the simpler stuff in January.

@deeglaze
Copy link

I'm not familiar enough with confidential containers to make a recommendation here since it is such a constrained application. My context has been in bringing up an attestation verification service that can support multiple types of evidence to verify different environments based on a body of endorsements, a data model for expressing what applied to the evidence, and a policy for processing that data model into attestation results.

In my understanding of the confidential containers RVPS, it's more like the SPIRE server where workloads are directly registered, and that's the only place they're registered. If it's there, it's permitted. There's very little in terms of supporting multiple clients, where your attestation service can be operated by a different party.

In my head, I've been thinking more along the lines of constructing an endorsement ecosystem that can be used by attestation services. An RVPS then provides a view of what's an appropriate knowledge base, but can also carry more ephemeral operator-provided endorsements the way there's an ingestion API already for the RVPS. You'll have attested ephemeral state of operating nodes that serve as reference values for other services, and you'll have more persistent reference values such as authenticity of some digest. You'll also have some endorsements in the middle, which carry "security posture" endorsements that result from periodic analysis reports that match software component analyses with CVE databases.

I've proposed this as a talk to the OC3 conference, so I hope to make this clearer in a recorded setting with visuals.

I don't know what an "extractor" is in your context. To me, I wrote an "extractor" to parse out the RIM details from the PCClient event log, since the SP800-155 event was designed for that purpose https://github.com/google/gce-tcb-verifier/blob/main/extract/extract.go

I'm planning to add an abstraction that allows us to group reference values.

Do you mean for this to segment trust domains or product lines? To use the generic framework of CoRIM, your attester itself gets to name its measured environment, so you can use the environment-map concept to segregate product lines and their endorsements. If your attester is more generic that that, then part of the evidence itself can be a self-identification: what does the node believe itself to be? You can use this to narrow the lookup query for reference values.

Self-identification would name the exact collection of endorsements that are appropriate to check. This is the concept of a CoBOM, a bill of material. You'd need to bake this into an unmeasured disk volume and use it only as a "hint", since policy ultimately decides which reference values are acceptable.

@fitzthum
Copy link
Member Author

I'm not familiar enough with confidential containers to make a recommendation here since it is such a constrained application. My context has been in bringing up an attestation verification service that can support multiple types of evidence to verify different environments based on a body of endorsements, a data model for expressing what applied to the evidence, and a policy for processing that data model into attestation results.

Trustee is also intended for use cases outside of CoCo; anything involving confidential attestation.

In my understanding of the confidential containers RVPS, it's more like the SPIRE server where workloads are directly registered, and that's the only place they're registered. If it's there, it's permitted. There's very little in terms of supporting multiple clients, where your attestation service can be operated by a different party.

One feature of the extractors is that they can check the signatures of reference value bundles, with the idea being that some reference values will be received from a third party. The entire RVPS could also be run remotely by someone else although we only support one and it's not widely tested. So there is some sense of a broader ecosystem at this point, but it's pretty simple. We also have an expiration field in our internal reference value representation.

Note that reference values registered in the RVPS may or may not map directly to a specific workload.

Do you mean for this to segment trust domains or product lines?

Either way. At the RVPS level the grouping mechanism would be totally generic. The AS has a couple of different options on how to use the feature. I haven't totally figured out what I prefer here, but one option would be to allow the group to be selected via init-data. Init-data is the coco spec for stuff that gets put into fields like hostdata by the host. The host/orchestrator could map groups to specific workloads, broader workload types, hardware platforms, or anything else. Ultimately the group id would be reported in the attestation token and the KBS policy would check that it is as expected in the context of resource requests.

@thomas-fossati
Copy link
Contributor

thomas-fossati commented Dec 19, 2024

On the “Adopting specifications” topic, two considerations (likely to change in the future, but valid on 2024/12/19):

  1. The CoRIM spec has not yet fully stabilised, and
  2. There is no OSS implementation available in Rust.

Given those, at this point, focusing on getting the interfaces right seems like the right investment.
If you modularise the ingest (is this what you call extractors?) and have a sensible internal representation and APIs, then adding CoRIM, CycloneDX, in-toto, or any other suitable format - including proprietary ones - should amount to adding a new ingest plugin and a synthesizer for format-specific IDs.

On the ID topic, one suggestion that I'd like to provide is to look at CoRIM’s environment-map (i.e., roughly, non-empty<{ ? instance-id , ? class-id }>) which has a sensible shape, at least for the use cases I have seen. You can define a serialisation for it (e.g., URI-based, detCBOR, or other) and your pattern-matching rules.

On the “group” topic, I have a clarifying question: by group you mean a collection of claims that belong to the same environment (i.e., the "semantic squashing" topic I raised back in the days? Or something else?

Just in case you haven't seen it, a while ago I assembled a few ideas on how to evolve RVPS. Caveat: Those ideas may be entirely broken or obsolete :-)

@fitzthum
Copy link
Member Author

On the “group” topic, I have a clarifying question: by group you mean a collection of claims that belong to the same environment (i.e., the "semantic squashing" topic I raised #238? Or something else?

Yes, this is meant to help address that concern. The grouping mechanism is meant to be very generic. It could be used to express CoRIM stuff like a target environment or a manifest, but it could also be other divisions. I wonder if we should have two levels of groups. That might bring things a little closer to CoRIM.

Speaking of shape, one place we seem to differ from the shape you mention is that the RVPS returns HashMap<String, Vec<String>>. Basically regardless of any grouping mechanism, if there are any duplicate keys in the group, their values will be put into a list together. This comes from the fact that the policy can't support duplicate keys in the reference values and that we only have one set of reference values per policy. Probably this is fine although it might make it slightly harder/impossible to express multiple specific combinations of reference values.

@fitzthum
Copy link
Member Author

fitzthum commented Dec 19, 2024

Actually I wonder if we should introduce a reference uri similar to the resource uri. That would basically represent two groups and a name. We should also consider how to reflect this in the attestation token.

@thomas-fossati
Copy link
Contributor

thomas-fossati commented Dec 20, 2024

Actually I wonder if we should introduce a reference uri similar to the resource uri. That would basically represent two groups and a name.

Can you make a concrete example?

We should also consider how to reflect this in the attestation token.

Warning: Brainstorm material 😄

I think about this in terms of a couple of ID "synthesiser" interfaces. One on the RVPS side, at ingest, and another on the AS side, at verification. The former is computed on the reference value contents and returns the ID for storage, the latter is computed on evidence claims and returns the ID for lookup.

In Golang-ish terms, an interface:

type IDSynthesizer interface {
  FromRefvalue(v RefValue) ID
  FromEvidence(v Evidence) ID
}

that each "attestation scheme" (e.g., TDX, CCA, SEV-SNP) must implement. E.g.,

func (o CCA) FromEvidence(v Evidence) ID {
  return "cca:" + b64(v.ImplementationID) + "/" + b64(v.InstanceID)
}

func (o CCA) FromRefvalue(v RefValue) ID {
  return "cca:" + b64(v.EnvMap.ClassID) + "/" + b64(v.EnvMap.InstanceID)
}

@fitzthum
Copy link
Member Author

fitzthum commented Dec 20, 2024

Can you make a concrete example?

I am picturing a policy with something like this.

executables := 3 if {
	input.snp.launch_measurement == reference_value(rvps:///snp/launch_measurement/measurement1)
}

Or the URI could be rvps:///snp/manifest1/launch_measurement. In general I want this hierarchy to be user-defined and I don't want to assume there is any unique identification of guests. This URI might help the policy to be less vague about which reference values are being used. That said, it could be a big mistake to bake the whole thing into the policy itself. Maybe the first two groups, or one of them, should be set outside the policy so that they can be changed without changing the policy. There's an interesting tension here between having the policy be explicit about the reference values and having the policy be flexible and not need to be updated.

Regorus, the crate we use for doing OPA in Rust, allows you to register extension functions in Rust that can be called within the policy. I am thinking about using this to provide reference values for reasons that I will describe shortly.

I think about this in terms of a couple of ID "synthesiser" interfaces. One on the RVPS side, at ingest, and another on the AS side, at verification.

I like the idea of having some mechanism to keep track of what reference values were used to evaluate the policy. What comes to my mind is to have the RVPS generate a report of the reference values that were requested (perhaps as JWT) and have that stored in an extension in the EAR Appraisal.

By using the Regorus extension reference_value function shown above, we could keep track of exactly which values are used when evaluating a policy (even if the policy also mentions other values for other platforms that are not executed). I'm not totally sure what would go in this report. One option would be to just put the reference values themselves there, but this seems kind of clunky especially since we have all the tcb evidence in the attestation token already. Instead maybe we would just store the RVPS UUID of each reference value. Someone who has received an attestation token could in theory take the report and ask the RVPS to validate it. The RVPS could then call up the reference values and say whether they have been revoked or whatever else.

Anyway I think reporting the reference values isn't super high-priority since they are trusted at the time of verification, but it would be good to have some story about how to do this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants