Skip to content

updated publisher spec #147

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 11 commits into
base: main
Choose a base branch
from
Draft

updated publisher spec #147

wants to merge 11 commits into from

Conversation

0x73746F66
Copy link

@0x73746F66 0x73746F66 commented May 10, 2025

TLDR

The TEA Publisher OpenAPI specification has inconsistencies with the consumer API
and documentation, causing potential integration issues and confusion for implementers in the upcoming Beta 1 hackathon on May 28th.


  • Extended parameter documentation to be more complete
  • We now have schema examples
  • Version number updated to 0.0.3 to match consumer API

This PR is focussed on the Publisher API spec, that was mostly unchanged since my last contrinution. I noticed it needed significant updates to align with the latest architecture documentation and all the updated that had been given to the consumer API spec (how would consumers consume things that weren't able to be published? This had to be fixed!)

Incompatibilities with consumer spec

  1. Field Naming Inconsistencies:

Consumer: Uses url in artifact format
Publisher: Uses artifact_url in artifact format

  1. Parameter Structure:

Consumer: Uses id-type/id-value for product queries
Publisher: Uses direct filter parameters

  1. Missing Consumer Endpoints:
  • /release/{uuid}/collection for latest collection
  • /release/{uuid}/collections for all collections
  • /artifact/{uuid} for artifact metadata
  1. Schema Differences:

Consumer requires versions field in component schema
Publisher's component doesn't match this requirement

Changes

  1. Terminology Standardization

Before:

  • Used leaf endpoints and schemas in publisher API
  • Inconsistent with consumer API's component/release model

After:

  • Replaced with proper /component and /release endpoints
  • Aligned object model with documentation

Why: Consistent terminology across consumer/publisher APIs prevents integration errors.

  1. TEI URN Integration

Before:

  • Limited TEI support in product endpoints
  • No validation for TEI URN format

After:

  • Added tei_urns array to product schema
  • Implemented URI pattern validation: ^urn:tei:[a-zA-Z0-9]+:[a-zA-Z0-9\\.-]+:.+$

Why: TEI URNs are the primary discovery mechanism according to documentation.

  1. Collection Update Mechanism

Before:

  • Collection update reasons used undocumented enum values
  • No clear versioning mechanism

After:

  • Added proper collection_update_reason schema with values from docs:
    • INITIAL_RELEASE
    • VEX_UPDATED
    • ARTIFACT_UPDATED
    • ARTIFACT_ADDED
    • ARTIFACT_REMOVED
  • Added /collection/{uuid}/{version} endpoint

Why: Matches documented collection versioning requirements.

  1. Artifact Structure

Before:

  • Used inconsistent artifact structure with objects
  • Single checksum value per artifact

After:

  • Aligned with consumer API using formats array
  • Multiple checksums with algorithm type support
  • Consistent properties with consumer API

Why: Allows full interoperability between consumer/publisher systems.

  1. Component/Release Relationship

Before:

  • Flat hierarchy with limited relationships between objects

After:

  • Clear product → component → release → collection hierarchy
  • Proper reference arrays between object types

Why: Matches documented model and enables proper artifact organization.

Signed-off-by: Chris Langton <[email protected]>
- Add authentication schemes
- Create discovery endpoints (/.well-known/tea/{id})
- Standardise property naming between consumer/publisher APIs
- Implement advanced filtering for components and artifacts
- Add lifecycle status endpoints

Signed-off-by: Chris Langton <[email protected]>
@0x73746F66 0x73746F66 marked this pull request as ready for review May 10, 2025 13:16
@0x73746F66 0x73746F66 requested review from oej and madpah as code owners May 10, 2025 13:16
@ppkarwasz
Copy link
Contributor

@0x73746F66,

Thank you for updating the Publisher spec! 💯 It was certainly way of out of date!

However, I notice that the proposed spec has also a lot of subtle differences (schema names, parameter names, schema contents) from the current Consume API. Should we mark this PR as draft and work on it after the hackaton?

@0x73746F66 0x73746F66 marked this pull request as draft May 21, 2025 10:43
Copy link
Contributor

@taleodor taleodor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the Pull Request! Did first read through for the document, there may be more things to discuss later, but raised issues in the comments that require further work and / or discussion.

post:
description: Create TEA Product entry for the supplied product identifier
operationId: createTeaProduct
requestBody:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to above, most of those properties are not currently part of the Consumer API spec. Additionally, it would be helpful to create a shared Schema object as we did throughout the Consumer API.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, refer to PURL in the consumer API

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, don't see how this is relevant to PURL at all.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're going to have to be clear then, because of you're referring to the markdown spec and find these then perhaps the consumer spec should be updated after beta1 to align like this has

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my previous comment, those are listed in markdown as possible identifiers, we shouldn't have a separate field for each type and instead expand TeaIdentifier. We may also want to be a little stricter about identifier types - it's possible that markdown needs to be updated here as well.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The consumer API returns the constructed PURL as a dynamic response property

To be able to do that, the publisher needs to gather data elements and store them to construct the PURL

A string that is a full PURL can be described by th publisher as a loaded optional string, where some values are duplicated every API call

Or an API spec that implements the intent of returning a partial PURL could do so wth partial PURL support by not including some of the additional elements I added

I believe reduction of duplicates, and addition of fields available in a PURL, is the best way to implement the spec.
If we do anything lese it is either adding duplication, and it is putting the responcibility onto publishers to know how to construct a PURL, and being optional this will probabl be skipped...

If we want to do ourselves a favour later when we implement the PURL support for searching, we should be forward thinking now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The consumer API returns the constructed PURL as a dynamic response property

A single entity may have several PURLs - that is the state of the PURL spec at the moment, I specifically brought this question on the last PURL community meeting. Having consistent and unique PURLs is a noble goal, but I don't think that is achievable.

As a basic example, imagine a Docker image that also has code hosted on GitHub. You may have PURL pkg:github/... and pkg:docker/... for the same thing where both are valid. That is also part of the reason why separate TEI is needed.

Therefore, I believe we should actually leave it to publishers to define their identifiers.

Finally, PURL is very domain (type) specific and things like sku and barcode are not even basic PURL elements, so not sure why they are here in this context.

$ref: '#/components/responses/404-object-by-id-not-found'
tags:
- TEA Component
/component/{componentIdentifier}:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest using same name for {uuid} across the spec and refer to the Schema uuid definition as in the Consumer API.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that makes sense is we merge them and the meaning has no distinctions from publisher and consumer. i.e. a publisher does not have the primary key yet (until the database record is inserted) so it cannot be the same entity as the resultant response entity in the consumer API spec.

It is common for request models to be distinct from response models in all API designs - this is why I deveated deliberately

$ref: '#/components/operations/standardDelete'
tags:
- TEA Release
/collection:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should discuss whether explicit CRUD on Collections should be allowed, see #152

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of the referenced issue has a mixture of obviously good ideas but I believe that dogma might be winning over critical thinking a little here. Without question why these views are better than allowing publishers the flexibilities is problematic, we automatically impose an arbitrary constraint. Perhaps constraints should be sparingly applied to a stnadard like TEA

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me this is actually purely implementation prospective. I think we should add calls related to adding artifacts, then it becomes more clear that having both CRUD on collections and on artifacts conflicts with each other logically. For now, since artifact CRUD is not there it may be less visible.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CRUD is a por description for this spec, given R for read is not apparent and therefore chaining requests that rely on prior call results with new unknown UUIDs can be a logical constraint which only a read can resolve..

Better to reduce chaining requests for creation, and offer incremental limited updater methods expected to be called at much later dates, dependant on having a timely GET request for freshness

Would you agree a single POST with everything is both concise, organised, and performant? With consideration any complications that can be addressed with a GET + PATCH chain should be back ported to the POST to maintain these desirable characteristics and avoid dependency on complicated chaining logic?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me, but right now we have collection type update enum, which has things like Vex Updated, Artifact Updated, Artifact Removed, Artifact Added. If you do batch requests where you both add and update artifacts, this concept becomes murky.

Another issue with batch requests is that it eventually would lead to conversation about collection own lifecycle - which to me is more bureaucracy without much gain. I prefer a system where I upload an artifact and it becomes visible right away, instead of a system where I need to first upload an artifact (or a batch) and then approve a new collection on top of that. This is a point to discuss though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not only "approve" - it's in many cases "sing the collection" too. It's like a commit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have a conversation on this.

My current implementation idea is that signing should happen automatically on the backend - we can add constraints that artifacts should be signed, etc.

In SaaS scenarios, we can easily have dozens of releases per day for a single component and there can be many components. Each of those releases can have multiple collection versions. I cannot imagine a human signing all of them.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keyless signing (Cosign signature) is definitely a superior choice, however we'll need to support signing with Tuff, CMK, and PGP

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cosign as a service is not a superior choice for a lot of reasons. The software is cool though. Signing is discussed elsewhere and we need to open that discussion.

Signed-off-by: Chris Langton <[email protected]>
@0x73746F66
Copy link
Author

Awesome feedback @taleodor ty. I have some cnversations resolved and others with clarifications t odiscuss

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants