Skip to content

Commit

Permalink
[Infoplat-1563] Validate protobuf schema evolution compatibility usin…
Browse files Browse the repository at this point in the history
…g schema registry (#824)

* [WIP] add schema registry based validator and associated readme

* add beholder validator and tests to validate a schema using the beholder config

* WIP add beholder ci validate command to the github actions

* Update the ci beholder validator actions and add changeset

* update the beholder validator ci action

* update the readme as per review comments

* correct the default branch for the github actions while determining the changed files

* add the working directory to github compse action to take compose file from

* update the docker compose step in actions for beholder validator

* update the registries for the docker images to be pulled in

* add the repository to use aws account id for the login action to ecr

* add the login ecr id to pull from docker repositories

* correct the latest image to use the input image tag

* add origin so checkout can work

* fix the issue with comma after git checkout

* correct the full path to docker registry to run the docker image

* correct the path to repo for beholder schemas according to the docker file

* fetch depth of 0 so it can fetch all the repos

* try replacing the depth with actual number instead of string

* fetch all remote branches so that the git checkout succeeds

* fetch the origin branch and then use the same for git diff

* add the repo path to the changed files so it can pick changed schemas from repo path

* correct the path variable

* fix the issue with changed files not being picked

* remove the trailing character to fix the EOF errors

* fix the issue with schema registry url missing actual host name

* update the validate schema in current branch step to only run if the branch is not default branch

* fix issues with the actions not recognizing 2 or more changed files

* refactor the actions for schema beholder validate

* update the readme after the tests are concluded

* remove docker registry in favor of only using aws docker registry

* add default branch as environment variable so that they cannot be used for injection of code

* add back deleted files

* add image tag as env variable to avoid code injection attacks and remove contributing section

* update the actions path with the env variable to avoid code injection
  • Loading branch information
anukin authored Feb 4, 2025
1 parent 6f4520c commit f70c304
Show file tree
Hide file tree
Showing 7 changed files with 499 additions and 67 deletions.
5 changes: 5 additions & 0 deletions .changeset/tough-bobcats-clean.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"ci-beholder-validator": major
---

- Added a beholder schema validator to include actions relating to beholder
281 changes: 281 additions & 0 deletions actions/ci-beholder-validator/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,281 @@
# Schema Registry Validator Action

Easily validate your schema changes for backward compatibility using a **local
Redpanda** instance within a GitHub Actions workflow. The **Schema Registry
Validator** action ensures that you catch breaking schema changes before merging
pull requests.

---

## Table of Contents

1. [Overview](#overview)
2. [How It Works](#how-it-works)
3. [Requirements & Prerequisites](#requirements--prerequisites)
4. [Usage](#usage)
5. [Example: End-to-End Integration](#example-end-to-end-integration)
6. [Configuration](#configuration)

- [Required Inputs](#required-inputs)
- [Optional AWS Inputs](#optional-aws-inputs)

7. [Schema Configuration (`beholder.yaml`)](#schema-configuration-beholderyaml)
8. [Troubleshooting](#troubleshooting)

---

## Overview

When you update .proto or .avsc schema files, you risk introducing incompatible
changes that can break your applications. This action:

1. Starts a local Redpanda registry for schema validation
2. Pulls a validator image from AWS ECR (or other registries)
3. Detects changed schema files in your pull request
4. Checks schema evolution compatibility against your default branch
5. Reports any failures directly in GitHub Actions logs

6. This helps maintain stable contracts across services that rely on these
schemas, ensuring changes remain backward-compatible.

---

## How It Works

1. **Checkout & Detect Changes**

- The action clones your repo, fetches the default branch, and detects which
schema files changed in the pull request.

2. **Spin Up Redpanda**

- A Docker Compose file starts a local Redpanda service (with schema registry)
inside your GitHub Actions runner.

3. **Pull Schema Validator**

- The action retrieves the schema-validator Docker image from your specified
registry (e.g., AWS ECR).

4. **Validate “Master” (Default) Branch**

- Checks all schemas on your default branch to ensure they’re valid.

5. **Validate PR Branch (Changed Files)**

- Only the files that changed in this PR are validated. If a breaking change is
introduced, you’ll see the failure logs in your PR’s workflow run.

6. **Report Success or Failure**

- The action logs details about which files passed or failed schema
compatibility checks. A failing run will block merging until resolved.

---

## Requirements & Prerequisites

- **GitHub repository** with Actions enabled
- **Docker** available in the GitHub Actions runner
- A `beholder.yaml` file describing your schemas (see
[Schema Configuration](#schema-configuration-beholderyaml))
- If you’re pulling from AWS ECR:
- Valid AWS credentials or IAM role
- [aws-actions/configure-aws-credentials](https://github.com/aws-actions/configure-aws-credentials)
(used within the workflow)

---

## Usage

Add this action to your workflow like in example shown below. An example repo
can be found
[here](https://github.com/smartcontractkit/schema_validator_example)

```yaml
name: schema-validator-example
on:
push:
branches:
- main
- test
pull_request:
paths:
- ".github/workflows/schema_validator_example.yaml"
- "**.proto"
- "**.avsc"
- "**/beholder.yaml"
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
validate-schema:
runs-on: ubuntu-latest
permissions:
id-token: write # Add this line to enable OIDC token for the job
contents: read # This is required for actions/checkout
steps:
- name: Validate
uses: smartcontractkit/.github/actions/ci-beholder-validator@{{sha-of-action}}
with:
role-session-name: schema-validator-example
aws-role-arn: ${{ secrets.aws-role-arn }}
aws-region: us-west-2
aws-account-number: ${{ secrets.AWS_ACCOUNT_ID }}
image-tag: "1f3c06f003948fe07df0f40287f217f9d9aa778c"
```
Note: the `{{sha-of-action}}` should be replaced with the sha of the action you
want to use.

A detailed README and the source code for the docker image can be found
[here](https://github.com/smartcontractkit/atlas/tree/master/beholder/schema_validator).

---

## Example: End-to-End Integration

Below is a **complete example** of how you might integrate this action into your
repository. It assumes:

- Your default branch is named `main`.
- You have a schema file at `schemas/pet.proto`.
- You have a `beholder.yaml` referencing `./schemas/pet.proto`.

### Step 1: Prepare `beholder.yaml`

Create a `beholder.yaml` in the root of your repo indicating where the schemas
are located in relative to the root of the repo.: Note: the schema field
**must** indicate the path to the schema file relative to the root of the repo.

```yaml
beholder:
domain: my_app
schemas:
- entity: Pet
schema: "./schemas/pet.proto"
```

### Step 2: Create .github/workflows/schema_validation.yml

```yaml
name: schema-validator-example
on:
push:
branches:
- main
- test
pull_request:
paths:
- ".github/workflows/schema_validator_example.yaml"
- "**.proto"
- "**.avsc"
- "**/beholder.yaml"
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
validate-schema:
runs-on: ubuntu-latest
permissions:
id-token: write # Add this line to enable OIDC token for the job
contents: read # This is required for actions/checkout
steps:
- name: Validate
uses: smartcontractkit/.github/actions/ci-beholder-validator@{{sha-of-action}}
with:
role-session-name: schema-validator-example
aws-role-arn: ${{ secrets.aws-role-arn }}
aws-region: us-west-2
aws-account-number: ${{ secrets.AWS_ACCOUNT_ID }}
image-tag: "1f3c06f003948fe07df0f40287f217f9d9aa778c"
```

If you are creating a new Repo, you will need additional setup like setting IAM
roles and permissions. A sample PR can be found
[here](https://github.com/smartcontractkit/infra/pull/6974/files) Once the above
PR is run, you will have access to the `{{secrets.aws-role-arn}}`.
`aws-account-number` is used to fetch the image from the ECR. Hence this is same
as the aws account id of production account. The secrets can be inserted in the
repo secrets by following guide
[here](https://smartcontract-it.atlassian.net/wiki/spaces/RE/pages/906985607/GitHub+Repo+Configuration#GitHub-Secrets)

### Step 3: Open a Pull Request

When you push changes to schemas/pet.proto in a feature branch and open a PR,
GitHub Actions will automatically:

1. Spin up Redpanda for testing
2. Pull and run the schema-validator container
3. Compare changes in pet.proto to what’s in the main branch
4. Fail if there’s a backward-incompatible change
5. If validation passes, you’ll see a success status in your PR checks.

---

## Configuration

### Required Inputs

- docker-registry: Registry to pull the validator image from (currently 'aws').

### Optional AWS Inputs

If docker-registry: 'aws', you can supply:

- aws-region: AWS region hosting ECR (e.g., us-west-2).
- aws-role-arn: IAM Role ARN to assume for ECR.
- aws-role-duration-seconds: Session duration (default: 900).
- aws-account-number: AWS account ID (required if your ECR is in a non-default
account).

---

## Schema Configuration (`beholder.yaml`)

The `beholder.yaml` file describes your schemas and their locations. Here’s an
example:

```yaml
beholder:
domain: your_domain
schemas:
- entity: UserEvent
schema: "./schemas/user_event.proto"
- entity: OnRampEvent
schema: "./schemas/on_ramp_event.avsc"
```

- domain: Logical namespace for your schemas, e.g., payment_service.
- schemas: List of entities, each with a friendly entity name and schema path.
- Paths such as schema: "./schemas/user_event.proto" must match your actual
folder structure.

---

## Troubleshooting

1. AWS Authentication Failed
- Check if aws-role-arn is correct.
- Ensure the role has ECR pull permissions.
- Verify aws-region matches your ECR location.
2. Schema Validation Failed
- Make sure your changes are backward-compatible.
- Verify schema file paths in beholder.yaml match the actual files.
- Check logs for parse or registry errors.
3. Docker Issues
- The runner must have Docker installed (e.g., ubuntu-latest includes it).
- Check network connectivity to ECR or your registry.
- If you see “pull access denied,” confirm you’re logged in or have correct
permissions.
4. File Not Detected
- Verify your PR modifies .proto, .avsc, or a path included under
on.pull_request.paths.
- Ensure beholder.yaml is in the correct location.
5. Workflow Fails to Start
- Confirm your workflow is enabled in the Actions tab.
- Check for syntax errors in your YAML file.
- Ensure the workflow is in the correct directory.

---
Loading

0 comments on commit f70c304

Please sign in to comment.