This example shows how you can create a Pachyderm pipeline to automatically version and save data you've labeled in Superb.ai to use in downstream machine learning workflows.
The integration connects to your SuperbAI project, ingests the data into Pachyderm on a cron schedule.
Once your data is ingested into Pachyderm, you can perform data tests, train a model, or any other type of data automation you may want to do, all while having full end-to-end reproducibility.
You will need an account for each of the tools. Free accounts can now be used to run this example!
- Superb.AI account
- Setup a Pachyderm Hub Cluster
- Generate an Access API Key in SuperbAI.
- Put the key and your user name in the
secrets.json
file. - Create the Pachyderm secret
pachctl create secret -f secrets.json
- Create the cron pipeline to synchronize your
Sample project
from SuperbAI to Pachyderm. This pipeline will run every minute to check for new data (you can configure it to run more or less often in the cron spec insample_project.yml
).
pachctl create pipeline -f sample_project.yml