Pachyderm – Automate data transformations with data versioning and lineage

Pachyderm is cost-effective at scale, enabling data engineering teams to automate complex pipelines with sophisticated data transformations across any type of data. Our unique approach provides parallelized processing of multi-stage, language-agnostic pipelines with data versioning and data lineage tracking. Pachyderm delivers the ultimate CI/CD engine for data.

Features

Data-driven pipelines automatically trigger based on detecting data changes.
Immutable data lineage with data versioning of any data type.
Autoscaling and parallel processing built on Kubernetes for resource orchestration.
Uses standard object stores for data storage with automatic deduplication.
Runs across all major cloud providers and on-premises installations.

Getting Started

To start deploying your end-to-end version-controlled data pipelines, run Pachyderm locally or you can also deploy on AWS/GCE/Azure in about 5 minutes.

You can also refer to our complete documentation to see tutorials, check out example projects, and learn about advanced features of Pachyderm.

If you'd like to see some examples and learn about core use cases for Pachyderm:

Documentation

Official Documentation

Community

Keep up to date and get Pachyderm support via:

Follow us on Twitter.
Join our community Slack Channel to get help from the Pachyderm team and other users.

Contributing

To get started, sign the Contributor License Agreement.

You should also check out our contributing guide.

Send us PRs, we would love to see what you do! You can also check our GH issues for things labeled "help-wanted" as a good place to start. We're sometimes bad about keeping that label up-to-date, so if you don't see any, just let us know.

Usage Metrics

Pachyderm automatically reports anonymized usage metrics. These metrics help us understand how people are using Pachyderm and make it better. They can be disabled by setting the env variable METRICS to false in the pachd container.

Name		Name	Last commit message	Last commit date
Latest commit History 22,067 Commits
.circleci		.circleci
.github		.github
.vscode		.vscode
dex-assets		dex-assets
etc		etc
examples		examples
goreleaser		goreleaser
jupyter-extension		jupyter-extension
label-studio		label-studio
licenses		licenses
python-sdk		python-sdk
src		src
.dockerignore		.dockerignore
.drone.yml		.drone.yml
.gitattributes		.gitattributes
.gitignore		.gitignore
.golangci.yml		.golangci.yml
.ignore		.ignore
.spelling		.spelling
CHANGELOG-1.x.md		CHANGELOG-1.x.md
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile.etcd		Dockerfile.etcd
Dockerfile.mount-server		Dockerfile.mount-server
Dockerfile.pachctl		Dockerfile.pachctl
Dockerfile.pachd		Dockerfile.pachd
Dockerfile.pachdoc		Dockerfile.pachdoc
Dockerfile.pachdoc.dockerignore		Dockerfile.pachdoc.dockerignore
Dockerfile.pgbouncer		Dockerfile.pgbouncer
Dockerfile.worker		Dockerfile.worker
LICENSE		LICENSE
Makefile		Makefile
Pachyderm_Icon-01.svg		Pachyderm_Icon-01.svg
README.md		README.md
go.mod		go.mod
go.sum		go.sum
mascot.txt		mascot.txt
pachyderm.go		pachyderm.go
proto-docs.json		proto-docs.json
proto-docs.md		proto-docs.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pachyderm – Automate data transformations with data versioning and lineage

Features

Getting Started

Documentation

Community

Contributing

Usage Metrics

About

Releases

Packages

Languages

License

dirtyhooligans/pachyderm

Folders and files

Latest commit

History

Repository files navigation

Pachyderm – Automate data transformations with data versioning and lineage

Features

Getting Started

Documentation

Community

Contributing

Usage Metrics

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages