Custom datasets

Kedro supports a number of datasets out of the box, but you can also add support for any proprietary data format or filesystem in your pipeline.

You can find further information about how to add support for custom datasets in specific documentation covering advanced usage.

Contributing a custom dataset implementation

One of the easiest ways to contribute back to Kedro is to share a custom dataset. Kedro has a kedro.extras.datasets sub-package where you can add a new custom dataset implementation to share it with others. You can find out more in the Kedro contribution guide on Github.

To contribute your custom dataset:

Add your dataset package to kedro/extras/datasets/.

For example, in our ImageDataSet example, the directory structure should be:

kedro/extras/datasets/image
├── __init__.py
└── image_dataset.py

If the dataset is complex, create a README.md file to explain how it works and document its API.
The dataset should be accompanied by full test coverage in tests/extras/datasets.
Make a pull request against the main branch of Kedro's Github repository.

Go to the next page

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!