Kedro supports a number of datasets out of the box, but you can also add support for any proprietary data format or filesystem in your pipeline.
You can find further information about how to add support for custom datasets in specific documentation covering advanced usage.
One of the easiest ways to contribute back to Kedro is to share a custom dataset. Kedro has a kedro.extras.datasets
sub-package where you can add a new custom dataset implementation to share it with others. You can find out more in the Kedro contribution guide on Github.
To contribute your custom dataset:
- Add your dataset package to
kedro/extras/datasets/
.
For example, in our ImageDataSet
example, the directory structure should be:
kedro/extras/datasets/image
├── __init__.py
└── image_dataset.py
-
If the dataset is complex, create a
README.md
file to explain how it works and document its API. -
The dataset should be accompanied by full test coverage in
tests/extras/datasets
. -
Make a pull request against the
main
branch of Kedro's Github repository.