add factory docs

Signed-off-by: Nok <[email protected]>
kedro-org · Nov 8, 2024 · 52840e2 · 52840e2
1 parent 6e3e4d1
commit 52840e2
Show file tree

Hide file tree

Showing 2 changed files with 42 additions and 1 deletion.
diff --git a/RELEASE.md b/RELEASE.md
@@ -14,6 +14,7 @@
 ## Documentation changes
 * Updated CLI autocompletion docs with new Click syntax.
 * Standardised `.parquet` suffix in docs and tests.
+* Added example to explains how dataset factories work.
 
 ## Community contributions
 * [Hyewon Choi](https://github.com/hyew0nChoi)

diff --git a/docs/source/data/kedro_dataset_factories.md b/docs/source/data/kedro_dataset_factories.md
@@ -1,7 +1,47 @@
 # Kedro dataset factories
 You can load multiple datasets with similar configuration using dataset factories, introduced in Kedro `0.18.12`.
 
-The syntax allows you to generalise your configuration and reduce the number of similar catalog entries by matching datasets used in your project's pipelines to dataset factory patterns.
+The dataset factories introduce a syntax that allows you to generalise your configuration and reduce the number of similar catalog entries by matching datasets used in your project's pipelines to dataset factory patterns.
+
+For example:
+```yaml
+factory_data:
+  type: pandas.CSVDataset
+  filepath: data/01_raw/factory_data.csv
+```
+
+With dataset factory, it can be re-written as:
+```yaml
+{placeholder}_data:
+  type: pandas.CSVDataset
+  filepath: data/01_raw/{placeholder}.csv
+```
+
+In runtime, the pattern will be matched against the nodes.
+```
+            ...
+            node(
+                func=process_factory,
+                inputs="factory_data",
+                outputs=None,
+            ),
+            ...
+```
+It is similar to **regular expression** and reverse `f-string`. In this case, the name of dataset `factory_data` matches the pattern `{placeholder}_data` with the `_data` suffix, so it resolves `placeholder` to `factory`.
+
+Similarly, if you update the name of the inputs:
+```diff
+-                inputs="factory_data",
++                inputs="transaction_data",
+```
+
+It will be resolved as:
+```yaml
+transaction_data:
+  type: pandas.CSVDataset
+  filepath: data/01_raw/transaction_data.csv
+```
+
 
 ```{warning}
 Datasets are not included in the core Kedro package from Kedro version **`0.19.0`**. Import them from the [`kedro-datasets`](https://github.com/kedro-org/kedro-plugins/tree/main/kedro-datasets) package instead.