Replies: 1 comment 4 replies
-
So I think you're on the right track - I'd recommend that you don't manipulate your catalog at runtime. You can do this via a hook, but it's funky. Could this be achieved with a runtime parameter that dynamically filters your data? You could do something like this: |
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
just starting to pick up on Kedro, super excited after first test-pipelines. Very easy to fit my existing code inte nodes!
But i have searched for a design-pattern that i could use as template for my use-case, but not found any good.
I have a usecase where data is stored in partions (paths) based on year/month/day/hour, and every hour holds several files.
I would like to build a dataset that includes a selection of multiple years/months/days/hours.
Where the "selection" can be different for every run of the pipeline (think: include data from date/time - to date/time).
A use-case that must exist elsewere, i think...but i have not found a good reference for a kedro implementation on how to create a dataset like this.
After reading a bit about PartitionedDataSet, my idea is to create a node that will generate a list of paths/files (based on datetime input, mayby as run params and hooks) and use this as file_path argument for a PartitionedDataSet.
But i guess this is a "normal" usecase, so there might be some existing solutions to this usecase that you know about.
Any pointers is accepted with gratefullness!
Otherwise i "just" need to figure out if i should update the catalog from a node...and test if i should update an existing dataset in the catalog with the file_path argument, or create a new dataset in the catalog every time i run the pipeline.
There is so many options :-)
Beta Was this translation helpful? Give feedback.
All reactions