This is a Kedro project, which was generated using kedro 0.19.2
.
Take a look at the Kedro documentation to get started.
The file in conf/base/catalog.yml
contains configuration that can be adjusted.
Running first steps in the pipeline will populate the data
folder accordingly. Due to git-lfs
issues, no
data is uploaded. Data embeddings with the actual laws are uploaded to huggingface for easier access. Repo id
ftopal/german-law-dataset
Embeddings computation takes some time so feel free to execute this in Colab with GPU environment.
Notebook for this and how to execute it is shown in notebooks/Compute Embeddings.ipynb
.
One can execute kedro run
to run everything end to end, or via kedro run --from-nodes=...
to run certain functions.
The workflow itself is defined in the src/law_buddy/pipelines/pipeline.py
In order to get the best out of the template:
- Don't remove any lines from the
.gitignore
file we provide - Make sure your results can be reproduced by following a data engineering convention
- Don't commit data to your repository
- Don't commit any credentials or your local configuration to your repository. Keep all your credentials and local configuration in
conf/local/
You can install environment.yml
file to replicate the same environment. Additionally,
an environment_full.yml
is also provided which pins all the versions in the environment.
conda env create -f environment.yml
You can install dependencies in the environment.yml
file by passing them into pip.
You can run the project with:
kedro run
Have a look at the file src/tests/test_run.py
for instructions on how to write your tests. You can run your tests as follows:
pytest
You can configure the coverage threshold in your project's pyproject.toml
file under the [tool.coverage.report]
section.
Note: Using
kedro jupyter
orkedro ipython
to run your notebook provides these variables in scope:context
, 'session',catalog
, andpipelines
.
kedro jupyter notebook
kedro jupyter lab
And if you want to run an IPython session:
kedro ipython
To automatically strip out all output cell contents before committing to git
, you can use tools like nbstripout
.
For example, you can add a hook in .git/config
with nbstripout --install
.
This will run nbstripout
before anything is committed to git
.
Note: Your output cells will be retained locally.
Further information about building project documentation and packaging your project