Skip to content

Commit

Permalink
Merge branch 'release/0.6.1' into main
Browse files Browse the repository at this point in the history
  • Loading branch information
htahir1 committed Feb 7, 2022
2 parents 8d689a1 + d9857d1 commit 608b6cd
Show file tree
Hide file tree
Showing 146 changed files with 3,022 additions and 1,096 deletions.
4 changes: 1 addition & 3 deletions .github/codecov.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,7 @@ coverage:
default:
threshold: 1% # allow coverage to drop by 1% before failing the PR

patch:
default:
threshold: 1% # allow coverage to drop by 1% before failing the PR
patch: off

# do not run coverage on changes
changes: off
1 change: 1 addition & 0 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,7 @@ jobs:
python -m poetry run zenml integration install mlflow -f
python -m poetry run zenml integration install gcp -f
python -m poetry run zenml integration install kubeflow -f
python -m poetry run zenml integration install azure -f
python -m poetry run pip install click~=8.0.3
- name: Lint
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/pull_request.yml
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,7 @@ jobs:
python -m poetry run zenml integration install mlflow -f
python -m poetry run zenml integration install gcp -f
python -m poetry run zenml integration install kubeflow -f
python -m poetry run zenml integration install azure -f
python -m poetry run pip install click~=8.0.3
- name: Lint
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/update_todos.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ jobs:
JIRA_USERNAME: [email protected]
JIRA_API_TOKEN: ${{ secrets.JIRA_ACCESS_TOKEN }}
JIRA_BOARD_ID: 10004
JIRA_ISSUE_TYPE_ID: 10017
JIRA_ISSUE_TYPE_ID: 10027
JIRA_DONE_STATUS_CATEGORY_ID: 3
JIRA_ISSUE_LABEL: todo_comment
JIRA_REMOVED_TODO_LABEL: todo_removed
Expand Down
6 changes: 3 additions & 3 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -68,8 +68,9 @@ instance/
# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/
# mkdocs documentation
docs/mkdocs/api_docs
docs/mkdocs/index.md

# PyBuilder
target/
Expand Down Expand Up @@ -171,7 +172,6 @@ local_test/
!cloudbuild.yaml
!cloudbuild-develop.yaml
docs/book/_build/
docs/mkdocs/api_docs(/
zenml_examples/

# GitHub Folder YAML files not to be ignored
Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@

At its core, **ZenML pipelines execute ML-specific workflows** from sourcing data to splitting, preprocessing, training, all the way to the evaluation of results and even serving. There are many built-in batteries to support common ML development tasks. ZenML is not here to replace the great tools that solve these individual problems. Rather, it **integrates natively with popular ML tooling** and gives standard abstraction to write your workflows.

🎉 **Version 0.6.0 out now!** [Check out the release notes here](https://github.com/zenml-io/zenml/releases).
🎉 **Version 0.6.1 out now!** [Check out the release notes here](https://github.com/zenml-io/zenml/releases).

[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/zenml)](https://pypi.org/project/zenml/)
[![PyPI Status](https://pepy.tech/badge/zenml)](https://pepy.tech/project/zenml)
Expand Down Expand Up @@ -52,7 +52,7 @@ ZenML pipelines are designed to be written early on the development lifecycle. D
| 🧘‍♀️ **[ZenML 101]** | New to ZenML? Here's everything you need to know! |
| ⚛️ **[Core Concepts]** | Some key terms and concepts we use. |
| 🗃 **[Functional API Guide]** | Build production ML pipelines with simple functions. |
| 🚀 **[New in v0.6.0]** | New features, bug fixes. |
| 🚀 **[New in v0.6.1]** | New features, bug fixes. |
| 🗳 **[Vote for Features]** | Pick what we work on next! |
| 📓 **[Docs]** | Full documentation for creating your own ZenML pipelines. |
| 📒 **[API Reference]** | The detailed reference for ZenML's API. |
Expand All @@ -67,7 +67,7 @@ ZenML pipelines are designed to be written early on the development lifecycle. D
[ZenML 101]: https://docs.zenml.io/
[Core Concepts]: https://docs.zenml.io/core-concepts
[Functional API Guide]: https://docs.zenml.io/v/docs/guides/functional-api
[New in v0.6.0]: https://github.com/zenml-io/zenml/releases
[New in v0.6.1]: https://github.com/zenml-io/zenml/releases
[Vote for Features]: https://zenml.io/discussion
[Docs]: https://docs.zenml.io/
[API Reference]: https://apidocs.zenml.io/
Expand Down
42 changes: 42 additions & 0 deletions RELEASE_NOTES.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,45 @@
# 0.6.1

ZenML 0.6.1 is out and it's all about the cloud ☁️! We have improved AWS integration and a brand-new [Azure](https://github.com/zenml-io/zenml/tree/0.6.1/src/zenml/integrations/azure) integration! Run your pipelines on AWS and Azure now and let us know how it went on our [Slack](https://zenml.io/slack-invite).

Smaller changes that you'll notice include much-awaited updates and fixes, including the first iterations of scheduling pipelines and tracking more reproducibility-relevant data in the metadata store.

For a detailed look at what's changed, see below.

## Whats changed

* Add MVP for scheduling by @htahir1 in https://github.com/zenml-io/zenml/pull/354
* Add S3 artifact store and filesystem by @schustmi in https://github.com/zenml-io/zenml/pull/359
* Update 0.6.0 release notes by @alex-zenml in https://github.com/zenml-io/zenml/pull/362
* Fix cuda-dev base container image by @stefannica in https://github.com/zenml-io/zenml/pull/361
* Mark ZenML as typed package by @schustmi in https://github.com/zenml-io/zenml/pull/360
* Improve error message if ZenML repo is missing inside kubeflow container entrypoint by @schustmi in https://github.com/zenml-io/zenml/pull/363
* Spell whylogs and WhyLabs correctly in our docs by @stefannica in https://github.com/zenml-io/zenml/pull/369
* Feature/add readme for mkdocs by @AlexejPenner in https://github.com/zenml-io/zenml/pull/372
* Cleaning up the assets pushed by gitbook automatically by @bcdurak in https://github.com/zenml-io/zenml/pull/371
* Turn codecov off for patch updates by @htahir1 in https://github.com/zenml-io/zenml/pull/376
* Minor changes and fixes by @schustmi in https://github.com/zenml-io/zenml/pull/365
* Only include python files when building local docs by @schustmi in https://github.com/zenml-io/zenml/pull/377
* Prevent access to repo during step execution by @schustmi in https://github.com/zenml-io/zenml/pull/370
* Removed duplicated Section within docs by @AlexejPenner in https://github.com/zenml-io/zenml/pull/379
* Fixing the materializer registry to spot sub-classes of defined types by @bcdurak in https://github.com/zenml-io/zenml/pull/368
* Computing hash of step and materializer works in notebooks by @htahir1 in https://github.com/zenml-io/zenml/pull/375
* Sort requirements to improve docker build caching by @schustmi in https://github.com/zenml-io/zenml/pull/383
* Make sure the s3 artifact store is registered when the integration is activated by @schustmi in https://github.com/zenml-io/zenml/pull/382
* Make MLflow integration work with kubeflow and scheduled pipelines by @stefannica in https://github.com/zenml-io/zenml/pull/374
* Reset _has_been_called to False ahead of pipeline.connect by @AlexejPenner in https://github.com/zenml-io/zenml/pull/385
* Fix local airflow example by @schustmi in https://github.com/zenml-io/zenml/pull/366
* Improve and extend base materializer error messages by @schustmi in https://github.com/zenml-io/zenml/pull/380
* Windows CI issue by @schustmi in https://github.com/zenml-io/zenml/pull/389
* Add the ability to attach custom properties to the Metadata Store by @bcdurak in https://github.com/zenml-io/zenml/pull/355
* Handle case when return values do not match output by @AlexejPenner in https://github.com/zenml-io/zenml/pull/386
* Quickstart code in docs fixed by @AlexejPenner in https://github.com/zenml-io/zenml/pull/387
* Fix mlflow tracking example by @stefannica in https://github.com/zenml-io/zenml/pull/393
* Implement azure artifact store and fileio plugin by @schustmi in https://github.com/zenml-io/zenml/pull/388
* Create todo issues with separate issue type by @schustmi in https://github.com/zenml-io/zenml/pull/394
* Log that steps are cached while running pipeline by @alex-zenml in https://github.com/zenml-io/zenml/pull/381
* Schedule added to context for all orchestrators by @AlexejPenner in https://github.com/zenml-io/zenml/pull/391

# 0.6.0

ZenML 0.6.0 is out now. We've made some big changes under the hood, but our biggest public-facing addition is our new integration to support all your data logging needs: [`whylogs`](https://github.com/whylabs/whylogs). Our core architecture was [thoroughly reworked](https://github.com/zenml-io/zenml/pull/305) and is now in a much better place to support our ongoing development needs.
Expand Down
12 changes: 9 additions & 3 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,9 @@ The documentation source files can be found in this repository at `docs/book`

## API Docs

The ZenML API docs are generated from our python docstrings using [Sphinx](https://www.sphinx-doc.org/en/master/). The API docs will be automatically updated each release using a Github workflow and can be found at [https://apidocs.zenml.io](https://apidocs.zenml.io/).
The ZenML API docs are generated from our python docstrings using [mkdocs](https://www.mkdocs.org/).
The API docs will be automatically updated each release using a Github workflow and can be found
at[https://apidocs.zenml.io](https://apidocs.zenml.io/).

### Building the API Docs locally

Expand All @@ -24,9 +26,13 @@ poetry install
poetry run zenml integration install -f
poetry run pip install click~=8.0.3 typing-extensions~=3.10.0.2
```
* Run `poetry run bash scripts/generate-docs.sh` from the repository root
* Run `poetry run bash scripts/serve_api_docs.sh` from the repository root -
running it from elsewhere can lead to unexpected errors. This script will compose the docs hierarchy
and serve it (default location is http://127.0.0.1:8000/).
* In case port 8000 is taken you can also manually go into the docs folder within your terminal and
run `mkdocs serve` from there

The generated HTML files will be inside the directory `docs/sphinx_docs/_build/html`
The generated .md files will be inside the directory `docs/mkdocs/`

## Contributors

Expand Down
Binary file removed docs/book/.gitbook/assets/architecture.png
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file removed docs/book/.gitbook/assets/compare.png
Binary file not shown.
Binary file removed docs/book/.gitbook/assets/core_concepts_zenml.png
Binary file not shown.
Binary file removed docs/book/.gitbook/assets/cover_image.png
Binary file not shown.
Binary file removed docs/book/.gitbook/assets/localstack (1).png
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file removed docs/book/.gitbook/assets/localstack.png
Binary file not shown.
Binary file removed docs/book/.gitbook/assets/monet.png
Binary file not shown.
Binary file removed docs/book/.gitbook/assets/sam_frustrated.jpg
Binary file not shown.
Binary file removed docs/book/.gitbook/assets/sam_zen_mode (1).jpg
Binary file not shown.
Binary file removed docs/book/.gitbook/assets/sam_zen_mode (2).jpg
Binary file not shown.
Binary file removed docs/book/.gitbook/assets/sam_zen_mode (3).jpg
Binary file not shown.
Binary file removed docs/book/.gitbook/assets/sam_zen_mode.jpg
Binary file not shown.
Binary file not shown.
Binary file removed docs/book/.gitbook/assets/tensorboard_inline.png
Binary file not shown.
Binary file removed docs/book/.gitbook/assets/zenml-deck-q2-21-3-.png
Binary file not shown.
6 changes: 4 additions & 2 deletions docs/book/features/integrations.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,11 +23,13 @@ These are the third-party integrations that ZenML currently supports:
| ----------- | ------ | ---- | -------------------- | ------- |
| Apache Airflow || Orchestrator | Works for local environment | [airflow_local](https://github.com/zenml-io/zenml/tree/main/examples/airflow_local) |
| Apache Beam || Distributed Processing | | |
| AWS || Cloud | Use S3 buckets as ZenML artifact stores | |
| Azure || Cloud | Use Azure Blob Storage buckets as ZenML artifact stores | |
| BentoML || Cloud | Looking for community implementors | |
| Dash || Visualizer | For Pipeline and PipelineRun visualization objects. | [lineage](https://github.com/zenml-io/zenml/tree/main/examples/lineage) |
| Evidently || Monitoring | Allows for visualization of drift as well as export of a `Profile` object | [drift_detection](https://github.com/zenml-io/zenml/tree/release/0.5.7/examples/drift_detection) |
| Facets || Visualizer | | [statistics](https://github.com/zenml-io/zenml/tree/main/examples/statistics) |
| GCP || Cloud | | |
| GCP || Cloud | Use Google Cloud Storage buckets as ZenML artifact stores | |
| Graphviz || Visualizer | For Pipeline and PipelineRun visualization objects. | [dag_visualizer](https://github.com/zenml-io/zenml/tree/main/examples/dag_visualizer) |
| Great Expectations || Data Validation | Looking for community implementors | |
| KServe || Inference | Looking for community implementors | |
Expand All @@ -42,7 +44,7 @@ These are the third-party integrations that ZenML currently supports:
| scikit-learn || Training | | [caching chapter](https://docs.zenml.io/v/docs/guides/functional-api/caching) |
| Seldon || Cloud | Looking for community implementors | |
| Tensorflow || Training | | [quickstart](https://github.com/zenml-io/zenml/tree/main/examples/quickstart) |
| Whylogs || Logging | Integration fully implemented for data logging | [whylogs](https://github.com/zenml-io/zenml/tree/main/examples/whylogs) |
| whylogs || Logging | Integration fully implemented for data logging | [whylogs](https://github.com/zenml-io/zenml/tree/main/examples/whylogs) |

✅ means the integration is already implemented.
⛏ means we are looking to implement the integration soon.
Expand Down
28 changes: 0 additions & 28 deletions docs/book/features/step-fixtures.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,34 +40,6 @@ class MyStep(BaseStep):

Please note in both examples above that the name of the parameter can be anything, but the type hint is what is important.

## Using the `StepContext`

`StepContext` provides additional context inside a step function. It is used to access materializers and artifact URIs inside a step function.

You do not need to create a `StepContext` object yourself and pass it when creating the step, as long as you specify
it in the signature. ZenML will create the `StepContext` and automatically pass it when executing your step.

Note: When using a `StepContext` inside a step, ZenML disables caching for this step by default as the context provides
access to external resources which might influence the result of your step execution.
To enable caching anyway, explicitly enable it in the `@step` decorator or when initializing your custom step class.

Within a step, there are many things for which you can use the `StepContext` object. For example:

```python
@enable_INTEGRATION # can be `enable_whylogs`, `enable_mlflow` etc.
@step
def my_step(
context: StepContext,
):
context.get_output_materializer() # Returns a materializer for a given step output.
context.get_output_artifact_uri() # Returns the URI for a given step output.
context.metadata_store # Access to the [Metadata Store](https://apidocs.zenml.io/latest/api_docs/metadata_stores/)
context.INTEGRATION # Access to an integration, e.g. `context.whylogs`
```

For more information, check the [API reference](https://apidocs.zenml.io/latest/api_docs/steps/)

>>>>>>> 9633c5509ebaaa6abde3bfcfdf0b8ca4b471d3bd
## Using the `BaseStepConfig`

`BaseStepConfig` instances can be passed when creating a step.
Expand Down
12 changes: 6 additions & 6 deletions docs/book/guides/common-usecases/custom-materializer.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,8 @@ data to and from the artifact stores lives in the materializers.
class BaseMaterializer(metaclass=BaseMaterializerMeta):
"""Base Materializer to realize artifact data."""

ASSOCIATED_ARTIFACT_TYPES = []
ASSOCIATED_TYPES = []
ASSOCIATED_ARTIFACT_TYPES = ()
ASSOCIATED_TYPES = ()

def __init__(self, artifact: "BaseArtifact"):
"""Initializes a materializer with the given artifact."""
Expand Down Expand Up @@ -71,8 +71,8 @@ In order to control more precisely how data flowing between steps is treated, on
```python
class MyCustomMaterializer(BaseMaterializer):
"""Define my own materialization logic"""
ASSOCIATED_ARTIFACT_TYPES = [...]
ASSOCIATED_TYPES = [...]
ASSOCIATED_ARTIFACT_TYPES = (...)
ASSOCIATED_TYPES = (...)


def handle_input(self, data_type: Type[Any]) -> Any:
Expand Down Expand Up @@ -135,8 +135,8 @@ from zenml.io import fileio
from zenml.materializers.base_materializer import BaseMaterializer

class MyMaterializer(BaseMaterializer):
ASSOCIATED_TYPES = [MyObj]
ASSOCIATED_ARTIFACT_TYPES = [DataArtifact]
ASSOCIATED_TYPES = (MyObj, )
ASSOCIATED_ARTIFACT_TYPES = (DataArtifact, )

def handle_input(self, data_type: Type[MyObj]) -> MyObj:
"""Read from artifact store"""
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -104,4 +104,4 @@ FacetStatisticsVisualizer().visualize(output)

It produces the following visualization:

![Statistics for boston housing dataset](../../.gitbook/assets/statistics\_boston\_housing.png)
![Statistics for boston housing dataset](../../assets/statistics-boston-housing.png)
2 changes: 1 addition & 1 deletion docs/book/guides/common-usecases/visualizers.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ WhylogsVisualizer().visualize(whylogs_outputs)

It produces the following visualization:

![WhyLogs visualization](../../assets/whylogs/whylogs-visualizer.png)
![whylogs visualization](../../assets/whylogs/whylogs-visualizer.png)

### Drift with [`evidently`](https://github.com/evidentlyai/evidently)

Expand Down
2 changes: 1 addition & 1 deletion docs/book/guides/functional-api/materialize-artifacts.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ class Floats(Base):
class SQLALchemyMaterializerForSQLite(BaseMaterializer):
"""Read/Write float to sqlalchemy table."""

ASSOCIATED_TYPES = [float]
ASSOCIATED_TYPES = (float, )

def __init__(self, artifact):
super().__init__(artifact)
Expand Down
25 changes: 14 additions & 11 deletions docs/book/introduction/quickstart-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ or view it on [GitHub](https://github.com/zenml-io/zenml/tree/main/examples/quic

## Install and initialize

```python
```shell
# Install the dependencies for the quickstart
pip install zenml tensorflow
```
Expand All @@ -26,18 +26,23 @@ HuggingFace, PyTorch Lightning etc.
Once the installation is completed, you can go ahead and create your first ZenML repository for your project. As
ZenML repositories are built on top of Git repositories, you can create yours in a desired empty directory through:

```python
```shell
# Initialize ZenML
zenml init
```

Now, the setup is completed. For the next steps, just make sure that you are executing the code within your
ZenML repository.

## Define ZenML Steps
## Run your first pipeline

In the code that follows, you can see that we are defining the various steps of our pipeline. Each step is
decorated with `@step`, the main low-level abstraction that is currently available for creating pipeline steps.
decorated with `@step`. The pipeline in turn is decorated with the `@pipeline` decorator.

{% hint style="success" %}
Note that type hints are used for inputs and outputs of each step. The routing of step outputs
to step inputs is handled within the pipeline definition.
{% endhint %}

![Quickstart steps](../assets/quickstart-diagram.png)

Expand Down Expand Up @@ -76,7 +81,6 @@ def trainer(

model.fit(X_train, y_train)

# write model
return model


Expand All @@ -85,10 +89,10 @@ def evaluator(
X_test: np.ndarray,
y_test: np.ndarray,
model: tf.keras.Model,
) -> float:
) -> Output(loss=float, acc=float):
"""Calculate the accuracy on the test set"""
test_acc = model.evaluate(X_test, y_test, verbose=2)
return test_acc
loss, acc = model.evaluate(X_test, y_test, verbose=1)
return loss, acc


@pipeline
Expand Down Expand Up @@ -123,13 +127,12 @@ If you had a hiccup or you have some suggestions/questions regarding our framewo

## Wait, how is this useful?

The above code looks like its yet another standard pipeline framework that added to your work, but there is a lot
The above code looks like it is yet another standard pipeline framework that added to your work, but there is a lot
going on under the hood that is mighty helpful:

- All data is versioned and tracked as it flows through the steps.
- All parameters and return values are tracked by a central metadata store that you can later query.
- Individual step outputs are now cached, so you can swap out the trainer for other implementations and iterate fast.
- Code is versioned with `git`.

With just a little more work, one can:

Expand All @@ -141,7 +144,7 @@ training loops with automatic deployments.

Best of all: We let you and your infra/ops team decide what the underlying tools are to achieve all this.

Keep reading to learn how all of the above can be achieved.
Keep reading to learn how all the above can be achieved.

## Next Steps?

Expand Down
Loading

0 comments on commit 608b6cd

Please sign in to comment.