Skip to content

Commit

Permalink
docs: DSS CLI and Notebook Creation (#115)
Browse files Browse the repository at this point in the history
* add `docs-lint` tox environment to run docs tests
* replace "MLFlow" with "MLflow" in docs
* Add docs for dss CLI, notebook creation
* Fix how-to/manage-dss section
  • Loading branch information
ca-scribner authored May 3, 2024
1 parent 273ed1c commit 5d79362
Show file tree
Hide file tree
Showing 20 changed files with 324 additions and 50 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ GPU and drivers, so the containerized environments can only focus on user-space
- Seamless GPU utilization
- Out-of-the box ML Environments with JupyterLab
- Easy data passing between local machine and containerized ML Environments
- [MLFlow](https://github.com/mlflow/mlflow) for lineage tracking
- [MLflow](https://github.com/mlflow/mlflow) for lineage tracking

## Requirements

Expand Down
24 changes: 20 additions & 4 deletions docs/.custom_wordlist.txt
Original file line number Diff line number Diff line change
@@ -1,10 +1,26 @@
DSS
JupyterLab
MLFlow
GPUs
runtime
Initialize
Jupyter
JupyterLab
MLflow
MicroK
microk
OCI
config
hostpath
initialize
initialized
initializing
io
jupyter
kubeconfig
kubeflownotebookswg
microk
microk8s.io
pvc
hostpath
reinitialise
runtime
scipy
snapcraft
toolkits
26 changes: 26 additions & 0 deletions docs/how-to/dss/dss-status.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
Get Status of DSS
=================

This guide explains how to check the status of your DSS environment.

Overview
--------

The `dss status` command provides a quick way to check the status of your DSS environment, including the status of MLflow and whether a GPU is detected in the environment.

Installing the DSS Snap
-----------------------

To see the status of DSS, run the following command:

.. code-block:: bash
dss status
If you have a DSS environment running and no GPU available, the expected output is:

.. code-block:: none
[INFO] MLflow deployment: Ready
[INFO] MLflow URL: http://10.152.183.68:5000
[INFO] GPU acceleration: Disabled
12 changes: 12 additions & 0 deletions docs/how-to/dss/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
Manage DSS
==========

Use these guides for detailed steps on installing and managing the Data Science Stack.

.. toctree::
:maxdepth: 1

install-dss-cli
initialize-dss
dss-status
purge-dss
58 changes: 58 additions & 0 deletions docs/how-to/dss/initialize-dss.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
Initialize DSS
==============

This guide explains how to initialize the DSS environment through the Data Science Stack (DSS) Command Line Interface (CLI).

Overview
--------

The `dss initialize` command provides a way to initialize the DSS environment. This command:

* stores credentials for the MicroK8s cluster
* allocates storage for all DSS Notebooks to share
* deploys an `MLflow <MLflow Docs_>`_ model registry

Prerequisites
-------------

Before initializing DSS, ensure you have the following:

- DSS CLI installed on your workstation.
- `MicroK8s`_ installed on your workstation.

Initializing the DSS Environment
--------------------------------

Initialize DSS through the `dss initialize` command, for example:

.. code-block:: shell
dss initialize --kubeconfig "$(microk8s config)"
where we provide the content of our MicroK8s cluster's kubeconfig using the `--kubeconfig` option.

.. note::
Don't forget the quotes around `$(microk8s config)` - without them, the content may be interpreted by your shell.

The expected output of the above command is:

.. code-block:: none
[INFO] Executing initialize command
[INFO] Storing provided kubeconfig to /home/user/.dss/config
[INFO] Waiting for deployment mlflow in namespace dss to be ready...
[INFO] Deployment mlflow in namespace dss is ready
[INFO] DSS initialized. To create your first notebook run the command:
dss create
Examples:
dss create my-notebook --image=pytorch
dss create my-notebook --image=kubeflownotebookswg/jupyter-scipy:v1.8.0
From this point, DSS is ready for you to :doc:`create your first notebook </how-to/jupyter-notebook/create-notebook>`.

Conclusion
----------

This guide explained how to initialize the DSS environment through the DSS CLI. You can now proceed to :doc:`create your first notebook </how-to/jupyter-notebook/create-notebook>`.
36 changes: 36 additions & 0 deletions docs/how-to/dss/install-dss-cli.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
Install DSS CLI
===============

This guide explains how to install the Data Science Stack (DSS) Command Line Interface (CLI).

Overview
--------

The DSS CLI is distributed as a snap accessible from the `snap store <dss snap store_>`_.

Prerequisites
-------------

Before proceeding, ensure that you have the following:

- A system with `snap`_ installed.

Installing the DSS Snap
-----------------------

To install the DSS snap, run the following command:

.. code-block:: bash
sudo snap install data-science-stack
Then you can run the DSS CLI by running the following command:

.. code-block:: bash
dss
Conclusion
----------

The Data Science Stack CLI has been successfully installed on your system. You can now start using the DSS CLI to manage your data science projects.
51 changes: 51 additions & 0 deletions docs/how-to/dss/purge-dss.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
Purge DSS
===========

This guide explains how to purge (remove) the Data Science Stack (DSS) environment from your MicroK8s cluster

Overview
--------

The `dss purge` command provides a way to remove everything deployed by DSS from your MicroK8s cluster. This includes all the DSS components, such MLflow and Jupyter Notebooks.

.. note::

This action removes the components of the DSS environment, but it does not remove the DSS CLI or your MicroK8s cluster. To remove those, `remove their snaps <https://snapcraft.io/docs/quickstart-tour>`_.

Prerequisites
-------------

This guide applies if you have the following:

- DSS initialized on your system.

Purging the DSS Environment
---------------------------

To purge all DSS components from your machine, do:

.. code-block:: bash
dss purge
This will remove:

* all Jupyter Notebooks
* the MLflow server
* any data stored within the DSS environment

.. caution::

This action is irreversible. All data stored within the DSS environment will be lost.

The expected output from the above command is:

.. code-block:: none
[INFO] Waiting for namespace dss to be deleted...
[INFO] Success: All DSS components and notebooks purged successfully from the Kubernetes cluster.
Conclusion
----------

All elements of the DSS environment have been purged from your MicroK8s cluster. You can now reinitialise DSS on your system if you wish to continue using it.
1 change: 1 addition & 0 deletions docs/how-to/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ managing and interacting with containerised ML Environments via the DSS.
.. toctree::
:maxdepth: 2

dss/index
jupyter-notebook/index
mlflow/index
list-commands
2 changes: 2 additions & 0 deletions docs/how-to/jupyter-notebook/access-ui.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _access_ui:

Access the Jupyter Notebooks UI
===============================

Expand Down
30 changes: 15 additions & 15 deletions docs/how-to/jupyter-notebook/connect-mlflow.rst
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
Connect from Notebook to MLFlow
Connect from Notebook to MLflow
===============================

This guide provides instructions on how to integrate MLFlow with your Jupyter Notebook in the Data Science Stack (DSS) environment for tracking experiments.
This guide provides instructions on how to integrate MLflow with your Jupyter Notebook in the Data Science Stack (DSS) environment for tracking experiments.

Overview
--------

MLFlow is a platform for managing the end-to-end machine learning life cycle. It includes tracking experiments, packaging code into reproducible runs, and sharing and deploying models. DSS environments are pre-configured to interact with an MLFlow server through the `MLFLOW_TRACKING_URI` environment variable set in each notebook.
MLflow is a platform for managing the end-to-end machine learning life cycle. It includes tracking experiments, packaging code into reproducible runs, and sharing and deploying models. DSS environments are pre-configured to interact with an MLflow server through the `MLFLOW_TRACKING_URI` environment variable set in each notebook.

Prerequisites
-------------
Expand All @@ -16,14 +16,14 @@ Before you begin, ensure the following:
- You have an active Jupyter Notebook in the DSS environment.
- You understand basic operations within a Jupyter Notebook.

Installing MLFlow
Installing MLflow
-----------------

To interact with MLFlow, the MLFlow Python library needs to be installed within your notebook environment. There are two ways to install the MLFlow library:
To interact with MLflow, the MLflow Python library needs to be installed within your notebook environment. There are two ways to install the MLflow library:

1. **Within a Notebook Cell** (Recommended):

It's recommended to install MLFlow directly within a notebook cell to ensure the library is available for all subsequent cells during your session.
It's recommended to install MLflow directly within a notebook cell to ensure the library is available for all subsequent cells during your session.

.. code-block:: none
Expand All @@ -32,43 +32,43 @@ To interact with MLFlow, the MLFlow Python library needs to be installed within
2. **Using the Notebook's Terminal**:

Alternatively, you can install MLFlow from the notebook's terminal with the same command. This method also installs MLFlow for the current session:
Alternatively, you can install MLflow from the notebook's terminal with the same command. This method also installs MLflow for the current session:

.. code-block:: bash
pip install mlflow
Remember, any installations via the notebook or terminal will not persist after the notebook is restarted (e.g., stopped and started again with `dss start` and `dss stop`). Therefore, the first method is preferred to ensure consistency across sessions.

Connecting to MLFlow library
Connecting to MLflow library
----------------------------

After installing MLFlow, you can directly interact with the MLFlow server configured for your DSS environment:
After installing MLflow, you can directly interact with the MLflow server configured for your DSS environment:

.. code-block:: python
import mlflow
# Initialise the MLFlow client
# Initialise the MLflow client
c = mlflow.MlflowClient()
# The tracking URI should be set automatically from the environment variable
print(c.tracking_uri) # Prints the MLFlow tracking URI
print(c.tracking_uri) # Prints the MLflow tracking URI
# Create a new experiment
c.create_experiment("test-experiment")
This example shows how to initialise the MLFlow client, check the tracking URI, and create a new experiment. The `MLFLOW_TRACKING_URI` should already be set in your environment, allowing you to focus on your experiments without manual configuration.
This example shows how to initialise the MLflow client, check the tracking URI, and create a new experiment. The `MLFLOW_TRACKING_URI` should already be set in your environment, allowing you to focus on your experiments without manual configuration.

Further Information
-------------------

For more detailed information on using MLFlow, including advanced configurations and features, refer to the official MLFlow documentation:
For more detailed information on using MLflow, including advanced configurations and features, refer to the official MLflow documentation:

* `MLFlow Docs`_
* `MLflow Docs`_

Conclusion
----------

By following these steps, you can seamlessly integrate MLFlow into your Jupyter Notebooks within the DSS environment, leveraging robust tools for managing your machine learning experiments effectively.
By following these steps, you can seamlessly integrate MLflow into your Jupyter Notebooks within the DSS environment, leveraging robust tools for managing your machine learning experiments effectively.

55 changes: 54 additions & 1 deletion docs/how-to/jupyter-notebook/create-notebook.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,57 @@
Create Notebook
===============

Create Notebook
This guide provides instructions on how to create a Jupyter Notebook in the Data Science Stack (DSS) environment.

Overview
--------

A Jupyter Notebook can be created using the DSS command line interface (CLI). This notebook will include different packages and toolkits depending on the image used to create it.

Prerequisites
-------------

Before creating a notebook, ensure you have the following:

- DSS CLI installed on your workstation
- DSS initialized

Creating a Notebook
-------------------

1. **Select an image**:

Before creating a notebook, you need to select an image that includes the packages and toolkits you need. To see a list of recommended images and their aliases, see:

.. code-block:: bash
dss create --help
The help text includes a list of recommended images and aliases so you don't need to type the full image name. For this guide, we will use the image `kubeflownotebookswg/jupyter-scipy:v1.8.0`

2. **Create the notebook**:

Create a new notebook using ``dss create``:

.. code-block:: bash
dss create my-notebook --image kubeflownotebookswg/jupyter-scipy:v1.8.0
This will pull the notebook image and start a Notebook server, printing the URL of the notebook once complete. Expected output:

.. code-block:: none
[INFO] Executing create command
[INFO] Waiting for deployment test-notebook in namespace dss to be ready...
[INFO] Deployment test-notebook in namespace dss is ready
[INFO] Success: Notebook test-notebook created successfully.
[INFO] Access the notebook at http://10.152.183.42:80.
3. **Access the notebook**:

To :doc:`access the Notebook </how-to/jupyter-notebook/access-ui>`, use the URL provided in the output.

Conclusion
----------

Notebooks are a powerful tool for data scientists and analysts to explore, visualise, and analyse data. By creating a notebook in the DSS environment, you can leverage the power of the Data Science Stack to run your analyses.
2 changes: 1 addition & 1 deletion docs/how-to/list-commands.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ Listing available DSS commands
--help Show this message and exit.
Commands:
create Create a Jupyter notebook in DSS and connect it to MLFlow.
create Create a Jupyter notebook in DSS and connect it to MLflow.
initialize Initialize DSS on the given Kubernetes cluster.
list Lists all created notebooks in the DSS environment.
logs Prints the logs for the specified notebook or DSS component.
Expand Down
Loading

0 comments on commit 5d79362

Please sign in to comment.