docs: DSS CLI and Notebook Creation (#115)

* add `docs-lint` tox environment to run docs tests * replace "MLFlow" with "MLflow" in docs * Add docs for dss CLI, notebook creation * Fix how-to/manage-dss section
canonical · May 3, 2024 · 5d79362 · 5d79362
1 parent 273ed1c
commit 5d79362
Show file tree

Hide file tree

Showing 20 changed files with 324 additions and 50 deletions.
diff --git a/README.md b/README.md
@@ -28,7 +28,7 @@ GPU and drivers, so the containerized environments can only focus on user-space
 - Seamless GPU utilization
 - Out-of-the box ML Environments with JupyterLab
 - Easy data passing between local machine and containerized ML Environments
-- [MLFlow](https://github.com/mlflow/mlflow) for lineage tracking
+- [MLflow](https://github.com/mlflow/mlflow) for lineage tracking
 
 ## Requirements
 

diff --git a/docs/.custom_wordlist.txt b/docs/.custom_wordlist.txt
@@ -1,10 +1,26 @@
 DSS
-JupyterLab
-MLFlow
 GPUs
-runtime
+Initialize
+Jupyter
+JupyterLab
+MLflow
 MicroK
 microk
 OCI
+config
+hostpath
+initialize
+initialized
+initializing
+io
+jupyter
+kubeconfig
+kubeflownotebookswg
+microk
+microk8s.io
 pvc
-hostpath
+reinitialise
+runtime
+scipy
+snapcraft
+toolkits
diff --git a/docs/how-to/dss/dss-status.rst b/docs/how-to/dss/dss-status.rst
@@ -0,0 +1,26 @@
+Get Status of DSS
+=================
+
+This guide explains how to check the status of your DSS environment.
+
+Overview
+--------
+
+The `dss status` command provides a quick way to check the status of your DSS environment, including the status of MLflow and whether a GPU is detected in the environment.
+
+Installing the DSS Snap
+-----------------------
+
+To see the status of DSS, run the following command:
+
+.. code-block:: bash
+
+    dss status
+
+If you have a DSS environment running and no GPU available, the expected output is:
+
+.. code-block:: none
+
+    [INFO] MLflow deployment: Ready
+    [INFO] MLflow URL: http://10.152.183.68:5000
+    [INFO] GPU acceleration: Disabled
diff --git a/docs/how-to/dss/index.rst b/docs/how-to/dss/index.rst
@@ -0,0 +1,12 @@
+Manage DSS
+==========
+
+Use these guides for detailed steps on installing and managing the Data Science Stack.
+
+.. toctree::
+   :maxdepth: 1
+
+   install-dss-cli
+   initialize-dss
+   dss-status
+   purge-dss
diff --git a/docs/how-to/dss/initialize-dss.rst b/docs/how-to/dss/initialize-dss.rst
@@ -0,0 +1,58 @@
+Initialize DSS
+==============
+
+This guide explains how to initialize the DSS environment through the Data Science Stack (DSS) Command Line Interface (CLI).
+
+Overview
+--------
+
+The `dss initialize` command provides a way to initialize the DSS environment. This command:
+
+* stores credentials for the MicroK8s cluster
+* allocates storage for all DSS Notebooks to share
+* deploys an `MLflow <MLflow Docs_>`_ model registry
+
+Prerequisites
+-------------
+
+Before initializing DSS, ensure you have the following:
+
+- DSS CLI installed on your workstation.
+- `MicroK8s`_ installed on your workstation.
+
+Initializing the DSS Environment
+--------------------------------
+
+Initialize DSS through the `dss initialize` command, for example:
+
+.. code-block:: shell
+
+    dss initialize --kubeconfig "$(microk8s config)"
+
+where we provide the content of our MicroK8s cluster's kubeconfig using the `--kubeconfig` option.
+
+.. note::
+   Don't forget the quotes around `$(microk8s config)` - without them, the content may be interpreted by your shell.
+
+The expected output of the above command is:
+
+.. code-block:: none
+
+    [INFO] Executing initialize command
+    [INFO] Storing provided kubeconfig to /home/user/.dss/config
+    [INFO] Waiting for deployment mlflow in namespace dss to be ready...
+    [INFO] Deployment mlflow in namespace dss is ready
+    [INFO] DSS initialized. To create your first notebook run the command:
+
+    dss create
+
+    Examples:
+      dss create my-notebook --image=pytorch
+      dss create my-notebook --image=kubeflownotebookswg/jupyter-scipy:v1.8.0
+
+From this point, DSS is ready for you to :doc:`create your first notebook </how-to/jupyter-notebook/create-notebook>`.
+
+Conclusion
+----------
+
+This guide explained how to initialize the DSS environment through the DSS CLI. You can now proceed to :doc:`create your first notebook </how-to/jupyter-notebook/create-notebook>`.
diff --git a/docs/how-to/dss/install-dss-cli.rst b/docs/how-to/dss/install-dss-cli.rst
@@ -0,0 +1,36 @@
+Install DSS CLI
+===============
+
+This guide explains how to install the Data Science Stack (DSS) Command Line Interface (CLI).
+
+Overview
+--------
+
+The DSS CLI is distributed as a snap accessible from the `snap store <dss snap store_>`_.
+
+Prerequisites
+-------------
+
+Before proceeding, ensure that you have the following:
+
+- A system with `snap`_ installed.
+
+Installing the DSS Snap
+-----------------------
+
+To install the DSS snap, run the following command:
+
+.. code-block:: bash
+
+    sudo snap install data-science-stack
+
+Then you can run the DSS CLI by running the following command:
+
+.. code-block:: bash
+
+    dss
+
+Conclusion
+----------
+
+The Data Science Stack CLI has been successfully installed on your system. You can now start using the DSS CLI to manage your data science projects.
diff --git a/docs/how-to/dss/purge-dss.rst b/docs/how-to/dss/purge-dss.rst
@@ -0,0 +1,51 @@
+Purge DSS
+===========
+
+This guide explains how to purge (remove) the Data Science Stack (DSS) environment from your MicroK8s cluster
+
+Overview
+--------
+
+The `dss purge` command provides a way to remove everything deployed by DSS from your MicroK8s cluster. This includes all the DSS components, such MLflow and Jupyter Notebooks.
+
+.. note::
+
+    This action removes the components of the DSS environment, but it does not remove the DSS CLI or your MicroK8s cluster.  To remove those, `remove their snaps <https://snapcraft.io/docs/quickstart-tour>`_.
+
+Prerequisites
+-------------
+
+This guide applies if you have the following:
+
+- DSS initialized on your system.
+
+Purging the DSS Environment
+---------------------------
+
+To purge all DSS components from your machine, do:
+
+.. code-block:: bash
+
+    dss purge
+
+This will remove:
+
+* all Jupyter Notebooks
+* the MLflow server
+* any data stored within the DSS environment
+
+.. caution::
+
+    This action is irreversible. All data stored within the DSS environment will be lost.
+
+The expected output from the above command is:
+
+.. code-block:: none
+
+    [INFO] Waiting for namespace dss to be deleted...
+    [INFO] Success: All DSS components and notebooks purged successfully from the Kubernetes cluster.
+
+Conclusion
+----------
+
+All elements of the DSS environment have been purged from your MicroK8s cluster. You can now reinitialise DSS on your system if you wish to continue using it.
diff --git a/docs/how-to/index.rst b/docs/how-to/index.rst
@@ -7,6 +7,7 @@ managing and interacting with containerised ML Environments via the DSS.
 .. toctree::
    :maxdepth: 2
 
+   dss/index
    jupyter-notebook/index
    mlflow/index
    list-commands
diff --git a/docs/how-to/jupyter-notebook/access-ui.rst b/docs/how-to/jupyter-notebook/access-ui.rst
@@ -1,3 +1,5 @@
+.. _access_ui:
+
 Access the Jupyter Notebooks UI
 ===============================
 

diff --git a/docs/how-to/jupyter-notebook/connect-mlflow.rst b/docs/how-to/jupyter-notebook/connect-mlflow.rst
@@ -1,12 +1,12 @@
-Connect from Notebook to MLFlow
+Connect from Notebook to MLflow
 ===============================
 
-This guide provides instructions on how to integrate MLFlow with your Jupyter Notebook in the Data Science Stack (DSS) environment for tracking experiments.
+This guide provides instructions on how to integrate MLflow with your Jupyter Notebook in the Data Science Stack (DSS) environment for tracking experiments.
 
 Overview
 --------
 
-MLFlow is a platform for managing the end-to-end machine learning life cycle. It includes tracking experiments, packaging code into reproducible runs, and sharing and deploying models. DSS environments are pre-configured to interact with an MLFlow server through the `MLFLOW_TRACKING_URI` environment variable set in each notebook.
+MLflow is a platform for managing the end-to-end machine learning life cycle. It includes tracking experiments, packaging code into reproducible runs, and sharing and deploying models. DSS environments are pre-configured to interact with an MLflow server through the `MLFLOW_TRACKING_URI` environment variable set in each notebook.
 
 Prerequisites
 -------------
@@ -16,14 +16,14 @@ Before you begin, ensure the following:
 - You have an active Jupyter Notebook in the DSS environment.
 - You understand basic operations within a Jupyter Notebook.
 
-Installing MLFlow
+Installing MLflow
 -----------------
 
-To interact with MLFlow, the MLFlow Python library needs to be installed within your notebook environment. There are two ways to install the MLFlow library:
+To interact with MLflow, the MLflow Python library needs to be installed within your notebook environment. There are two ways to install the MLflow library:
 
 1. **Within a Notebook Cell** (Recommended):
 
-   It's recommended to install MLFlow directly within a notebook cell to ensure the library is available for all subsequent cells during your session.
+   It's recommended to install MLflow directly within a notebook cell to ensure the library is available for all subsequent cells during your session.
 
    .. code-block:: none
 
@@ -32,43 +32,43 @@ To interact with MLFlow, the MLFlow Python library needs to be installed within
 
 2. **Using the Notebook's Terminal**:
 
-   Alternatively, you can install MLFlow from the notebook's terminal with the same command. This method also installs MLFlow for the current session:
+   Alternatively, you can install MLflow from the notebook's terminal with the same command. This method also installs MLflow for the current session:
 
    .. code-block:: bash
 
        pip install mlflow
 
    Remember, any installations via the notebook or terminal will not persist after the notebook is restarted (e.g., stopped and started again with `dss start` and `dss stop`). Therefore, the first method is preferred to ensure consistency across sessions.
 
-Connecting to MLFlow library
+Connecting to MLflow library
 ----------------------------
 
-After installing MLFlow, you can directly interact with the MLFlow server configured for your DSS environment:
+After installing MLflow, you can directly interact with the MLflow server configured for your DSS environment:
 
 .. code-block:: python
 
     import mlflow
 
-    # Initialise the MLFlow client
+    # Initialise the MLflow client
     c = mlflow.MlflowClient()
 
     # The tracking URI should be set automatically from the environment variable
-    print(c.tracking_uri)  # Prints the MLFlow tracking URI
+    print(c.tracking_uri)  # Prints the MLflow tracking URI
 
     # Create a new experiment
     c.create_experiment("test-experiment")
 
-This example shows how to initialise the MLFlow client, check the tracking URI, and create a new experiment. The `MLFLOW_TRACKING_URI` should already be set in your environment, allowing you to focus on your experiments without manual configuration.
+This example shows how to initialise the MLflow client, check the tracking URI, and create a new experiment. The `MLFLOW_TRACKING_URI` should already be set in your environment, allowing you to focus on your experiments without manual configuration.
 
 Further Information
 -------------------
 
-For more detailed information on using MLFlow, including advanced configurations and features, refer to the official MLFlow documentation:
+For more detailed information on using MLflow, including advanced configurations and features, refer to the official MLflow documentation:
 
-* `MLFlow Docs`_
+* `MLflow Docs`_
 
 Conclusion
 ----------
 
-By following these steps, you can seamlessly integrate MLFlow into your Jupyter Notebooks within the DSS environment, leveraging robust tools for managing your machine learning experiments effectively.
+By following these steps, you can seamlessly integrate MLflow into your Jupyter Notebooks within the DSS environment, leveraging robust tools for managing your machine learning experiments effectively.
 
diff --git a/docs/how-to/jupyter-notebook/create-notebook.rst b/docs/how-to/jupyter-notebook/create-notebook.rst
@@ -1,4 +1,57 @@
 Create Notebook
 ===============
 
-Create Notebook
+This guide provides instructions on how to create a Jupyter Notebook in the Data Science Stack (DSS) environment.
+
+Overview
+--------
+
+A Jupyter Notebook can be created using the DSS command line interface (CLI).  This notebook will include different packages and toolkits depending on the image used to create it.
+
+Prerequisites
+-------------
+
+Before creating a notebook, ensure you have the following:
+
+- DSS CLI installed on your workstation
+- DSS initialized
+
+Creating a Notebook
+-------------------
+
+1. **Select an image**:
+
+    Before creating a notebook, you need to select an image that includes the packages and toolkits you need.  To see a list of recommended images and their aliases, see:
+
+    .. code-block:: bash
+
+        dss create --help
+
+    The help text includes a list of recommended images and aliases so you don't need to type the full image name.  For this guide, we will use the image `kubeflownotebookswg/jupyter-scipy:v1.8.0`
+
+2. **Create the notebook**:
+
+    Create a new notebook using ``dss create``:
+
+    .. code-block:: bash
+
+        dss create my-notebook --image kubeflownotebookswg/jupyter-scipy:v1.8.0
+
+    This will pull the notebook image and start a Notebook server, printing the URL of the notebook once complete.  Expected output:
+
+    .. code-block:: none
+
+        [INFO] Executing create command
+        [INFO] Waiting for deployment test-notebook in namespace dss to be ready...
+        [INFO] Deployment test-notebook in namespace dss is ready
+        [INFO] Success: Notebook test-notebook created successfully.
+        [INFO] Access the notebook at http://10.152.183.42:80.
+
+3. **Access the notebook**:
+
+    To :doc:`access the Notebook </how-to/jupyter-notebook/access-ui>`, use the URL provided in the output.
+
+Conclusion
+----------
+
+Notebooks are a powerful tool for data scientists and analysts to explore, visualise, and analyse data.  By creating a notebook in the DSS environment, you can leverage the power of the Data Science Stack to run your analyses.
diff --git a/docs/how-to/list-commands.rst b/docs/how-to/list-commands.rst
@@ -35,7 +35,7 @@ Listing available DSS commands
         --help  Show this message and exit.
 
         Commands:
-        create      Create a Jupyter notebook in DSS and connect it to MLFlow.
+        create      Create a Jupyter notebook in DSS and connect it to MLflow.
         initialize  Initialize DSS on the given Kubernetes cluster.
         list        Lists all created notebooks in the DSS environment.
         logs        Prints the logs for the specified notebook or DSS component.