Skip to content

Latest commit

 

History

History
153 lines (90 loc) · 6.4 KB

TESTS.md

File metadata and controls

153 lines (90 loc) · 6.4 KB

Tests

This project uses unit, smoke and integration tests with Python files and notebooks. For more information, see a quick introduction to unit, smoke and integration tests. To manually execute the unit tests in the different environments, first make sure you are in the correct environment as described in the SETUP.md.

Test execution

Click on the following menus to see more details on how to execute the unit, smoke and integration tests:

Unit tests

Unit tests ensure that each class or function behaves as it should. Every time a developer makes a pull request to staging or master branch, a battery of unit tests is executed.

For executing the Python unit tests for the utilities:

pytest tests/unit -m "not notebooks and not spark and not gpu"

For executing the Python unit tests for the notebooks:

pytest tests/unit -m "notebooks and not spark and not gpu"

For executing the Python GPU unit tests for the utilities:

pytest tests/unit -m "not notebooks and not spark and gpu"

For executing the Python GPU unit tests for the notebooks:

pytest tests/unit -m "notebooks and not spark and gpu"

For executing the PySpark unit tests for the utilities:

pytest tests/unit -m "not notebooks and spark and not gpu"

For executing the PySpark unit tests for the notebooks:

pytest tests/unit -m "notebooks and spark and not gpu"
Smoke tests

Smoke tests make sure that the system works and are executed just before the integration tests every night.

For executing the Python smoke tests:

pytest tests/smoke -m "smoke and not spark and not gpu"

For executing the Python GPU smoke tests:

pytest tests/smoke -m "smoke and not spark and gpu"

For executing the PySpark smoke tests:

pytest tests/smoke -m "smoke and spark and not gpu"
Integration tests

Integration tests make sure that the program results are acceptable

For executing the Python integration tests:

pytest tests/integration -m "integration and not spark and not gpu"

For executing the Python GPU integration tests:

pytest tests/integration -m "integration and not spark and gpu"

For executing the PySpark integration tests:

pytest tests/integration -m "integration and spark and not gpu"

How to create tests on notebooks with Papermill

In the notebooks of these repo we use Papermill in unit, smoke and integration tests.

In the unit tests we just make sure the notebook runs. In the smoke tests, we run them with a small dataset or a small number of epochs to make sure that, apart from running, they provide reasonable metrics. Finally, in the integration tests, we use a bigger dataset for more epochs and we test that the metrics are what we expect.

Developing unit tests with Papermill

Executing a notebook with Papermill is easy, this is what we mostly do in the unit tests. Next we show just one of the tests that we have in tests/unit/test_notebooks_python.py.

import pytest
import papermill as pm
from tests.notebooks_common import OUTPUT_NOTEBOOK, KERNEL_NAME

@pytest.mark.notebooks
def test_sar_single_node_runs(notebooks):
    notebook_path = notebooks["sar_single_node"]
    pm.execute_notebook(notebook_path, OUTPUT_NOTEBOOK, kernel_name=KERNEL_NAME)

Notice that the input of the function is a fixture defined in conftest.py. For more information, please see the definition of fixtures in PyTest.

For executing this test, first make sure you are in the correct environment as described in the SETUP.md:

pytest tests/unit/test_notebooks_python.py::test_sar_single_node_runs

Developing smoke and integration tests with Papermill

A more advanced option is used in the smoke and integration tests, where we not only execute the notebook, but inject parameters and recover the computed metrics.

The first step is to tag the parameters that we are going to inject. For it we need to modify the notebook. We will add a tag with the name parameters. To add a tag, go the the notebook menu, View, Cell Toolbar and Tags. A tag field will appear on every cell. The variables in the cell tagged with parameters can be injected. The typical variables that we inject are MOVIELENS_DATA_SIZE, EPOCHS and other configuration variables for our algorithms.

The way papermill works to inject parameters is very simple, it generates a copy of the notebook (in our code we call it OUTPUT_NOTEBOOK), and creates a new cell with the injected variables.

The second modification that we need to do to the notebook is to record the metrics we want to test using pm.record("output_variable", python_variable_name). We normally use the last cell of the notebook to record all the metrics. These are the metrics that we are going to control to in the smoke and integration tests.

This is an example on how we do a smoke test. The complete code can be found in tests/smoke/test_notebooks_python.py:

import pytest
import papermill as pm
from tests.notebooks_common import OUTPUT_NOTEBOOK, KERNEL_NAME

TOL = 0.05

@pytest.mark.smoke
def test_sar_single_node_smoke(notebooks):
    notebook_path = notebooks["sar_single_node"]
    pm.execute_notebook(notebook_path, OUTPUT_NOTEBOOK, kernel_name=KERNEL_NAME)
    pm.execute_notebook(
        notebook_path,
        OUTPUT_NOTEBOOK,
        kernel_name=KERNEL_NAME,
        parameters=dict(TOP_K=10, MOVIELENS_DATA_SIZE="100k"),
    )
    results = pm.read_notebook(OUTPUT_NOTEBOOK).dataframe.set_index("name")["value"]
    assert results["precision"] == pytest.approx(0.326617179, TOL)
    assert results["recall"] == pytest.approx(0.175956743, TOL)

As it can be seen in the code, we are injecting the dataset size and the top k and we are recovering the precision and recall at k.

For executing this test, first make sure you are in the correct environment as described in the SETUP.md:

pytest tests/smoke/test_notebooks_python.py::test_sar_single_node_smoke

More details on how to integrate Papermill with notebooks can be found in their repo.