The E2E tests for TF operator are implemented as Argo workflows. For more background and details about Argo (not required for understanding the rest of this document), please take a look at this link.
Test results can be monitored at the Prow dashboard.
At a high level, the E2E test suites are structured as Python test classes. Each test class contains one or more tests. A test typically runs the following:
- Create a ksonnet component using a TFJob spec;
- Creates the specified TFJob;
- Verifies some expected results (e.g. number of pods started, job status);
- Deletes the TFJob.
An example can be found here.
A test class can have several test methods. Each method executes a series of user actions (e.g. starting or deleting a TFJob), and performs verifications of expected results (e.g. TFJob exits with correct status, pods are deleted, etc).
Test classes should follow this pattern:
class MyTest(test_util.TestCase):
def __init__(self, args):
# Initialize environment
def test_case_1(self):
# Test code
def test_case_2(self):
# Test code
if __name__ == "__main__"
test_runner.main(module=__name__)
The code here ideally should only contain API calls. Any common functionalities used by the test code should be added to one of the helper modules:
- k8s_util - for K8s operations like querying/deleting a pod
- ks_util - for ksonnet operations
- tf_job_client - for TFJob-specific operations, such as waiting for the job to be in a certain phase
This is needed if you want to use your own TFJob spec instead of an existing one. An example can be found here. All TFJob specs should be placed in the same directory.
These are similar to actual TFJob specs. Note that many of these are using the tf-operator-test-server as the test image. This gives us more control over when each replica exits, and allows us to send specific requests like fetching the runtime TensorFlow config.
This is needed if you are creating a new test class. Creating a new test class is recommended if you are implementing a new feature, and want to group all relevant E2E tests together.
New test classes should be added as Argo workflow steps to the workflows.libsonnet file.
Under the templates section, add the following to the dag:
{
name: "my-test",
template: "my-test",
dependencies: ["setup-kubeflow"],
},
This will configure Argo to run my-test
after setting up the Kubeflow cluster.
Next, add the following lines toward the end of the file:
$.parts(namespace, name, overrides).e2e(prow_env, bucket).buildTestTemplate(
"my-test"),
This assumes that there is a corresponding Python file named my_test.py
(note the difference between dashes and
underscores).