Update documentation somewhat

ejnnr · ejnnr · Mar 4, 2024 · Feb 28, 2024 · Feb 29, 2024 · Feb 29, 2024
commit 35220aabc9c65952ef20999d193f00947d1a6819
diff --git a/README.md b/README.md
@@ -31,13 +31,13 @@ installing `cupbearer`, in particular if you want to control CUDA version etc.
 
 ## Running experiments
 We provide scripts in `cupbearer.scripts` for more easily running experiments.
-See [demo.ipynb](demo.ipynb) for a quick example of how to use them---this is likely
+See [the demo notebook](notebooks/simple_demo.ipynb) for a quick example of how to use them---this is likely
 also the best way to get an overview of how the components of `cupbearer` fit together.
 
 These "scripts" are Python functions and designed to be used from within Python,
 e.g. in a Jupyter notebook or via [submitit](https://github.com/facebookincubator/submitit/tree/main)
 if on Slurm. But of course you could also write a simple Python wrapper and then use
-them from the CLI. Their configuration interface is designed to be very general,
+them from the CLI. The scripts are designed to be pretty general,
 which sometimes comes at the cost of being a bit verbose---we recommend writing helper
 functions for your specific use case on top of the general script interface.
 Of course you can also use the components of `cupbearer` directly without going through

diff --git a/docs/adding_a_script.md b/docs/adding_a_script.md
diff --git a/docs/adding_a_task.md b/docs/adding_a_task.md
@@ -1,88 +1 @@
 # Adding a new task
-
-The only component that a task absolutely needs is an implementation of the
-`TaskConfigBase` abstract class:
-```python
-class TaskConfigBase(BaseConfig, ABC):
-    @abstractmethod
-    def build_reference_data(self) -> Dataset:
-        pass
-
-    @abstractmethod
-    def build_model(self) -> Model:
-        pass
-
-    def build_params(self):
-        return None
-
-    @abstractmethod
-    def build_anomalous_data(self) -> Dataset:
-        pass
-```
-If your config has any parameters, you should use a dataclass to set them. E.g.
-```python
-@dataclass
-class MyTaskConfig(TaskConfigBase):
-    my_required_param: str
-    my_optional_param: int = 42
-
-    ...
-```
-This will automagically let you override these parameters from the command line
-(and any parameters without default values will be required).
-
-`build_reference_data` and `build_anomalous_data` both need to return `pytorch` `Dataset`s.
-`build_model` needs to return a `models.Model`, which is a special type of `flax.linen.Module`.
-`build_params` can return a parameter dict for the returned `Model` (if `None`, the model
-will be randomly initialized, which is usually not what you want).
-
-In practice, the datasets and the model will have to come from somewhere, so you'll
-often implement a few things in addition to the task config class. There are predefined
-interfaces for datasets and models, and if possible I suggest using those (either
-using their existing implementations, or adding your own). For example, consider
-the adversarial example task:
-```python
-@dataclass
-class AdversarialExampleTask(TaskConfigBase):
-    run_path: Path
-
-    def __post_init__(self):
-        self._reference_data = TrainDataFromRun(path=self.run_path)
-        self._anomalous_data = AdversarialExampleConfig(run_path=self.run_path)
-        self._model = StoredModel(path=self.run_path)
-
-    def build_anomalous_data(self) -> Dataset:
-        return self._anomalous_data.build_dataset()
-
-    def build_model(self) -> Model:
-        return self._model.build_model()
-
-    def build_params(self) -> Model:
-        return self._model.build_params()
-
-    def build_reference_data(self) -> Dataset:
-        return self._reference_data.build_dataset()
-```
-This task only has one parameter, the path to the training run of a base model.
-It then uses the training data of that run as reference data, and an adversarial
-version of it as anomalous data. The model is just the trained base model, loaded
-from disk.
-
-You can also add new scripts in the `scripts` directory, to generate the datasets
-and/or train the model. For example, the adversarial examples task has an
-associated script `make_adversarial_examples.py`. (To get the model, we can simply
-use the existing `train_classifier.py` script.)
-
-There's no formal connection between scripts and the rest of the library---you can
-leave it up to users to run the necessary preparatory scripts before using your new
-task. But if feasible, you may want to automate this. For example, the `AdversarialExampleDataset`
-automatically runs `make_adversarial_examples.py` if the necessary files are not found.
-
-Finally, you need to register your task to make it accessible from the command line
-in the existing scripts. Simply add the task config class to the `TASKS` dict in `tasks/__init__.py`
-(with an arbitrary name as the key).
-
-Then you should be able to run commands like
-```bash
-python -m cupbearer.scripts.train_detector --task my_task --detector my_detector --task.my_required_param foo
-```
diff --git a/docs/configuration.md b/docs/configuration.md
diff --git a/docs/high_level_structure.md b/docs/high_level_structure.md
@@ -3,32 +3,13 @@ In this document, we'll go over all the subpackages of `cupbearer` to see what r
 they play and how to extend them. For more details of extending `cupbearer`, see
 the other documentation files on specific subpackages.
 
-## Configuration
-Different parts of `cupbearer` interface with each other through many configuration
-dataclasses. Each dataset, model, task, detector, script, etc. should expose all its
-hyperparameters and configuration options through such a dataclass. That way,
-all options will automatically be configurable from the command line.
-
-Many of the configuration dataclass ABCs have one or several `build()` methods that
-create the actual object of interest based on the configuration. For example,
-the `DetectorConfig` ABC has an abstract `build()` method that must return an
-`AnomalyDetector` instance.
-
-See [configuration.md](configuration.md) for more details on the configuration
-dataclasses and what to keep in mind when writing your own.
-
 ## Helper subpackages
 ### `cupbearer.data`
 The `data` package contains implementations of basic datasets, transforms,
 and specialized datasets (e.g. datasets consisting only of adversarial examples).
-The key interface is the `DatasetConfig` class. It has a `build()` method that
-needs to return a pytorch `Dataset` instance.
 
-In principle, you don't need to use the `DatasetConfig` interface (or anything
-from the `data` package) to implement new tasks or detectors. Tasks and detectors
-just pass `Dataset` instances between each other. But unless you have a good reason
-to avoid the `DatasetConfig` interface, it's best to use it since it already works
-with the scripts and you get some features such as configuring transforms for free.
+Using this subpackage is optional, you can define tasks directly using standard
+pytorch `Dataset`s.
 
 ### `cupbearer.models`
 Unlike the `data` package, you have to use the `models` package at the moment.
@@ -37,53 +18,32 @@ to the model's activations. Using the implementations from the `models` package
 ensures a consistent way to get activations from models. As long as you don't want
 to add new model architectures, most of the details of this package won't matter.
 
-For now, only linear computational graphs are supported, i.e. each model needs to
-be a fixed sequence of computational steps performed one after the other
-(like a `Sequential` module in many deep learning frameworks). A `Computation`
-is just a type alias for such as sequence of steps. The `Model` class takes such a
-`Computation` and is itself a `flax.linen.Module` that implements the computation.
-The main thing it does on top of `flax.linen.Sequential` is that it can also return
-all the activations of the model. It also has a function for plotting the architecture
-of the model.
-
-Similar to the `DataConfig` interface, there's a `ModelConfig` with a `build()`
-method that returns a `Model` instance.
+In the future, we'll likely deprecate the `HookedModel` interface and just support
+standard `torch.nn.Module`s via pytorch hooks.
 
 ### `cupbearer.utils`
-The `utils` package contains many miscallaneous helper functions. You probably won't
-interact with these too much, but here are a few that it may be good to know about:
-- `utils.trainer` contains a `Trainer` class that's a very simple version of pytorch
-  lightning for flax. You certainly don't need to use this in any scripts you add,
-  but it may save you some boilerplate. NOTE: we might deprecate this in the future
-  and replace it with something like `elegy`.
-- `utils.utils.save` and `utils.utils.load` can save and store pytrees. They use the
-  `orbax` checkpointer under the hood, but add some hacky support for saving/loading
-  types.
-
-We'll cover a few more functions from the `utils` package when we talk about scripts.
+The `utils` package contains some miscallaneous helper functions. Most of these are
+mainly for internal usage, but see the example notebooks for helpful ones.
 
 ## Tasks
-The `tasks` package contains the `TaskConfigBase` ABC, which is the interface any
-task needs to implement, as well as all the existing tasks. To add a new task:
-1. Create a new module or subpackage in `tasks`, where you implement a new class
-   that inherits `TaskConfigBase`.
-2. Add your new class to the `TASKS` dictionary in `tasks/__init__.py`.
+The `tasks` package contains the `Task` class, which is the interface any
+task needs to implement, as well as all the existing tasks. To add a new task,
+you can either inherit `Task` or simply write a function that returns a `Task` instance.
 
-Often, you'll also need to implement a new type of dataset or model.
+Often, you'll also need to implement a new type of dataset or model for your task.
 That code probably belongs in the `data` and `model` packages,
 though sometimes it's a judgement call.
 
 See [adding_a_task.md](adding_a_task.md) for more details.
 
 ## Detectors
-The `detectors` package is similar to `tasks`, but for anomaly detectors. In addition
-to the `DetectorConfig` interface, it also contains an `AnomalyDetector` ABC, which
-any detection method needs to subclass for its actual implementation.
+The `detectors` package is similar to `tasks`, but for anomaly detectors. The key
+interface is `AnomalyDetector`.
 
 See [adding_a_detector.md](adding_a_detector.md) for more details.
 
 ## Scripts
-The `scripts` package contains command line scripts and their configurations.
+The `scripts` package contains Python functions for running common workflows.
 Two scripts are meant to be used by all detectors/tasks:
 - `train_detector` trains a detector on a task and saves the trained detector to disk.
 - `eval_detector` evaluates a stored (or otherwise specified) detector and evaluates
@@ -92,7 +52,3 @@ Two scripts are meant to be used by all detectors/tasks:
 All other scripts are helper scripts for specific tasks or detectors. For example,
 most tasks will need a script to train the model to be analyzed, and perhaps to prepare
 the dataset.
-
-There's a lot more to be said about scripts, see the [README](../README.md) for a brief
-overview of *running* scripts, and [adding_a_script.md](adding_a_script.md) for details
-on writing new scripts.