Skip to content

Commit

Permalink
Simplify save_dir and some directory -> dir renames (#151)
Browse files Browse the repository at this point in the history
* wip renames

* renames in docs

* readme

* data dir renamme in docs

* rename in code from data_directory to data_dir

* maintaining update

* fix capitalization

* further updates

* tweak

* do not overwrite

* add overwrite save dir

* add overwrite save dir to config

* update configs with all info

* use full train configuration

* only upload if does not exist

* tests for save

* overwrite param

* better set up and test for overwrite

* docs

* update docs with overwrite

* from overwrite_save_dir to overwrite

* missed rename

* remove machine specific from vlc

* unindent so test actually runs

* check for local and cached checkpoints

* should be and

* write out predict config before preds start like we do for train config

* update all configs and use only first 10 digits of hash

* dry run check after save is configured; more robust test

* reorder

* show save directory

* copy edits

* update template

* fix test

* lower case for consistency

* fix test
  • Loading branch information
ejm714 authored Oct 25, 2021
1 parent b4d0a3d commit e1c1f03
Show file tree
Hide file tree
Showing 35 changed files with 478 additions and 336 deletions.
2 changes: 1 addition & 1 deletion .github/MAINTAINING.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ make publish_models

This will generate a public file name for each model based on the config hash and upload the model weights to the three DrivenData public s3 buckets. This will generate a folder in `zamba/models/official_models/{your_name_name}` that contains the official config as well as reference yaml and json files. You should PR everything in this folder.

Lastly, you need to update the template in `templates`. The template should contain all the same info as the model's `config.yaml`, plus placeholders for `data_directory` and `labels` in `train_config`, and `data_directory`, `filepaths`, and `checkpoint` in `predict_config`.
Lastly, you need to update the template in `templates`. The template should contain all the same info as the model's `config.yaml`, plus placeholders for `data_dir` and `labels` in `train_config`, and `data_dir`, `filepaths`, and `checkpoint` in `predict_config`.

### New model checklist

Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ See the [Quickstart](https://zamba.drivendata.org/docs/quickstart/) page or the
### Training a model

```console
$ zamba train --data-dir path/to/videos --labels path_to_labels.csv --save-path my_trained_model
$ zamba train --data-dir path/to/videos --labels path_to_labels.csv --save_dir my_trained_model
```

The newly trained model will be saved to the specified save directory. The folder will contain a model checkpoint as well as training configuration, model hyperparameters, and validation and test metrics. Run `zamba train --help` to list all possible options to pass to `train`.
Expand Down
41 changes: 26 additions & 15 deletions docs/docs/configurations.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ All video loading arguments can be specified either in a [YAML file](yaml-config
from zamba.models.config import PredictConfig
from zamba.models.model_manager import predict_model

predict_config = PredictConfig(data_directory="example_vids/")
predict_config = PredictConfig(data_dir="example_vids/")
video_loader_config = VideoLoaderConfig(
model_input_height=240,
model_input_width=426,
Expand Down Expand Up @@ -146,14 +146,16 @@ All possible model inference parameters are defined by the [`PredictConfig` clas

class PredictConfig(ZambaBaseModel)
| PredictConfig(*,
data_directory: DirectoryPath = Path.cwd()
data_dir: DirectoryPath = Path.cwd(),
filepaths: FilePath = None,
checkpoint: FilePath = None,
model_name: zamba.models.config.ModelEnum = <ModelEnum.time_distributed: 'time_distributed'>,
gpus: int = 0,
num_workers: int = 3,
batch_size: int = 2,
save: Union[bool, pathlib.Path] = True,
save: bool = True,
save_dir: Optional[Path] = None,
overwrite: bool = False,
dry_run: bool = False,
proba_threshold: float = None,
output_class_names: bool = False,
Expand All @@ -164,9 +166,9 @@ class PredictConfig(ZambaBaseModel)
...
```

**Either `data_directory` or `filepaths` must be specified to instantiate `PredictConfig`.** If neither is specified, the current working directory will be used as the default `data_directory`.
**Either `data_dir` or `filepaths` must be specified to instantiate `PredictConfig`.** If neither is specified, the current working directory will be used as the default `data_dir`.

#### `data_directory (DirectoryPath, optional)`
#### `data_dir (DirectoryPath, optional)`

Path to the directory containing videos for inference. Defaults to the current working directory.

Expand Down Expand Up @@ -194,9 +196,18 @@ The number of CPUs to use during training. The maximum value for `num_workers` i

The batch size to use for inference. Defaults to `2`

#### `save (bool, optional)`
#### `save (bool)`

Whether to save out the predictions to a CSV file. By default, predictions will be saved at `zamba_predictions.csv`. Defaults to `True`
Whether to save out predictions. If `False`, predictions are not saved. Defaults to `True`.

#### `save_dir (Path, optional)`

An optional directory in which to save the model predictions and configuration yaml. If
no `save_dir` is specified and `save` is True, outputs will be written to the current working directory. Defaults to `None`

#### `overwrite (bool)`

If True, will overwrite `zamba_predictions.csv` and `predict_configuration.yaml` in `save_dir` if they exist. Defaults to False.

#### `dry_run (bool, optional)`

Expand Down Expand Up @@ -237,7 +248,7 @@ All possible model training parameters are defined by the [`TrainConfig` class](
class TrainConfig(ZambaBaseModel)
| TrainConfig(*,
labels: Union[FilePath, pandas.DataFrame],
data_directory: DirectoryPath = # your current working directory ,
data_dir: DirectoryPath = # your current working directory ,
checkpoint: FilePath = None,
scheduler_config: Union[str, zamba.models.config.SchedulerConfig, NoneType] = 'default',
model_name: zamba.models.config.ModelEnum = <ModelEnum.time_distributed: 'time_distributed'>,
Expand All @@ -256,8 +267,8 @@ class TrainConfig(ZambaBaseModel)
verbose=True, mode='max'),
weight_download_region: zamba.models.utils.RegionEnum = 'us',
split_proportions: Dict[str, int] = {'train': 3, 'val': 1, 'holdout': 1},
save_directory: pathlib.Path = # your current working directory ,
overwrite_save_directory: bool = False,
save_dir: pathlib.Path = # your current working directory ,
overwrite: bool = False,
skip_load_validation: bool = False,
from_scratch: bool = False,
predict_all_zamba_species: bool = True,
Expand All @@ -270,7 +281,7 @@ class TrainConfig(ZambaBaseModel)

Either the path to a CSV file with labels for training, or a dataframe of the training labels. There must be columns for `filename` and `label`. **`labels` must be specified to instantiate `TrainConfig`.**

#### `data_directory (DirectoryPath, optional)`
#### `data_dir (DirectoryPath, optional)`

Path to the directory containing training videos. Defaults to the current working directory.

Expand Down Expand Up @@ -326,13 +337,13 @@ Because `zamba` needs to download pretrained weights for the neural network arch

The proportion of data to use during training, validation, and as a holdout set. Defaults to `{"train": 3, "val": 1, "holdout": 1}`

#### `save_directory (Path, optional)`
#### `save_dir (Path, optional)`

Directory in which to save model checkpoint and configuration file. If not specified, will save to a `version_*` folder in your working directory.
Directory in which to save model checkpoint and configuration file. If not specified, will save to a `version_n` folder in your current working directory.

#### `overwrite_save_directory (bool, optional)`
#### `overwrite (bool, optional)`

If `True`, will save outputs in `save_directory` and overwrite the directory if it exists. If False, will create an auto-incremented `version_n` folder within `save_directory` with model outputs. Defaults to `False`.
If `True`, will save outputs in `save_dir` and overwrite the directory if it exists. If False, will create an auto-incremented `version_n` folder within `save_dir` with model outputs. Defaults to `False`.

#### `skip_load_validation (bool, optional)`

Expand Down
6 changes: 3 additions & 3 deletions docs/docs/debugging.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Before kicking off a full run of inference or model training, we recommend testi
In Python, add `dry_run=True` to [`PredictConfig`](configurations.md#prediction-arguments) or [`TrainConfig`](configurations.md#training-arguments):
```python
predict_config = PredictConfig(
data_directory="example_vids/", dry_run=True
data_dir="example_vids/", dry_run=True
)
```

Expand All @@ -30,7 +30,7 @@ The dry run will also catch any GPU memory errors. If you hit a GPU memory error
In Python, add `batch_size` to [`PredictConfig`](configurations.md#prediction-arguments) or [`TrainConfig`](configurations.md#training-arguments):
```python
predict_config = PredictConfig(
data_directory="example_vids/", batch_size=1
data_dir="example_vids/", batch_size=1
)
```

Expand Down Expand Up @@ -66,7 +66,7 @@ Reduce the number of workers (subprocesses) used for data loading. By default `n
In Python, add `num_workers` to [`PredictConfig`](configurations.md#prediction-arguments) or [`TrainConfig`](configurations.md#training-arguments):
```python
predict_config = PredictConfig(
data_directory="example_vids/", num_workers=1
data_dir="example_vids/", num_workers=1
)
```

Expand Down
10 changes: 5 additions & 5 deletions docs/docs/extra-options.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ For using a YAML file with the Python package and other details, see the [YAML C
In Python this can be specified in [`PredictConfig`](configurations.md#prediction-arguments) or [`TrainConfig`](configurations.md#training-arguments):
```python
predict_config = PredictConfig(
data_directory="example_vids/",
data_dir="example_vids/",
weight_download_region='asia',
)
```
Expand Down Expand Up @@ -50,7 +50,7 @@ Say that you have a large number of videos, and you are more concerned with dete
from zamba.models.config import PredictConfig
from zamba.models.model_manager import predict_model

predict_config = PredictConfig(data_directory="example_vids/")
predict_config = PredictConfig(data_dir="example_vids/")

video_loader_config = VideoLoaderConfig(
model_input_height=50, model_input_width=50, total_frames=16
Expand Down Expand Up @@ -139,7 +139,7 @@ For example, to take the 16 frames with the highest probability of detection:
total_frames=16,
)

train_config = TrainConfig(data_directory="example_vids/", labels="example_labels.csv",)
train_config = TrainConfig(data_dir="example_vids/", labels="example_labels.csv",)

train_model(video_loader_config=video_loader_config, train_config=train_config)
```
Expand All @@ -162,15 +162,15 @@ Both can be specified in either [`predict_config`](configurations.md#prediction-
=== "YAML file"
```yaml
predict_config:
data_directory: example_vids/
data_dir: example_vids/
num_workers: 5
batch_size: 4
# ... other parameters
```
=== "Python"
```python
predict_config = PredictConfig(
data_directory="example_vids/",
data_dir="example_vids/",
num_workers=5,
batch_size=4,
# ... other parameters
Expand Down
4 changes: 2 additions & 2 deletions docs/docs/models/denspose.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ Once that is done, here's how to run the DensePose model:
=== "Python"
```python
from zamba.models.densepose import DensePoseConfig
densepose_conf = DensePoseConfig(data_directory="PATH_TO_VIDEOS", render_output=True)
densepose_conf = DensePoseConfig(data_dir="PATH_TO_VIDEOS", render_output=True)
densepose_conf.run_model()
```

Expand All @@ -68,7 +68,7 @@ Options:
containing images/videos.
--filepaths PATH Path to csv containing `filepath` column
with videos.
--save-path PATH An optional directory for saving the output.
--save-dir PATH An optional directory for saving the output.
Defaults to the current working directory.
--config PATH Specify options using yaml configuration
file instead of through command line
Expand Down
16 changes: 8 additions & 8 deletions docs/docs/predict-tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,17 +37,17 @@ Minimum example for prediction using the Python package:
from zamba.models.model_manager import predict_model
from zamba.models.config import PredictConfig

predict_config = PredictConfig(data_directory="example_vids/")
predict_config = PredictConfig(data_dir="example_vids/")
predict_model(predict_config=predict_config)
```

The only two arguments that can be passed to `predict_model` are `predict_config` and (optionally) `video_loader_config`. The first step is to instantiate [`PredictConfig`](configurations.md#prediction-arguments). Optionally, you can also specify video loading arguments by instantiating and passing in [`VideoLoaderConfig`](configurations.md#video-loading-arguments).

### Required arguments

To run `predict_model` in Python, you must specify either `data_directory` or `filepaths` when `PredictConfig` is instantiated.
To run `predict_model` in Python, you must specify either `data_dir` or `filepaths` when `PredictConfig` is instantiated.

* **`data_directory (DirectoryPath)`:** Path to the folder containing your videos.
* **`data_dir (DirectoryPath)`:** Path to the folder containing your videos.

* **`filepaths (FilePath)`:** Path to a CSV file with a column for the filepath to each video you want to classify. The CSV must have a column for `filepath`. Filepaths can be absolute or relative to the data directory.

Expand All @@ -57,7 +57,7 @@ For detailed explanations of all possible configuration arguments, see [All Opti

By default, the [`time_distributed`](models/index.md#time-distributed) model will be used. `zamba` will output a `.csv` file with rows labeled by each video filename and columns for each class (ie. species). The default prediction will store all class probabilities, so that cell (i,j) can be interpreted as *the probability that animal j is present in video i.*

By default, predictions will be saved to `zamba_predictions.csv`. You can save predictions to a custom directory using the `--save-path` argument.
By default, predictions will be saved to `zamba_predictions.csv` in your working directory. You can save predictions to a custom directory using the `--save-dir` argument.

```console
$ cat zamba_predictions.csv
Expand Down Expand Up @@ -90,7 +90,7 @@ Add the path to your video folder. For example, if your videos are in a folder c
```
=== "Python"
```python
predict_config = PredictConfig(data_directory='example_vids/')
predict_config = PredictConfig(data_dir='example_vids/')
predict_model(predict_config=predict_config)
```

Expand All @@ -109,7 +109,7 @@ Add the model name to your command. The `time_distributed` model will be used if
=== "Python"
```python
predict_config = PredictConfig(
data_directory='example_vids/', model_name='slowfast'
data_dir='example_vids/', model_name='slowfast'
)
predict_model(predict_config=predict_config)
```
Expand Down Expand Up @@ -137,7 +137,7 @@ Say we want to generate predictions for the videos in `example_vids` indicating
=== "Python"
```python
predict_config = PredictConfig(
data_directory="example_vids/", proba_threshold=0.5
data_dir="example_vids/", proba_threshold=0.5
)
predict_model(predict_config=predict_config)
predictions = pd.read_csv("zamba_predictions.csv")
Expand All @@ -153,7 +153,7 @@ Say we want to generate predictions for the videos in `example_vids` indicating

### 4. Specify any additional parameters

And there's so much more! You can also do things like specify your region for faster model download (`--weight-download-region`), use a saved model checkpoint (`--checkpoint`), or specify a different path where your predictions should be saved (`--save`). To read about a few common considerations, see the [Guide to Common Optional Parameters](extra-options.md) page.
And there's so much more! You can also do things like specify your region for faster model download (`--weight-download-region`), use a saved model checkpoint (`--checkpoint`), or specify a different folder where your predictions should be saved (`--save-dir`). To read about a few common considerations, see the [Guide to Common Optional Parameters](extra-options.md) page.

### 5. Test your configuration with a dry run

Expand Down
30 changes: 16 additions & 14 deletions docs/docs/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,8 @@ $ zamba predict --data-dir example_vids/
```

`zamba` will output a `.csv` file with rows labeled by each video filename and columns for each class (ie. species). The default prediction will store all class probabilities, so that cell `(i,j)` is *the probability that animal `j` is present in video `i`.* Comprehensive predictions are helpful when a single video contains multiple species.
Predictions will be saved to `zamba_predictions.csv` in the current working directory by default. You can save out predictions to a different folder using the `--save-path` argument.

Predictions will be saved to `zamba_predictions.csv` in the current working directory by default. You can save out predictions to a different folder using the `--save-dir` argument.

Adding the argument `--output-class-names` will simplify the predictions to return only the *most likely* animal in each video:

Expand Down Expand Up @@ -108,7 +109,7 @@ eleph.MP4,elephant
leopard.MP4,leopard
```

By default, the trained model and additional training output will be saved to a `version_*` folder in the current working directory. For example,
By default, the trained model and additional training output will be saved to a `version_n` folder in the current working directory. For example,

```console
$ zamba train --data-dir example_vids/ --labels example_labels.csv
Expand All @@ -134,8 +135,6 @@ Once zamba is installed, you can see more details of each function with `--help`
To get help with `zamba predict`:

```console
$ zamba predict --help

Usage: zamba predict [OPTIONS]

Identify species in a video.
Expand All @@ -162,11 +161,13 @@ Options:
specifiied, will use all GPUs found on
machine.
--batch-size INTEGER Batch size to use for training.
--save / --no-save Whether to save out predictions to a csv
file. If you want to specify the location of
the csv, use save_path instead.
--save-path PATH Full path for prediction CSV file. Any
needed parent directories will be created.
--save / --no-save Whether to save out predictions. If you want
to specify the output directory, use
save_dir instead.
--save-dir PATH An optional directory in which to save the
model predictions and configuration yaml.
Defaults to the current working directory if
save is True.
--dry-run / --no-dry-run Runs one batch of inference to check for
bugs.
--config PATH Specify options using yaml configuration
Expand All @@ -193,6 +194,8 @@ Options:
loaded prior to inference. Only use if
you're very confident all your videos can be
loaded.
-o, --overwrite Overwrite outputs in the save directory if
they exist.
-y, --yes Skip confirmation of configuration and
proceed right to prediction.
--help Show this message and exit.
Expand Down Expand Up @@ -228,11 +231,10 @@ Options:
machine.
--dry-run / --no-dry-run Runs one batch of train and validation to
check for bugs.
--save-dir PATH Directory in which to save model checkpoint
and configuration file. If not specified,
will save to a folder called
'zamba_{model_name}' in your working
directory.
--save-dir PATH An optional directory in which to save the
model checkpoint and configuration file. If
not specified, will save to a `version_n`
folder in your working directory.
--num-workers INTEGER Number of subprocesses to use for data
loading.
--weight-download-region [us|eu|asia]
Expand Down
Loading

0 comments on commit e1c1f03

Please sign in to comment.