Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify save_dir and some directory -> dir renames #151

Merged
merged 36 commits into from
Oct 25, 2021
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
73ed0cb
wip renames
ejm714 Oct 23, 2021
5802102
renames in docs
ejm714 Oct 23, 2021
be8478d
readme
ejm714 Oct 23, 2021
1e986d3
data dir renamme in docs
ejm714 Oct 23, 2021
5630332
rename in code from data_directory to data_dir
ejm714 Oct 23, 2021
7f28c59
maintaining update
ejm714 Oct 23, 2021
3e51468
fix capitalization
ejm714 Oct 23, 2021
2b8b282
further updates
ejm714 Oct 23, 2021
3618be5
tweak
ejm714 Oct 23, 2021
49a3029
do not overwrite
ejm714 Oct 23, 2021
bd45a60
add overwrite save dir
ejm714 Oct 23, 2021
bacc14d
add overwrite save dir to config
ejm714 Oct 23, 2021
efb5ec2
update configs with all info
ejm714 Oct 25, 2021
c7578b5
use full train configuration
ejm714 Oct 25, 2021
70d9b1c
only upload if does not exist
ejm714 Oct 25, 2021
72adb39
tests for save
ejm714 Oct 25, 2021
0578a8a
overwrite param
ejm714 Oct 25, 2021
ddd4c78
better set up and test for overwrite
ejm714 Oct 25, 2021
bfbb24f
docs
ejm714 Oct 25, 2021
dc28179
update docs with overwrite
ejm714 Oct 25, 2021
0b4a098
from overwrite_save_dir to overwrite
ejm714 Oct 25, 2021
2515b3f
missed rename
ejm714 Oct 25, 2021
247fc5c
remove machine specific from vlc
ejm714 Oct 25, 2021
2b8f546
unindent so test actually runs
ejm714 Oct 25, 2021
2bbe343
check for local and cached checkpoints
ejm714 Oct 25, 2021
e7fee37
should be and
ejm714 Oct 25, 2021
e6eb0db
write out predict config before preds start like we do for train config
ejm714 Oct 25, 2021
a5680a2
update all configs and use only first 10 digits of hash
ejm714 Oct 25, 2021
1d1bfbb
dry run check after save is configured; more robust test
ejm714 Oct 25, 2021
488c47a
reorder
ejm714 Oct 25, 2021
61d3878
show save directory
ejm714 Oct 25, 2021
e78cea0
copy edits
ejm714 Oct 25, 2021
78270d5
update template
ejm714 Oct 25, 2021
4dd9aae
fix test
ejm714 Oct 25, 2021
2a0a315
lower case for consistency
ejm714 Oct 25, 2021
59322a7
fix test
ejm714 Oct 25, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/MAINTAINING.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ make publish_models

This will generate a public file name for each model based on the config hash and upload the model weights to the three DrivenData public s3 buckets. This will generate a folder in `zamba/models/official_models/{your_name_name}` that contains the official config as well as reference yaml and json files. You should PR everything in this folder.

Lastly, you need to update the template in `templates`. The template should contain all the same info as the model's `config.yaml`, plus placeholders for `data_directory` and `labels` in `train_config`, and `data_directory`, `filepaths`, and `checkpoint` in `predict_config`.
Lastly, you need to update the template in `templates`. The template should contain all the same info as the model's `config.yaml`, plus placeholders for `data_dir` and `labels` in `train_config`, and `data_dir`, `filepaths`, and `checkpoint` in `predict_config`.

### New model checklist

Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ See the [Quickstart](https://zamba.drivendata.org/docs/quickstart/) page or the
### Training a model

```console
$ zamba train --data-dir path/to/videos --labels path_to_labels.csv --save-path my_trained_model
$ zamba train --data-dir path/to/videos --labels path_to_labels.csv --save_dir my_trained_model
```

The newly trained model will be saved to the specified save directory. The folder will contain a model checkpoint as well as training configuration, model hyperparameters, and validation and test metrics. Run `zamba train --help` to list all possible options to pass to `train`.
Expand Down
36 changes: 21 additions & 15 deletions docs/docs/configurations.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ All video loading arguments can be specified either in a [YAML file](yaml-config
from zamba.models.config import PredictConfig
from zamba.models.model_manager import predict_model

predict_config = PredictConfig(data_directory="example_vids/")
predict_config = PredictConfig(data_dir="example_vids/")
video_loader_config = VideoLoaderConfig(
model_input_height=240,
model_input_width=426,
Expand Down Expand Up @@ -146,14 +146,15 @@ All possible model inference parameters are defined by the [`PredictConfig` clas

class PredictConfig(ZambaBaseModel)
| PredictConfig(*,
data_directory: DirectoryPath = Path.cwd()
data_dir: DirectoryPath = Path.cwd(),
filepaths: FilePath = None,
checkpoint: FilePath = None,
model_name: zamba.models.config.ModelEnum = <ModelEnum.time_distributed: 'time_distributed'>,
gpus: int = 0,
num_workers: int = 3,
batch_size: int = 2,
save: Union[bool, pathlib.Path] = True,
save: bool = True,
save_dir: Optional[Path] = None,
dry_run: bool = False,
proba_threshold: float = None,
output_class_names: bool = False,
Expand All @@ -164,9 +165,9 @@ class PredictConfig(ZambaBaseModel)
...
```

**Either `data_directory` or `filepaths` must be specified to instantiate `PredictConfig`.** If neither is specified, the current working directory will be used as the default `data_directory`.
**Either `data_dir` or `filepaths` must be specified to instantiate `PredictConfig`.** If neither is specified, the current working directory will be used as the default `data_dir`.

#### `data_directory (DirectoryPath, optional)`
#### `data_dir (DirectoryPath, optional)`

Path to the directory containing videos for inference. Defaults to the current working directory.

Expand Down Expand Up @@ -194,9 +195,14 @@ The number of CPUs to use during training. The maximum value for `num_workers` i

The batch size to use for inference. Defaults to `2`

#### `save (bool, optional)`
#### `save (bool)`

Whether to save out the predictions to a CSV file. By default, predictions will be saved at `zamba_predictions.csv`. Defaults to `True`
Whether to save out predictions. If `False`, predictions are not saved. Defaults to `True`.

#### `save_dir (Path, optional)`

An optional directory in which to save the model predictions and configuration yaml. If
no `save_dir` is specified and `save` is True, outputs will be written to the current working directory. Defaults to `None`

#### `dry_run (bool, optional)`

Expand Down Expand Up @@ -237,7 +243,7 @@ All possible model training parameters are defined by the [`TrainConfig` class](
class TrainConfig(ZambaBaseModel)
| TrainConfig(*,
labels: Union[FilePath, pandas.DataFrame],
data_directory: DirectoryPath = # your current working directory ,
data_dir: DirectoryPath = # your current working directory ,
checkpoint: FilePath = None,
scheduler_config: Union[str, zamba.models.config.SchedulerConfig, NoneType] = 'default',
model_name: zamba.models.config.ModelEnum = <ModelEnum.time_distributed: 'time_distributed'>,
Expand All @@ -256,8 +262,8 @@ class TrainConfig(ZambaBaseModel)
verbose=True, mode='max'),
weight_download_region: zamba.models.utils.RegionEnum = 'us',
split_proportions: Dict[str, int] = {'train': 3, 'val': 1, 'holdout': 1},
save_directory: pathlib.Path = # your current working directory ,
overwrite_save_directory: bool = False,
save_dir: pathlib.Path = # your current working directory ,
overwrite_save_dir: bool = False,
skip_load_validation: bool = False,
from_scratch: bool = False,
predict_all_zamba_species: bool = True,
Expand All @@ -270,7 +276,7 @@ class TrainConfig(ZambaBaseModel)

Either the path to a CSV file with labels for training, or a dataframe of the training labels. There must be columns for `filename` and `label`. **`labels` must be specified to instantiate `TrainConfig`.**

#### `data_directory (DirectoryPath, optional)`
#### `data_dir (DirectoryPath, optional)`

Path to the directory containing training videos. Defaults to the current working directory.

Expand Down Expand Up @@ -326,13 +332,13 @@ Because `zamba` needs to download pretrained weights for the neural network arch

The proportion of data to use during training, validation, and as a holdout set. Defaults to `{"train": 3, "val": 1, "holdout": 1}`

#### `save_directory (Path, optional)`
#### `save_dir (Path, optional)`

Directory in which to save model checkpoint and configuration file. If not specified, will save to a `version_*` folder in your working directory.
Directory in which to save model checkpoint and configuration file. If not specified, will save to a `version_n` folder in your current working directory.

#### `overwrite_save_directory (bool, optional)`
#### `overwrite_save_dir (bool, optional)`

If `True`, will save outputs in `save_directory` and overwrite the directory if it exists. If False, will create an auto-incremented `version_n` folder within `save_directory` with model outputs. Defaults to `False`.
If `True`, will save outputs in `save_dir` and overwrite the directory if it exists. If False, will create an auto-incremented `version_n` folder within `save_dir` with model outputs. Defaults to `False`.

#### `skip_load_validation (bool, optional)`

Expand Down
6 changes: 3 additions & 3 deletions docs/docs/debugging.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Before kicking off a full run of inference or model training, we recommend testi
In Python, add `dry_run=True` to [`PredictConfig`](configurations.md#prediction-arguments) or [`TrainConfig`](configurations.md#training-arguments):
```python
predict_config = PredictConfig(
data_directory="example_vids/", dry_run=True
data_dir="example_vids/", dry_run=True
)
```

Expand All @@ -30,7 +30,7 @@ The dry run will also catch any GPU memory errors. If you hit a GPU memory error
In Python, add `batch_size` to [`PredictConfig`](configurations.md#prediction-arguments) or [`TrainConfig`](configurations.md#training-arguments):
```python
predict_config = PredictConfig(
data_directory="example_vids/", batch_size=1
data_dir="example_vids/", batch_size=1
)
```

Expand Down Expand Up @@ -66,7 +66,7 @@ Reduce the number of workers (subprocesses) used for data loading. By default `n
In Python, add `num_workers` to [`PredictConfig`](configurations.md#prediction-arguments) or [`TrainConfig`](configurations.md#training-arguments):
```python
predict_config = PredictConfig(
data_directory="example_vids/", num_workers=1
data_dir="example_vids/", num_workers=1
)
```

Expand Down
10 changes: 5 additions & 5 deletions docs/docs/extra-options.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ For using a YAML file with the Python package and other details, see the [YAML C
In Python this can be specified in [`PredictConfig`](configurations.md#prediction-arguments) or [`TrainConfig`](configurations.md#training-arguments):
```python
predict_config = PredictConfig(
data_directory="example_vids/",
data_dir="example_vids/",
weight_download_region='asia',
)
```
Expand Down Expand Up @@ -50,7 +50,7 @@ Say that you have a large number of videos, and you are more concerned with dete
from zamba.models.config import PredictConfig
from zamba.models.model_manager import predict_model

predict_config = PredictConfig(data_directory="example_vids/")
predict_config = PredictConfig(data_dir="example_vids/")

video_loader_config = VideoLoaderConfig(
model_input_height=50, model_input_width=50, total_frames=16
Expand Down Expand Up @@ -139,7 +139,7 @@ For example, to take the 16 frames with the highest probability of detection:
total_frames=16,
)

train_config = TrainConfig(data_directory="example_vids/", labels="example_labels.csv",)
train_config = TrainConfig(data_dir="example_vids/", labels="example_labels.csv",)

train_model(video_loader_config=video_loader_config, train_config=train_config)
```
Expand All @@ -162,15 +162,15 @@ Both can be specified in either [`predict_config`](configurations.md#prediction-
=== "YAML file"
```yaml
predict_config:
data_directory: example_vids/
data_dir: example_vids/
num_workers: 5
batch_size: 4
# ... other parameters
```
=== "Python"
```python
predict_config = PredictConfig(
data_directory="example_vids/",
data_dir="example_vids/",
num_workers=5,
batch_size=4,
# ... other parameters
Expand Down
4 changes: 2 additions & 2 deletions docs/docs/models/denspose.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ Once that is done, here's how to run the DensePose model:
=== "Python"
```python
from zamba.models.densepose import DensePoseConfig
densepose_conf = DensePoseConfig(data_directory="PATH_TO_VIDEOS", render_output=True)
densepose_conf = DensePoseConfig(data_dir="PATH_TO_VIDEOS", render_output=True)
densepose_conf.run_model()
```

Expand All @@ -68,7 +68,7 @@ Options:
containing images/videos.
--filepaths PATH Path to csv containing `filepath` column
with videos.
--save-path PATH An optional directory for saving the output.
--save-dir PATH An optional directory for saving the output.
Defaults to the current working directory.
--config PATH Specify options using yaml configuration
file instead of through command line
Expand Down
16 changes: 8 additions & 8 deletions docs/docs/predict-tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,17 +37,17 @@ Minimum example for prediction using the Python package:
from zamba.models.model_manager import predict_model
from zamba.models.config import PredictConfig

predict_config = PredictConfig(data_directory="example_vids/")
predict_config = PredictConfig(data_dir="example_vids/")
predict_model(predict_config=predict_config)
```

The only two arguments that can be passed to `predict_model` are `predict_config` and (optionally) `video_loader_config`. The first step is to instantiate [`PredictConfig`](configurations.md#prediction-arguments). Optionally, you can also specify video loading arguments by instantiating and passing in [`VideoLoaderConfig`](configurations.md#video-loading-arguments).

### Required arguments

To run `predict_model` in Python, you must specify either `data_directory` or `filepaths` when `PredictConfig` is instantiated.
To run `predict_model` in Python, you must specify either `data_dir` or `filepaths` when `PredictConfig` is instantiated.

* **`data_directory (DirectoryPath)`:** Path to the folder containing your videos.
* **`data_dir (DirectoryPath)`:** Path to the folder containing your videos.

* **`filepaths (FilePath)`:** Path to a CSV file with a column for the filepath to each video you want to classify. The CSV must have a column for `filepath`. Filepaths can be absolute or relative to the data directory.

Expand All @@ -57,7 +57,7 @@ For detailed explanations of all possible configuration arguments, see [All Opti

By default, the [`time_distributed`](models/index.md#time-distributed) model will be used. `zamba` will output a `.csv` file with rows labeled by each video filename and columns for each class (ie. species). The default prediction will store all class probabilities, so that cell (i,j) can be interpreted as *the probability that animal j is present in video i.*

By default, predictions will be saved to `zamba_predictions.csv`. You can save predictions to a custom directory using the `--save-path` argument.
By default, predictions will be saved to `zamba_predictions.csv` in your working directory. You can save predictions to a custom directory using the `--save-dir` argument.

```console
$ cat zamba_predictions.csv
Expand Down Expand Up @@ -90,7 +90,7 @@ Add the path to your video folder. For example, if your videos are in a folder c
```
=== "Python"
```python
predict_config = PredictConfig(data_directory='example_vids/')
predict_config = PredictConfig(data_dir='example_vids/')
predict_model(predict_config=predict_config)
```

Expand All @@ -109,7 +109,7 @@ Add the model name to your command. The `time_distributed` model will be used if
=== "Python"
```python
predict_config = PredictConfig(
data_directory='example_vids/', model_name='slowfast'
data_dir='example_vids/', model_name='slowfast'
)
predict_model(predict_config=predict_config)
```
Expand Down Expand Up @@ -137,7 +137,7 @@ Say we want to generate predictions for the videos in `example_vids` indicating
=== "Python"
```python
predict_config = PredictConfig(
data_directory="example_vids/", proba_threshold=0.5
data_dir="example_vids/", proba_threshold=0.5
)
predict_model(predict_config=predict_config)
predictions = pd.read_csv("zamba_predictions.csv")
Expand All @@ -153,7 +153,7 @@ Say we want to generate predictions for the videos in `example_vids` indicating

### 4. Specify any additional parameters

And there's so much more! You can also do things like specify your region for faster model download (`--weight-download-region`), use a saved model checkpoint (`--checkpoint`), or specify a different path where your predictions should be saved (`--save`). To read about a few common considerations, see the [Guide to Common Optional Parameters](extra-options.md) page.
And there's so much more! You can also do things like specify your region for faster model download (`--weight-download-region`), use a saved model checkpoint (`--checkpoint`), or specify a different folder where your predictions should be saved (`--save-dir`). To read about a few common considerations, see the [Guide to Common Optional Parameters](extra-options.md) page.

### 5. Test your configuration with a dry run

Expand Down
28 changes: 14 additions & 14 deletions docs/docs/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,8 @@ $ zamba predict --data-dir example_vids/
```

`zamba` will output a `.csv` file with rows labeled by each video filename and columns for each class (ie. species). The default prediction will store all class probabilities, so that cell `(i,j)` is *the probability that animal `j` is present in video `i`.* Comprehensive predictions are helpful when a single video contains multiple species.
Predictions will be saved to `zamba_predictions.csv` in the current working directory by default. You can save out predictions to a different folder using the `--save-path` argument.

Predictions will be saved to `zamba_predictions.csv` in the current working directory by default. You can save out predictions to a different folder using the `--save-dir` argument.
ejm714 marked this conversation as resolved.
Show resolved Hide resolved

Adding the argument `--output-class-names` will simplify the predictions to return only the *most likely* animal in each video:

Expand Down Expand Up @@ -108,7 +109,7 @@ eleph.MP4,elephant
leopard.MP4,leopard
```

By default, the trained model and additional training output will be saved to a `version_*` folder in the current working directory. For example,
By default, the trained model and additional training output will be saved to a `version_n` folder in the current working directory. For example,

```console
$ zamba train --data-dir example_vids/ --labels example_labels.csv
Expand All @@ -134,8 +135,6 @@ Once zamba is installed, you can see more details of each function with `--help`
To get help with `zamba predict`:

```console
$ zamba predict --help

Usage: zamba predict [OPTIONS]

Identify species in a video.
Expand All @@ -162,11 +161,13 @@ Options:
specifiied, will use all GPUs found on
machine.
--batch-size INTEGER Batch size to use for training.
--save / --no-save Whether to save out predictions to a csv
file. If you want to specify the location of
the csv, use save_path instead.
--save-path PATH Full path for prediction CSV file. Any
needed parent directories will be created.
--save / --no-save Whether to save out predictions. If you want
to specify the output directory, use
save_dir instead.
--save-dir PATH An optional directory in which to save the
model predictions and configuration yaml.
Defaults to the current working directory if
save is True.
--dry-run / --no-dry-run Runs one batch of inference to check for
bugs.
--config PATH Specify options using yaml configuration
Expand Down Expand Up @@ -228,11 +229,10 @@ Options:
machine.
--dry-run / --no-dry-run Runs one batch of train and validation to
check for bugs.
--save-dir PATH Directory in which to save model checkpoint
and configuration file. If not specified,
will save to a folder called
'zamba_{model_name}' in your working
directory.
--save-dir PATH An optional directory in which to save the
model checkpoint and configuration file. If
not specified, will save to a `version_n`
folder in your working directory.
--num-workers INTEGER Number of subprocesses to use for data
loading.
--weight-download-region [us|eu|asia]
Expand Down
Loading