Skip to content

Commit

Permalink
release commit
Browse files Browse the repository at this point in the history
  • Loading branch information
krkaufma committed Dec 5, 2019
0 parents commit 47db19b
Show file tree
Hide file tree
Showing 40 changed files with 7,918 additions and 0 deletions.
23 changes: 23 additions & 0 deletions .coveragerc
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# .coveragerc to control coverage.py
[run]
branch = True
source = */src/*
# omit = bad_file.py

[report]
# Regexes for lines to exclude from consideration
exclude_lines =
# Have to re-enable the standard pragma
pragma: no cover

# Don't complain about missing debug-only code:
def __repr__
if self\.debug

# Don't complain if tests don't hit defensive assertion code:
raise AssertionError
raise NotImplementedError

# Don't complain if non-runnable code isn't run:
if 0:
if __name__ == .__main__.:
48 changes: 48 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Excluded artifact directories
build/
data/
models/


# Temporary and binary files
*~
*.py[cod]
*.so
*.cfg
!setup.cfg
*.orig
*.log
*.pot
__pycache__/*
.cache/*
.*.swp
*/.ipynb_checkpoints/*

# Project files
.ropeproject
.project
.pydevproject
.settings
.idea

# Package files
*.egg
*.eggs/
.installed.cfg
*.egg-info

# Unittest and coverage
htmlcov/*
.coverage
.tox
junit.xml
coverage.xml

# Build and docs folder/files
build/*
dist/*
sdist/*
docs/api/*
docs/_build/*
cover/*
MANIFEST
6 changes: 6 additions & 0 deletions AUTHORS.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
============
Contributors
============
Alex Rosengarten, [email protected]
Kevin Kaufmann, [email protected]
Chaoyi Zhu, [email protected]
23 changes: 23 additions & 0 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
=========
Changelog
=========

Version 0.1.0
=============
- Initial project structure established.

Version 0.1.1
=============
- Added a model "plugin" system. This allows parallel model development with reasonable version control.


Version 0.1.2
=============
- Added script to create manifest file. Manifest file is used to manage all of the data for the project -- it's a central place to look up the location of each image file and to understand it's known metadata.

- [ ] TODO: need to refactor `make_data.py`, i.e. data reading module to use the manifest file instead of rely on a specific file structure.

Version 0.1.3
=============
- Added notebooks module for interactive investigations. First investigation: Model interpretability.
- Fixed excessive warnings when reading in TIFF images.
21 changes: 21 additions & 0 deletions LICENSE.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
The MIT License (MIT)

Copyright (c) 2018 Vecchio Laboratory

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
264 changes: 264 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,264 @@
# Vecchio-Group-Phase-2



A small ML Training and Evaluation Framework for reliable experimentation.


## Setup

This project targets a GPU-enabled Linux Workstation. Additional work may be required for testing on other operating systems or on CPUs.

We target Python 3.6+.

1. Create/Activate a virtual environment (via [anaconda](https://docs.conda.io/en/latest/miniconda.html), [virtualenv](https://virtualenv.pypa.io/en/latest/), or [pyenv](https://github.com/pyenv/pyenv)) Recommended: Anaconda
2. `pip install -e .`


### Dev Install

Development installations are recommended if you'd like to contribute to this library.

`pip install -e .[dev]`

### Testing

`python setup.py test`

Thereafter, coverage reporting can be found at `docs/cov/`

## How to Run Experiments

### Collecting data into a manifest file

This project includes a `make_manifest.py` script that assumes the following:

- All of your EBSD data exists under a single directory (likely called `data`) on your filesystem.
- The folders that contain image data (`.tiff` files) also contain a `Grain List.txt` file with metadata about the images in the folder.
- The folder structure for the images, while arbitrary, will not change at the time experiments are run.

Before training starts, first create a manifest file to represent your dataset.

### Splitting data manifest into groups

We also include a `split_manifest.py` script whose purpose is to divide the EBSD image dataset into train, test, and validation groups.

After creating a manifest file and before starting experimenting, please use this script to mutate the manifest file for repeatable experimentation.

This script should add a column (likely called `_split`) that will be used in the training process.

### Train a Model

Define a model through subclassing either a Regression or Classification type in the `src/vecchio/models.py` file.
Thereafter, train your model in a repeatble way using the `train.py` or `regression_train.py` scripts below.
Consider saving the exact command that is run, because the specific arguments will be integral in the evaluation phase.
See documentation below for arguments and example uages.

### Evaluate the model performance.

For classification or regression, please make use of the `eval.py` and `regression_eval.py` scripts (docs below).


## Docs

## `make_manifest.py --help`

```
usage: make_manifest.py [-h] [-o OUTPUT] source_directory
Produce an EBSD data manifest file from a root directory.
positional arguments:
source_directory Root directory to begin parsing data
optional arguments:
-h, --help show this help message and exit
-o OUTPUT, --output OUTPUT
/path/to/output.csv Default: `manifest.csv`
```

Example usage:
```
python src/make_manifest.py data/
```

```
python src/make_manifest.py data/ -o my_manifest.csv
```

## `split_manifest.py --help`
```
usage: split_manifest.py [-h] [-ts TEST_SIZE] [-vs VAL_SIZE] [-s SEED]
[-sm {shuffle,stratified-shuffle}] [-o OUTPUT]
manifest label_column
Produce an EBSD data manifest file from a root directory.
positional arguments:
manifest path/to/manifest.csv
label_column Name of column considered the label or `y`.
optional arguments:
-h, --help show this help message and exit
-ts TEST_SIZE, --test-size TEST_SIZE
Ratio of dataset to include in test set
-vs VAL_SIZE, --validation-size VAL_SIZE
Ratio of dataset to include in validation set
-s SEED, --seed SEED Random seed
-sm {shuffle,stratified-shuffle}, --split-method {shuffle,stratified-shuffle}
-o OUTPUT, --output OUTPUT
(optional) /path/to/mutated/copy/of/manifest.csv
```

Example usage:
```
python src/split_manifest.py manifest.csv 'Phase'
```

```
python src/split_manifest.py manifest.csv 'Phase' -s 42 -sm 'stratified-shuffle'
```

```
python src/split_manifest.py manifest.csv 'Phase' -o split_manifest.csv -vs 0.1 -sm 'stratified-shuffle'
```
## `train.py --help`
```
usage: train.py [-h] [--weight-classes] [-bs BATCH_SIZE] [-e EPOCHS]
[-md MIN_DELTA] [-p PATIENCE] [-o OUTPUT]
{XceptionClassifier,ResNet50Model} manifest label_column
Train a Classification ML Model
positional arguments:
{XceptionClassifier,ResNet50Model}
The model to use for classification.
manifest path/to/manifest.csv
label_column Column name for label (y) from manifest file.
optional arguments:
-h, --help show this help message and exit
--weight-classes Flag that turns on class balancing (should not be used
with regression).
-bs BATCH_SIZE, --batch-size BATCH_SIZE
Training batch size (default: 32)
-e EPOCHS, --epochs EPOCHS
Number of epochs (default: 10000)
-md MIN_DELTA, --min-delta MIN_DELTA
Minimum change int the monitored quantity to qualify
as an improvement. Default: 0.001
-p PATIENCE, --patience PATIENCE
Number of epochs with no improvement after which
training will be stopped Default: 25
-o OUTPUT, --output OUTPUT
(optional) path/to/output/directory/
```

Example Usage:

```
python src/train.py XceptionClassifier manifest.csv Phase -e 1
```


## `regression_train.py --help`
```
usage: regression_train.py [-h] [-bs BATCH_SIZE] [-e EPOCHS] [-md MIN_DELTA]
[-p PATIENCE] [-o OUTPUT]
{MultiLabelLinearRegressor,XceptionRegressor}
manifest label_columns [label_columns ...]
Train a Regression ML Model
positional arguments:
{MultiLabelLinearRegressor,XceptionRegressor}
The model to use for regression.
manifest path/to/manifest.csv
label_columns Column(s) name for label (y) from manifest file.
optional arguments:
-h, --help show this help message and exit
-bs BATCH_SIZE, --batch-size BATCH_SIZE
Training batch size (default: 32)
-e EPOCHS, --epochs EPOCHS
Number of epochs (default: 10000)
-md MIN_DELTA, --min-delta MIN_DELTA
Minimum change int the monitored quantity to qualify
as an improvement. Default: 0.001
-p PATIENCE, --patience PATIENCE
Number of epochs with no improvement after which
training will be stopped Default: 25
-o OUTPUT, --output OUTPUT
(optional) path/to/output/directory/
```

Example usage:

```
python src/regression_train.py MultiLabelLinearRegressor tests/test_regression.csv Lattice_a -e 1
```

```
python src/regression_train.py XceptionRegressor tests/test_regression.csv Lattice_a Lattice_b Lattice_c -bs 128
```

## `eval.py --help`

```
usage: eval.py [-h] [-o OUTPUT] [-bs BATCH_SIZE]
trained_model manifest label_column
Evaluate a Classification ML Model
positional arguments:
trained_model /path/to/model_checkpoint.h5
manifest path/to/manifest.csv
label_column Column name for label (y) from manifest file.
optional arguments:
-h, --help show this help message and exit
-o OUTPUT, --output OUTPUT
(optional) path/to/output/directory/
-bs BATCH_SIZE, --batch-size BATCH_SIZE
Training batch size (default: 32)
```

Example usage:

```
python src/eval.py models/XceptionClassifier-0.0.0/2019-04-07-06-37/model_checkpoint.h5 manifest.csv Phase -o class_eval.csv
```

## `regression_eval.py --help`
```
usage: regression_eval.py [-h] [-o OUTPUT] [-bs BATCH_SIZE]
trained_model manifest label_columns
[label_columns ...]
Evaluate a Regression ML Model
positional arguments:
trained_model /path/to/model_checkpoint.h5
manifest path/to/manifest.csv
label_columns Column(s) name for label (y) from manifest file.
optional arguments:
-h, --help show this help message and exit
-o OUTPUT, --output OUTPUT
(optional) path/to/output/directory/
-bs BATCH_SIZE, --batch-size BATCH_SIZE
Training batch size (default: 32)
```

Example usage:

```
python src/regression_eval.py models/XceptionRegressor-0.0.0/2019-04-07-06-37_small/model_checkpoint.h5 manifest.csv Lattice_a Lattice_b Lattice_c -o regress_eval.csv
```

# FAQ

Q: Error: `Failed to load the native TensorFlow runtime.` (I don't have a GPU / Haven't set up GPU Drivers)
A: Try installing a non-gpu version of tensorflow: `pip install tensorflow`
2 changes: 2 additions & 0 deletions dev_requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
sphinx
pytest
Loading

0 comments on commit 47db19b

Please sign in to comment.