Skip to content

Commit

Permalink
Minor spelling tweaks (mlcommons#39)
Browse files Browse the repository at this point in the history
  • Loading branch information
brettkoonce authored and deepakn94 committed May 13, 2018
1 parent aa596ff commit cdb75f0
Show file tree
Hide file tree
Showing 8 changed files with 18 additions and 18 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ Each reference implementation provides the following:
* A Dockerfile which can be used to run the benchmark in a container.
* A script which downloads the appropriate dataset.
* A script which runs and times training the model.
* Documentaiton on the dataset, model, and machine setup.
* Documentation on the dataset, model, and machine setup.

# Running Benchmarks

Expand Down
4 changes: 2 additions & 2 deletions benchmark_readme_template.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,11 +26,11 @@ Cite paper describing model plus any additional attribution requested by code au
### List of layers
Brief summary of structure of model
### Weight and bias initialization
How are weights and biases intialized
How are weights and biases initialized
### Loss function
Name/description of loss function used
### Optimizer
Name of optimzier used
Name of optimizer used
# 5. Quality
### Quality metric
What is the target quality metric
Expand Down
2 changes: 1 addition & 1 deletion image_classification/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ For more information on preprocessing, see this file and documentation:
https://github.com/tensorflow/models/tree/master/research/inception#getting-started

### Training and test data separation
This is proivded by the imagenet dataset and original authors.
This is provided by the Imagenet dataset and original authors.

### Training data order
Each epoch goes over all the training data, shuffled every epoch.
Expand Down
4 changes: 2 additions & 2 deletions object_detection/caffe2/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,11 +74,11 @@ Box: Log loss for true class.
Mask: per-pixel sigmoid, average binary cross-entropy loss.

### Optimizer
Momentum SGD. Weight decay of 0.0001, momenum of 0.9.
Momentum SGD. Weight decay of 0.0001, momentum of 0.9.

# 5. Quality
### Quality metric
As Mask R-CNN can provide both boxes and masks, we evalute on both box and mask mAP.
As Mask R-CNN can provide both boxes and masks, we evaluate on both box and mask mAP.

### Quality target
Box mAP of 0.377, mask mAP of 0.339
Expand Down
10 changes: 5 additions & 5 deletions reinforcement/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,22 +87,22 @@ This benchmark includes both the environment and training for 9x9 go. There are

### Structure

This task has a non-trivial network structure, including a search tree. A good overview of the sructure can be found here: https://medium.com/applied-data-science/alphago-zero-explained-in-one-diagram-365f5abf67e0.
This task has a non-trivial network structure, including a search tree. A good overview of the structure can be found here: https://medium.com/applied-data-science/alphago-zero-explained-in-one-diagram-365f5abf67e0.

### Weight and bias initialization and Loss Function
Network weights are initialized randomly. Initializion and loss are described here;
Network weights are initialized randomly. Initialization and loss are described here;
["Mastering the Game of Go with Deep Neural Networks and Tree Search"](https://www.nature.com/articles/nature16961)

### Optimizer
We use a MomentumOptimizer to train the primary network.

# 4. Quality

Due to the difficulty of training a highly proficient go model, our quality metric and termination criteria is based on predicting moves from human reference games. Currently published results indicate that it takes weeks of time and/or cluster sized resources to achieve a high level of play. Given more liminited time and resources, it is possible to predict a significant number of moves from professional or near-professional games.
Due to the difficulty of training a highly proficient go model, our quality metric and termination criteria is based on predicting moves from human reference games. Currently published results indicate that it takes weeks of time and/or cluster sized resources to achieve a high level of play. Given more limited time and resources, it is possible to predict a significant number of moves from professional or near-professional games.

### Quality metric

Provided in with this benchmark are records of human games and the quality metric is the percent of the time the model chooses the same move the human chose in each position. Each position is attempted twice by the model (keep in mind the model's choice is non-deterministic). The metric is calculated as the number of correct predictions devided by the number of predictions attempted.
Provided in with this benchmark are records of human games and the quality metric is the percent of the time the model chooses the same move the human chose in each position. Each position is attempted twice by the model (keep in mind the model's choice is non-deterministic). The metric is calculated as the number of correct predictions divided by the number of predictions attempted.

The particular games we use are from Iyama Yuta 6 Title Celebration, between contestants Murakawa Daisuke, Sakai Hideyuki, Yamada Kimio, Hyakuta Naoki, Yuki Satoshi, and Iyama Yuta.

Expand All @@ -121,7 +121,7 @@ Informally, we have observed that quality should improve roughly linearly with t
36h 24%
60h 34%

Note that quality does not necessarily monotically increase.
Note that quality does not necessarily monotonically increase.

### Evaluation frequency
Evaluation should be preformed for every model which is trained (regardless if it wins the "model evaluation" round).
Expand Down
2 changes: 1 addition & 1 deletion reinforcement/tensorflow/minigo/yomi/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Yomi Server

[Yomi](https://senseis.xmp.net/?Yomi) means reading in Japanese Go jargon, and
in the context of Minigo, is the name of a kubrnetes workload designed to play
in the context of Minigo, is the name of a kubernetes workload designed to play
games against various engines (mostly itself).

TODO(kashomon): describe how this works.
Expand Down
6 changes: 3 additions & 3 deletions translation/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,10 +90,10 @@ We use WMT17 ende training for tranding, and we evaluate using the WMT 2014 Engl
We combine all the files together and subtokenize the data into a vocabulary.

### Training and test data separation
We use the trian and evaluation sets provided explicity by the authors.
We use the train and evaluation sets provided explicitly by the authors.

### Training data order
We split the data into 100 blocks, and we shuffle interneraly in the blocks.
We split the data into 100 blocks, and we shuffle internally in the blocks.


# 4. Model
Expand All @@ -103,7 +103,7 @@ This is an implementation of the Transformer translation model as described in t

### Structure

Transformer is a neural network architecture that solves sequence to sequence problems using attention mechanisms. Unlike traditional neural seq2seq models, Transformer does not involve recurrent connections. The attention mechanism learns dependencies between tokens in two sequences. Since attention weights apply to all tokens in the sequences, the Tranformer model is able to easily capture long-distance depedencies.
Transformer is a neural network architecture that solves sequence to sequence problems using attention mechanisms. Unlike traditional neural seq2seq models, Transformer does not involve recurrent connections. The attention mechanism learns dependencies between tokens in two sequences. Since attention weights apply to all tokens in the sequences, the Tranformer model is able to easily capture long-distance dependencies.

Transformer's overall structure follows the standard encoder-decoder pattern. The encoder uses self-attention to compute a representation of the input sequence. The decoder generates the output sequence one token at a time, taking the encoder output and previous decoder-outputted tokens as inputs.

Expand Down
6 changes: 3 additions & 3 deletions translation/tensorflow/transformer/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Transformer Translation Model
This is an implementation of the Transformer translation model as described in the [Attention is All You Need](https://arxiv.org/abs/1706.03762) paper. Based on the code provided by the authors: [Transformer code](https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/models/transformer.py) from [Tensor2Tensor](https://github.com/tensorflow/tensor2tensor).

Transformer is a neural network architecture that solves sequence to sequence problems using attention mechanisms. Unlike traditional neural seq2seq models, Transformer does not involve recurrent connections. The attention mechanism learns dependencies between tokens in two sequences. Since attention weights apply to all tokens in the sequences, the Tranformer model is able to easily capture long-distance depedencies.
Transformer is a neural network architecture that solves sequence to sequence problems using attention mechanisms. Unlike traditional neural seq2seq models, Transformer does not involve recurrent connections. The attention mechanism learns dependencies between tokens in two sequences. Since attention weights apply to all tokens in the sequences, the Tranformer model is able to easily capture long-distance dependencies.

Transformer's overall structure follows the standard encoder-decoder pattern. The encoder uses self-attention to compute a representation of the input sequence. The decoder generates the output sequence one token at a time, taking the encoder output and previous decoder-outputted tokens as inputs.

Expand All @@ -26,7 +26,7 @@ The model also applies embeddings on the input and output tokens, and adds a con

## Walkthrough

Below are the commands for running the Transformer model. See the [Detailed instrutions](#detailed-instructions) for more details on running the model.
Below are the commands for running the Transformer model. See the [Detailed instructions](#detailed-instructions) for more details on running the model.

```
PARAMS=big
Expand Down Expand Up @@ -226,7 +226,7 @@ Aside from the main file to train the Transformer model, we provide other script

[data_download.py](data_download.py) downloads and extracts data, then uses `Subtokenizer` to tokenize strings into arrays of int IDs. The int arrays are converted to `tf.Examples` and saved in the `tf.RecordDataset` format.

The data is downloaded from the Workshop of Machine Transtion (WMT) [news translation task](http://www.statmt.org/wmt17/translation-task.html). The following datasets are used:
The data is downloaded from the Workshop of Machine Translation (WMT) [news translation task](http://www.statmt.org/wmt17/translation-task.html). The following datasets are used:

* Europarl v7
* Common Crawl corpus
Expand Down

0 comments on commit cdb75f0

Please sign in to comment.