Skip to content

Commit

Permalink
Merge pull request #15 from jacquelinegarrahan/keras
Browse files Browse the repository at this point in the history
Keras
  • Loading branch information
jacquelinegarrahan authored Oct 15, 2020
2 parents b108c46 + 26ebec0 commit bca0dc3
Show file tree
Hide file tree
Showing 9 changed files with 278 additions and 59 deletions.
29 changes: 23 additions & 6 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,8 +101,6 @@ Models and variables may be constructed using a yaml configuration file. The con

The model section is used for the initialization of model classes. The `model_class` entry is used to specify the model class to initialize. The `model_from_yaml` method will attempt to import the specified class. Additional model-specific requirements may be provided. These requirements will be checked before model construction. Model keyword arguments may be passed via the config file or with the function kwarg `model_kwargs`. All models are assumed to accept `input_variables` and `output_variables` as keyword arguments.

In order to use the `KerasModel` execution class, instructions must be provided to format inputs for model execution and parse the model output. Input formatting in the yaml uses the `order` and `shape` entries to format the model input. The output format requires indexing for each output variable. Similar functionality might be implemented for custom model classes; however, this is not supported out-of-the-box with `lume-model`.

The below example outlines the specification for a model compatible with the `lume-model` keras/tensorflow toolkit.

```yaml
Expand All @@ -121,10 +119,8 @@ model:
shape: [1, 4]
output_format:
type: softmax
indices:
Species: [0]
```

```

Variables are constructed the minimal data requirements for inputs/outputs.

Expand Down Expand Up @@ -173,8 +169,29 @@ The `KerasModel` packaged in the toolkit will be compatible with models saved us

### Development requirements:
- The model must be trained using the custom scaling layers provided in `lume_model.keras.layers` OR using preprocessing layers packaged with Keras OR the custom layers must be defined during build and made accessible during loading by the user. Custom layers are not supported out-of-the box by this toolkit.
- The keras model must use named input layers such that the model will accept a dictionary input OR the `KerasModel` must be subclassed and the `format_input` and `format_output` member functions must be overwritten with proper formatting of model input from a dictionary mapping input variable names to values and proper output parsing into a dictionary, respectively.
- The keras model must use named input layers such that the model will accept a dictionary input OR the `KerasModel` must be subclassed and the `format_input` and `format_output` member functions must be overwritten with proper formatting of model input from a dictionary mapping input variable names to values and proper output parsing into a dictionary, respectively. This will require use of the Keras functional API for model construction.

An example of a model built using the functional API is given below:

```python

sepal_length_input = keras.Input(shape=(1,), name="SepalLength")
sepal_width_input = keras.Input(shape=(1,), name="SepalWidth")
petal_length_input = keras.Input(shape=(1,), name="PetalLength")
petal_width_input = keras.Input(shape=(1,), name="PetalWidth")
inputs = [sepal_length_input, sepal_width_input, petal_length_input, petal_width_input]
merged = keras.layers.concatenate(inputs)
dense1 = Dense(8, activation='relu')(merged)
output = Dense(3, activation='softmax', name="Species")(dense1)

# Compile model
model = keras.Model(inputs=inputs, outputs=[output])
optimizer = tf.keras.optimizers.Adam()
model.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=['accuracy'])

```

Models built in this way will accept inputs in dictionary form mapping variable name to a numpy array of values.

### Configuration file
The KerasModel can be instantiated using the utility function `lume_model.utils.model_from_yaml` method.
Expand Down
198 changes: 198 additions & 0 deletions examples/IrisTraining.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,198 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# iris example"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.datasets import load_iris\n",
"from tensorflow import keras\n",
"import tensorflow as tf\n",
"\n",
"from tensorflow.keras.models import Sequential\n",
"from tensorflow.keras.layers import Dense, Flatten\n",
"from tensorflow.keras.utils import to_categorical\n",
"from sklearn.preprocessing import LabelEncoder\n",
"import pandas as pd\n",
"iris = load_iris()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"iris[\"data\"][0].shape"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data = pd.DataFrame(iris.data, columns=iris.feature_names)\n",
"data.columns = [\"SepalLength\", \"SepalWidth\", \"PetalLength\", \"PetalWidth\"]\n",
"\n",
"data[\"Species\"] = iris.target\n",
"data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"train_dataset = data.sample(frac=0.8,random_state=0)\n",
"test_dataset = data.drop(train_dataset.index)\n",
"train_labels = train_dataset.pop('Species')\n",
"test_labels = test_dataset.pop('Species')\n",
"train_dataset.keys()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"# encode class values as integers\n",
"encoder = LabelEncoder()\n",
"encoder.fit(train_labels)\n",
"encoded_Y = encoder.transform(train_labels)\n",
"\n",
"# convert integers to dummy variables (i.e. one hot encoded)\n",
"dummy_y = to_categorical(encoded_Y)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
" # define model\n",
"def build_model():\n",
" # create model\n",
" sepal_length_input = keras.Input(shape=(1,), name=\"SepalLength\")\n",
" sepal_width_input = keras.Input(shape=(1,), name=\"SepalWidth\")\n",
" petal_length_input = keras.Input(shape=(1,), name=\"PetalLength\")\n",
" petal_width_input = keras.Input(shape=(1,), name=\"PetalWidth\")\n",
" inputs = [sepal_length_input, sepal_width_input, petal_length_input, petal_width_input]\n",
" merged = keras.layers.concatenate(inputs)\n",
" dense1 = Dense(8, activation='relu')(merged)\n",
" output = Dense(3, activation='softmax', name=\"Species\")(dense1)\n",
"\n",
" # Compile model\n",
" model = keras.Model(inputs=inputs, outputs=[output])\n",
" optimizer = tf.keras.optimizers.Adam()\n",
" model.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=['accuracy'])\n",
" return model\n",
"\n",
"model = build_model()\n",
"keras.utils.plot_model(model, \"my_first_model_with_shape_info.png\", show_shapes=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"train_stats = train_dataset.describe()\n",
"train_stats = train_stats.transpose()\n",
"train_stats"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"train_x = train_dataset.to_dict(\"series\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"early_stop = keras.callbacks.EarlyStopping(monitor='val_loss', patience=15)\n",
"\n",
"history = model.fit(train_x, dummy_y, epochs=1000,\n",
" validation_split = 0.2, verbose=1, callbacks=[early_stop])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model.save(\"files/iris_model.h5\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model.input_names"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model.output_names"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.9"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
12 changes: 1 addition & 11 deletions examples/files/iris_config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,20 +3,10 @@ model:
model_class: lume_model.keras.KerasModel
requirements:
tensorflow: 2.3.1
args:
kwargs:
model_file: examples/files/iris_model.h5
input_format:
order:
- SepalLength
- SepalWidth
- PetalLength
- PetalWidth
shape: [1, 4]
output_format:
type: softmax
indices:
Species: [0]


input_variables:
SepalLength:
Expand Down
Binary file modified examples/files/iris_model.h5
Binary file not shown.
27 changes: 26 additions & 1 deletion lume_model/keras/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,32 @@ The `KerasModel` packaged in the toolkit will be compatible with models saved us

## Development requirements:
- The model must be trained using the custom scaling layers provided in `lume_model.keras.layers` OR using preprocessing layers packaged with Keras OR the custom layers must be defined during build and made accessible during loading by the user. Custom layers are not supported out-of-the box by this toolkit.
- The keras model must use named input layers such that the model will accept a dictionary input OR the `KerasModel` must be subclassed and the `format_input` and `format_output` member functions must be overwritten with proper formatting of model input from a dictionary mapping input variable names to values and proper output parsing into a dictionary, respectively.
- The keras model must use named input layers such that the model will accept a dictionary input OR the `KerasModel` must be subclassed and the `format_input` and `format_output` member functions must be overwritten with proper formatting of model input from a dictionary mapping input variable names to values and proper output parsing into a dictionary, respectively. This will require use of the Keras functional API for model construction.

An example of a model built using the functional API is given below:

```python
from tensorflow import keras
from tensorflow.keras.layers import Dense
import tensorflow as tf

sepal_length_input = keras.Input(shape=(1,), name="SepalLength")
sepal_width_input = keras.Input(shape=(1,), name="SepalWidth")
petal_length_input = keras.Input(shape=(1,), name="PetalLength")
petal_width_input = keras.Input(shape=(1,), name="PetalWidth")
inputs = [sepal_length_input, sepal_width_input, petal_length_input, petal_width_input]
merged = keras.layers.concatenate(inputs)
dense1 = Dense(8, activation='relu')(merged)
output = Dense(3, activation='softmax', name="Species")(dense1)

# Compile model
model = keras.Model(inputs=inputs, outputs=[output])
optimizer = tf.keras.optimizers.Adam()
model.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=['accuracy'])

```

Models built in this way will accept inputs in dictionary form mapping variable name to a numpy array of values.

## Configuration file
The KerasModel can be instantiated using the utility function `lume_model.utils.model_from_yaml` method.
Expand Down
40 changes: 19 additions & 21 deletions lume_model/keras/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,17 +33,15 @@ def __init__(
model_file: str,
input_variables: Dict[str, InputVariable],
output_variables: Dict[str, OutputVariable],
input_format: dict,
output_format: dict,
input_format: dict = {},
output_format: dict = {},
) -> None:
"""Initializes the model and stores inputs/outputs.
Args:
model_file (str): Path to model file generated with keras.save()
input_variables (List[InputVariable]): list of model input variables
output_variables (List[OutputVariable]): list of model output variables
input_format (dict): Instructions for building model input
output_format (dict): Instructions for parsing model ouptut
"""

Expand All @@ -57,7 +55,7 @@ def __init__(
# load model in thread safe manner
self._thread_graph = tf.Graph()
with self._thread_graph.as_default():
self.model = load_model(
self._model = load_model(
model_file,
custom_objects={
"ScaleLayer": ScaleLayer,
Expand Down Expand Up @@ -89,7 +87,7 @@ def evaluate(self, input_variables: List[InputVariable]) -> List[OutputVariable]

# call prediction in threadsafe manner
with self._thread_graph.as_default():
model_output = self.model.predict(formatted_input)
model_output = self._model.predict(formatted_input)

output = self.parse_output(model_output)

Expand Down Expand Up @@ -161,37 +159,37 @@ def _prepare_outputs(self, predicted_output: dict):
return list(self.output_variables.values())

def format_input(self, input_dictionary: dict):
"""Formats input to be fed into model
"""Formats input to be fed into model. For the base KerasModel, inputs should
be assumed in dictionary format.
Args:
input_dictionary (dict): Dictionary mapping input to value.
"""
formatted_dict = {}
for input_variable, value in input_dictionary.items():
if isinstance(value, (float, int)):
formatted_dict[input_variable] = np.array([value])
else:
formatted_dict[input_variable] = [value]

vector = []
for item in self._input_format["order"]:
vector.append(input_dictionary[item])

# Convert to numpy array and reshape
vector = np.array(vector)
vector = vector.reshape(tuple(self._input_format["shape"]))

return vector
return formatted_dict

def parse_output(self, model_output):
"""Parses model output to create dictionary variable name -> value
"""Parses model output to create dictionary variable name -> value. This assumes
that outputs have been labeled during model creation.
Args:
model_output (np.ndarray): Raw model output
"""
output_dict = {}

if self._output_format["type"] == "softmax":
for value, idx in self._output_format["indices"].items():
for idx, output_name in enumerate(self._model.output_names):
softmax_output = list(model_output[idx])
output_dict[value] = softmax_output.index(max(softmax_output))
output_dict[output_name] = softmax_output.index(max(softmax_output))

if self._output_format["type"] == "raw":
for value, idx in self._output_format["indices"].items():
output_dict[value] = model_output[idx]
for idx, output_name in enumerate(self._model.output_names):
output_dict[output_name] = model_output[idx]

return output_dict
Loading

0 comments on commit bca0dc3

Please sign in to comment.