The document describes how to develop tensorflow estimator model with DLRover trainer.
Tensorflow Estimator encapsulate Training, Evaluation, Prediction and Export for serving actions. In DLrover, both custome estimators and pre-made estimators are supported.
A DLrover program with Estimator typically consists of the following four steps:
Each Column
identifies a feature name, its type and whether it is label.
The following snippet defines two feature columns in the
example.
train_set = {
"reader": FileReader("test.data"),
"columns": (
Column.create( # type: ignore
name="x",
dtype="float32",
is_label=False,
),
Column.create( # type: ignore
name="y",
dtype="float32",
is_label=True,
),
),
}
The first feature is x
and its type is float32
.
The second feature is y
and is label. Its type is float32
.
dlrover.trainer
helps build input_fn
for train set and test set with those columns.
In some case, the reader provided by DLrover trainer doesn't satisfy user's need. User need to develop custom reader and set it in the conf.
One necessary arguments in the __init__
method is path.
The key funcion is read_data_by_index_range
and count_data
. count_data
is used for
konwing how many dataset are there before training. During training, read_data_by_index_range
will be called to get train data.
from dlrover.trainer.tensorflow.reader.base_reader import ElasticReader
class FakeReader(ElasticReader):
def __init__(self, path=None):
self.count = 1
super().__init__(path=path)
def count_data(self):
self._data_nums = 10
def read_data_by_index_range(self, start_index, end_index):
data = []
for i in range(start_index, end_index):
x = np.random.randint(1, 1000)
y = 2 * x + np.random.randint(1, 5)
d = "{},{}".format(x, y)
data.append(d)
return data
you need to initial you reader and set it in the conf. Here is an example
eval_set = {"reader": FakeReader("./eval.data"), "columns": train_set["columns"]}
The key funcion is iterator
. During training, iterator
will be called to get train data.
class Reader:
def __init__(
self,
path=None,
batch_size=None
):
pass
def get_data(self):
# you custom code
while True:
yield "1,1"
def iterator(self):
while True:
for d in self.get_data():
yield d
you need to initial you reader and set it in the conf. Here is an example
eval_set = {"reader": Reader("./eval.data"), "columns": train_set["columns"]}
The heart of every Estimator—whether pre-made or custom—is its model function, model_fn,
which is a method that builds graphs for training, evaluation, and prediction.
In dlrover.trainer
, we assume the Estimator is a custom estimator.
And pre-made estimators should be converted to custom estimator with little overhead.
When relying on a custom Estimator, you must write the model function yourself. Refer the tutorial.
You can convert an existing pre-made estimators by writing an Adaptor to fit with dlrover.trainer
.
As we can see, the model_fn is the key part of estimator.
When training and evaluating, the model_fn is called with different mode and the graph is returned.
Thus, you can define a custom estimator in which model_fn function acts as a wrapper for pre-made estimator model_fn.
In the example of DeepFMAdaptor,
DeepFMEstimator
in deepctr.estimator.models
is a pre-made estimator.
from deepctr.estimator.models.deepfm import DeepFMEstimator
class DeepFMAdaptor(tf.estimator.Estimator):
"""Adaptor"""
def model_fn(self, features, labels, mode, params):
'''
featurs: type dict, key is the feature name and value is tensor.
labels: type tensor, corresponding to the colum which `is_label` equals True.
'''
x = features["x"]
x_buckets = feature_column.bucketized_column(x, boundaries=[1, 3, 5])
linear_feature_columns = [x_buckets]
dnn_feature_columns = [x]
self.estimator = DeepFMEstimator(
linear_feature_columns,
dnn_feature_columns,
task=params["task"],
)
return self.estimator._model_fn(
features, labels, mode, self.run_config
)
Estimators by default save checkpoints with variable names rather than the
object graph described in the Checkpoint guide.
The checkpoint hook is added by dlrover.trainer.estimator_executor
.
Estimators export SavedModels through tf.Estimator.export_saved_model.
The exporter hook is added by dlrover.trainer.estimator_executor
.
When the job is launched, dlrover.trainer.estimator_executor
parses the conf and builds input_fn,
estimator and related hooks.
You can install dlrover in your image.
pip install dlrover[tensorflow] - U
Or you also can build your image from the dlrover base image.
FROM registry.cn-hangzhou.aliyuncs.com/intell-ai/dlrover:deeprec_criteo_v1
COPY model_zoo /home/model_zoo
docker build -t ${IMAGE_NAME} -f ${DockerFile} .
docker push ${IMAGE_NAME}
We need to set the command of ps and worker to train the model like the DeepCTR example
command:
- /bin/bash
- -c
- " cd ./examples/tensorflow/criteo_deeprec \
&& python -m dlrover.trainer.entry.local_entry \
--platform=Kubernetes --conf=train_conf.TrainConf \
--enable_auto_scaling=True"
Then, we can submit the job by kubectl
.
kubectl -n dlrover apply -f ${JOB_YAML_FILE}