-
Notifications
You must be signed in to change notification settings - Fork 7
NerlPlanner
NerlPlanner is a tool that assists with generating configuration files of type JSON for Nerlnet.
There are 3 types of configurations files that configure a Nerlnet distribution ML experiment:
Distributed Configuration File (dc_<name>.json), describes the layout of a Nerlnet distributed machine learning Cluster. It consists of
- Worker models definition (multiple kinds of models are supported),
- Devices communication properties (IPv4 and Port)
- Entities of Nerlnet (Client, Router, Source) and their allocations to devices.
The first step of creating a DC file is generating worker models. Using the Nerlplanner worker model generating dialog.
This is the worker dialog menu which can be used for generating worker models json files.
A json model can also be imported to this dialog for editing and saved as a new model.
- Load json: A worker configuration json to load former network model.
- Select Json File Output Directory.
- Name of worker network model json file.
This section is responsible for the definition of model.
Model type allows user to select between several pre-defined models. A custom neural network is the NN type where user should config all layers of the network. Pre-defined projects add layers before and after hidden layer to achieve the desired functionality (E.g., classification, regression or text prediction network types).
Declare the number of neurons of each layer. Each layer must have a size definition. Sizes of layers are separated by ",".
There is a simple layer size which is an integer positive value that defines the number of neurons for a 1D dimension layer. In addition, there is a complex layer size string which declares a multi-dimensional layers, E.g., CNN layers.
Example of a CNN which is a mix of complex and simple layer sizes representation:
First layer complex size of CNN: "128x128k3x3s2x2x1p1x1".
Second layer size: "4096".
Third layer size: "128".
Last layer size: "4".
The input string to layer sizes should be: "128x128k3x3s2x2x1p1x1,4096,128,4"
Example of a simple DNN layer sizes list:
32,16,4,2,1.
Example of a simple Autoencoder NN list:
32,16,4,16,32.
Parameters of the the optimization process:
- Learning Rate (float)
- Epochs (int)
- Optimizer Type and Optimizer Args
- Loss Method
- Infra Type, currently only OpenNN is supported.
- Distributed System Type: none is independent workers.
Federated architectures can be selected from the list. - Distributed System Token: A token that defines a cluster that consists of workers that communicates with one another. E.g., federated learning: parameter server and workers communicates and token is the handshake between server and its workers.
- Distributed System Arguments: Custom arguments to pass for distributed ML cluster depends on system type.
TODO add link to page that describes distributed system types.
Use the browse button to select the file generated by the worker model definition dialog.
Then give the worker a name and click on add.
In this step entities are added through the section of "Entities".
In this section there is a right pane of 3 list boxes that accumulate entities by their type.
There are 3 types of entities in Nerlnet: Client, Source and Router.
This is a figure of the "Entities" section in Nerlplanner:
The client entity is a communication layer that hosts multiple workers (from 1.1).
First a client is generated filling the name and port and clicking on add.
Then workers can be bound to client as follows:
- Select the client from the right pane clients list box.
- Click on the left pane load, to load the current client.
- Choose available workers from the drop-down list and click on add.
- Client's workers will appear in workers list box.
Worker's list is updated by client's workers each time that a client is loaded.
Source is an entity that streams data toward the worker for training or prediction, depends on current phase of the Nerlnet cluster.
Currently, only a CSV source type is supported.
Source supports several policies of sending batches to its target workers:
- Round Robin: a batch is sent to a worker chosen by RR method.
- Random: a batch is sent to a randomly chosen worker.
- Casting: a batch is sent to all workers.
Router is an entity that connects entities together in Nerlnet.
The router can help user to form communication graphs layouts that influence the path that data flows through until it reaches the worker.
Routers collect statistics of messages and batches that are routed.
TODO - update when generating is supported by NP
TODO - update when generating is supported by NP
Noa and Ohad - TODO continue the explanations as I did about fields (each one of them).