Skip to content

NerlPlanner

David edited this page May 18, 2024 · 19 revisions

NerlPlanner

NerlPlanner is a tool that assists with generating configuration files of type JSON for Nerlnet.
There are 3 types of configurations files that configure a Nerlnet distribution ML experiment:

1. Distributed Configuration JSON File

Distributed Configuration File (dc_<name>.json), describes the layout of a Nerlnet distributed machine learning Cluster. It consists of

  • Worker models definition (multiple kinds of models are supported),
  • Devices communication properties (IPv4 and Port)
  • Entities of Nerlnet (Client, Router, Source) and their allocations to devices.

The first step of creating a DC file is generating worker models. Using the Nerlplanner worker model generating dialog.

Click on create/edit worker .json and this dialog menu will appear. Follow the steps: 1.1 Worker Model Definition Json File. 1.2 Import model to DC file.

1.1 Worker Model Definition Json File

This is the worker dialog menu which can be used for generating worker models json files.
A json model can also be imported to this dialog for editing and saved as a new model.

1.1.1 File Section:

  • Load json: A worker configuration json to load former network model.
  • Select Json File Output Directory.
  • Name of worker network model json file.

1.1.2 Model Definition

This section is responsible for the definition of model.

1.1.2.1 Model type

Model type allows user to select between several pre-defined models. A custom neural network is the NN type where user should config all layers of the network. Pre-defined projects add layers before and after hidden layer to achieve the desired functionality (E.g., classification, regression or text prediction network types).

1.1.2.2 Layer Sizes

Declare the number of neurons of each layer. Each layer must have a size definition. Sizes of layers are separated by ",". There is a simple layer size which is an integer positive value that defines the number of neurons for a 1D dimension layer. In addition, there is a complex layer size string which declares a multi-dimensional layers, E.g., CNN layers. Example of a CNN which is a mix of complex and simple layer sizes representation:
First layer complex size of CNN: "128x128k3x3s2x2x1p1x1".
Second layer size: "4096".
Third layer size: "128".
Last layer size: "4".
The input string to layer sizes should be: "128x128k3x3s2x2x1p1x1,4096,128,4"
Example of a simple DNN layer sizes list:
32,16,4,2,1.
Example of a simple Autoencoder NN list:
32,16,4,16,32.

1.1.2 Optimizer Definitions

Parameters of the the optimization process:

  • Learning Rate (float)
  • Epochs (int)
  • Optimizer Type and Optimizer Args
  • Loss Method

1.1.3 Distributed System Configurations

  • Infra Type, currently only OpenNN is supported.
  • Distributed System Type: none is independent workers.
    Federated architectures can be selected from the list.
  • Distributed System Token: A token that defines a cluster that consists of workers that communicates with one another. E.g., federated learning: parameter server and workers communicates and token is the handshake between server and its workers.
  • Distributed System Arguments: Custom arguments to pass for distributed ML cluster depends on system type.
    TODO add link to page that describes distributed system types.

1.2 Import model to DC file

Use the browse button to select the file generated by the worker model definition dialog.
Then give the worker a name and click on add.

A worker can be duplicated easily by selecting an existing model from the list box of workers. Once a worker is selected click on load, fill a new and unique name fo the worker and click on add. The model graph can be viewed using the "show worker model" button.

1.3 Default Settings and Special Entities Parameters

Complete settings and special entities parameters and click on save button of each section. Types: Frequency(int), BatchSize(int), Port(int). The port is of the machine that the special entity is intended to run on. The entity is bound to the machine in the "Devices" section.

1.4 Add Entities

In this step entities are added through the section of "Entities".
In this section there is a right pane of 3 list boxes that accumulate entities by their type.
There are 3 types of entities in Nerlnet: Client, Source and Router.
This is a figure of the "Entities" section in Nerlplanner:

1.4.1 Client

The client entity is a communication layer that hosts multiple workers (from 1.1).
First a client is generated filling the name and port and clicking on add.
Then workers can be bound to client as follows:

  • Select the client from the right pane clients list box.
  • Click on the left pane load, to load the current client.
  • Choose available workers from the drop-down list and click on add.
  • Client's workers will appear in workers list box.
    Worker's list is updated by client's workers each time that a client is loaded.

1.4.2 Source

Source is an entity that streams data toward the worker for training or prediction, depends on current phase of the Nerlnet cluster.
Currently, only a CSV source type is supported.
Source supports several policies of sending batches to its target workers:

  • Round Robin: a batch is sent to a worker chosen by RR method.
  • Random: a batch is sent to a randomly chosen worker.
  • Casting: a batch is sent to all workers.

1.4.3 Router

Router is an entity that connects entities together in Nerlnet.
The router can help user to form communication graphs layouts that influence the path that data flows through until it reaches the worker.
Routers collect statistics of messages and batches that are routed.

Experiment Flow JSON File

TODO - update when generating is supported by NP

Communication Map Json File

TODO - update when generating is supported by NP

Noa and Ohad - TODO continue the explanations as I did about fields (each one of them).