Skip to content

Latest commit

 

History

History
272 lines (214 loc) · 12.1 KB

cpp_api.md

File metadata and controls

272 lines (214 loc) · 12.1 KB

The DRAGNN C++ API & Multithreading Model

[TOC]

Elements of DRAGNN

The DRAGNN framework allows its users to easily create, train, and use sets of machine learning subsystems in a cohesive manner. To this end, the DRAGNN C++ code is centered around three main types of objects: Components, which represent ML subsystems; TransitionStates, which hold the state of a single inference; and ComputeSessions, which represent the overall state of a DRAGNN computation.

Components and TransitionStates

The fundamental concept behind DRAGNN is that it allows users to compose sets of ML subsystems in a single computation. Each of those subsystems is represented in the DRAGNN framweork by a pair of classes - a Component subclass and a TransitionState subclass. While the Component will generally contain the logic code for the subsystem, the state of a computation should be held in a TransitionState instance. This way, a batch of inferences can be represented by a set of TransitionStates that are acted on by the Component. As an example, the DRAGNN SyntaxNet backend has two parts - a SyntaxNetComponent and a SyntaxNetTransitionState. Here, the SyntaxNetComponent owns the transition system, feature extractor, and so forth, but the actual parser state for each sentence being examined is contained in the SyntaxNetTransitionState.

Note that the actual /inference/ is not done by the Component or TransitionState - Tensorflow handles that. The Component and TransitionState are instead responsible for holding state that would otherwise be difficult or impossible to represent with TensorFlow idioms.

Components also are responsible for keeping beam history (if applicable) - that is, the location of a given state in the beam at a given step - but a helper Beam class is provided if a beam is needed.

Finally, Components can also publish a set of Component-specific translators (see [TODO: link]), which can provide additional data to Components that execute later in the computation. For the SyntaxNetComponent, for instance, we provide translators like 'reduce-step', which returns the data associated with the parent of the requested token.

Translators

One of the most powerful and flexible features of DRAGNN is the ability to use activations from previous components as inputs to the current inference. This is done through Translator functions. Translator functions take a desired value - for example, a token index in a previous component - and convert that value into the batch, beam, and step indices that correspond to the location of the Tensorflow data for that value in that component These translations can be straightforward - the 'identity(N)' translator, for instance, returns the inputs to the inference performed on that transition state's data in the previous component at step N - or complex, like the 'parent-shift-reduce-step(N)'translator provided by the SyntaxNet component, which returns the data from the reduction step performed on the parent token of the token at step N.

Some translators are 'universal' - that is, they apply to any component. However, Components may declare a set of translators that they can provide that go beyond the universal translators. These backend-specific translators often provide richer access to data, or access in more meaningful ways.

For a complete list of translators provided in the DRAGNN baseline, please see [TODO: LINK ME].

ComputeSession

The ComputeSession contains the state of a single DRAGNN computation. It holds local, independent copies of Component objects, TransitionStates, and input data, making it a completely independent container for that computation. The ComputeSession object is also the basic API layer for DRAGNN - most external computation should use its interface rather than diving deeper into Components and TransitionStates.

ComputeSessions are created by ComputeSessionPools, which will handle all the relevant initialization and setup tasks for that ComputeSession. After a computation is complete, the ComputeSession should be returned to the Pool that created it; failing to do this will not leak resources, but will also cause the ComputeSessionPool to allocate more ComputeSessions than are necessary.

ComputeSessionPool

The ComputeSessionPool is a constructor, initializer, and storage object for ComputeSessions. When a pool is created, it is passed a MasterSpec specification that describes the DRAGNN graph that will be computed; when a ComputeSession is requested, the ComputeSessionPool will return one that is set up to compute based on the pool's MasterSpec.

When a DRAGNN computation is complete and the ComputeSession is no longer needed, the calling code should return the ComputeSession to the ComputeSessionPool; the pool will reset its internal state and reuse it. If this is complete, it limits the number of extant ComputeSessions to the number of parallell computations.

The ComputeSessionPool is threadsafe; DRAGNN multithreading generally should take place at the ComputeSession level (as discussed below).

The ComputeSession API

NOTE: All of the functionality described below is already wrapped in Tensorflow ops. This is for developers who want C++ access to the DRAGNN framework.

The core elements of the ComputeSession API are intended to allow calling code to collate a set of inputs for a Tensorflow inference using the internal state of a Component, take the result of that inference and feed it back into the Component so that the Component can advance its internal state, and then repeat this process until the Component indicates that it no longer needs to advance. This process is repeated for all Components, and when no Components are left the computation is complete.

The key features of the API are as follows. Note that most functions are keyed on the component name - this is the string 'ComponentSpec.name' from the ComponentSpec message that describes the component in question in the MasterSpec.

Data Extraction Functions

The following functions allow the caller to extract data from the internal state of a Component.

int GetInputFeatures(
      const string &component_name,
      std::function<int32 *(int num_items)> allocate_indices,
      std::function<int64 *(int num_items)> allocate_ids,
      std::function<float *(int num_items)> allocate_weights,
      int channel_id) const

GetInputFeatures extracts the fixed features from the given component for the given channel ID and places them in the memory allocated by the allocator functions. These allocator functions should take an int representing how many elements will be extracted, allocate backing memory to hold that many elements, and return a pointer. (This is used to wrap Tensorflow tensor allocation code and efficiently extract large amounts of data in the current op kernels).

int BulkGetInputFeatures(const string &component_name,
                         const BulkFeatureExtractor &extractor);

BulkGetInputFeatures extracts all fixed features, advances the Component via the oracle, and repeats the process until the component is terminal. This is intended to efficiently extract features (like, for instance, word embeddings). The passed BulkFeatureExtractor object contains allocator functions and formatting functions required to correctly lay out the data that is being extracted.

std::vector<LinkFeatures> GetTranslatedLinkFeatures(
      const string &component_name, int channel_id)

GetTranslatedLinkFeatures extracts /linked/ features for the given component and channel ID. Linked features are indices into previous components' data tensors; this function call will extract the raw data, translate it via the relevant Translator call, and return a set of filled-out LinkFeatures protos that indicate how to access the relevant data.

Component Advancement Functions

The following functions allow their caller to advance the state of a given Component.

void AdvanceFromOracle(const string &component_name)

The AdvanceFromOracle function advances the given component one step, according to whatever oracle it has.

AdvanceFromPrediction(const string &component_name,
                                     const float score_matrix[],
                                     int score_matrix_length)

Advances the given component based on the given matrix of scores. The scores are generally the outputs of a neural net, and should be padded to (batch size)x (max beam size), ordered so that each beam element's scores are contiguous.

bool IsTerminal(const string &component_name)

Returns true if all batch items in the given component report that they are final.

Computation Advancement Functions

void SetInputData(const std::vector<string> &data)

Passes a set of input data to the ComputeSession. Input data should be in the form of serialized protobuf messages (as can be seen in the graph builder test). The components that were used to construct the graph will have a certain type of proto that they expect - for instance, the SyntaxNetComponent expects Sentence protos - and will attempt to deserialize the input data into that form. Each element in the vector will become a batch element; batch sizes should be controlled by limiting the number of items passed into this function.

void InitializeComponentData(const string &component_name,
                                       int max_beam_size)

This function performs setup tasks on the given component, and requests that it set itself up using the given beam size. If there is a previous component, the previous component's final transition states will be passed to the requested component at this time.

void FinalizeData(const string &component_name)

This function essentially "completes" a component, forcing it to write out its current best prediction to the backing data for other components to use. This function should always be called on a component before calling InitializeComponentData on the next component in the sequence.

std::vector<string> GetSerializedPredictions()

This function completes a computation, taking the current set of predictions and data, re-serializing them, and returning them as output. Be sure that FinalizeData has been called on all components before this function is called, or the output predictions will be incomplete. Once this function has been called, the ComputeSession is done and can be returned to the pool - it has no further use. (ComputeSessions can be returned early, as well, but they will be reset and their internal state will be lost.)

The ComputeSessionPool API

NOTE: All of the functionality described below is already wrapped in Tensorflow ops. This is for developers who want C++ access to the DRAGNN framework.

The ComputeSessionPool creates and manages completed ComputeSessions. It has functions to provide and take ownership of ComputeSession objects.

ComputeSessionPool(const MasterSpec &master_spec,
                     const GridPoint &hyperparams)

This constructs a CommputeSessionPool with the associated MasterSpec and hyperparameter grid point. All ComputeSessions that this pool will create will perform computations according to this master spec. If the master spec must be changed, then destroy this pool and create another one. std::unique_ptr<ComputeSession> GetSession(); Creates (or reuses, if possible) a ComputeSession based on the pool's master spec. If the MasterSpec is ill-formed, this method will CHECK-fail. Ownership of the compute session is passed to the caller; for efficiency, the owned pointer should be returned to the pool via ReturnSession().

void ReturnSession(std::unique_ptr<ComputeSession> session);

Returns a ComputeSession object to the pool. This ComputeSession object will be reset and re-used.

Multithreading

DRAGNN was designed to be thread-safe at the ComputeSession level. It should be fully safe to have multiple threads running computations, pulling ComputeSessions from the same ComputeSessionPool and executing them independently. Const methods in the ComputeSession are also thread-safe, so (for instance) multiple threads could be used to extract each channel's fixed features (and this does happen at the Tensorflow level) - but only one thread could be used when calling BulkFixedFeatures.