Smart Cache #4

amaiberg · 2013-10-10T04:38:10Z

Currently, not only the is the DataCache containing intermediate CASes stored in-memory, it does so inefficiently by keeping a duplicate of the entire CAS resulting from each execution. Obviously, this strategy won't scale for larger experiments, especially those with many phases and cross-opts.

A new "smart" cache is proposed to address this issue. The cache will be implemented as a trie using "subtraces" as its keys, and FeatureStructures as values. A "subtrace" is a 2-tuple composed of an input number, and an executed descriptor sequence. An input number designates the position of which is input is currently being processed. An executed descriptor sequence is a concatenation of all the component descriptors that form a linear "root-to-node" path to the current node being executed in the configuration space tree. A FeatureStructure is either an Entity type or an Annotation types that were added by a Uima AnalysisEngine.

A trie will be created for each of the input numbers, since they will have entirely different CASes. Each node in the trie will contain FeatureStructures that correspond to an incremental processing performed by an AnalysisEngine. To retrieve the CAS to be inserted into the current AnalysisEngine being executed, the executor will simply have to provide the "subtrace" associated with it to the trie. The trie will then traverse the "root-to-node" path that is given by the "subtrace," collecting the FeatureStructures stored therein and adding them to the CAS. The CAS produced from this process will represent the one leading up to the execution of the current AnalysisEngine.

@CollinM
@elmer-garduno

ghost assigned amaiberg Oct 10, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Smart Cache #4

Smart Cache #4

amaiberg commented Oct 10, 2013

Smart Cache #4

Smart Cache #4

Comments

amaiberg commented Oct 10, 2013