You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, not only the is the DataCache containing intermediate CASes stored in-memory, it does so inefficiently by keeping a duplicate of the entire CAS resulting from each execution. Obviously, this strategy won't scale for larger experiments, especially those with many phases and cross-opts.
A new "smart" cache is proposed to address this issue. The cache will be implemented as a trie using "subtraces" as its keys, and FeatureStructures as values. A "subtrace" is a 2-tuple composed of an input number, and an executed descriptor sequence. An input number designates the position of which is input is currently being processed. An executed descriptor sequence is a concatenation of all the component descriptors that form a linear "root-to-node" path to the current node being executed in the configuration space tree. A FeatureStructure is either an Entity type or an Annotation types that were added by a Uima AnalysisEngine.
A trie will be created for each of the input numbers, since they will have entirely different CASes. Each node in the trie will contain FeatureStructures that correspond to an incremental processing performed by an AnalysisEngine. To retrieve the CAS to be inserted into the current AnalysisEngine being executed, the executor will simply have to provide the "subtrace" associated with it to the trie. The trie will then traverse the "root-to-node" path that is given by the "subtrace," collecting the FeatureStructures stored therein and adding them to the CAS. The CAS produced from this process will represent the one leading up to the execution of the current AnalysisEngine.
Currently, not only the is the DataCache containing intermediate CASes stored in-memory, it does so inefficiently by keeping a duplicate of the entire CAS resulting from each execution. Obviously, this strategy won't scale for larger experiments, especially those with many phases and cross-opts.
A new "smart" cache is proposed to address this issue. The cache will be implemented as a trie using "subtraces" as its keys, and FeatureStructures as values. A "subtrace" is a 2-tuple composed of an input number, and an executed descriptor sequence. An input number designates the position of which is input is currently being processed. An executed descriptor sequence is a concatenation of all the component descriptors that form a linear "root-to-node" path to the current node being executed in the configuration space tree. A FeatureStructure is either an Entity type or an Annotation types that were added by a Uima AnalysisEngine.
A trie will be created for each of the input numbers, since they will have entirely different CASes. Each node in the trie will contain FeatureStructures that correspond to an incremental processing performed by an AnalysisEngine. To retrieve the CAS to be inserted into the current AnalysisEngine being executed, the executor will simply have to provide the "subtrace" associated with it to the trie. The trie will then traverse the "root-to-node" path that is given by the "subtrace," collecting the FeatureStructures stored therein and adding them to the CAS. The CAS produced from this process will represent the one leading up to the execution of the current AnalysisEngine.
@CollinM
@elmer-garduno
The text was updated successfully, but these errors were encountered: