Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Smart Cache #4

Open
amaiberg opened this issue Oct 10, 2013 · 0 comments
Open

Smart Cache #4

amaiberg opened this issue Oct 10, 2013 · 0 comments
Assignees
Milestone

Comments

@amaiberg
Copy link
Member

Currently, not only the is the DataCache containing intermediate CASes stored in-memory, it does so inefficiently by keeping a duplicate of the entire CAS resulting from each execution. Obviously, this strategy won't scale for larger experiments, especially those with many phases and cross-opts.

A new "smart" cache is proposed to address this issue. The cache will be implemented as a trie using "subtraces" as its keys, and FeatureStructures as values. A "subtrace" is a 2-tuple composed of an input number, and an executed descriptor sequence. An input number designates the position of which is input is currently being processed. An executed descriptor sequence is a concatenation of all the component descriptors that form a linear "root-to-node" path to the current node being executed in the configuration space tree. A FeatureStructure is either an Entity type or an Annotation types that were added by a Uima AnalysisEngine.

A trie will be created for each of the input numbers, since they will have entirely different CASes. Each node in the trie will contain FeatureStructures that correspond to an incremental processing performed by an AnalysisEngine. To retrieve the CAS to be inserted into the current AnalysisEngine being executed, the executor will simply have to provide the "subtrace" associated with it to the trie. The trie will then traverse the "root-to-node" path that is given by the "subtrace," collecting the FeatureStructures stored therein and adding them to the CAS. The CAS produced from this process will represent the one leading up to the execution of the current AnalysisEngine.

@CollinM
@elmer-garduno

@ghost ghost assigned amaiberg Oct 10, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant