-
Notifications
You must be signed in to change notification settings - Fork 0
/
TODO
23 lines (23 loc) · 1004 Bytes
/
TODO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
- implement stemmer for strings
- implement a simple stemmer
- build an analyzer to split the strings
? the analyzer uses tokens to split the stream
? it can throw out string parts, which have no meaning
- builds an array of strings with number of occurences (needed for scoring)
* the index writer will do this, as i think the analyzer just has to
deliver the array itself
* IndexSearcher
* build indexes for attributes
* implement tree structures for index
* binary tree
* b-tree
* word tree for strings
* index has only the position of document in storage
* add scoring for findings
* could be implemented as per document score stored in the index
* findings of string in the document in relation to findings in all documents
* info should be accessible after building the index
* implement resultset
* should be streamlined to gather resultset from multiple queries
* sorts the result if needed
* returns only the top x documents