IR2

Information Storage and Retrieval course project phase 2: This program reads an input file of multiple documents separated with <p>, tokenizes them, stems the tokens using Paice/Husk algorithm, and creates a dictionary of the results. The dictionary is constructed using Ternary Search Tree which is implemented with a data structure called Train. The TST was implemented using different data structers, Train was the best in both memory and time.

The user can type in a query to search in the document set. The program merges the posting lists of the words to find the matching documents for the input vector. The output is the index of the documents which contain all of the queries tokens.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
IR2		IR2
IR.jar		IR.jar
README.md		README.md
input file.dat		input file.dat
paice rules.txt		paice rules.txt
small size input file.dat		small size input file.dat
user manual.txt		user manual.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IR2

About

Releases

Packages

Languages

arminbashizade/IR2

Folders and files

Latest commit

History

Repository files navigation

IR2

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages