LatentSemanticIndexing by Team 3 (Hofmann, Portisch, Ulbrich, Hentschel)

Repository for Task 3 of the Information Retrival team project: Latent Semantic Indexing.

General Information

Used Technologies:

Python (version 3.6)
Libraries
- Natural Language ToolKit (NLTK) for Python
- Numpy
- SciPy (v0.19)
- json
Webpage as UI build with Angular
We recommend using Anaconda as package manager and runtime (https://www.continuum.io/downloads)

Preprocessed files are persisted and already available in the project. If you want to re-preprocess, follow the steps of "How to Run from Scratch".

It is assumed that the newsgroup folder is available as described above in LatentSemanticIndexing/data .

Go to src/preprocessing/LemmatizationFilePreprocessing
If you have never used the NLTK stopword removal list and the tokenizer, follow the subsequent steps. Otherwise continue with step 3.

Execute

import nltk
nltk.download()

Name		Name	Last commit message	Last commit date
Latest commit History 176 Commits
data		data
src		src
.gitignore		.gitignore
README.md		README.md
main.py		main.py