The wikipedia search engine!
Demo » (service might be unvailable)
Wikoogle is a wikipedia information retrieval system. In other terms is a wikipedia search engine
Alongside of other dependencies used to build the project explained below, you must need:
- Wikipedia dumps. You should pick few dumps and only ones that have
pages-articles-multistream
as a name. Download them and put in thedumps
directory on the root of the project - You will need enough RAM memory that depends on the number of dumps you want to index, you can easily run out of memory during the index or running phase. For example, if you have >= 3 dumps (> 3 GB decompressed) you will need at least 4 GB of ram free. On the other hand if you index a single tiny dump (from 600MB to 1.3GB decompressed), 2 GB free should be enough
- python (>= 3.7)
- pipenv
After checking the requirements
python --version
pipenv --version
follow these steps from the root of the project:
pipenv install
pipenv run python -m nltk.downloader 'popular'
- Specify the entrypoint
or in Window Powershell (
export FLASK_APP=main.py
Window-key + X
-> Window Powershell)$env:FLASK_APP = "main.py"
- Run
remember to set the PORT (e.g 8888)
cd src pipenv run python -m flask run --host 0.0.0.0 --port PORT --no-reload
As alternative to the first installation, you can install and run the project within a linux container. Be sure to have docker installed: https://docs.docker.com/get-docker/
-
Build the image
information_retrieval
(you can change the tag)docker image build -t information_retrieval -f Dockerfile.dev .
The image is based on the
python:latest
image. If the process fails due to missing image, download it withdocker pull python
and retry. -
Create the container with the name
ir_container
(you can change it)docker container create -p 8888:8888 -v ${PWD}:/app -it --name ir_container information_retrieval
You can change the ports mapping(8888 is the only exposed port of the image, so don't change the destination container port but only the origin host port) and the name of the container. Be sure to give ENOUGH RAM to the container(read installation instruction at the beginning), otherwise the next step might fail
-
Run
docker -ia ir_container # or the name you specified before cd src export FLASK_APP=main.py python -m flask run --host 0.0.0.0 --port 8888 --no-reload
The usage is straightforward, you can checkout the demo online here: http://212.237.42.43:8080/ or hit the browser after you
ran the application on your computer at: localhost:PORT where PORT
is the port you specified in the previous steps.
Wikoogle, resembles google(at least, we try): the query language is almost the same and you can configure search parameters of the models (e.g page rank, query expansion) from the ui-friendly menu
All major modern browser are supported:
- Chrome (>=57)
- Edge (>=16)
- Firefox (>=52)
See docs here, you will find evaluation and performance measures and other aspects related to the project as challenges, architecture, models and so on.