This project is a dockerized version of the oerhoernchen20 made by Matthias Andrasch. It tries to combine the central OER search engine approach proprosed by Matthias with the personalized search engine approach proposed by Nele Hirsch as it is also possible to insert ones own data to the search engines index.
The general idea of this prototype is to establish a example pipeline of how a OER search engine could work.
This PROTOTYPE is based on the following technologies all packed into docker-containers (beside the the scapy-crawler):
-
Scrapy: First OER repositories are crawled using Scrapy. Currently the following repositories are crawled:
- HOOU
- a bit of OERinfo
- TIB-AV-Portal
- digiLL
- OpenRUB
- HHU-Mediathek
- MOOCs of oncampus
- ZOERR
Entries without a proper license are skipped and not stored). The results are stored in a MySQL database.
-
MySQL: Used to store the results of the crawl.
-
Logstash: Logstash is regulary checking the MySQL database, if any new items are added or changes are made to existing entries.
-
Elasticsearch: Elasticsearch is the search engine and indexes the input it gets from Logstash.
-
VueJS: The WebApp is made with VueJS and using ReactSearch to connect to Elasticsearch. It provides the user interface to make search queries or add items to Elasticsearch. It is deployed using Nginx, which also provides some reverse proxy services to connect the App with the Docker network.
-
make sure you have git installed (https://www.atlassian.com/git/tutorials/install-git)
-
make sure you have Docker installed
-
git clone
-
cd oerhoernchen20_docker/docker_hoernchen
-
docker-compose up
this may take a while on the first run as some images will have to be build. -
after a few minutes you should be able to go to http://localhost and see your "OERhörnchen 2.0" instance ready (if it still says "Loading entries" you might just have to wait a bit longer)
-
(for DockerPros: Don't get confused with the indexer-contaier throwing some errors at the beginning. It is started right at the beginning and looking for Elasticsearch, even if it's not ready yet. It stops throwing errors after a minute or so. Have to fix this, shouldn't be too complicated.)
- make sure you have python3 installed (https://docs.python-guide.org/starting/install3/osx/)
- go to project root
cd oer_scrapy
python -m venv oerhoernchen
source oerhoernchen/bin/activate
pip3 install -r requirements.txt
- crawler can be run with
scrapy crawl hoou_spider
. It assumes that you have the MySQL database running, so you should run thedocker-compose up
command from before.
- The mapping in Elasticsearch currently uses the LRMI as used by the HOOU LRMI (Learning Resource Metadata Initiative).
docker exec some-mysql sh -c 'exec mysqldump --all-databases -uroot -p"$MYSQL_ROOT_PASSWORD"' > /some/path/on/your/host/all-databases.sql
- logstash Fingerprint vor duplicate detection?
Just take it.
To the extent possible under law,
Steffen Rörtgen
has waived all copyright and related or neighboring rights to
Docker-Hoernchen 2.0.
This work is published from:
Deutschland.