This repository contains the development done by Oesía for Task “2.5 Data comparison components”
Download articles and legal documents from public procurement sources:
- Tender and contract data of European government bodies from OpenOpps via API or Amazon-S3 bucket (credentials are required)
- Legislative texts via JRC-Acquis dataset.
- Public procurement notices via TED dataset.
The tool is used to analyze the information extracted from the KG through the 2 project APIs designed to work with the Knowledge Graph: KG core api and search api.
The core API basically brings bidding information while the Api search searches, from an OCID, for documents containing similar descriptions ("description" field) regardless of the language.
This comparison tool analyses the bidding and award data that is extracted from an initial load. To perform this initial load, you must indicate the number of tenders and awards that you want to search for the KG. And, to use the api search, you are asked for the ID of the document you want to compare. All this parameterization is done together at the beginning.
- Install Docker and Docker-Compose
- Software and Hardware requirements for the Docker platform
- Clone this repo
git clone https://github.com/TBFY/Data-Comparison-Components.git
- Modify the
docker-compose.yml
file to adjust the volume of data to be extracted from the KG-API and the search-API on initial tool loading
- IDTENDER_SEARCH: id in search-api for similar tender
- TOTAL_DATOS_TENDER_SEARCH: Maximum number of records to retrieve from search-api
- TOTAL_DATOS_TENDER: Maximum data to recover from kg-api for Tender
- STATUS_DATOS_TENDER: Status of the tender (planning, planned, active, canceled, unsuccessful, complete, withdrawn), if left on the side, the filter is not applied
- TITLE_DATOS_TENDER: A word to filter, if left empty the filter is not applied
- DESCRIPTION_DATOS_TENDER: A word to filter, if left empty the filter is not applied
- TOTAL_DATOS_AWARD: Maximum data to recover from kg-api for Award
- STATUS_DATOS_AWARD: Status of the award (pending, active, canceled, unsuccessful), if left on the side, the filter is not applied
- TITLE_DATOS_AWARD: A word to filter, if left empty the filter is not applied
- DESCRIPTION_DATOS_AWARD: A word to filter, if left empty the filter is not applied
- TOTAL_DATOS_CONTRACTING_PROCESS: Maximum data to recover from kg-api for Contracting Process
- Run the container Docker by:
docker-compose up -d
- You should be able to monitor the progress by:
docker-compose logs -f
- To stop the docker container, the command
docker-compose logs -f
If you want to perform a new data load, you will stop, modify the docker-compose.yml and start the container, this will restart the whole process. In the final result of the data loading there will be more data than the limits indicated, this is due to the fact that the related Tenders and Awards are also loaded. - For admin site should be available at: http://localhost:5606/
- User: sirenadmin
- Password: password
It is necessary that the Docker machine meets the following requirements: https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html#_set_vm_max_map_count_to_at_least_262144
- Modify /etc/sysctl.conf to add, according to the documentation of ElasticSearch
vm.max_map_count=262144
Lastest stable release can be found here:
https://hub.docker.com/repository/docker/tbfy/odc-tool
Please take a look at our contributing guidelines if you're interested in helping!