Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EWPP-2479: Implement command to update information about vocabularies. #64

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ COPY RoboFile.php RoboFile.php
COPY robo.yml robo.yml
COPY run.sh run.sh

RUN composer --no-interaction install
RUN composer --no-interaction --no-dev install

FROM tenforce/virtuoso:1.3.1-virtuoso7.2.2

Expand Down
29 changes: 29 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ New default content can be added to [`robo.yml`](./robo.yml) as shown below:
```
data:
- name: "corporate-body"
title: "Corporate body"
graph: "http://publications.europa.eu/resource/authority/corporate-body"
url: "http://publications.europa.eu/resource/cellar/07e1a665-2b56-11e7-9412-01aa75ed71a1.0001.10/DOC_1"
format: "rdf"
Expand All @@ -49,6 +50,7 @@ containing the archived RDF file name as shown below:
```
data:
- name: "eurovoc-thesaurus"
title: "EuroVoc Thesaurus"
graph: "http://publications.europa.eu/resource/dataset/eurovoc"
url: "http://publications.europa.eu/resource/cellar/9f2bd600-ae7b-11e7-837e-01aa75ed71a1.0001.09/DOC_1"
file: "eurovoc_in_skos_core_concepts.rdf"
Expand All @@ -73,6 +75,13 @@ Visit the RDF storage at: http://localhost:8890

## Available commands

Update information about vocabularies:

```
$ docker-compose exec web vendor/bin/robo update_version
```
This command can be executed only after execution within this code base `docker-compose up -d` and `docker-compose exec web composer update --dev`

Fetch remote data:

```
Expand Down Expand Up @@ -115,6 +124,26 @@ DBA_PASSWORD

Default values set via environment variables will override values set in `robo.yml`.

## Update vocabularies version in source code

Currently, information regarding the latest version of vocabularies can be discovered on `https://op.europa.eu/en/home` site.
For automatic update of source code you can do by following steps:

1. Download and start the supplied Docker images:
```
$ docker-compose up -d
```
2. Run composer install:
```
$ docker-compose exec web composer update --dev
```
3. Run automatic update information about vocabularies with using crawler:
```
$ docker-compose exec web vendor/bin/robo update_version
```
4. Test built image in your application.
5. Commit changed files except for `composer.lock` file.

## Working with Docker Compose

In Docker Compose declare service as follow:
Expand Down
101 changes: 101 additions & 0 deletions RoboFile.php
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,107 @@ public function purge() {
]);
}

/**
* Update information about OP vocabularies with automatic updating URLs.
*
* @TODO Refactor after accepting POC.
*
* @command update_version
*/
public function updateVersions() {
$web_driver = \Facebook\WebDriver\Remote\RemoteWebDriver::create(
"http://selenium:4444/wd/hub",
[
"browserName" => "chrome",
"browserVersion" => "103.0",
]
);
$current_voc_titles = [];
$data_values = [];
foreach ($this->config->get('data') as $datum) {
$current_voc_titles[] = $datum['title'];
$data_values[$datum['title']] = $datum;
}

try {
$parsedown = new Parsedown();
$parsed_readme = $parsedown->parse(file_get_contents('README.md'));
$raw_readme = file_get_contents('README.md');
$updated_readme = FALSE;
$crawler = new \Symfony\Component\DomCrawler\Crawler($parsed_readme);
$links_to_op_vocs = $crawler->filter('li > a');
/** @var \Facebook\WebDriver\WebDriverBy $webdriver_by */
$webdriver_by = \Facebook\WebDriver\WebDriverBy::class;
foreach ($links_to_op_vocs as $link) {
// Use only links to OP.
if (!in_array($link->textContent, $current_voc_titles)) {
continue;
}
$data = $data_values[$link->textContent];
$url = $link->getAttribute('href');
echo "Updating vocabulary " . $url . "\n";
$web_driver->get($url);
sleep(2);

// Assert we are not in the latest version.
$dropdown_button = $web_driver->findElements($webdriver_by::cssSelector('.eu-vocabularies-latest-version'));
// If the "LATEST" label is shown twice we know we are in the latest version.
if (count($dropdown_button) > 1) {
echo "Vocabulary is already in its latest version. \n";
continue;
}
// Find latest version.
$dropdown_button = $web_driver->findElement($webdriver_by::cssSelector('button.dropdown-toggle'));
$dropdown_button->click();
$latest_link = $web_driver->findElement($webdriver_by::cssSelector('div.dropdown-menu span:first-child a'));
$latest_link->click();
sleep(2);
$title = str_replace(' ', '[[:space:]]', $link->textContent);
$regexp = '/^(\-[[:space:]]\[' . $title . '\])(\(.*\))$/m';
$raw_readme = preg_replace($regexp, '$1' . '(' . $web_driver->getCurrentURL() . ')', $raw_readme);
$updated_readme = TRUE;

// Visit page with links to rdf files.
$web_driver->findElement($webdriver_by::linkText('Downloads'))->click();
sleep(2);
$rdf_link_url = $web_driver->findElement($webdriver_by::partialLinkText($data['partial_link_text']))->getAttribute('href');
parse_str(parse_url(urldecode($rdf_link_url))['query'], $query);
$rdf_urls_for_update[$link->textContent] = $query['cellarURI'];
echo "Vocabulary updated to " . $query['cellarURI'] . ". \n";
echo "===================================================\n";
}

}
catch(Exception $e){
echo 'Message: ' .$e->getMessage();
}
$web_driver->quit();
$web_driver->close();

// Update robo.yml file.
$rdf_data = $this->config->get('data');
$updated = FALSE;
foreach ($rdf_data as $index => $rdf_info) {
if (!empty($rdf_urls_for_update[$rdf_info['title']])) {
$updated = TRUE;
$rdf_data[$index]['url'] = $rdf_urls_for_update[$rdf_info['title']];
}
}
if ($updated) {
$this->config->set('data', $rdf_data);
$exported = $this->config->export();
unset($exported['command']);
unset($exported['options']);
$content = \Symfony\Component\Yaml\Yaml::dump($exported, 10);
file_put_contents('robo.yml', $content);
}
// Update README.md file.
if ($updated_readme) {
file_put_contents('README.md', $raw_readme);
}

}

/**
* Run list of queries via isql-v.
*
Expand Down
8 changes: 7 additions & 1 deletion composer.json
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,16 @@
"require": {
"consolidation/robo": "^1.0"
},
"require-dev": {
"erusev/parsedown": "^1.7",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered https://github.com/thephpleague/commonmark? It's robust and maintainable library. Please consider it after POC.

"php-webdriver/webdriver": "^1.12",
"symfony/css-selector": "^4.4",
"symfony/dom-crawler": "^4.4"
},
"config": {
"sort-packages": true,
"platform": {
"php": "7.0"
"php": "8.0"
}
},
"scripts": {
Expand Down
Loading