Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schema evolution, mapping and query logging improvement #20

Merged
merged 14 commits into from
Aug 24, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 11 additions & 8 deletions .env.example
Original file line number Diff line number Diff line change
@@ -1,16 +1,19 @@
SERVER_PORT=8080
MAPPED_POSTGRES_PORT=7777

# Only for development - testing on the local machine
DEV_MASTER_POSTGRES_PORT=7776

COPY_POSTGRES_HOST=
COPY_POSTGRES_DB=
COPY_POSTGRES_USER=
COPY_POSTGRES_PASSWORD=

FLYWAY_URL=
FLYWAY_USER=
FLYWAY_PASSWORD=
FLYWAY_PLACEHOLDERS_ORIGINAL_HOST_IP=
FLYWAY_PLACEHOLDERS_ORIGINAL_DB=
FLYWAY_PLACEHOLDERS_ORIGINAL_USER=
FLYWAY_PLACEHOLDERS_ORIGINAL_PASSWORD=
FLYWAY_PLACEHOLDERS_SUBSCRIPTION_NAME=
FLYWAY_URL=jdbc:postgresql://slave-db/tourismuser
FLYWAY_USER=tourismuser
FLYWAY_PASSWORD=postgres2
FLYWAY_PLACEHOLDERS_ORIGINAL_HOST_IP=master-db
FLYWAY_PLACEHOLDERS_ORIGINAL_DB=tourismuser
FLYWAY_PLACEHOLDERS_ORIGINAL_USER=vkgreplicate
FLYWAY_PLACEHOLDERS_ORIGINAL_PASSWORD=Azerty
FLYWAY_PLACEHOLDERS_SUBSCRIPTION_NAME=vkgsubscription_test
3 changes: 3 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
*.sh eol=lf
*.conf eol=lf
*.sql eol=lf
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -211,3 +211,7 @@ vkg/odh.properties
.DS_Store

tmp*.*

*.local.properties
*.sql.gz
test/master/*.sql
16 changes: 7 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ cd odh-vkg/

### Local deployment

1. Create the `.env` file in which the SPARQL endpoint port and the PG external port (for debugging purposes) are specified
1. Create the `.env` file in which, amongst all, the SPARQL endpoint port and the PG external port (for debugging purposes) are specified

* `cp .env.example .env`

Expand All @@ -45,6 +45,7 @@ cd odh-vkg/
3. Visit the SPARQL endpoint

* Now we can open the link <http://localhost:8080> in the browser and test some SPARQL queries
* Note that synchronisation between the master and the slave takes some time. Until it is finished, some queries may return empty results.

#### Docker environment

Expand All @@ -66,15 +67,15 @@ Install [Docker](https://docs.docker.com/install/) (with Docker Compose) locally

To start the container on the foreground:
```
docker-compose pull && docker-compose up
docker-compose pull && docker-compose up --pull
```
The container is run on the foreground and can be stopped by pressing CTRL-C.

##### Option 2: On the background

To start the container on the background:
```
docker-compose pull && docker-compose up -d
docker-compose pull && docker-compose up --pull -d
```

To stop it:
Expand All @@ -93,18 +94,18 @@ Current deployments:
* Production: https://sparql.opendatahub.bz.it/

#### Database synchronization
The SPARQL endpoints do not query directly the production database but slave read-only instances, which are synchronized with the master database through logical replication. For more details, see [the dedicated page](data/replication/slave/README.md).
The SPARQL endpoints do not query directly the production database but slave read-only instances, which are synchronized with the master database through logical replication. For more details, see [the dedicated page](docs/replication.md).


## Maintenance

### Schema evolution

[See the dedicated page](schema-evolution.md)
[See the dedicated page](docs/schema-evolution.md)

### Test database image

For building a newer version of the Docker image of the test database out of a fresh dump, please refer to [the dedicated page](data/test/README.md).
For building a newer version of the Docker image of the test database out of a fresh dump, please refer to [the dedicated page](docs/test-master.md).

This Docker image is published [on Docker Hub](https://hub.docker.com/r/ontopicvkg/odh-tourism-db).

Expand Down Expand Up @@ -142,6 +143,3 @@ Some examples of possible SPARQL queries can be found in the SPARQL Queries fold
### Schema

The schema of the VKG can be visualized [in the dedicated page](sparql_queries/schema.md).



4 changes: 0 additions & 4 deletions data/replication/slave/Dockerfile

This file was deleted.

5 changes: 0 additions & 5 deletions data/test/Dockerfile

This file was deleted.

14 changes: 12 additions & 2 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,22 +15,32 @@ services:
ONTOP_PORTAL_FILE: /opt/ontop/input/odh.portal.toml
ONTOP_CORS_ALLOWED_ORIGINS: "*"
ONTOP_DEV_MODE: "true"
EXTRA_FLYWAY_OPTIONS: "-mixed=true"
volumes:
- ./vkg:/opt/ontop/input
- ./jdbc:/opt/ontop/jdbc
- ./src:/opt/ontop/src
entrypoint: ["/wait-for-it.sh","master-db:5432","--timeout=0","--strict", "--", "/entrypoint.sh"]
nginx:
build:
context: ./
dockerfile: infrastructure/docker/nginx/Dockerfile
env_file: .env
ports:
- "${SERVER_PORT}:80"
db:
image: ontopicvkg/odh-tourism-db
master-db:
image: ontopicvkg/odh-tourism-db:master
environment:
- POSTGRES_USER=tourismuser
- POSTGRES_PASSWORD=postgres2
ports:
- "${DEV_MASTER_POSTGRES_PORT}:5432"
slave-db:
image: postgres:12.1
shm_size: 1g
ports:
- "${MAPPED_POSTGRES_PORT}:5432"
command: ["postgres", "-c", "wal_level=logical"]
environment:
- POSTGRES_USER=tourismuser
- POSTGRES_PASSWORD=postgres2
12 changes: 4 additions & 8 deletions data/replication/slave/README.md → docs/replication.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Slave Docker image for the ODH tourism dataset
# Replication tips

Inspired by https://blog.raveland.org/post/postgresql_lr_en/

Expand Down Expand Up @@ -51,23 +51,19 @@ GRANT SELECT ON ALL TABLES IN SCHEMA public TO vkgreplicate;

## Slave configuration

If the master is a Docker container on the same machine, one must make sure they are on the same Docker network (here `tourism`) than the master container.

```bash
docker run --name odh_db_slave -p 7778:5432 -e POSTGRES_USER=tourismuser -e POSTGRES_PASSWORD=postgres2 --network tourism -d ontopicvkg/odh-db-slave
```
If the master is a Docker container on the same machine, one must make sure they are on the same Docker network than the master container.

Connect to the shell and open `psql`
```bash
docker exec -it odh_db_slave /bin/sh
docker exec -it my_slave_db /bin/sh
psql -U tourismuser
```

Make sure, that the slave can access the master via TCP. Check firewall rules.

Let the slave subscribe to the publication:
```sql
CREATE SUBSCRIPTION subodh CONNECTION 'host=odh-hackathon-2019_db_1 dbname=tourismuser user=vkgreplicate password=Azerty' PUBLICATION odhpub;
CREATE SUBSCRIPTION subodh CONNECTION 'host=odh-tourism-db1 dbname=tourismuser user=vkgreplicate password=Azerty' PUBLICATION odhpub;
```
Note that the subscription `subodh` must not already exist (otherwise give it another name).

Expand Down
31 changes: 17 additions & 14 deletions schema-evolution.md → docs/schema-evolution.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,17 +8,18 @@ This documentation provides recommendations on how to proceed when the schema of
### JSON level

#### New key
A new JSON key is first safely ignored. One [can regenerate the corresponding derived table and trigger](#regenerating-a-derived-table-and-a-trigger) for creating the corresponding column.
A new JSON key is first safely ignored. One [can regenerate the corresponding derived table and trigger](#regenerating-a-derived-table-and-a-trigger) for creating the corresponding column.

#### Key removed
**TODO: check that it does not prevent anything from working immediately.**

However, one should plan to remove soon the mapping entries using that key.
Indeed, they may break once the derived tables and triggers are regenerated, as the corresponding column won't appear anymore.

In case of an array, the derived table for the old array is now useless. Please write by hand a SQL script for cleaning the derived table and its trigger.

#### Literal replaced by an Object or an Array
Not considered at the moment. To be investigated when the situation appears.


### Column level

#### New column
Expand All @@ -34,10 +35,10 @@ However, removing an additional column may break the replication. See [the dedic
### Table level

#### New table
Given that the subscription has been created for all the tables, it should stop the replication until the corresponding table is added on the slave. **TODO: test it**.
Given that the subscription has been created for all the tables, it should stop the replication until the corresponding table is added on the slave after the first row is inserted.

#### Table removed
**TODO: test it**.
It does not seems to complain.

## Actions

Expand All @@ -51,19 +52,21 @@ ALTER SUBSCRIPTION ${subscription_name} DISABLE;
ALTER SUBSCRIPTION ${subscription_name} ENABLE;
```

### Regenerating a derived table and a trigger

**TODO: modify the script for performing all these actions**.
### Regenerating the derived tables of a mirror table

This SQL script performs the following actions:
1. It pauses the replication
2. It regenerates the derived table and trigger.
3. It populates the derived table from the mirror table.
4. It resumes the replication (see above).
1. It pauses the replication.
2. It regenerates all the derived tables and triggers of a mirror table.
3. It populates the derived tables from the mirror table.
4. It resumes the replication.

Steps:
1. Generates the script. **TODO: add the command**
2. [Publish it](#publish-a-migration-script)
1. Generate the script (change the parameter values)
```sh
cd scripts
python3 create_derived_tables_and_triggers_from_db.py regenerate -t accommodationsopen -u tourismuser -p postgres2 -h localhost -d tourismuser --port 7776 --subscription=vkgsubscription_test
```
2. [Publish](#publish-a-migration-script) the SQL script with the prefix `regen-`.


### Adding and removing columns in the mirror tables
Expand Down
35 changes: 26 additions & 9 deletions data/test/README.md → docs/test-master.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,19 @@
# PostgreSQL Docker test image for Tourism ODH
# PostgreSQL Docker test image for ODH Tourism

Contains the Open Data fragment of the tourism dataset of the Open Data Hub. This Docker image is published [on Docker Hub](https://hub.docker.com/r/ontopicvkg/odh-tourism-db).
Contains the Open Data fragment of the tourism dataset of the Open Data Hub. The various versions of the Docker image are published [on Docker Hub](https://hub.docker.com/r/ontopicvkg/odh-tourism-db).

Note this image is intended to be used for development and tests purposes, on your local machine. It does not contain up-to-date data.

It has 2 main versions:
- Standalone: contains the original dump, the triggers and the derived views
- Master: contains the original dump and a publication (for logical replication) has been created.


## How to start manually

Standalone version:
```sh
docker run --name odh_db_running -p 7777:5432 -e POSTGRES_USER=tourismuser -e POSTGRES_PASSWORD=postgres2 -d ontopicvkg/odh-tourism-db
docker run --name odh_db_running -p 7777:5432 -e POSTGRES_USER=tourismuser -e POSTGRES_PASSWORD=postgres2 -d ontopicvkg/odh-tourism-db:standalone
```

Note that normally it is started by docker-compose in dev mode.
Expand All @@ -31,28 +37,39 @@ Make sure that the following statement is disabled (`public` needs to be in the
-- SELECT pg_catalog.set_config('search_path', '', false);
```

### Updating the script generating triggers and trigger tables
### Updating the script generating triggers and trigger tables (for the standalone version)

In case the schema have changed.

1. Create a temporary Docker image of PG out the original schema and the dump. **TODO: provide the Dockerfile**
1. Create a temporary Docker image of PG out the original schema and the dump (preferably without triggers). See below for the instructions.
2. Start this image.
3. Generate the script by connecting this container.
```sh
cd scripts
python3 create_derived_tables_and_triggers_from_db.py all -u tourismuser -p postgres2 -h localhost -d tourismuser --port=7777
```
4. Stop and delete the container
5. Remove the temporary image.


### Build the Docker image

#### Files to put in the data directory
#### Files to put in the `test/master` directory

* `original_schema.sql`
* `dump-tourism-201911121025.sql.gz` (you can use the `gzip` to create it from the SQL file)
* `create_triggers_gen.sql` from the `scripts` directory
* `triggers_and_derived_tables.sql` from the `scripts` directory (for the standalone version)

#### Commands

Standalone version:
```sh
docker build --target standalone -t ontopicvkg/odh-tourism-db:standalone .
docker push ontopicvkg/odh-tourism-db:standalone
```

Master version:
```sh
docker build -t ontopicvkg/odh-tourism-db .
docker push ontopicvkg/odh-tourism-db
docker build --target master -t ontopicvkg/odh-tourism-db:master .
docker push ontopicvkg/odh-tourism-db:master
```
3 changes: 2 additions & 1 deletion infrastructure/docker/nginx/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@ FROM nginx:1.17.8-alpine
RUN apk add --update curl && \
rm -rf /var/cache/apk/*

COPY infrastructure/docker/nginx/nginx.conf /etc/nginx/conf.d/default.conf
COPY infrastructure/docker/nginx/default.conf /etc/nginx/conf.d/default.conf
COPY infrastructure/docker/nginx/nginx.conf /etc/nginx/nginx.conf

# expose port 80
EXPOSE 80
Expand Down
37 changes: 37 additions & 0 deletions infrastructure/docker/nginx/default.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@

proxy_cache_path /tmp/cache-nginx levels=1:2 keys_zone=my_cache:10m max_size=10g
inactive=1d use_temp_path=off;

server {
listen 80;

gzip on;

location / {
proxy_pass http://ontop:8080/;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_read_timeout 6000;
gzip_proxied any;
gzip_types *;

proxy_cache my_cache;
proxy_cache_lock on;
proxy_cache_background_update on;
proxy_cache_methods GET HEAD POST;
# proxy_cache_revalidate on;
# For large POST requests, the request_body is not considered in the key!
# In such a case, we do not cache it in Nginx.
# NB: one can adjust client_body_buffer_size for deciding to cache or not.
# https://stackoverflow.com/questions/18795701/nginx-proxy-cache-key-request-body-is-ignored-for-large-request-body#18986571
proxy_no_cache $request_body_file;
proxy_cache_key $request_uri|$request_body|$upstream_http_vary;
}
}



Loading