Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating/Replacing feature would be nice. #68

Open
PauldeKoning opened this issue Dec 18, 2024 · 0 comments
Open

Updating/Replacing feature would be nice. #68

PauldeKoning opened this issue Dec 18, 2024 · 0 comments

Comments

@PauldeKoning
Copy link

PauldeKoning commented Dec 18, 2024

Heya, I am working with the dataset from Dublin Bus. This data updates regularly and I will create a script to automatically update it on an x interval. When trying to run the same script again it will fail as it will try to create the tables while they're already created. It would be nice to have a drop/replace schema option or to be able to add data that has not yet been added to the table.

Currently, I'm running my 'replace' as follows, which is purely a hack but it's a personal project so it's no bother.

  1. I create a script via gtfs-via-postgres [...] | sponge > gtfs.sql
  2. I replace BEGIN; in the script to:
BEGIN;
drop schema public cascade;
create schema public;
GRANT ALL ON SCHEMA public TO postgres;
GRANT ALL ON SCHEMA public TO public;
CREATE EXTENSION IF NOT EXISTS postgis;

using a script:

echo 'Replacing in SQL file to replace public schema'

sed -i -e 's/BEGIN;/BEGIN;\
drop schema public cascade;\
create schema public;\
GRANT ALL ON SCHEMA public TO postgres;\
GRANT ALL ON SCHEMA public TO public;\
CREATE EXTENSION IF NOT EXISTS postgis;/' gtfs.sql

echo 'Replacing done'
  1. I run the gtfs.sql script, in a transaction this replaces the public schema to be filled by the gtfs data.sql script

This works but is not pretty. The main problem is creating the indexes takes 20 minutes on my local pc. (using --stops-location-index)


I currently don't have the time to fully edit the project but starting from PostgreSQL 17, the ON_ERROR 'ignore' option on the COPY command would be a big help. This ignores any copy that cannot be done. Example query:

COPY public.routes (
                             route_id,
                             agency_id,
                             route_short_name,
                             route_long_name,
                             route_desc,
                             route_type,
                             route_url,
                             route_color,
                             route_text_color,
                             route_sort_order
    ) FROM STDIN WITH(format csv, ON_ERROR 'ignore')

This together with some IF NOT EXISTS on index and create table queries could help. If I get some time I will try to test this and make a PR.

@PauldeKoning PauldeKoning changed the title Updating/Replaceing feature would be nice. Updating/Replacing feature would be nice. Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant