Skip to content

Updating/Replacing feature would be nice. #68

Open
@PauldeKoning

Description

@PauldeKoning

Heya, I am working with the dataset from Dublin Bus. This data updates regularly and I will create a script to automatically update it on an x interval. When trying to run the same script again it will fail as it will try to create the tables while they're already created. It would be nice to have a drop/replace schema option or to be able to add data that has not yet been added to the table.

Currently, I'm running my 'replace' as follows, which is purely a hack but it's a personal project so it's no bother.

  1. I create a script via gtfs-via-postgres [...] | sponge > gtfs.sql
  2. I replace BEGIN; in the script to:
BEGIN;
drop schema public cascade;
create schema public;
GRANT ALL ON SCHEMA public TO postgres;
GRANT ALL ON SCHEMA public TO public;
CREATE EXTENSION IF NOT EXISTS postgis;

using a script:

echo 'Replacing in SQL file to replace public schema'

sed -i -e 's/BEGIN;/BEGIN;\
drop schema public cascade;\
create schema public;\
GRANT ALL ON SCHEMA public TO postgres;\
GRANT ALL ON SCHEMA public TO public;\
CREATE EXTENSION IF NOT EXISTS postgis;/' gtfs.sql

echo 'Replacing done'
  1. I run the gtfs.sql script, in a transaction this replaces the public schema to be filled by the gtfs data.sql script

This works but is not pretty. The main problem is creating the indexes takes 20 minutes on my local pc. (using --stops-location-index)


I currently don't have the time to fully edit the project but starting from PostgreSQL 17, the ON_ERROR 'ignore' option on the COPY command would be a big help. This ignores any copy that cannot be done. Example query:

COPY public.routes (
                             route_id,
                             agency_id,
                             route_short_name,
                             route_long_name,
                             route_desc,
                             route_type,
                             route_url,
                             route_color,
                             route_text_color,
                             route_sort_order
    ) FROM STDIN WITH(format csv, ON_ERROR 'ignore')

This together with some IF NOT EXISTS on index and create table queries could help. If I get some time I will try to test this and make a PR.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions