Update docs #102

dcrasto · 2017-07-07T20:24:47Z

Currently using v0.5.2
I have a playbook with 2 steps.
I can see the unloads finish on redshift, however the last step never executes and I get a timeout error on the first step. This just recently started happening and there is nothing running on the cluster that would cause it to block. If I run a 2nd or 3rd time it works just fine.

Is there a more verbose output?
What can I do differently?

I've even split up the queries into multiple steps and still no go.
Thanks
danny

error

‘/opt/sql-runner/queries/redshift/playbooks/web-incremental.yml.tpl’ -> ‘/opt/sql-runner/queries/redshift/playbooks/web-incremental.yml’
Using /opt/sql-runner/queries/redshift/playbooks/web-incremental.yml @ 2017-07-07-04-02
‘/opt/sql-runner/queries/redshift/playbooks/bi-kafka-incremental.yml.tpl’ -> ‘/opt/sql-runner/queries/redshift/playbooks/bi-kafka-incremental.yml’
Using /opt/sql-runner/queries/redshift/playbooks/bi-kafka-incremental.yml @ 2017-07-07-04-02
2017/07/07 04:02:19 EXECUTING unload-kafka-to-s3 (in step unload-kafka-to-s3 @ dwh): /opt/sql-runner/queries/redshift/sql/bi-kafka/incremental/unload-kafka-to-s3.sql
2017/07/07 06:02:47 FAILURE: unload-kafka-to-s3 (step unload-kafka-to-s3 @ target dwh), ERROR: read tcp 172.35.5.39:50733->:5439: read: connection reset by peer
2017/07/07 06:02:47
TARGET INITIALIZATION FAILURES:
QUERY FAILURES:
* Query unload-kafka-to-s3 /opt/sql-runner/queries/redshift/sql/bi-kafka/incremental/unload-kafka-to-s3.sql (in step unload-kafka-to-s3 @ target dwh), ERROR:
  - read tcp 172.35.5.39:50733->22.11.33.98:5439: read: connection reset by peer

playbook

targets:
  - :name:     __BI_DB_NAME__
    :type:     redshift
    :host:     __BI_DB_HOST__
    :database: __BI_DB_NAME__
    :port:     __BI_DB_PORT__
    :username: __BI_DB_USERNAME__
    :password: __BI_DB_PASSWORD__
:steps:
  - :name: unload-kafka-to-s3
    :queries:
      - :name: unload-kafka-to-s3
        :file: sql/bi-kafka/incremental/unload-kafka-to-s3.sql
        :template: true
  - :name: ad-bi-time-track
    :queries:
      - :name: ad-bi-time-track 
        :file: sql/bi-kafka/recalculate/add-last-kafkalisting-to-timetrack.sql

first step

UNLOAD ('
    with last_time as (SELECT MAX(last_processed) as last_time from snowplow_runner.time_track)
    SELECT listing_id, listing_key, event_type, event_unixtimestamp,
            event_date, event_datetime, listing_unixtimestamp,
            listing_date, listing_datetime, country, state, city,
            neighbourhood, category_l1, category_l2, category_l3,
            category_l4, language, user_id, user_name, geo_country,
            email, phone_number, live, seller_type, condition_id,
            price_type_id, price, currency_code, location_zip, ip,
            channel, images, title, featured_listing, first_ad, event_uuid,
            source, full_neighbourhood, listing_platform
    FROM  kafka.listings, last_time
    WHERE event_datetime > last_time  
          AND event_type = \'approved\'
    ORDER BY event_datetime
')
to 's3://{{systemEnv "S3_BI_KAKFA_BUCKET"}}/run_time={{systemEnv "RUN_TIMESTAMP"}}_{{systemEnv "RUN_TYPE"}}/listings_' 
{{awsEnvCredentials}}
ESCAPE;

UNLOAD ('
    with last_time as (SELECT MAX(last_processed) as last_time from snowplow_runner.time_track)
    SELECT listing_id, event_unixtimestamp, event_uuid, partial, data_bag
    FROM  kafka.listings_databag, last_time
    WHERE (TIMESTAMP \'epoch\' + event_unixtimestamp * INTERVAL \'1 Second \') > last_time
    AND (event_uuid, listing_id) in (SELECT DISTINCT event_uuid, listing_id from kafka.listings where event_type = \'uae_listing_approved\')
    ORDER BY event_unixtimestamp
')
to 's3://{{systemEnv "S3_BI_KAKFA_BUCKET"}}/run_time={{systemEnv "RUN_TIMESTAMP"}}_{{systemEnv "RUN_TYPE"}}/data_bag_' 
{{awsEnvCredentials}}
ESCAPE;

Second step

INSERT INTO snowplow_runner.time_track (
    SELECT MAX(event_datetime) as last_time from kafka.listings
);

The text was updated successfully, but these errors were encountered:

BenFradet · 2017-07-07T21:31:51Z

ERROR: read tcp 172.35.5.39:50733->:5439: read: connection reset by peer

I think you might want to dig into this, I'm assuming this is redshift's ip?

dcrasto · 2017-07-08T05:57:15Z

@BenFradet it looks like redshift does reset the connection, however it's because the script doesn't execute the second step and waits for lost response for the DB of the unload step that has finished as indicated on redshift's dashboard.

dcrasto · 2017-07-09T10:57:13Z

@BenFradet Digging further,
I have updated the playbook to run everything serially in it's own step however the second unload (databag) time's out.
However I can see the data in the s3 bucket that corroborates the information on the redshift dashboard that the unload finishes correctly. I just don't seem to be getting a success from the second unload query. It just hangs till the timeout occurs and the job fails.

Here's the updated playbook.

:targets:
  - :name:     __BI_DB_NAME__
    :type:     redshift
    :host:     __BI_DB_HOST__
    :database: __BI_DB_NAME__
    :port:     __BI_DB_PORT__
    :username: __BI_DB_USERNAME__
    :password: __BI_DB_PASSWORD__
:steps:
  - :name: unload-kafka-to-s3-listings
    :queries:
      - :name: unload-kafka-listings-to-s3
        :file: sql/bi-kafka/incremental/unload-kafka-listings-to-s3.sql
        :template: true
  - :name: unload-kafka-to-s3-databag
    :queries:
      - :name: unload-kafka-databag-to-s3
        :file: sql/bi-kafka/incremental/unload-kafka-databag-to-s3.sql
        :template: true
  - :name: ad-bi-time-track
    :queries:
      - :name: ad-bi-time-track 
        :file: sql/bi-kafka/recalculate/add-last-kafkalisting-to-timetrack.sql

alexanderdean · 2017-07-09T11:00:58Z

Hey @dcrasto - I'm struggling to see how this could be a bug in SQL Runner... It's much more likely to be an issue either in Redshift, or in the Golang Postgres driver no?

dcrasto · 2017-07-09T15:06:39Z

seems to be the case @alexanderdean.
I will check with the AWS guys to see whats going on.

abrenaut · 2018-07-17T15:05:53Z

@dcrasto Any update on that ? I'm running into the same issue where a SQL query finishes correctly but then sql-runner hangs till the timeout occurs and the job fails.

dcrasto · 2018-07-17T16:51:17Z

@abrenaut have to update the network settings on the machine running the job:

/sbin/sysctl -w net.ipv4.tcp_keepalive_time=200 net.ipv4.tcp_keepalive_intvl=200 net.ipv4.tcp_keepalive_probes=5

Also make sure there is no timeout for the user connecting to the DB.

Works like a charm now.
Best

p.s @alexanderdean you might want to update the docs

BenFradet · 2018-07-17T17:04:25Z

@mhadam do you mind taking care of updating the docs

dcrasto closed this as completed Jul 17, 2018

alexanderdean assigned mhadam Jul 17, 2018

alexanderdean changed the title ~~[bug] Stuck step~~ Updated docs Jul 17, 2018

alexanderdean reopened this Jul 17, 2018

alexanderdean changed the title ~~Updated docs~~ Update docs Jul 17, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update docs #102

Update docs #102

dcrasto commented Jul 7, 2017

BenFradet commented Jul 7, 2017

dcrasto commented Jul 8, 2017

dcrasto commented Jul 9, 2017

alexanderdean commented Jul 9, 2017

dcrasto commented Jul 9, 2017

abrenaut commented Jul 17, 2018

dcrasto commented Jul 17, 2018

BenFradet commented Jul 17, 2018

Update docs #102

Update docs #102

Comments

dcrasto commented Jul 7, 2017

error

playbook

first step

Second step

BenFradet commented Jul 7, 2017

dcrasto commented Jul 8, 2017

dcrasto commented Jul 9, 2017

alexanderdean commented Jul 9, 2017

dcrasto commented Jul 9, 2017

abrenaut commented Jul 17, 2018

dcrasto commented Jul 17, 2018

BenFradet commented Jul 17, 2018