Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update docs #102

Open
dcrasto opened this issue Jul 7, 2017 · 8 comments
Open

Update docs #102

dcrasto opened this issue Jul 7, 2017 · 8 comments
Assignees

Comments

@dcrasto
Copy link

dcrasto commented Jul 7, 2017

Currently using v0.5.2
I have a playbook with 2 steps.
I can see the unloads finish on redshift, however the last step never executes and I get a timeout error on the first step. This just recently started happening and there is nothing running on the cluster that would cause it to block. If I run a 2nd or 3rd time it works just fine.

  • Is there a more verbose output?
  • What can I do differently?

I've even split up the queries into multiple steps and still no go.
Thanks
danny

error

‘/opt/sql-runner/queries/redshift/playbooks/web-incremental.yml.tpl’ -> ‘/opt/sql-runner/queries/redshift/playbooks/web-incremental.yml’
Using /opt/sql-runner/queries/redshift/playbooks/web-incremental.yml @ 2017-07-07-04-02
‘/opt/sql-runner/queries/redshift/playbooks/bi-kafka-incremental.yml.tpl’ -> ‘/opt/sql-runner/queries/redshift/playbooks/bi-kafka-incremental.yml’
Using /opt/sql-runner/queries/redshift/playbooks/bi-kafka-incremental.yml @ 2017-07-07-04-02
2017/07/07 04:02:19 EXECUTING unload-kafka-to-s3 (in step unload-kafka-to-s3 @ dwh): /opt/sql-runner/queries/redshift/sql/bi-kafka/incremental/unload-kafka-to-s3.sql
2017/07/07 06:02:47 FAILURE: unload-kafka-to-s3 (step unload-kafka-to-s3 @ target dwh), ERROR: read tcp 172.35.5.39:50733->:5439: read: connection reset by peer
2017/07/07 06:02:47
TARGET INITIALIZATION FAILURES:
QUERY FAILURES:
* Query unload-kafka-to-s3 /opt/sql-runner/queries/redshift/sql/bi-kafka/incremental/unload-kafka-to-s3.sql (in step unload-kafka-to-s3 @ target dwh), ERROR:
  - read tcp 172.35.5.39:50733->22.11.33.98:5439: read: connection reset by peer

playbook

targets:
  - :name:     __BI_DB_NAME__
    :type:     redshift
    :host:     __BI_DB_HOST__
    :database: __BI_DB_NAME__
    :port:     __BI_DB_PORT__
    :username: __BI_DB_USERNAME__
    :password: __BI_DB_PASSWORD__
:steps:
  - :name: unload-kafka-to-s3
    :queries:
      - :name: unload-kafka-to-s3
        :file: sql/bi-kafka/incremental/unload-kafka-to-s3.sql
        :template: true
  - :name: ad-bi-time-track
    :queries:
      - :name: ad-bi-time-track 
        :file: sql/bi-kafka/recalculate/add-last-kafkalisting-to-timetrack.sql

first step

UNLOAD ('
    with last_time as (SELECT MAX(last_processed) as last_time from snowplow_runner.time_track)
    SELECT listing_id, listing_key, event_type, event_unixtimestamp,
            event_date, event_datetime, listing_unixtimestamp,
            listing_date, listing_datetime, country, state, city,
            neighbourhood, category_l1, category_l2, category_l3,
            category_l4, language, user_id, user_name, geo_country,
            email, phone_number, live, seller_type, condition_id,
            price_type_id, price, currency_code, location_zip, ip,
            channel, images, title, featured_listing, first_ad, event_uuid,
            source, full_neighbourhood, listing_platform
    FROM  kafka.listings, last_time
    WHERE event_datetime > last_time  
          AND event_type = \'approved\'
    ORDER BY event_datetime
')
to 's3://{{systemEnv "S3_BI_KAKFA_BUCKET"}}/run_time={{systemEnv "RUN_TIMESTAMP"}}_{{systemEnv "RUN_TYPE"}}/listings_' 
{{awsEnvCredentials}}
ESCAPE;

UNLOAD ('
    with last_time as (SELECT MAX(last_processed) as last_time from snowplow_runner.time_track)
    SELECT listing_id, event_unixtimestamp, event_uuid, partial, data_bag
    FROM  kafka.listings_databag, last_time
    WHERE (TIMESTAMP \'epoch\' + event_unixtimestamp * INTERVAL \'1 Second \') > last_time
    AND (event_uuid, listing_id) in (SELECT DISTINCT event_uuid, listing_id from kafka.listings where event_type = \'uae_listing_approved\')
    ORDER BY event_unixtimestamp
')
to 's3://{{systemEnv "S3_BI_KAKFA_BUCKET"}}/run_time={{systemEnv "RUN_TIMESTAMP"}}_{{systemEnv "RUN_TYPE"}}/data_bag_' 
{{awsEnvCredentials}}
ESCAPE;

Second step

INSERT INTO snowplow_runner.time_track (
    SELECT MAX(event_datetime) as last_time from kafka.listings
);
@BenFradet
Copy link

ERROR: read tcp 172.35.5.39:50733->:5439: read: connection reset by peer

I think you might want to dig into this, I'm assuming this is redshift's ip?

@dcrasto
Copy link
Author

dcrasto commented Jul 8, 2017

@BenFradet it looks like redshift does reset the connection, however it's because the script doesn't execute the second step and waits for lost response for the DB of the unload step that has finished as indicated on redshift's dashboard.

@dcrasto
Copy link
Author

dcrasto commented Jul 9, 2017

@BenFradet Digging further,
I have updated the playbook to run everything serially in it's own step however the second unload (databag) time's out.
However I can see the data in the s3 bucket that corroborates the information on the redshift dashboard that the unload finishes correctly. I just don't seem to be getting a success from the second unload query. It just hangs till the timeout occurs and the job fails.

Here's the updated playbook.

:targets:
  - :name:     __BI_DB_NAME__
    :type:     redshift
    :host:     __BI_DB_HOST__
    :database: __BI_DB_NAME__
    :port:     __BI_DB_PORT__
    :username: __BI_DB_USERNAME__
    :password: __BI_DB_PASSWORD__
:steps:
  - :name: unload-kafka-to-s3-listings
    :queries:
      - :name: unload-kafka-listings-to-s3
        :file: sql/bi-kafka/incremental/unload-kafka-listings-to-s3.sql
        :template: true
  - :name: unload-kafka-to-s3-databag
    :queries:
      - :name: unload-kafka-databag-to-s3
        :file: sql/bi-kafka/incremental/unload-kafka-databag-to-s3.sql
        :template: true
  - :name: ad-bi-time-track
    :queries:
      - :name: ad-bi-time-track 
        :file: sql/bi-kafka/recalculate/add-last-kafkalisting-to-timetrack.sql

@alexanderdean
Copy link
Member

Hey @dcrasto - I'm struggling to see how this could be a bug in SQL Runner... It's much more likely to be an issue either in Redshift, or in the Golang Postgres driver no?

@dcrasto
Copy link
Author

dcrasto commented Jul 9, 2017

seems to be the case @alexanderdean.
I will check with the AWS guys to see whats going on.

@abrenaut
Copy link

@dcrasto Any update on that ? I'm running into the same issue where a SQL query finishes correctly but then sql-runner hangs till the timeout occurs and the job fails.

@dcrasto
Copy link
Author

dcrasto commented Jul 17, 2018

@abrenaut have to update the network settings on the machine running the job:

/sbin/sysctl -w net.ipv4.tcp_keepalive_time=200 net.ipv4.tcp_keepalive_intvl=200 net.ipv4.tcp_keepalive_probes=5

Also make sure there is no timeout for the user connecting to the DB.

Works like a charm now.
Best

p.s @alexanderdean you might want to update the docs

@dcrasto dcrasto closed this as completed Jul 17, 2018
@BenFradet
Copy link

@mhadam do you mind taking care of updating the docs

@alexanderdean alexanderdean changed the title [bug] Stuck step Updated docs Jul 17, 2018
@alexanderdean alexanderdean reopened this Jul 17, 2018
@alexanderdean alexanderdean changed the title Updated docs Update docs Jul 17, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants