-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Option to print SQL statements for debug/perf reasons #5
Comments
@aaronsteers afaik sqlalchemy emits logs for all queries with the |
@edgarrmondragon - Great, thanks! Turns out I didn't need it in this pass. At first I was getting weird results due to discovery running first and taking a minute on its own - but then mitigated by running It would probably still be nice to have a docs entry or env var that can toggle this on or off - but I don't think it's very high priority as of now. |
@aaronsteers @edgarrmondragon I ran into performance issues with SQLAlchemy inspect and Snowflake during testing today too. TLDR; is that our default implementation of catalog discovery assumes it is cheap to query db metadata, which in snowflake it really isn't. The snowflake dialect is implemented using a mix of Also, during pytest testing, we set Will create issues for both 👍 |
@kgpayne https://github.com/transferwise/pipelinewise-tap-snowflake gets around that by having a |
Yeah, that makes sense that it isn't referencing
Definitely worth raising and worth logging. Thanks for calling this out. Can you say a bit more about your execution times for discovery? I saw discovery finish in 1.5 minutes on the Snowflake sample DB, which is not great but doesn't seem horrible. I was also running from Codespaces so roundtip network latency may have also been significantly better vs running on home WiFi. |
Doing some performance testing using the query history in the Snowflake UI, my first discovery query arrives at 11:53:38 AM and the first stream starts to load at 11:56:58 AM. So I guess performance isn't so terrible, and I agree it probably isn't a priority in this iteration. 3 minutes still feels like a long time for the terminal to hang for, and there is also inflated cost associated with constantly hitting the metadata tables, which we could look to improve in future 🙂 |
Agreed, and thanks for the additional detail. Having waited in the past upwards of 10-15 minutes for Salesforce to run discovery, I'm definitely tolerant of 3 minutes runtime for now, but I agree it makes sense to come back and tune this. This Meltano-side issue would actually be a big help here, since visibility/observability is part of the problem: On thinking through this a bit more... does the SDK print an INFO-level log when discovery is starting and finishing? (I don't think so.) If not, that's probably another quick win that wouldn't require a Meltano-side update. |
@edgarrmondragon and @kgpayne - Is there any option that you know of to print the SQL statements (commands or SELECT queries) as they are executing?
I believe I recall vaguely that SQLAlchemy supports something like this and I thought it could be helpful to enable it for scenarios where we want to debug what commands are being sent and/or to observe performance on specific commands.
The text was updated successfully, but these errors were encountered: