-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Use Psycopg3 COPY #451
base: main
Are you sure you want to change the base?
Conversation
@@ -33,10 +33,11 @@ packages = [ | |||
|
|||
[tool.poetry.dependencies] | |||
python = ">=3.8" | |||
faker = {version = "~=30.0", optional = true} | |||
psycopg2-binary = "2.9.9" | |||
faker = {version = "~=29.0", optional = true} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to downgrade this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope! My branch was slightly outdated 😅
psycopg = "^3.2.3" | ||
psycopg-binary = "^3.2.3" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we both the source and binary packages?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@edgarrmondragon This can be changed to just psycopg. Although now I am wondering if a user wants to use psycopg[c] or psycopg[binary] what would be the suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dbt-labs/dbt-postgres#96 is probably a good case study. Most users in the data space can't or don't want to build C extensions, so we'll probably prefer psycopg[binary]
.
|
||
# Use copy to run the copy statement. | ||
# https://www.psycopg.org/psycopg3/docs/basic/copy.html | ||
with connection.connection.cursor().copy(copy_statement) as copy: # type: ignore[attr-defined] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens at this point if someone sets postgresql+psycopg2
for dialect+driver
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@edgarrmondragon It would raise an exception. In the current main branch I don't think using anything aside from postgresql+psycopg2
would work anyway, so this being configurable doesn't add much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% sure, but I don't think we use driver-specific APIs and rely on SQLAlchemy DDL/DML in all places, so I would expect most drivers to work. Maybe I'm wrong.
Co-authored-by: Edgar Ramírez Mondragón <[email protected]>
This PR is aimed at using the Psycopg3 COPY functionality to speed up batch inserts.
Psycopg3 has the benefit of not needing to be file-based for COPY operations, so it avoids many of the issues with escaping values mentioned here:
#370
It also drastically decreases memory usage since we don't need to create a file in-memory to then serve to the database. All binding is done on server side (unlike psycopg2 which does it client-side).
This would require that we switch to psycopg3 as mentioned here:
#433