Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Several fixes and enhancements #39

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

musashiXXX
Copy link

The changes in this pull request fix:

  • newTable needed to be renamed to new_table in ETLAlchemySource.py
  • schema_transformer.py was not correctly retrieving the table name from the table_transformations dict, which would effectively disable all mappings in the table_schema_transformation_file.
  • MSSQL UNIQUEIDENTIFIER types will now be converted to PostgreSQL UUID types when migrating from MSSQL -> PostgreSQL
  • Added a per_table_buffers kwarg that allows for specifying the number of rows to fetch at once from the source database
  • We now check the varchar_length and try to adjust it for our target database, e.g., MSSQL varchar(max) -> PostgreSQL text

See line 12 in `schema_transformer.py` for details.
…(e.g., 1000.0)

even though the value is actually a whole number. This causes trouble
when migrating to other databases that expect a number without a
decimal. For example, when migrating SQL Server -> PostgreSQL, the
following error will be raised:

`psycopg2.DataError: invalid input syntax for integer: "1000048.0"`
…s for specifying, on a per-table basis, the buffer size when fetching rows from the source databse (i.e., the number of rows to fetch at a time, per table)
(more on that later). Fixed an issue with schema_transformer.py that
causes tables to not be renamed per the
table_schema_transformation_file.
the maximum allowable size, depending on our target database.
The previous required version causes this error:

ValueError: invalid literal for int() with base 10: '5 (Ubuntu 10'

when used with PostgreSQL >= 10
Copy link
Owner

@seanharr11 seanharr11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@musashiXXX thank you for the PR! One concern:

  1. I am not in agreement with your generalization of coercing data to a uuid.UUID if isinstance(bytearray, value) is True. While this likely solved your use case, I am concerned that it doesn't generalize to other columns that hold bytearray values. There may be a BINARY TEXT blob column in a DB, and we don't necessarily want to turn that into a UUID.

@musashiXXX
Copy link
Author

@seanharr11

I am not in agreement with your generalization of coercing data to a uuid.UUID if isinstance(bytearray, value) is True.

I agree. And the 8th commit in that PR un-does this behavior. The PR with the code you mentioned was submitted in haste. See here. There was this commit containing the ill-conceived coerce all bytea to uuid code, and then two commits later the correction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants