Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError with Postgres -> MSSQL migration #28

Open
nickolay opened this issue Oct 3, 2017 · 0 comments
Open

UnicodeDecodeError with Postgres -> MSSQL migration #28

nickolay opened this issue Oct 3, 2017 · 0 comments

Comments

@nickolay
Copy link

nickolay commented Oct 3, 2017

I tried to use [email protected] to copy some data from Postgres (via [email protected]) to MSSQL (via [email protected]) using python 2.7 (since per #14 python3 doesn't seem to be supported). (This is on Windows, but I expected this to become important later when running external commands.)

It failed with:

  File "...\sqlalchemy\sql\compiler.py", line 1895, in visit_insert
    for crud_param_set in crud_params
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 101: ordinal not in range(128)

while trying to generate the INSERTs for the non-ASCII data to dump into the intermediate file.

While poking around I realized I don't understand how non-Unicode data is supposed to be handled.

The source database uses UTF8 encoding and has this table definition:

CREATE TABLE public.TABLE_NAME
(
    id numeric(18,0) NOT NULL,
    name character varying(255) COLLATE pg_catalog."default",
    ...
)

The client_encoding is UTF8 (both with the ?client_encoding=utf8 in the connection string and without).

SQLAlchemy returns unicode strings, then in standardize_column_type (currently at https://github.com/seanharr11/etlalchemy/blob/master/etlalchemy/ETLAlchemySource.py#L201 ) they get converted to str:

elif "STRING" in base_classes
    ...
    if isinstance(row[idx], unicode):
        row[idx] = row[idx].encode('utf-8', 'ignore')

But SQLAlchemy really likes unicode, and in visit_insert in compiler.py it does:

text = "INSERT "
#...
text += "INTO "
#...
table_text = preparer.format_table(insert_stmt.table)  # returns quoted_name
#...
text += table_text # gets converted to `unicode`
#...
    elif insert_stmt._has_multi_parameters:
        text += " VALUES %s" % ( # non-ascii characters passed here

...which raises the UnicodeDecodeError.

Removing the encoding logic from standardize_column_type and adding it to dump_to_sql_statement in literal_value_generator.py:
fp.write(stmt.encode('UTF-8'))

...sorta works, but I'd like to understand what I'm missing.

The migration log:

Sending source '<etlalchemy.ETLAlchemySource.ETLAlchemySource instance at 0x0000000004E663C8>' to destination 'mssql+pyodbc:///?odbc_connect=XXX'
ETLAlchemySource (INFO) -
        *************************
        *** Total Tables: 1 ***
        *************************
ETLAlchemySource (INFO) - Reading Table Schema 'TABLE_NAME'...
ETLAlchemySource (INFO) - Loaded indexes and FKs for table 'TABLE_NAME'
ETLAlchemySource (INFO) - Building query to fetch all rows from TABLE_NAME
ETLAlchemySource (INFO) - Done. (315 total rows)
ETLAlchemySource (INFO) - Loading all rows into memory...
ETLAlchemySource (INFO) - Done
ETLAlchemySource (INFO) - (id) NUMERIC
ETLAlchemySource (INFO) - Bases: ['NUMERIC']
ETLAlchemySource (INFO) -  --> id...{'Decimal': 315}
ETLAlchemySource (WARNING) - Column 'id' is of type 'Decimal', but contains no mantissas > 0. (i.e. 3.00, 2.00, etc..)

ETLAlchemySource (WARNING) - Coercing to 'Integer'
ETLAlchemySource (INFO) - Checking column for elimination status...
ETLAlchemySource (INFO) - (pname) VARCHAR
ETLAlchemySource (INFO) - Bases: ['STRING']
...
ETLAlchemySource (WARNING) - Table '{0}' already exists - not creating table, reflecting to get new changes instead..
ETLAlchemySource (INFO) - Transforming & Dumping 315 total rows from table 'TABLE_NAME' into 'path_to/TABLE_NAME.sql'.
ETLAlchemySource (INFO) -  (TABLE_NAME) -- Transforming rows: 0 -> 315...(315 Total)
ETLAlchemySource (INFO) -  (TABLE_NAME) -- Dumping rows: 0 -> 315 to 'TABLE_NAME.sql'...(315 Total)[Table 0/1]
ETLAlchemySource (INFO) - Gathering unique columns for upsert.
ETLAlchemySource (INFO) - Unique columns are '[Column('id', INTEGER(), table=<TABLE_NAME>, primary_key=True, nullable=False, default=Sequence('id_identity', start=1, increment=1, metadata=MetaData(bind=Engine(mssql+pyodbc:///?odbc_connect=XXX))))]'
ETLAlchemySource (INFO) - Creating 'upsert' statements for '315' rows, and dumping to 'TABLE_NAME.sql'.
ETLAlchemySource (INFO) - Creating 'insert' stmts for (the remaining)315 rows, and dumping to 'TABLE_NAME.sql' (because they DNE in the table!).
ETLAlchemySource (INFO) -  (315) -- Inserting remaining '315' rows.
Traceback (most recent call last):
  File "test.py", line 12, in <module>
    tgt.migrate()
  File "...\etlalchemy\ETLAlchemyTarget.py", line 86, in migrate
    migrate_data=migrate_data)
  File "...\etlalchemy\ETLAlchemySource.py", line 1140, in migrate
    pks, Session)
  File "...\etlalchemy\ETLAlchemySource.py", line 871, in dump_data
    self.dst_engine, T.name)
  File "...\etlalchemy\literal_value_generator.py", line 234, in dump_to_sql_statement
    compiler = LiteralCompiler(dialect, statement)
  File "...\sqlalchemy\dialects\mssql\base.py", line 1043, in __init__
    super(MSSQLCompiler, self).__init__(*args, **kwargs)
  File "...\sqlalchemy\sql\compiler.py", line 395, in __init__
    Compiled.__init__(self, dialect, statement, **kwargs)
  File "...\sqlalchemy\sql\compiler.py", line 190, in __init__
    self.string = self.process(self.statement, **compile_kwargs)
  File "...\sqlalchemy\sql\compiler.py", line 213, in process
    return obj._compiler_dispatch(self, **kwargs)
  File "...\sqlalchemy\sql\visitors.py", line 81, in _compiler_dispatch
    return meth(self, **kw)
  File "...\sqlalchemy\sql\compiler.py", line 1895, in visit_insert
    for crud_param_set in crud_params
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 101: ordinal not in range(128)

Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant