You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tried to use [email protected] to copy some data from Postgres (via [email protected]) to MSSQL (via [email protected]) using python 2.7 (since per #14 python3 doesn't seem to be supported). (This is on Windows, but I expected this to become important later when running external commands.)
It failed with:
File "...\sqlalchemy\sql\compiler.py", line 1895, in visit_insert
for crud_param_set in crud_params
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 101: ordinal not in range(128)
while trying to generate the INSERTs for the non-ASCII data to dump into the intermediate file.
While poking around I realized I don't understand how non-Unicode data is supposed to be handled.
The source database uses UTF8 encoding and has this table definition:
CREATE TABLE public.TABLE_NAME
(
id numeric(18,0) NOT NULL,
name character varying(255) COLLATE pg_catalog."default",
...
)
The client_encoding is UTF8 (both with the ?client_encoding=utf8 in the connection string and without).
elif "STRING" in base_classes
...
if isinstance(row[idx], unicode):
row[idx] = row[idx].encode('utf-8', 'ignore')
But SQLAlchemy really likes unicode, and in visit_insert in compiler.py it does:
text = "INSERT "
#...
text += "INTO "
#...
table_text = preparer.format_table(insert_stmt.table) # returns quoted_name
#...
text += table_text # gets converted to `unicode`
#...
elif insert_stmt._has_multi_parameters:
text += " VALUES %s" % ( # non-ascii characters passed here
...which raises the UnicodeDecodeError.
Removing the encoding logic from standardize_column_type and adding it to dump_to_sql_statement in literal_value_generator.py:
fp.write(stmt.encode('UTF-8'))
...sorta works, but I'd like to understand what I'm missing.
The migration log:
Sending source '<etlalchemy.ETLAlchemySource.ETLAlchemySource instance at 0x0000000004E663C8>' to destination 'mssql+pyodbc:///?odbc_connect=XXX'
ETLAlchemySource (INFO) -
*************************
*** Total Tables: 1 ***
*************************
ETLAlchemySource (INFO) - Reading Table Schema 'TABLE_NAME'...
ETLAlchemySource (INFO) - Loaded indexes and FKs for table 'TABLE_NAME'
ETLAlchemySource (INFO) - Building query to fetch all rows from TABLE_NAME
ETLAlchemySource (INFO) - Done. (315 total rows)
ETLAlchemySource (INFO) - Loading all rows into memory...
ETLAlchemySource (INFO) - Done
ETLAlchemySource (INFO) - (id) NUMERIC
ETLAlchemySource (INFO) - Bases: ['NUMERIC']
ETLAlchemySource (INFO) - --> id...{'Decimal': 315}
ETLAlchemySource (WARNING) - Column 'id' is of type 'Decimal', but contains no mantissas > 0. (i.e. 3.00, 2.00, etc..)
ETLAlchemySource (WARNING) - Coercing to 'Integer'
ETLAlchemySource (INFO) - Checking column for elimination status...
ETLAlchemySource (INFO) - (pname) VARCHAR
ETLAlchemySource (INFO) - Bases: ['STRING']
...
ETLAlchemySource (WARNING) - Table '{0}' already exists - not creating table, reflecting to get new changes instead..
ETLAlchemySource (INFO) - Transforming & Dumping 315 total rows from table 'TABLE_NAME' into 'path_to/TABLE_NAME.sql'.
ETLAlchemySource (INFO) - (TABLE_NAME) -- Transforming rows: 0 -> 315...(315 Total)
ETLAlchemySource (INFO) - (TABLE_NAME) -- Dumping rows: 0 -> 315 to 'TABLE_NAME.sql'...(315 Total)[Table 0/1]
ETLAlchemySource (INFO) - Gathering unique columns for upsert.
ETLAlchemySource (INFO) - Unique columns are '[Column('id', INTEGER(), table=<TABLE_NAME>, primary_key=True, nullable=False, default=Sequence('id_identity', start=1, increment=1, metadata=MetaData(bind=Engine(mssql+pyodbc:///?odbc_connect=XXX))))]'
ETLAlchemySource (INFO) - Creating 'upsert' statements for '315' rows, and dumping to 'TABLE_NAME.sql'.
ETLAlchemySource (INFO) - Creating 'insert' stmts for (the remaining)315 rows, and dumping to 'TABLE_NAME.sql' (because they DNE in the table!).
ETLAlchemySource (INFO) - (315) -- Inserting remaining '315' rows.
Traceback (most recent call last):
File "test.py", line 12, in <module>
tgt.migrate()
File "...\etlalchemy\ETLAlchemyTarget.py", line 86, in migrate
migrate_data=migrate_data)
File "...\etlalchemy\ETLAlchemySource.py", line 1140, in migrate
pks, Session)
File "...\etlalchemy\ETLAlchemySource.py", line 871, in dump_data
self.dst_engine, T.name)
File "...\etlalchemy\literal_value_generator.py", line 234, in dump_to_sql_statement
compiler = LiteralCompiler(dialect, statement)
File "...\sqlalchemy\dialects\mssql\base.py", line 1043, in __init__
super(MSSQLCompiler, self).__init__(*args, **kwargs)
File "...\sqlalchemy\sql\compiler.py", line 395, in __init__
Compiled.__init__(self, dialect, statement, **kwargs)
File "...\sqlalchemy\sql\compiler.py", line 190, in __init__
self.string = self.process(self.statement, **compile_kwargs)
File "...\sqlalchemy\sql\compiler.py", line 213, in process
return obj._compiler_dispatch(self, **kwargs)
File "...\sqlalchemy\sql\visitors.py", line 81, in _compiler_dispatch
return meth(self, **kw)
File "...\sqlalchemy\sql\compiler.py", line 1895, in visit_insert
for crud_param_set in crud_params
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 101: ordinal not in range(128)
I tried to use [email protected] to copy some data from Postgres (via [email protected]) to MSSQL (via [email protected]) using python 2.7 (since per #14 python3 doesn't seem to be supported). (This is on Windows, but I expected this to become important later when running external commands.)
It failed with:
while trying to generate the
INSERT
s for the non-ASCII data to dump into the intermediate file.While poking around I realized I don't understand how non-Unicode data is supposed to be handled.
The source database uses UTF8 encoding and has this table definition:
The
client_encoding
is UTF8 (both with the?client_encoding=utf8
in the connection string and without).SQLAlchemy returns
unicode
strings, then instandardize_column_type
(currently at https://github.com/seanharr11/etlalchemy/blob/master/etlalchemy/ETLAlchemySource.py#L201 ) they get converted tostr
:But SQLAlchemy really likes unicode, and in
visit_insert
in compiler.py it does:...which raises the UnicodeDecodeError.
Removing the encoding logic from
standardize_column_type
and adding it todump_to_sql_statement
in literal_value_generator.py:fp.write(stmt.encode('UTF-8'))
...sorta works, but I'd like to understand what I'm missing.
The migration log:
Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.
The text was updated successfully, but these errors were encountered: