load_table incorrectly calls data.values #209

randyzwitch · 2019-04-10T20:23:10Z

Related: #208

For the rowwise loader, the logic calls data.values to convert a pandas dataframe to an array, before passing to _build_input_rows(). The resulting numpy array promotes all data to a common supertype, which in some cases can transform the data in undesirable ways. This method works if all data are integers, as the data are transformed to the largest integer width to hold the data.

In the presence of int columns and a float column, all columns will get converted to float64, which in the case of bigint sized columns, converts them to float and then scientific notation, causing the bug as identified in #208.

Because the row-wise loader only requires an iterable, we don't have to convert to an array, we can iterate over the pandas rows and convert each cell to the required Thrift text object.

Labeling this as breaking, as technically I'll be changing how the data are loaded rowwise, and hopefully no one is relying on technically incorrect behavior.

https://github.com/omnisci/pymapd/blob/master/pymapd/connection.py#L498

The text was updated successfully, but these errors were encountered:

Fixes #209 Calling .values on a dataframe creates an array, promoted to the common datatype able to hold all the data. This can cause ints to be converted to float and other undesirable behavior. Intent is just to move from dataframe to an iterable, which itertuples does, without changing data representation in a cell Add integer CPU tests Test table structure Switch from testing ints to no nulls Add integer CPU tests Test table structure Switch from testing ints to no nulls Add integer CPU tests Test table structure Switch from testing ints to no nulls Add integer CPU tests Test table structure Switch from testing ints to no nulls Add integer CPU tests Test table structure Switch from testing ints to no nulls Add integer CPU tests Test table structure Switch from testing ints to no nulls Add integer CPU tests Test table structure Switch from testing ints to no nulls Add integer CPU tests Add integer CPU tests Test table structure Switch from testing ints to no nulls

randyzwitch added bug breaking labels Apr 10, 2019

randyzwitch changed the title ~~load_table incorrectly calls d~~ load_table incorrectly calls data.values Apr 10, 2019

randyzwitch closed this as completed in 5ad20cc Apr 11, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

load_table incorrectly calls data.values #209

load_table incorrectly calls data.values #209

randyzwitch commented Apr 10, 2019

load_table incorrectly calls data.values #209

load_table incorrectly calls data.values #209

Comments

randyzwitch commented Apr 10, 2019