You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For the rowwise loader, the logic calls data.values to convert a pandas dataframe to an array, before passing to _build_input_rows(). The resulting numpy array promotes all data to a common supertype, which in some cases can transform the data in undesirable ways. This method works if all data are integers, as the data are transformed to the largest integer width to hold the data.
In the presence of int columns and a float column, all columns will get converted to float64, which in the case of bigint sized columns, converts them to float and then scientific notation, causing the bug as identified in #208.
Because the row-wise loader only requires an iterable, we don't have to convert to an array, we can iterate over the pandas rows and convert each cell to the required Thrift text object.
Labeling this as breaking, as technically I'll be changing how the data are loaded rowwise, and hopefully no one is relying on technically incorrect behavior.
Fixes#209
Calling .values on a dataframe creates an array, promoted to the common datatype able to hold all the data. This can cause ints to be converted to float and other undesirable behavior.
Intent is just to move from dataframe to an iterable, which itertuples does, without changing data representation in a cell
Add integer CPU tests
Test table structure
Switch from testing ints to no nulls
Add integer CPU tests
Test table structure
Switch from testing ints to no nulls
Add integer CPU tests
Test table structure
Switch from testing ints to no nulls
Add integer CPU tests
Test table structure
Switch from testing ints to no nulls
Add integer CPU tests
Test table structure
Switch from testing ints to no nulls
Add integer CPU tests
Test table structure
Switch from testing ints to no nulls
Add integer CPU tests
Test table structure
Switch from testing ints to no nulls
Add integer CPU tests
Add integer CPU tests
Test table structure
Switch from testing ints to no nulls
Related: #208
For the rowwise loader, the logic calls
data.values
to convert a pandas dataframe to an array, before passing to_build_input_rows()
. The resulting numpy array promotes all data to a common supertype, which in some cases can transform the data in undesirable ways. This method works if all data are integers, as the data are transformed to the largest integer width to hold the data.In the presence of int columns and a float column, all columns will get converted to float64, which in the case of bigint sized columns, converts them to float and then scientific notation, causing the bug as identified in #208.
Because the row-wise loader only requires an iterable, we don't have to convert to an array, we can iterate over the pandas rows and convert each cell to the required Thrift text object.
Labeling this as breaking, as technically I'll be changing how the data are loaded rowwise, and hopefully no one is relying on technically incorrect behavior.
https://github.com/omnisci/pymapd/blob/master/pymapd/connection.py#L498
The text was updated successfully, but these errors were encountered: