Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

load_table incorrectly calls data.values #209

Closed
randyzwitch opened this issue Apr 10, 2019 · 0 comments
Closed

load_table incorrectly calls data.values #209

randyzwitch opened this issue Apr 10, 2019 · 0 comments

Comments

@randyzwitch
Copy link
Contributor

Related: #208

For the rowwise loader, the logic calls data.values to convert a pandas dataframe to an array, before passing to _build_input_rows(). The resulting numpy array promotes all data to a common supertype, which in some cases can transform the data in undesirable ways. This method works if all data are integers, as the data are transformed to the largest integer width to hold the data.

In the presence of int columns and a float column, all columns will get converted to float64, which in the case of bigint sized columns, converts them to float and then scientific notation, causing the bug as identified in #208.

Because the row-wise loader only requires an iterable, we don't have to convert to an array, we can iterate over the pandas rows and convert each cell to the required Thrift text object.

Labeling this as breaking, as technically I'll be changing how the data are loaded rowwise, and hopefully no one is relying on technically incorrect behavior.

https://github.com/omnisci/pymapd/blob/master/pymapd/connection.py#L498

@randyzwitch randyzwitch changed the title load_table incorrectly calls d load_table incorrectly calls data.values Apr 10, 2019
randyzwitch added a commit that referenced this issue Apr 11, 2019
Fixes #209

Calling .values on a dataframe creates an array, promoted to the common datatype able to hold all the data. This can cause ints to be converted to float and other undesirable behavior.

Intent is just to move from dataframe to an iterable, which itertuples does, without changing data representation in a cell

Add integer CPU tests

Test table structure

Switch from testing ints to no nulls

Add integer CPU tests

Test table structure

Switch from testing ints to no nulls

Add integer CPU tests

Test table structure

Switch from testing ints to no nulls

Add integer CPU tests

Test table structure

Switch from testing ints to no nulls

Add integer CPU tests

Test table structure

Switch from testing ints to no nulls

Add integer CPU tests

Test table structure

Switch from testing ints to no nulls

Add integer CPU tests

Test table structure

Switch from testing ints to no nulls

Add integer CPU tests

Add integer CPU tests

Test table structure

Switch from testing ints to no nulls
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant