Skip to content

From redshift DB, read_sql_query distorts values, but unload does not #2216

Open
@lampretl

Description

@lampretl

Describe the bug

In our private redshift database on AWS, I have a table with a column probability of type numeric(38, 20), its values are between 0 and 1. In AWS SageMaker jupyter notebook, I query/download the content of that table. The values obtained via read_sql_query are also negative, but the ones obtained via unload are all positive.

How to Reproduce

When executing the following code

query = "SELECT user_id, probability p FROM models.churn_short_term_predictions"
df1 = wr.redshift.read_sql_query(query, con=con)
print(df1.p.min())
df2 = wr.redshift.unload(query, con=con, path=staging_area()) 
print(df2.p.min())

the output is unexpectedly:

-0.09223216322371685616
0.00028089172873606330

Expected behavior

Outputs df1.p.min() and df2.p.min() should be equal. And certainly, both must be between 0 and 1.

Your project

No response

Screenshots

photo_2023-04-20_18-37-54

Screenshot at 2023-04-20 18-41-48

OS

Linux (AWS)

Python version

3.8.16

AWS SDK for pandas version

3.0.0

Additional context

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions