Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Backend - Pyarrow]PostgresSQL ingestion maps ARRAY to wrong datatype in Bigquery #2137

Open
lfpll opened this issue Dec 11, 2024 · 0 comments

Comments

@lfpll
Copy link

lfpll commented Dec 11, 2024

dlt version

1.4.1

Describe the problem

When ingesting ARRAY bigquery the pyarrow wrongly maps to a RECORD inside a RECORD. I remember seeing this before and it's a pyarrow problem

Steps to reproduce

How to replicate

Create a .env

EXTRACT_DB_DELTA__DESTINATION__BIGQUERY__LOCATION
EXTRACT_DB_DELTA__DESTINATION__BIGQUERY__CREDENTIALS__PROJECT_ID
EXTRACT_DB_DELTA__SOURCES__SQL_DATABASE__SQL_DATABASE__CREDENTIALS__DRIVERNAME
EXTRACT_DB_DELTA__SOURCES__SQL_DATABASE__SQL_DATABASE__CREDENTIALS__HOST
EXTRACT_DB_DELTA__SOURCES__SQL_DATABASE__SQL_DATABASE__CREDENTIALS__USERNAME
EXTRACT_DB_DELTA__SOURCES__SQL_DATABASE__SQL_DATABASE__CREDENTIALS__PASSWORD
EXTRACT_DB_DELTA__SOURCES__SQL_DATABASE__SQL_DATABASE__CREDENTIALS__DATABASE
EXTRACT_DB_DELTA__SOURCES__SQL_DATABASE__SQL_DATABASE__CREDENTIALS__PORT

Docker-compose.yml

version: '3.8'
services:
  db:
    image: postgres:15-alpine
    environment:
      POSTGRES_USER: ${EXTRACT_DB_DELTA__SOURCES__SQL_DATABASE__SQL_DATABASE__CREDENTIALS__USERNAME}
      POSTGRES_PASSWORD: ${EXTRACT_DB_DELTA__SOURCES__SQL_DATABASE__SQL_DATABASE__CREDENTIALS__PASSWORD}
      POSTGRES_DB: ${EXTRACT_DB_DELTA__SOURCES__SQL_DATABASE__SQL_DATABASE__CREDENTIALS__DATABASE}
    ports:
      - "${EXTRACT_DB_DELTA__SOURCES__SQL_DATABASE__SQL_DATABASE__CREDENTIALS__PORT}:5432"
    volumes:
      - ./init.sql:/docker-entrypoint-initdb.d/init.sql

volumes:
  postgres_data:

init.sql

CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    email VARCHAR(100),
    credit NUMERIC,
    list_of_ids INT[]
);

CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    price DECIMAL(10,2)
);

INSERT INTO users (name, email, credit,list_of_ids) VALUES
    ('John Doe', '[email protected]',0.999999999, '{1,2}'),
    ('Jane Smith', '[email protected]',0.9999999, '{3}');

INSERT INTO products (name, price) VALUES
    ('Laptop', 999.99),
    ('Phone', 599.99),
    ('Tablet', 299.99);

python script

   table_config = {"table_name": "users"}

   pipeline = dlt.pipeline(
       pipeline_name="extract_db_delta",
       destination='bigquery',
       dataset_name=DATASET_NAME,
       progress="log",
   )
   credentials = ConnectionStringCredentials()

   extra_table_args = {}
   extra_sql_args = {
       # "table_adapter_callback": add_dw_extraction_time_column,
   }
   
   sources = sql_database(
       credentials=credentials,
       chunk_size=10000 * 5,
       backend='pyarrow',
       include_views=True,
       **extra_sql_args,
   ).with_resources(table_config["table_name"])

   logging.info(f"Adding table: {table_config}")

   table = sources.resources[table_config["table_name"]]
   table.apply_hints(write_disposition="append", **extra_table_args)

   info = pipeline.run(sources)
   logging.info(info)

Operating system

Linux

Runtime environment

Local

Python version

3.11

dlt data source

sql_database

dlt destination

Google BigQuery

Other deployment details

I removed the name of my company from the code so it may have some issues on the environment variables or the variables names.

Additional information

Result

image

@lfpll lfpll changed the title PostgresSQL ARRAY ingestion with pyarrow maps wrongly to RECORD<RECORD<>> in Bigquery [Backend - Pyarrow]PostgresSQL ingestion maps ARRAY to wrong datatype in Bigquery Dec 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

No branches or pull requests

1 participant