Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to connect to duckdb database residing on S3 #249

Open
fRoDdYy opened this issue Sep 13, 2023 · 13 comments
Open

Unable to connect to duckdb database residing on S3 #249

fRoDdYy opened this issue Sep 13, 2023 · 13 comments

Comments

@fRoDdYy
Copy link

fRoDdYy commented Sep 13, 2023

Referring to the documentation when I am trying to connect to my duckdb database by attaching it

outputs:
dev:
type: duckdb
path: tmp/dbt.duckdb
extensions:
- httpfs
- parquet
settings:
s3_region: ap-south-1
s3_access_key_id: "{{ env_var('AWS_ACCESS_KEY_ID') }}"
s3_secret_access_key: "{{ env_var('AWS_SECRET_ACCESS_KEY') }}"
s3_session_token: "{{ env_var('AWS_SESSION_TOKEN') }}"
attach:
- path: "s3://bucket-name/file-path/database.duckdb
read_only: true
threads: 1
external_root: s3://bucket-name/output
target: dev

I get the following error:

Runtime Error
Catalog Error: Cannot open database "s3://bucket-name/file-path/database.duckdb" in read-only mode: database does not exist

@jwills
Copy link
Collaborator

jwills commented Sep 13, 2023

Huh; the path in the attach statement doesn't match the path in the error and it's not totally obvious to me why-- will take a look.

@Mause
Copy link
Member

Mause commented Sep 13, 2023

The s3 attach via httpfs story is a bit of a strange one right now - I think you can only set auth via environment variables or query parameters

@jwills
Copy link
Collaborator

jwills commented Sep 13, 2023

ack, thank you @Mause! I swear I've made it work before, let me see if I can sort out how.

@fRoDdYy
Copy link
Author

fRoDdYy commented Sep 13, 2023

@Mause I tried it with environment variable too but the issue persists.

@jwills Yeah sorry I made that mistake while typing have corrected it though.

@Mause
Copy link
Member

Mause commented Sep 13, 2023

Maybe double check the environment variable names? https://duckdb.org/docs/extensions/httpfs#configuration-1

@NatElkins
Copy link

I am also interested in this use case. I'm trying to perform a query on a DuckDB database in S3. I'm pretty sure my env vars are correct because I'm able to run a query like:

with t as (
	SELECT *
	FROM iceberg_scan('s3a://path/to/files', allow_moved_paths = true)
)
SELECT *
from t;

without issue.

I've set my env vars like this:

SET s3_region = 'us-east-1';
SET s3_access_key_id = 'access_key_id';
SET s3_secret_access_key = 'secret_access_key';

and I've also executed the following:

INSTALL httpfs;
INSTALL iceberg;

LOAD httpfs;
LOAD iceberg;

(Iceberg not related directly to this problem, just noting it).

I also get the error:

Catalog Error: Cannot open database "s3://path/to/test.duckdb" in read-only mode: database does not exist

I know my repro isn't using dbt-duckdb, but I'm also interested in this use case with dbt-duckdb (was just testing with SQL to verify behavior outside of dbt-duckdb).

@jwills
Copy link
Collaborator

jwills commented Jan 31, 2024

Ah appreciate that @NatElkins -- I wonder if it works if you use the fsspec stuff? So as to treat S3 like a filesystem as opposed to trying to use the httpfs route?

@NatElkins
Copy link

NatElkins commented Feb 1, 2024

@jwills It doesn't seem to work, although I may be doing something wrong. I have the creds defined in my ~/.aws/credentials file.

import duckdb
from fsspec import filesystem

duckdb.register_filesystem(filesystem('s3', anon=False))
duckdb.connect("s3://bucket/path/to/test.duckdb'")
r1 = duckdb.sql("select * from test")
print(r1)

The error I get is:

duckdb.connect("s3://bucket/path/to/test.duckdb")
duckdb.duckdb.IOException: IO Error: Cannot open file "/Users/nathanielelkins/Projects/dbt_test/s3://bucket/path/to/test.duckdb": No such file or directory

@jwills
Copy link
Collaborator

jwills commented Feb 1, 2024

Yep, right there with you-- I cannot figure out how to get this to work right now. 😞

@AldricVS
Copy link

AldricVS commented Jun 3, 2024

I stumbled across this thread when searching for the same issue and I managed to make this working using the Secret provider system (see the S3 API Support Page).

The issue seems to be to set credentials using statements like SET s3_endpoint = ....

On the 0.10.3 version, when creating a secret like :

INSTALL httpfs; 
LOAD httpfs;
CREATE SECRET secret1 (
        TYPE S3,
        ENDPOINT '***',
        KEY_ID '****',
        SECRET '****',
        REGION '****'
);

then performing the "attach" statement :

ATTACH 's3://path/to/dbfile'
AS db (READ_ONLY)

it worked perfectly

@evan-climate
Copy link

Chiming in to say this is still a problem on 1.0.0 python release.

@ldolberg
Copy link

ldolberg commented Oct 2, 2024

Any workaround on this? for dagster with duck the workaround was to use boto3 to set the session manually. Is there any way to make it work?

This is my profiles.yml

analytics:
  target: dev
  outputs:
    dev:
      type: duckdb
      path: '../{{ env_var("DUCKDB_DATABASE", "data/staging/data.duckdb") }}'
      use_credential_provider: aws
      extensions:
        - httpfs
        - parquet
      secrets:
        - type: s3
          provider: credential_chain
      settings:
        s3_region: us-west-2

and this is my source:

- name: external_source meta: external_location: "s3://{{env_var('S3_BUCKET','DEFINE_ME')}}/input/reonomy_{name}_*.json" tables: - name: summaries - name: properties

The error I'm getting is

22:49:00 Runtime Error in model base_reonomy_properties (models/source/base_reonomy_properties.sql) HTTP Error: HTTP GET error on '/?encoding-type=url&list-type=2&prefix=input%2Freonomy_summaries_' (HTTP 403) 22:49:00

@jwills
Copy link
Collaborator

jwills commented Oct 2, 2024

Oh interesting— can you point me at the Dagster workaround and I will see if I can implement it here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants