Skip to content

Commit

Permalink
names the duckdb database file after the pipeline that created it
Browse files Browse the repository at this point in the history
  • Loading branch information
rudolfix committed Feb 28, 2023
1 parent f38cee0 commit 85f2557
Show file tree
Hide file tree
Showing 4 changed files with 8 additions and 5 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ experiments/*
secrets.toml
*.session.sql
*.duckdb
*.wal

# Byte-compiled / optimized / DLL files
**/__pycache__/
Expand Down
6 changes: 4 additions & 2 deletions dlt/destinations/duckdb/configuration.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,8 @@
from dlt.common.destination.reference import DestinationClientDwhConfiguration
from dlt.common.typing import DictStrAny, TSecretValue

DEFAULT_DUCK_DB_NAME = "quack.duckdb"
DUCK_DB_NAME = "%s.duckdb"
DEFAULT_DUCK_DB_NAME = DUCK_DB_NAME % "quack"
LOCAL_STATE_KEY = "duckdb_database"


Expand Down Expand Up @@ -117,7 +118,8 @@ def _path_from_pipeline(self, default_path: str) -> str:
context = Container()[PipelineContext]
if context.is_active():
try:
# get
# use pipeline name as default
default_path = DUCK_DB_NAME % context.pipeline().pipeline_name
return context.pipeline().get_local_state_val(LOCAL_STATE_KEY) # type: ignore
except KeyError:
pass
Expand Down
2 changes: 1 addition & 1 deletion docs/website/docs/destinations.md
Original file line number Diff line number Diff line change
Expand Up @@ -155,7 +155,7 @@ python3 chess.py

### Destination Configuration

By default, a DuckDB database will be created in the current working directory with a name `quack.duckdb`. After loading, it is available in `read/write` mode via `with pipeline.sql_client() as con:` which is a wrapper over `DuckDBPyConnection`. See [duckdb docs](https://duckdb.org/docs/api/python/overview#persistent-storage) for details.
By default, a DuckDB database will be created in the current working directory with a name `<pipeline_name>.duckdb` (`chess.duckdb` in the example above). After loading, it is available in `read/write` mode via `with pipeline.sql_client() as con:` which is a wrapper over `DuckDBPyConnection`. See [duckdb docs](https://duckdb.org/docs/api/python/overview#persistent-storage) for details.

The `duckdb` credentials do not require any secret values. You are free to pass the configuration explicitly via the `credentials` parameter to `dlt.pipeline` or `pipeline.run` methods. For example:
```python
Expand Down
4 changes: 2 additions & 2 deletions tests/load/duckdb/test_duckdb_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,12 +42,12 @@ def test_duckdb_open_conn_default() -> None:
def test_duckdb_database_path() -> None:
# resolve without any path provided
c = resolve_configuration(DuckDbClientConfiguration(dataset_name="test_dataset"))
assert c.credentials.database.lower() == os.path.abspath(DEFAULT_DUCK_DB_NAME).lower()
assert c.credentials.database.lower() == os.path.abspath("quack.duckdb").lower()
# resolve without any path but with pipeline context
p = dlt.pipeline(pipeline_name="quack_pipeline")
c = resolve_configuration(DuckDbClientConfiguration(dataset_name="test_dataset"))
# still cwd
db_path = os.path.abspath(os.path.join(".", DEFAULT_DUCK_DB_NAME))
db_path = os.path.abspath(os.path.join(".", "quack_pipeline.duckdb"))
assert c.credentials.database.lower() == db_path.lower()
# but it is kept in the local state
assert p.get_local_state_val("duckdb_database").lower() == db_path.lower()
Expand Down

0 comments on commit 85f2557

Please sign in to comment.