Releases: dlt-hub/dlt
0.4.9
Core Library
- SCD2 support by @jorritsandbrink in #1168 https://dlthub.com/devel/general-usage/incremental-loading#scd2-strategy
- A fully configurable layout for filesystem files by @sultaniman in #1182 https://dlthub.com/devel/dlt-ecosystem/destinations/filesystem#files-layout
- picks file format matching item format to minimize number of rewrites during loading by @rudolfix in #1222
- fix athena iceberg's trailing location by @romanperesypkin in #1230
- Pass options to parse iso like strings by @VioletM in #1219
- pipeline state can be restored from filesystem destination by @sh-rp in #1184 - https://dlthub.com/devel/dlt-ecosystem/destinations/filesystem#syncing-of-dlt-state
- Remove
staging-optimized
replace strategy forsynapse
by @jorritsandbrink in #1231 - fixes bug, where configs where not injected for async functions by @sh-rp in #1241
- feat(transform): implement columns pivot map function by @IlyaFaer in #1152
- Add max_table_nesting to resource decorator by @sultaniman in #1242
- adds csv options to write headers, change delimiter, quotation style by @rudolfix in #1239
- Check for default schema and schema name in streamlit session by @sultaniman in #1155
- Add seconds and millisecond timestamps to filesystem date placeholders by @sultaniman in #1260
- send dlt telemetry wherever you want, not only segment by @zem360 in #1236
- Make merge write-disposition fall back to staging append if no primary or merge keys are specified by @sh-rp in #1225
- Add snowflake application parameter to configuration by @sultaniman in #1266
Docs
- Added docs for deploying dlt with Prefect. by @dat-a-man in #1138
- a note on scd2 incoming high ts change by @rudolfix in #1273
- adding images and wordsmithing to Prefect walkthrough by @WillRaphaelson in #1276
Verified Sources
- Use
pyarrow
,pandas
,connectorx
orsqlalchemy
backends when reading tables withsql_database
. See README for details. dlt-hub/verified-sources#425 - Google ads source is available dlt-hub/verified-sources#428
- Pages endpoint for notion dlt-hub/verified-sources#429
New Contributors
- @romanperesypkin made their first contribution in #1230
- @WillRaphaelson made their first contribution in #1276
Full Changelog: 0.4.8...0.4.9
0.4.9a2
A pre-release that allows to try out the following features and includes the following bugfixes:
- SCD2 support by @jorritsandbrink in #1168 We are still working on BigQuery support) https://dlthub.com/devel/general-usage/incremental-loading#scd2-strategy
- A fully configurable layout for filesystem files by @sultaniman in #1182 https://dlthub.com/devel/dlt-ecosystem/destinations/filesystem#files-layout
- picks file format matching item format by @rudolfix in #1222
- fix athena iceberg's trailing location by @romanperesypkin in #1230
- Pass options to parse iso like strings by @VioletM in #1219
- filesystem state sync by @sh-rp in #1184 - https://dlthub.com/devel/dlt-ecosystem/destinations/filesystem#syncing-of-dlt-state
- Remove
staging-optimized
replace strategy forsynapse
by @jorritsandbrink in #1231 - fixes bug, where configs where not injected for async functions by @sh-rp in #1241
- adds options to write csv headers, change delimiter by @rudolfix in #1239
Final release is scheduled for next week
0.4.8
Core Library
- Add Dremio as a destination by @maxfirman in #1026
- adds a fast loading of arrow tables/pandas to postgres via COPY csv by @rudolfix in #1185
- adds a csv writer for filesystem and postgres by @rudolfix in #1185
- saves parquet with all logical types,
spark
flavor is not a default any longer by @rudolfix in #1185
#1185 - feat(bigquery): add streaming inserts support by @IlyaFaer in #1123
- Feat: parameterize pipeline class in the primary factory method by @z3z1ma in #1176
- Fix: check for typeddict before class or subclass checks which fail by @z3z1ma in #1160
- fixes column order and add hints table variants by @rudolfix in #1127
- fixes schema versioning by @rudolfix in #1140
- regular initializers for credentials / config specs are type checked like dataclasses by @rudolfix in #1142
- fix streamlit app state display: Add yaml representer for pendulum datetime by @sultaniman in #1192
synapse
andmssql
bugfixes and improvements (INSERT VALUES UNION) by @jorritsandbrink in #1174- various improvements to arrow table normalization by @rudolfix in #1185
- arrow tables without rows create tables in destination by @rudolfix in #1185
- fixes Motherduck configuration to use
my_db
default database and makes password / token mandatory by @rudolfix in
Docs
- docs: add typechecking to embedded snippets by @sh-rp in #1130
- Fix typo with switched column names in schema evolution docs page by @b-per in #1132
- Docs: deploy with Kestra by @dat-a-man in #1087
- Docs: Deploy dlt on dagster by @dat-a-man in #1086
- Update example connection string by @MiConnell in #1188
- Changed directory of all the blog images to google cloud storage. by @dat-a-man in #1156
Verified Sources
- postgres replication / CDC by @jorritsandbrink dlt-hub/verified-sources#392
New Contributors
- @b-per made their first contribution in #1132
- @MiConnell made their first contribution in #1188
- @maxfirman made their first contribution in #1026
Full Changelog: 0.4.7...0.4.8
0.4.7
Core Library
- Custom destinations with
@dlt.destination
decorator by @sh-rp in #1065 - A BigQuery custom destination supporting STRUCT data types by @sh-rp in #1107
- Built-in Streamlit rewrite, UI improvements, dark theme a by @sultaniman in #1060
- fixes various edge cases with Incremental data deduplication, for ordered and unordered results #971 by @rudolfix in #1062
- Adds new
dlt.mark
marker to materialize table schemas without data by @rudolfix in #1122 - validates class instances in typed dict by @rudolfix in #1082
- feat(airflow): allow re-using sources in airflow wrapper by @IlyaFaer in #1080
- feat(core): drop default value for write disposition by @IlyaFaer in #1057
- splits pandas and arrow imports to fix pyarrow.compute missing by @rudolfix in #1112
- improve no schema upgrade path exception by @sh-rp in #1125
Docs
- docs(airflow): add description of new decompose methods by @IlyaFaer in #1072
- check embedded code blocks by @sh-rp in #1093
- docs(kafka): describe the possible sync issues by @IlyaFaer in #1100
- Docs: schema evolution by @dat-a-man in #1078
- Add example link to the custom destination page by @VioletM in #1120
Full Changelog: 0.4.6...0.4.7
0.4.6
Core Library
- feat(airflow): expose the Airflow runner method to create custom DAGs by @IlyaFaer in #1014
- removes sql alchemy dependency and port parts of URL class by @rudolfix in #1028
- Parallelize decorator - run many regular generators in parallel by @steinitzu in #965
- Add main entry point to support calling dlt as python module by @sultaniman in #1023
Library Bugfixes
- fixes naive datetime bug in incremental by @rudolfix in #1020
- Import missing pyarrow compute for transforms on arrowitems by @sh-rp in #1010
- delete normalized package in case it already existed by @sh-rp in #1012
- fix(core): validation error with TTableHintTemplate by @IlyaFaer in #1039
- adds test case where payload data contains PUA unicode characters by @willi-mueller in #1053
- fix add_limit behavior in edge cases by @sh-rp in #1052
- adds row_order to Incremental - automatically stop taking data when out of range by @rudolfix in #1041
- Fix to serialize load metrics as list instead of a dictionary by @sultaniman in #1051
- fix import schema workflow by @sh-rp in #1013
- rollback all changes to live schemas when extraction fails by @sh-rp in #1013
Docs
- Fix zendesk example test by @VioletM in #1027
- Edit arrow-pandas.md and fix a typo by @Bl3f in #1001
- Added info about file compression to filesystem docs by @dat-a-man in #975
- Update "create destination" docs with new file layouts by @steinitzu in #1032
- Docs update on how to set query limits. by @dat-a-man in #973
- Docs/Updated for slack alerts. by @dat-a-man in #1042
Verified Sources
- scrape web sites with spiders and Scrapy and send data to dlt @sultaniman dlt-hub/verified-sources#332
sql_database
recoginizesend_value
androw_order
to return rows in range and optionally ordered. backfill and proper Airflow intervals support @rudolfix dlt-hub/verified-sources#388
New Contributors
Full Changelog: 0.4.5...0.4.6
0.4.5
Core Library
- enables google drive filesystem for sources and destinations (second one experimental, google drive listings are only eventually consistent!) by @IlyaFaer in #932
- creates parallel Airflow DAGs in airflow helper to allow many resources to be executed at once @IlyaFaer in #966
- 855 create bigquery adapter for dlt resources: easily configure partitions, clustering, data retention etc. by @Pipboyguy in #952 and https://dlthub.com/docs/dlt-ecosystem/destinations/bigquery#bigquery-adapter
- Use BIGNUMERIC for large decimals in bigquery by @steinitzu in #984
- Normalize keys for Google secrets config provider by @sultaniman in #963
- does not lowercase postgres and redshift database names by @rudolfix in #990
- Introduce
hard_delete
anddedup_sort
columns hint formerge
by @jorritsandbrink in #960 and https://dlthub.com/docs/general-usage/incremental-loading#delete-records - adjustment of pua start in typed json encoding, pass through on decoding errors by @rudolfix in #974
- creates isolated parallel Airflow DAGs in airflow helper to execute resources parallel in isolated pipelines @IlyaFaer in #979
- Fix annotation processing and rebuilding, mark dataclass as complex by @sultaniman in #980
- allows async functions to be decorated with dlt.source by @rudolfix in #985
- allows right pipe operator to feed simple lists into a transformer @rudolfix in #985
- allows pendulum datetime as incremental cursor when loading arrow tables @rudolfix in #985
- enables Python 3.12 (mind that not all extras have python 3.12 libraries!) @rudolfix in #985
Docs
- docs(filesystem): include Google Drive into filesystem tutorial by @IlyaFaer in #962
- Fix typos/grammar in tutorial docs by @taljaards in #972
- add blog post observability by @adrianbr in #989
- Update arrow-pandas.md by @snehangsude in #992
- Clarify info about GoodData in modelling tools article by @mhauzirek in #956
- Fix small typings in contributing guide by @VioletM in #993
- Docs/google sheets update by @dat-a-man in #976
- Added "Incremental Configuration" section to SQL Databases documentat… by @dat-a-man in #977
Verified Sources
- Bing Webmaster source by @willi-mueller
New Contributors
- @taljaards made their first contribution in #972
- @mhauzirek made their first contribution in #956
- @snehangsude made their first contribution in #992
- @VioletM made their first contribution in #993
Full Changelog: 0.4.4...0.4.5
0.4.4
Core Library
- passes incremental from apply hints to resource function by @rudolfix in #953
- Handle UnionType when checking is_union_type and is_optional_type by @sultaniman in #951
- yanks orjson to <=0.3.10 by @rudolfix in #958
Docs
- Databricks workspace setup docs by @steinitzu in #949
Verified Source
- allows for table reflection at runtime, column selection and buffer control in
sql_database
@rudolfix (dlt-hub/verified-sources#351)
Full Changelog: 0.4.3...0.4.4
0.4.3
Core Library
- Databricks destination by @steinitzu and @phillem15 in #892
- Synapse destination by @jorritsandbrink in #900
- BigQuery Partitioning Improvements by @Pipboyguy in #887
- enable async generators as resources by @sh-rp in #905
- fix: use truthy value in ternary since 0 cause div by zero by @z3z1ma in #902
- feat(filesystem): add compression flag if the read file is GZ by @IlyaFaer in #912
- Enhancements in Filesystem Configuration by @Pipboyguy in #869
- add mark function to emit resource hints from decorated function by @rudolfix in #938
- handles nested Pydantic models when generating dlt schema by @sultaniman in #901
Docs
- Restructure intro, getting started and tutorial by @burnash in #702
- Update the release instructions in CONTRIBUTING.md by @burnash in #867
- Add explicit sub section about streamlit under getting started by @sultaniman in #884
- Examples: google sheets by @AstrakhantsevaAA in #846
- Added URL-parser documentation by @dat-a-man in #909
Verified Sources
- feat(filesystem): implement a csv reader with duckdb engine @IlyaFaer dlt-hub/verified-sources#319
- fix(notion): define payload within the while-loop @glebzhidkov (dlt-hub/verified-sources#338)
- sql alchemy + connector x example @rudolfix (dlt-hub/verified-sources#334)
- Shopify: Standalone resource for partner API queries @steinitzu (dlt-hub/verified-sources#329)
- sql-database: detect precision and scale of supported column types @steinitzu (dlt-hub/verified-sources#324)
- feat(sources.kafka): implement Kafka source @IlyaFaer (dlt-hub/verified-sources#306)
New Contributors
- @Pipboyguy made their first contribution in #869
- @sultaniman made their first contribution in #883
Full Changelog: 0.4.2...0.4.3
0.4.2
Core Library
- Fix the data type used in the
from_db_type()
method fromMsSqlTypeMapper
by @jorritsandbrink in #863 - Use Secret Manager in CI by @AstrakhantsevaAA in #859
- Move destination adapters to
dlt.destination.adapters
by @rudolfix in #854
Docs
- Improve HubSpot source docs by @IlyaFaer in #864
- Add new topic to docs: Destination; improve Configuration docs by @rudolfix in #861
Full Changelog: 0.4.1...0.4.2
0.4.1
Major release
This is a major dlt
release (as per our semantic versioninghttps://github.com/dlt-hub/dlt?tab=readme-ov-file#adding-as-dependency). It brings several interesting new features like: schema evolution control, data contracts, deeper Pydantic integration, parametrized destinations, improvements to parallelism and data lineage + many more
There are no significant breaking changes, but minor ones exist, please refer to #763 for details
Core Library
- Parametrized destinations - import destinations from
dlt.destinations
module and instantiate them: by @steinitzu in #746 - schema and data contracts by @sh-rp in #594
- load package id in extract step by @rudolfix in #790
- named destinations: configure many destinations with different names by @sh-rp in #783
- rich tracing information from pipeline steps (extract, normalize, load) by @rudolfix in #801
- adds exception stack to pipeline trace by @rudolfix in #806
- fixed attribute check: getuid -> geteuid by @jorritsandbrink in #823
- allows to run parallel pipelines in separate threads by @rudolfix in #813
- 791 test mssql credentialspy is odbc driver 18 dependent by @jorritsandbrink in #834
- adds extract and normalize traces by @rudolfix in #839
Plus some tooling changes
- introduce black formatting by @sh-rp in #583
- Fix: ensure accessor typing does not make static type checker error by @z3z1ma in #785
- Hot fix: add skipifgithubfork to nested_data example by @AstrakhantsevaAA in #802
- Fix Windows lint issue and implement CI lint matrix strategy by @jorritsandbrink in #827
Docs
- documents schema and data contract by @rudolfix in #782
- Added Kinesis documentation. by @dat-a-man in #804
- 788 clarify docs intro by @deanja in #797
- Fix links to source code by @AstrakhantsevaAA in #805
- Clarify docs dev process by @deanja in #809
- Qdrant ingestion pipeline example eg by @hibajamal in #775
- Personio doc: added more endpoints by @AstrakhantsevaAA in #829
New Contributors
- @deanja made their first contribution in #797
- @IlyaFaer made their first contribution in #820
- @jorritsandbrink made their first contribution in #823
Full Changelog: 0.3.25...0.4.1