You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
setting concurrent_batches=true for incremental_strategy=microbatch yields
[WARNING]: Found 1 microbatch model with the `concurrent_batches` config set to true, but the databricks adapter does not support running batches concurrently. Batches will be run sequentially.
@benc-db followed the dbt guide to activate concurrency, which yielded
[DELTA_CONCURRENT_APPEND] ConcurrentAppendException: Files were added to the root of the table by a concurrent update. Please try the operation again.
qouting him:
There are ways to allow concurrent writes, but it looks like they may be dependent on the actual schema of the table: link
considering that the whole 'microbatch' feature was developed especially with large datasets in mind, concurrency would be really powerful.
Describe alternatives you've considered
given an incremental batch of 5 days, we tried using jinja to loop through these 5 days and then unioning them (to not overload the memory with too many rows), but that got "optimized away" by the internal query optimizer. but it also just creates annoying complexity on top of your models.
Additional context
none
Who will this benefit?
clients with large datasets/heavy operations, where the 'row count' bottleneck does not scale linearly but explodes after some threshold (depending on memory)
Are you interested in contributing this feature?
yes, but no idea how for this topic
The text was updated successfully, but these errors were encountered:
Describe the feature
see slack thread
setting
concurrent_batches=true
forincremental_strategy=microbatch
yields@benc-db followed the dbt guide to activate concurrency, which yielded
qouting him:
considering that the whole 'microbatch' feature was developed especially with large datasets in mind, concurrency would be really powerful.
Describe alternatives you've considered
given an incremental batch of 5 days, we tried using jinja to loop through these 5 days and then unioning them (to not overload the memory with too many rows), but that got "optimized away" by the internal query optimizer. but it also just creates annoying complexity on top of your models.
Additional context
none
Who will this benefit?
clients with large datasets/heavy operations, where the 'row count' bottleneck does not scale linearly but explodes after some threshold (depending on memory)
Are you interested in contributing this feature?
yes, but no idea how for this topic
The text was updated successfully, but these errors were encountered: