support concurrent_batches for microbatch #914

data-blade · 2025-01-27T13:42:46Z

Describe the feature

setting concurrent_batches=true for incremental_strategy=microbatch yields

[WARNING]: Found 1 microbatch model with the `concurrent_batches` config set to true, but the databricks adapter does not support running batches concurrently. Batches will be run sequentially.

@benc-db followed the dbt guide to activate concurrency, which yielded

[DELTA_CONCURRENT_APPEND] ConcurrentAppendException: Files were added to the root of the table by a concurrent update. Please try the operation again.

qouting him:

There are ways to allow concurrent writes, but it looks like they may be dependent on the actual schema of the table: link

considering that the whole 'microbatch' feature was developed especially with large datasets in mind, concurrency would be really powerful.

Describe alternatives you've considered

given an incremental batch of 5 days, we tried using jinja to loop through these 5 days and then unioning them (to not overload the memory with too many rows), but that got "optimized away" by the internal query optimizer. but it also just creates annoying complexity on top of your models.

Additional context

none

Who will this benefit?

clients with large datasets/heavy operations, where the 'row count' bottleneck does not scale linearly but explodes after some threshold (depending on memory)

Are you interested in contributing this feature?

yes, but no idea how for this topic

The text was updated successfully, but these errors were encountered:

data-blade added the enhancement New feature or request label Jan 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support concurrent_batches for microbatch #914

support concurrent_batches for microbatch #914

data-blade commented Jan 27, 2025

support concurrent_batches for microbatch #914

support concurrent_batches for microbatch #914

Comments

data-blade commented Jan 27, 2025

Describe the feature

Describe alternatives you've considered

Additional context

Who will this benefit?

Are you interested in contributing this feature?