Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support concurrent_batches for microbatch #914

Open
data-blade opened this issue Jan 27, 2025 · 0 comments
Open

support concurrent_batches for microbatch #914

data-blade opened this issue Jan 27, 2025 · 0 comments
Labels
enhancement New feature or request

Comments

@data-blade
Copy link

Describe the feature

see slack thread

setting concurrent_batches=true for incremental_strategy=microbatch yields

[WARNING]: Found 1 microbatch model with the `concurrent_batches` config set to true, but the databricks adapter does not support running batches concurrently. Batches will be run sequentially.

@benc-db followed the dbt guide to activate concurrency, which yielded

[DELTA_CONCURRENT_APPEND] ConcurrentAppendException: Files were added to the root of the table by a concurrent update. Please try the operation again.

qouting him:

There are ways to allow concurrent writes, but it looks like they may be dependent on the actual schema of the table: link

considering that the whole 'microbatch' feature was developed especially with large datasets in mind, concurrency would be really powerful.

Describe alternatives you've considered

given an incremental batch of 5 days, we tried using jinja to loop through these 5 days and then unioning them (to not overload the memory with too many rows), but that got "optimized away" by the internal query optimizer. but it also just creates annoying complexity on top of your models.

Additional context

none

Who will this benefit?

clients with large datasets/heavy operations, where the 'row count' bottleneck does not scale linearly but explodes after some threshold (depending on memory)

Are you interested in contributing this feature?

yes, but no idea how for this topic

@data-blade data-blade added the enhancement New feature or request label Jan 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant