Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: purely in-memory (non-database) DataManager implementations for various entity types #3531

Closed

Conversation

ikaakkola
Copy link
Contributor

@ikaakkola ikaakkola commented Nov 23, 2022

(Fixes #3532)

This PR adds a new module ('flowable-inmemory-data') that provides in-memory, non-database DataManager implementations for activities, executions, variables, jobs, event subscriptions and identity links.

The in-memory DataManagers are based on concurrent maps and are fast compared to their Mybatis counterparts (even for simple processes 4-5x faster than H2 or HSQLDB in-memory database). They do not suffer from optimistic locking failures but provide no true rollbacks (see Limitations below)

The intended use of the high performance In-memory DataManagers is "lambda" style execution of BPMN processes where the individual processes are very short lived, usually synchronous and with little to no external integrations, but the amount of processes executed is high (we are running this implementation with 100+ process instances/second).

Limitations

The DataManager implementations are for entities of active processes only - there is no history implementations of in-memory DataManagers and as such, any BPMN engine running these datamanagers should always set historyLevel to NONE.

The DataManagers do not support Native SQL queries at all. Any Native SQL query against any of the implementations will simply throw an exception.

The implementation has partial support for transactions, where inserts and deletes will be rolled back, but updates are left as-is.

Implementation details

The implementation is done as a separate module because some of the current modules "cheat" and access data from other modules directly on the database level. This means that for example the Timer JobDataManager cannot be implemented as part of flowable-job-service , as it needs access to process definition classes and repository service, which the 'flowable-job-service' module does not have. Also having this as a separate module makes it clear that it is (somewhat) a custom feature that standard BPMN execution environments are probably not expected to ever use.

The entity types that this implementation supports are the ones that our testing has shown to get the most database operations (ByteArrayDataManager would be the one that is still left out of this feature for now) and as such provide the most benefit for performance. There is no technical reason not to support all the DataManagers, but implementing them is a rather slow process and things like deployments and definitions do not see that much database operations.

Check List:

  • Unit tests: YES
  • Documentation: NO

@ikaakkola ikaakkola force-pushed the feature/memory-data-managers branch 2 times, most recently from e04bdb6 to f5e84c0 Compare January 27, 2023 13:05
@ikaakkola ikaakkola changed the title Feature: purely in-memory (non-database) DataManager implementations for various entity types WIP - Feature: purely in-memory (non-database) DataManager implementations for various entity types Jan 27, 2023
@ikaakkola ikaakkola marked this pull request as draft January 27, 2023 15:12
@ikaakkola
Copy link
Contributor Author

ikaakkola commented Jan 27, 2023

I will rebase this to flowable-6.8.0 master , marking as draft for now

@ikaakkola ikaakkola force-pushed the feature/memory-data-managers branch 3 times, most recently from 202f96d to 49c54c0 Compare January 28, 2023 09:04
@ikaakkola ikaakkola changed the title WIP - Feature: purely in-memory (non-database) DataManager implementations for various entity types Feature: purely in-memory (non-database) DataManager implementations for various entity types Jan 31, 2023
@ikaakkola ikaakkola marked this pull request as ready for review January 31, 2023 09:46
@ikaakkola
Copy link
Contributor Author

@tijsrademakers @dbmalkovsky any comments on if you are interested in this feature?

Would it make more sense for you, if only the changes towards other parts of Flowable (required for this feature to function) would be included in a PR and I would maintain this module independently ?

@tijsrademakers
Copy link
Contributor

Hi @ikaakkola, thanks for your question. We will have a discussion about it in the coming days, and then update this PR with feedback on the feature.

@ikaakkola ikaakkola force-pushed the feature/memory-data-managers branch from fef4b99 to 14317cc Compare August 1, 2023 06:15
A new module providing high performance non-sql in memory data managers for various entity types.

The implementation is based on Concurrent maps serving as the storage layer of the entities. This
removes the chance of locking exceptions entirely and increases the performance of process execution
significantly (2-5x faster compared to H2 or hsqldb even for simple processes), but the managers
do not support (full) transactions. They implement a simple rollback / commit strategy where things
like deadletter jobs work, and items are removed from memory on rollback, but many parts of true
transactions, like rolling back updated objects, are not yet implemented.

Any data manager that does not have a no-sql implementation will keep using the existing database
based datamanagers. When running the non-sql in-memory datamanagers, it makes most sense to also
run an in-memory database (eg. H2 or hsqldb) instead of a real database. The database is used for
things like process deployments and definitions, byte arrays and other entities that either have
limited performance improvements from memory data managers, or are complicated to implement due
to references to table data on the database level.

The in-memory data managers are for runtime processes only. It is intended that an executor using
them has history set excplicitly to NONE - if history is enabled the benefits of the in-memory
data managers vanish because each process still needs to do database operations to store history.
@ikaakkola ikaakkola force-pushed the feature/memory-data-managers branch from 14317cc to 01cee69 Compare August 1, 2023 08:27
@zhenchuan9r
Copy link

Hi @ikaakkola, thanks for your question. We will have a discussion about it in the coming days, and then update this PR with feedback on the feature.

Please let me know your opinion on this PR.
We had similar optimization targets but did it in a different way.

Ilkka Kaakkola added 3 commits December 20, 2023 11:07
Implement missing DataManager query options and implement new Flowable 7.1.0-SNAPSHOT specific features
@ikaakkola
Copy link
Contributor Author

I will opt to release and maintain this feature as a separate module. To help achieve that, I will create a few PRs that make it easier to override the datamanagers without resorting to Reflection.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature: In-Memory (non-database, non-sql) DataManagers
4 participants