Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend job executor acquisition to work with rootProcessInstanceId #4004

Closed
1 task done
Tracked by #4003
danielkelemen opened this issue Dec 11, 2023 · 4 comments
Closed
1 task done
Tracked by #4003
Assignees
Labels
scope:core-api Changes to the core API: engine, dmn-engine, feel-engine, REST API, OpenAPI type:subtask Issues that are subtasks of another issue. Must always be part of the breakdown of the parent issue. version:7.21.0-alpha5 version:7.21.0

Comments

@danielkelemen
Copy link
Member

danielkelemen commented Dec 11, 2023

Acceptance Criteria (Required on creation)

  • Job executor acquisition is updated so that it can now work with the rootProcessInstanceId, allowing it to respect exclusiveness in tasks across sub-process instances.
    • Add new column in the runtime job table.
      • Column name: ROOT_PROCESS_INSTANCE_ in table ACT_RU_JOB. Add it for all the databases.
      • Add a database index on the new column for all databases
        • Indicative name: ACT_IDX_JOB_ROOT_PROCINST
        • Add it to create, drop & upgrade scripts
    • Adjust the query and java grouping logic.
      • Use processInstanceId as fallback in case the rootProcessInstanceId is null.
      • Make code and queries dynamic, so no unnecessary conditions are evaluated.
      • Acquisition logic should remain fast.
  • New engine config is introduced that enables the new behaviour. It's deactivated by default.
    • Add Feature flag jobExecutorAcquireExclusiveOverProcessHierarchies and set it to false by default
  • Changes are covered by tests.
    • Add a Unit Test for testing Job Acquisition given a Root Process that calls a Subprocess (1 level deep) with the feature disabled

    • Add a Unit Test for testing Job Acquisition given a Root Process that calls a Subprocess (1 level deep) with the feature enabled*

    • Add Unit Test for testing Job Acquisition given a Root Process that calls a Subprocess which calls another Subprocess (2 levels deep) with the feature enabled*

    • Add db-instance-migration Test which tests a a given Root Process that calls a Subprocess (1 level deep) created in 7.20

      • The test here will verify the behaviour of a null rootProcessId of a process definition, created before the feature and guarantee the acquisition works with the legacy behaviour when the feature is disabled by default
      • Also, the test verifies indirectly the defaulting of the feature flag
    • Add db-instance-migration Test which tests a given Root Process that calls a Subprocess (1 level deep) created in 7.20 with the feature enabled*

    • Feature enabled*: jobExecutorAcquireExclusiveOverProcessHierarchies=true

Hints

Links

Breakdown

Tasks

  1. ci:migration ci:oracle
    psavidis
@danielkelemen danielkelemen added type:subtask Issues that are subtasks of another issue. Must always be part of the breakdown of the parent issue. scope:core-api Changes to the core API: engine, dmn-engine, feel-engine, REST API, OpenAPI potential:7.21.0 labels Dec 11, 2023
@psavidis psavidis self-assigned this Mar 5, 2024
@psavidis
Copy link
Contributor

psavidis commented Mar 8, 2024

Investigation Update

Context: I've tried to add a test to db-migration-tests which starts two 7.20 process instances (which have a null rootProcessInstanceId at their respective jobs) and then use the handy AssertJobExecutor.

  • The scenario tries to spin up 2 root processes having 2 subprocesses each
    • When the test starts, in order for the test to verify the job acquisition order, it needs to mutate the inhereted ProcessEngineConfiguration to set the custom AssertJobExecutor, register it to the engine, register the engine to it and then build a new process engine
      • See SequentialJobAcquisitionTest#testJobAcquisitionWithCallActivitySubProcesses setup logic
    • The above is a hacky way to be able to have the ability to make assertions on the JobExecutor at the test level

Problems

  • The manipulation of the inherited ProcessEngineConfiguration for setting a custom JobExecutor & building a new engine does not work at the time when the executor tries to execute the first SQL query.
  • There are no metadata, API or external information that can be used to assume the acquisition order of the JobExecutor e.g an approach could be to use HistoryService to fetch metadata about the job execution (if it was available)
  • The rest of the db-migration-tests use the JobExecutor API to execute directly a job.
  • This use case is the first one where the interest is to use the JobExecutor directly so that it decides the execution order
  • The configuration cannot be initialised earlier to see if it is going to work because the AssertJobExecutor needs to be shared between the modules test-fixture-720 & test-migration
Here is the Exception [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 102.801 s <<< FAILURE! - in org.camunda.bpm.qa.upgrade.scenarios7210.jobexecutor.ExclusiveOverProcessHierarchiesTest [ERROR] testJobAcquisitionWithCallActivitySubProcesses(org.camunda.bpm.qa.upgrade.scenarios7210.jobexecutor.ExclusiveOverProcessHierarchiesTest) Time elapsed: 102.801 s <<< ERROR! org.camunda.bpm.engine.ProcessEnginePersistenceException: An exception occurred in the persistence layer. Please check the server logs for a detailed message and the entire exception stack trace. at org.camunda.bpm.qa.upgrade.scenarios7210.jobexecutor.ExclusiveOverProcessHierarchiesTest.areJobsAvailable(ExclusiveOverProcessHierarchiesTest.java:203) at org.camunda.bpm.qa.upgrade.scenarios7210.jobexecutor.ExclusiveOverProcessHierarchiesTest.lambda$waitForJobExecutorToProcessAllJobs$0(ExclusiveOverProcessHierarchiesTest.java:172) at org.camunda.bpm.qa.upgrade.scenarios7210.jobexecutor.ExclusiveOverProcessHierarchiesTest.waitForCondition(ExclusiveOverProcessHierarchiesTest.java:188) at org.camunda.bpm.qa.upgrade.scenarios7210.jobexecutor.ExclusiveOverProcessHierarchiesTest.waitForJobExecutorToProcessAllJobs(ExclusiveOverProcessHierarchiesTest.java:172) at org.camunda.bpm.qa.upgrade.scenarios7210.jobexecutor.ExclusiveOverProcessHierarchiesTest.testJobAcquisitionWithCallActivitySubProcesses(ExclusiveOverProcessHierarchiesTest.java:104) Caused by: org.apache.ibatis.exceptions.PersistenceException:

Error querying database. Cause: java.sql.SQLException: PooledDataSource: Unknown severe error condition. The connection pool returned a null connection.

The error may exist in org/camunda/bpm/engine/impl/mapping/entity/Job.xml

The error may involve org.camunda.bpm.engine.impl.persistence.entity.JobEntity.selectJobByQueryCriteria

The error occurred while executing a query

Cause: java.sql.SQLException: PooledDataSource: Unknown severe error condition. The connection pool returned a null connection.

    at org.camunda.bpm.qa.upgrade.scenarios7210.jobexecutor.ExclusiveOverProcessHierarchiesTest.areJobsAvailable(ExclusiveOverProcessHierarchiesTest.java:203)
    at org.camunda.bpm.qa.upgrade.scenarios7210.jobexecutor.ExclusiveOverProcessHierarchiesTest.lambda$waitForJobExecutorToProcessAllJobs$0(ExclusiveOverProcessHierarchiesTest.java:172)
    at org.camunda.bpm.qa.upgrade.scenarios7210.jobexecutor.ExclusiveOverProcessHierarchiesTest.waitForCondition(ExclusiveOverProcessHierarchiesTest.java:188)
    at org.camunda.bpm.qa.upgrade.scenarios7210.jobexecutor.ExclusiveOverProcessHierarchiesTest.waitForJobExecutorToProcessAllJobs(ExclusiveOverProcessHierarchiesTest.java:172)
    at org.camunda.bpm.qa.upgrade.scenarios7210.jobexecutor.ExclusiveOverProcessHierarchiesTest.testJobAcquisitionWithCallActivitySubProcesses(ExclusiveOverProcessHierarchiesTest.java:104)

Caused by: java.sql.SQLException: PooledDataSource: Unknown severe error condition. The connection pool returned a null connection.
at org.camunda.bpm.qa.upgrade.scenarios7210.jobexecutor.ExclusiveOverProcessHierarchiesTest.areJobsAvailable(ExclusiveOverProcessHierarchiesTest.java:203)
at org.camunda.bpm.qa.upgrade.scenarios7210.jobexecutor.ExclusiveOverProcessHierarchiesTest.lambda$waitForJobExecutorToProcessAllJobs$0(ExclusiveOverProcessHierarchiesTest.java:172)
at org.camunda.bpm.qa.upgrade.scenarios7210.jobexecutor.ExclusiveOverProcessHierarchiesTest.waitForCondition(ExclusiveOverProcessHierarchiesTest.java:188)
at org.camunda.bpm.qa.upgrade.scenarios7210.jobexecutor.ExclusiveOverProcessHierarchiesTest.waitForJobExecutorToProcessAllJobs(ExclusiveOverProcessHierarchiesTest.java:172)
at org.camunda.bpm.qa.upgrade.scenarios7210.jobexecutor.ExclusiveOverProcessHierarchiesTest.testJobAcquisitionWithCallActivitySubProcesses(ExclusiveOverProcessHierarchiesTest.java:104)

@psavidis
Copy link
Contributor

psavidis commented Mar 11, 2024

Investigation Update

Context: (A) db-instance-migration tests
Problem: Using a custom job executor (AssertJobExecutor) for testing the acquisition details is problematic
Why: The tests are designed to be given a config and an engine and be executed against it.

  • To test acquisition, this initialisation needs to be done during the test, hacking the given engine and config.
  • This did not work.

Solution: An alternative approach to overcome the limitations of the db migration tests is to use the AcquiredJobsCmd instead and check the jobIdBatches directly.

  • This might not be ideal compared to a blackbox test but it is still a good-enough approach to overcome the limitations imposed by the migration tests.
  • The side-effects created by the execution of AcquireJobsCmd e.g locking jobs can be bypassed by unlocking the jobs that the command locks by preserving the state before and restoring it after a test
    • As a result, the jobs will be unlocked again for the next test

Context: (B) Unit Test having the new configuration disabled have more acquisition entries than the expected (5 instead of 4)
Problem: When the feature is disabled, the job acquisition would select the same job twice
Why: The core-threadpool-size of DefaultJobExecutor is set to 3. As a result, when threads synchronisation is not affected by the debugger, besides the first thread additional threads try to lock the same job entries and an OptimisticLockingException is thrown.
Solution: Configure the core-thread-pool of the AssertJobExecutor to 1 to avoid that issue for testing purposes.

@psavidis
Copy link
Contributor

psavidis commented Mar 12, 2024

Investigation Update

Problem (A):

  • The approach to use the Clock for fast forwarding time so that the lockExpirationTime of the job entries belongs to the past and those jobs are available again for selection did not work.
  • The approach to maintain the state of the jobs before and after the AcquireJobsCmd execution and then compare it against the current to determine the locked jobs to be unlocked worked.

@psavidis psavidis assigned yanavasileva and unassigned psavidis Mar 15, 2024
@psavidis psavidis assigned yanavasileva and unassigned psavidis Mar 20, 2024
psavidis added a commit that referenced this issue Mar 22, 2024
Context: Exclusive jobs that originate from process hierarchies (processes which contain multi-instance subprocesses) can now be executed exclusively.

Why: The exclusive execution would only be applied on tasks that originate at a root level. Any subprocess spawned by root processes would not be correlated and considered for exclusive execution.

Changes: See below the changes of this feature

- New column `ROOT_PROC_INST_ID_ ` is introduced to correlated a process instance with its root parent
    - The column is added to all supported databases & `7.20_to_7.21` migration scripts.
- The query `selectNextJobsToExecute` is enriched to consider the root process instance id when the feature is enabled.
- The feature flag `jobExecutorAcquireExclusiveOverProcessHierarchies` enables / disables the feature.
- The feature is disabled by default for backwards compatibility with the legacy behaviour.

Tests:

- Unit tests under `ExclusiveJobAcquisitionTest` which cover:
    - Legacy behaviour with the feature disabled
    - Feature behaviour with the feature enabled & a 1-level deep process hierarchy (process which spawns a subprocess)
    - Feature behaviour with the feature enabled & a 2-level deep process hierarchy (process which spawns a subprocess - which spawns another subprocess)

- Migration tests under `ExclusiveOverProcessHierarchiesTest` which cover:
    - How the job acquisition and execution (see `AcquireJobsCmd`) behave with existing 7.20 process instances when the feature is disabled (legacy behaviour and backwards compatibility)
    - How the job acquisition and execution (see `AcquireJobsCmd`) behave with existing 7.20 process instances when the feature is enabled (feature behaviour)
    - The tests need to select all the jobs and lock them to perform their logic. After the test execution the jobs are unlocked.

Other Notable Changes: 

- Fixed flakiness of `DecisionDefinitionTest`
    - The Test class was susceptible to timezone changes due its test data using the current date; as a result, depending on the date of execution, changing to DST could fail the tests.
    - This behaviour is fixed by adjusting the test data to use a fixed past date, rendering the test executions immune to date of execution.
- The waiting behaviour on jobs for the tests to rely on the scheduler, shared by `SequentialJobAcquisitionTest`, `ExclusiveJobAcquisitionTest` is extracted into class `JobExecutorWaitUtils`

Co-authored-by: daniel.kelemen
Co-authored-by: yanavasileva
Co-authored-by: petros.savvidis

Related-to: #4004 , #4003
@psavidis
Copy link
Contributor

Merged feature to master. Closing this subtask.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
scope:core-api Changes to the core API: engine, dmn-engine, feel-engine, REST API, OpenAPI type:subtask Issues that are subtasks of another issue. Must always be part of the breakdown of the parent issue. version:7.21.0-alpha5 version:7.21.0
Projects
None yet
Development

No branches or pull requests

4 participants