-
-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TASK: Overhaul CatchUpHook error behaviour; At least once delivery for ERROR projections #5392
TASK: Overhaul CatchUpHook error behaviour; At least once delivery for ERROR projections #5392
Conversation
…tion is persisted. Even in error case.
Replaces: neos#5376 We agreed on using the default timeout for `FOR UPDATE` locks in the database. This test only failed previously because of the `NOWAIT`. The `NOWAIT` was only needed when we had a lot of catchup processes which always did a for update for EACH event which was not a good combination. Now when getting the lock once the default wait of the database is more sane and easier to implement than fetching the lowest position and always waiting.
CASE: $this->dbal->isTransactionActive() === false
This reverts commit dc5ff10. We discussed that we dont want to ensure exactly once delivery for external projections. Adding transaction logic via traits for the projections increases logic there. And while with alot of effort we could bring exactly once delivery to work for most cases php could still decide to die in the small timeframe where we commit the one transaction and then the second. So there is no gurantee except when using a dedicated store: neos#5377
…ernal projections see 175ab4c
… not put the projection in ERROR state ... instead a full rollback is done (which might be still changed) - calls "internals" of `ProjectionSubscriber` directly from the subscription engine - inline `handleEvent` method
updates bfb4655 to not do a full rollback if catchup hooks fail but continue and collect their errors, which will be rethrown later with the advantage that the catchup worked!
…ersistence manager
Previously the `CatchUpHookInterface` contained a `onBeforeBatchCompleted` which was explained as follows: This hook is called directly before the database lock is RELEASED It can happen that this method is called multiple times, even without having seen Events in the meantime. If there exist more events which need to be processed, the database lock is directly acquired again after it is released. ---------- `onAfterBatchCompleted` behaves similar just that it runs when the lock was released. This allows us to run all the tasks that `onAfterCatchUp` could also do (using possibly transactions and commiting work) We dont ensure that `onAfterBatchCompleted` is only called if there are more events to handle, as this would complicate the code and technically without acquiring a lock there is never a guarantee for that, so hooks might as well encounter the case more often and have to deal with that.
FYI, while we decided against exactly once delivery for external projections, one could still use a decorator like below. That also allows the original test c47d181 to pass: /** @before */
public function injectExternalFakeProjection(): void
{
$entityManager = $this->getObject(EntityManagerInterface::class);
if (!isset(self::$secondConnection)) {
self::$secondConnection = DriverManager::getConnection(
$entityManager->getConnection()->getParams(),
$entityManager->getConfiguration(),
$entityManager->getEventManager()
);
}
$this->secondFakeProjection = new DebugEventProjection(
sprintf('cr_%s_debug_projection', self::$contentRepositoryId->value),
self::$secondConnection
);
FakeProjectionFactory::setProjection(
'second',
ExternalDbalProjection::decorateForConnection(
$this->secondFakeProjection,
self::$secondConnection
)
);
} final readonly class ExternalDbalProjection implements ProjectionInterface
{
public function __construct(
private ProjectionInterface $decorated,
private Connection $dbal,
) {
}
public static function decorateForConnection(
ProjectionInterface $projection,
Connection $dbal,
): self {
return new self($projection, $dbal);
}
public function setUp(): void
{
$this->decorated->setUp();
}
public function status(): ProjectionStatus
{
return $this->decorated->status();
}
public function apply(EventInterface $event, EventEnvelope $eventEnvelope): void
{
if ($this->dbal->isTransactionActive()) {
throw new \RuntimeException(sprintf('Expected no transaction to be active while handling decorated apply for %s', $this->decorated::class), 1733255315);
}
$this->dbal->transactional(fn () => $this->decorated->apply($event, $eventEnvelope));
}
public function getState(): ProjectionStateInterface
{
return $this->decorated->getState();
}
public function resetState(): void
{
$this->decorated->resetState();
}
} but i decided against putting that in the source code - not even for testing. Instead there is an
|
Similarly to that doctrine migrations (`flow doctrine:migrate`) are also cannot be rollback in case of an error So we cannot wrap projection->setUp in transaction and errors (like in a migration) will be directly commited. Thus, it must be acted with care.
see neos#5321 (comment) the save-point will only be used for REAL projection errors now and never rolled back if catchup errors occur. With that change in code the save-points are less important because a real projection error should better be thrown at the start before any statements, and even if some statements were issued and a full rollback is done its unlikely that a reactivateSubscription helps that case. Instead, to repair projections you should replay
6423dc6
into
neos:feature/4746-rework-catchup-mechanism-3
Adjustments for #5321
This pr originated after discussing NOT do do central locking #5383 and to simplify the save point creation.
A full list of discussed and here implemented changes can be seen here:
#5321 (comment)
The new catchup hook error behaviour - implemented via this pr - is documented in #4908 (comment) with great detail.
Additional technical information
Introduce batching in subscription engine
The replay uses a batch size of
500
(seeContentRepositoryMaintainer
)The regular catchup does not batch to ensure the catchup is run completely and any errors don't stop it
Batching does not respect the causation identifier, meaning a parent node event might be handled but not its tethered node creation event if the batch stops there - but the current 9.0 state is not different.
Introduce onAfterBatchCompleted hook
Previously the
CatchUpHookInterface
contained aonBeforeBatchCompleted
which was explained as follows:onAfterBatchCompleted
behaves similar just that it runs when the lock was released. This allows us to run all the tasks thatonAfterCatchUp
could also do (using possibly transactions and commiting work)We dont ensure that
onAfterBatchCompleted
is only called if there are more events to handle, as this would complicate the code and technically without acquiring a lock there is never a guarantee for that, so hooks might as well encounter the case more often and have to deal with that.Behaviour for catchup hooks and the same connection
Introduce test to assert behaviour when catchup hooks use the persistence manager (
CatchUpHookWithPersistenceTest
)While not pretty, we rely on this working also for
onAfterEvent
in the redirect handler:see https://github.com/neos/redirecthandler-neosadapter/blob/8c8f0cc6f63a030201f01cf16a7f70079149314a/Classes/Service/NodeRedirectService.php#L167
Commit the transaction during the catchup hook is evil and will not be respected. In that case the projection will be ahead of its position and an doctrine error will be thrown as no transaction is there to be committed anymore. Only a full replay can solve this if done.
We discussed that it might be sensible to use a dedicated connection for the full content repository - though for that we need some kind of abstract connection concept first.
Upgrade instructions
Review instructions
Checklist
FEATURE|TASK|BUGFIX
!!!
and have upgrade-instructions