Fix concurrent transition double-execution and ungraceful locking#13
Merged
Conversation
…xecution LockManager::acquire() now treats the unique index as the atomic acquire: a concurrent INSERT collision (unique violation) or a transient deadlock maps to a graceful null (locked) instead of an uncaught exception. The unconditional global expired-lock sweep is removed from the hot path (it caused InnoDB gap-lock deadlocks under load); this entity's own expired lock is reclaimed on demand via a scoped point delete. WorkflowBehavior re-reads the persisted state field while holding the lock, so a caller acting on a stale in-memory entity can no longer re-apply a transition another caller already applied (lost update / double execution of commands). Adds a regression test proving a stale entity is blocked after a concurrent update.
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #13 +/- ##
============================================
+ Coverage 70.34% 70.41% +0.06%
- Complexity 1123 1144 +21
============================================
Files 62 62
Lines 4020 4076 +56
============================================
+ Hits 2828 2870 +42
- Misses 1192 1206 +14 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This was referenced May 22, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes findings 1 & 2 of #12.
Problem
Under concurrent access, transitions were not safe:
transition()calls on the same entity ran the transition's command 3–13 times (scaling with contention) instead of once.StateMachineEngine::getCurrentState()reads the in-memory state and the engine never re-reads/row-locks the DB; the behaviour lock is released after each transition, so any caller that loaded the row in the originalfromstate re-applies the transition once it takes its turn on the lock.QueryException: Duplicate entry ... workflow_locks_uniqueinstead ofTransitionResult::locked(), becauseLockManager::acquire()did check-then-save()and never caught the unique collision. The unconditionaldeleteAll([workflow, table, id])between the check and the insert (no expiry filter) could also delete a concurrent fresh lock.Fix
LockManager:(workflow_name, entity_table, entity_id)is treated as the real mutex; theINSERTis the atomic acquire. A unique-violation collision now returnsnull(→locked) instead of throwing.DELETEon the lock table under high insert concurrency causes InnoDB gap-lock deadlocks). This entity's own expired lock is reclaimed on demand via a scoped point delete; cluster-wide GC remains available viadeleteExpired().null(try again later).WorkflowBehavior:refreshWorkflowState()), so the engine evaluates the transition against the authoritative current state. A stale caller is then correctlyblocked. Only the state field is touched and its dirty flag cleared sobeforeSavedoes not treat the refresh as a direct state change; no-op for unpersisted entities.Verification
Reproduced and verified with a multi-process harness on MySQL 8 / InnoDB (100 and 200 parallel workers transitioning the same entity through a guarded, side-effecting transition):
locked/blockedAdded
WorkflowBehaviorConcurrencyTestas a single-process regression (a stale handle is blocked after a concurrent update); it fails without the behaviour change and passes with it. Full suite green (297 tests);cs-checkclean;stanclean for the changed files (one unrelated pre-existingTimeoutSchedulernotice on PHP 8.4 remains, addressed separately).Note on the 5.4 Lock component
cakephp/cakephp#19370 (5.4) adds
Lock::synchronized(), a clean atomic distributed mutex that would be a good future backend forLockManager(and resolves the lock-acquire concerns natively). It does not by itself address the stale in-memory state (therefreshWorkflowStatepart is still required regardless of lock backend). This PR is a 5.2/5.3-compatible fix using the existingworkflow_lockstable; migrating the backend to the Lock component can follow once 5.4 is the baseline.