Root Cause
The issue was caused by database locking during queue processing, where concurrent jobs attempted to update order lines individually. This resulted in record contention and orders becoming stuck in a confirming state.
The deployment the next evening evening included changes to optimise these updates and prevent recurrence.
Resolution
- Locked database processes were cleared.
- Queue processing was restored.
- A code update was deployed to prevent row-by-row update locking.
Unexpected Consequences
- Affected orders were transitioned to approved status instead of expired.
Preventative Actions
- Refactored order line updates to execute as a single database query instead of individual updates.
- Reduced database lock contention risk.
- Improved internal monitoring of queue processing and database locks.
- Review of incident communication process to ensure operational impacts (e.g. payment clearing behaviour) are clearly communicated in future.
- Current development effort includes re-writing parts of the core order flow with a focus on reliability and fault tolerance.
- Current development effort includes adding better visibility, recovery and proactive notification options for venues.
Current Status
The issue has been resolved and we are confident the fix prevents recurrence of this specific locking condition.
We apologise for the inconvenience caused and are committed to improving both system stability and communication during incidents.