Summary and Impact to Customers
On Monday the 6th of June SYNAQ Securemail experienced an inbound mail delay issue. The issue occurred from 8:50am – 12:45pm.
The resultant impact of the event was a delay of approximately 2 hours for the delivery of inbound mail.
Root cause and Solution
The root cause of this event was due to a bug discovered in the firmware on one of our core database servers. This bug affected the thermal management component of the server, which in turn affected the CPU and caused it to overheat. Consequently, the database was unable to perform optimally and caused an inbound mail delay of approximately 2 hours.
In order to resolve this issue, our engineers failed over the affected query load to the slave database. This caused the query load to revert back to processing efficiently so that the backlog of mails could be delivered. In parallel, we logged an incident ticket with our hardware upstream provider to investigate the issue. They identified the bug in the existing firmware of the affected core database server and recommended a firmware upgrade. This was summarily actioned.
Remediation Actions
• Implement lower threshold monitoring on our database servers so that an increase in temperature is detected earlier on.
• Investigate alternative methods to improve the performance of our applications so that the load on the CPU is reduced, thus allowing our core databases to function at their most optimal levels.