IS Securemail - Email Delays
Incident Report for SYNAQ
Postmortem

Summary and Impact to Customers

On Monday the 6th of June SYNAQ Securemail experienced an inbound mail delay issue. The issue occurred from 8:50am – 12:45pm.

The resultant impact of the event was a delay of approximately 2 hours for the delivery of inbound mail.

Root cause and Solution

The root cause of this event was due to a bug discovered in the firmware on one of our core database servers. This bug affected the thermal management component of the server, which in turn affected the CPU and caused it to overheat. Consequently, the database was unable to perform optimally and caused an inbound mail delay of approximately 2 hours.

In order to resolve this issue, our engineers failed over the affected query load to the slave database. This caused the query load to revert back to processing efficiently so that the backlog of mails could be delivered. In parallel, we logged an incident ticket with our hardware upstream provider to investigate the issue. They identified the bug in the existing firmware of the affected core database server and recommended a firmware upgrade. This was summarily actioned.

Remediation Actions

• Implement lower threshold monitoring on our database servers so that an increase in temperature is detected earlier on.

• Investigate alternative methods to improve the performance of our applications so that the load on the CPU is reduced, thus allowing our core databases to function at their most optimal levels.

Posted Nov 07, 2016 - 16:59 CAT

Resolved
All mail is now processing live. Delays are less than 5 minutes.
Posted Jun 06, 2016 - 12:45 CAT
Update
Update on delay duration: ETA for full backlog to be cleared in 1 hours. Current processing of new email is still at a five minute delay.
Posted Jun 06, 2016 - 11:47 CAT
Update
Update on delay duration: ETA for full backlog to be cleared in 2 hours. Current mail processing of new email is at a five minute delay.
Posted Jun 06, 2016 - 10:46 CAT
Monitoring
Backlog of queued mails are recovering: Engineers are monitoring.
Posted Jun 06, 2016 - 10:05 CAT
Identified
Cause of Issue Identified: Engineers implementing remedy.
Posted Jun 06, 2016 - 09:21 CAT
Investigating
IS Securemail is experiencing Incoming email delays of 30 minutes. SYNAQ Engineers are investigating.
Posted Jun 06, 2016 - 08:50 CAT
This incident affected: SYNAQ Securemail.