CloudMail Incident - 2016-07-08 Degraded performance
Incident Report for SYNAQ
Postmortem

Summary and Impact to Customers

On Friday 8th July 2016 from 12:41 to 17:11, SYNAQ Cloud Mail experienced an LDAP replication incident.

The resultant impact of the event was:

  • The intermittent delay of certain inbound and outbound emails, and;
  • The Admin and Webmail Consoles were slow and unresponsive.

Root cause and Solution

The root cause of this event was due to a bug discovered in one of the Master LDAP servers. This affected the ability of the slave LDAP servers to synchronise with the Master, thus preventing the processing of queries from the mail transport servers for the purpose of delivering emails. This caused the intermittent delay of certain inbound and outbound emails.

In addition, the Admin and Webmail Consoles were also unable to query the LDAP servers for authentication and information purposes which caused these consoles to become slow and unresponsive.

In order to solve this issue, we remotely restarted the affected Master LDAP server which caused the slave LDAP servers to resynchronise with the Master server. Upon synchronisation, mail queries were processed and the mail and console issues were summarily resolved.

Remediation Actions

• An incident ticket was logged with our Zimbra upstream provider to investigate the root cause of the issue.

• In addition, we are working on a comprehensive set prevention measures so that this issue does not recur in the future.

Posted Nov 07, 2016 - 16:58 CAT

Resolved
Issue has been resolved. All systems performing nominally.
Posted Jul 08, 2016 - 17:11 CAT
Update
We're currently experiencing INTERMITTENT issues which may affect a small subset of clients. The following services have been identified which may be affected:
- Admin Console may be slow
- Intermittent errors on outbound mail being sent from Outlook using the Zimbra connector.
- We have identified the issue as being related to an LDAP issue and are working with our upstream provider to resolve.
- Inbound email services are not affected.
Posted Jul 08, 2016 - 13:43 CAT
Investigating
CloudMail Incident - Degraded performance potential errors with sending from Webmail and Zimbra Connector
Posted Jul 08, 2016 - 12:41 CAT