Service Disruption - SYNAQ Cloud Mail
Incident Report for SYNAQ
Postmortem

Post Mortem Report

The purpose of this post-mortem is to detail the investigations and subsequent findings of the impact of Cloud Mail incident of May 26 2022.

Background

  • On Zimbra, we perform regular proactive maintenance on our LDAP servers.
  • This involves reloading of the LDAP Databases on all masters and replica servers using a backup of one of the master databases.
  • This is done quarterly to maintain optimal performance of the LDAP service.
  • This process has been performed over the last 3 years in conjunction with our vendor without issue.

On Wednesday 25th May 2022:

  • An LDAP maintenance change was performed on the evening of the 25th at 20:00.
  • A SYNAQ Engineer was working with our vendor engineer for this change and a miscommunication occurred between them, which resulted in the incorrect backup being applied (it was a legacy backup from January 18 2022), instead of current backup taken from a master that evening.

On Thursday 26th May 2022: Authentication Failures Experienced

  • [07:00am]: We received reports of a subset of clients that were experiencing authentication issues (i.e. could not login to their mailbox).
  • Our engineers investigated the issue and we uncovered that we were working on incorrect version of the LDAP database
  • [8:03am – 9:31am]: SYNAQ engineers took the LDAP system down for the purposes of restoring the most current version of the backups to all LDAP servers and this reloading to approximately 90 minutes.
  • Authentication services were restored at 09:31 AM.

Remedial Actions

Immediate – 0-3 months:

  • Improve and verify standard operating procedure for LDAP optimisation process with additional test cases.
  • Build an additional monitoring alert that allows us to detect anomalous changes in expected data found in LDAP after a reload process is performed.

Long term: 6 – 12 months:

  • Zimbra 9 – LDAP fixes and improvements.
Posted Jun 03, 2022 - 14:53 CAT

Resolved
Dear Clients,

The SYNAQ Cloud Mail service is restored.
We apologise once again for any inconvenience caused.

Sincerely,
SYNAQ Technical Team
Posted May 26, 2022 - 09:49 CAT
Monitoring
Dear Clients,

Our engineers have implemented a fix for the SYNAQ Cloud Mail incident and are monitoring the service stability.
Posted May 26, 2022 - 09:29 CAT
Update
Dear Clients,

Our engineers are still working on the resolution of the SYNAQ Cloud Mail Incident. This is being treated as a matter of urgency.

We will send our next update in 30 minutes and we are still on track for the ETA for resolution in 10 minutes.
Posted May 26, 2022 - 09:08 CAT
Identified
Dear Clients,

Our engineers have identified the SYNAQ Cloud Mail incident and are working on a resolution.

We will send our next update in 30 minutes and the ETA for resolution will be 45 minutes.
Posted May 26, 2022 - 08:39 CAT
Investigating
Dear SYNAQ Client,

We are experiencing an authentication issue on the SYNAQ Cloud Mail platform.

We apologise for this inconvenience. Our engineers are looking into the problem and should have us back to normal in the shortest time possible!

Sincerely,
SYNAQ Technical Team
Posted May 26, 2022 - 08:03 CAT
This incident affected: SYNAQ Cloud Mail.