SYNAQ Archiving Incident 12-01.2017
Incident Report for SYNAQ
Postmortem

Summary and Impact to Customers

On Thursday 12th January 2017 from 10:05 – 17:25, SYNAQ Archive experienced a platform outage incident.

The resultant impact of the event was that a subset of Clients were unable to access their archived data.

Root cause and Solution

The root cause of this event was due to an unresponsive infrastructure switch responsible for managing multi-path routing, which caused the storage of the affected Archive cluster to become unreadable.

The unresponsive infrastructure switch caused the Archive Operating System to place the storage LUN’s in an unreadable state and as such, a subset of Clients on the affected Archive cluster were unable to access their Archive data.

In order to solve this issue, we remotely restarted the affected switch to bring up the multipath routing management. In addition, a file system integrity scan was initiated on all affected storage before restoring and activating the Archive service.

Remediation Actions

A project is underway for the review and enhancement of the existing Archive environment regarding infrastructure components and the system methodology to improve efficiencies and prevent any further incidents from recurring

Posted Jan 19, 2017 - 13:12 CAT

Resolved
All affected archives have been fully restored
Posted Jan 12, 2017 - 17:25 CAT
Monitoring
The issue has been resolved and the affected archives are slowly recovering.
Posted Jan 12, 2017 - 14:26 CAT
Update
The team is still currently working on the issue. We apologize for any inconvenience caused.
Posted Jan 12, 2017 - 12:46 CAT
Identified
The team has identified the cause of the issue and is currently working on a resolution
Posted Jan 12, 2017 - 11:17 CAT
Investigating
SYNAQ Archiving currently has a partial outage where a sub set of clients are unable to access their archive data.
Posted Jan 12, 2017 - 10:05 CAT
This incident affected: SYNAQ Archive.