Summary and Impact to Customers
On Thursday 12th January 2017 from 10:05 – 17:25, SYNAQ Archive experienced a platform outage incident.
The resultant impact of the event was that a subset of Clients were unable to access their archived data.
Root cause and Solution
The root cause of this event was due to an unresponsive infrastructure switch responsible for managing multi-path routing, which caused the storage of the affected Archive cluster to become unreadable.
The unresponsive infrastructure switch caused the Archive Operating System to place the storage LUN’s in an unreadable state and as such, a subset of Clients on the affected Archive cluster were unable to access their Archive data.
In order to solve this issue, we remotely restarted the affected switch to bring up the multipath routing management. In addition, a file system integrity scan was initiated on all affected storage before restoring and activating the Archive service.
A project is underway for the review and enhancement of the existing Archive environment regarding infrastructure components and the system methodology to improve efficiencies and prevent any further incidents from recurring