Metrics outage 01:00 UTC May 20
At ~01:00 UTC on Sunday May 20th the database cluster which underpins the Wormly Metrics service suffered a partial outage.
This caused the failed receipt of Metrics for all customers, as well as a secondary effect which in some cases included incorrect alerts being sent. If you experienced spurious alert messages during this time please let us know and we will ensure these are refunded from your account.
The duration of the outage was, unfortunately, lengthy. Service was restored at 04:20, with some short periods of instability over the following 3 hours.
The Uptime Monitoring service was not affected by this outage.
Currently our post-mortem is underway so we don’t yet have a firm idea of the cause and what the possible mitigations might be for the future. We will keep you updated.
We’re in the uptime business, and really regret any downtime whatsoever. So on behalf of Wormly I apologise for this incident and promise that we will continue to do our best to keep improving. The availability of the Metrics service has exceeded 99.9% in the past 24 months, but I’m sure we can do better.