Metrics outage 01:00 UTC May 20

At ~01:00 UTC on Sunday May 20th the database cluster which underpins the Wormly Metrics service suffered a partial outage.

This caused the failed receipt of Metrics for all customers, as well as a secondary effect which in some cases included incorrect alerts being sent. If you experienced spurious alert messages during this time please let us know and we will ensure these are refunded from your account.

The duration of the outage was, unfortunately, lengthy. Service was restored at 04:20, with some short periods of instability over the following 3 hours.

The Uptime Monitoring service was not affected by this outage.

Currently our post-mortem is underway so we don’t yet have a firm idea of the cause and what the possible mitigations might be for the future. We will keep you updated.

We’re in the uptime business, and really regret any downtime whatsoever. So on behalf of Wormly I apologise for this incident and promise that we will continue to do our best to keep improving. The availability of the Metrics service has exceeded 99.9% in the past 24 months, but I’m sure we can do better.

Filed under: Announcements — Jules @ 8:04 pm - May 21, 2018 :: Comments Off

Sorry, the comment form is closed at this time.

Never Offline

A blog hosted by James Peterson, director of insights @ Wormly

On a semi-regular basis James will be trying to demonstrate that website infrastructure really is an exciting topic, and that your users really do care about the uptime & speed of your website.