Partial Uptime Monitoring Outage November 30 06:35 – 14:43 UTC

During this period we experienced a failure of our Uptime Monitoring service which resulted in periods of failures not being detected for some customers; i.e. “false negatives”.

The root cause was relatively simple to uncover; a bug introduced in a code deploy that morning which did not trigger failures in our CI pipeline, nor our monitoring system.

Needless to say, our attention has been focussed on investigating why both our CI and platform monitoring systems did not reveal this problem sooner; an 8 hour failure window is most certainly not acceptable to either us or our customers.

Rest assured that when these sorts of failures occur – humbling as they are – we learn a great deal and apply that knowledge to improve the quality and reliability of our infrastructure.

Given our track record for exceptional reliability over the past 15+ years, we are very disappointed by this incident, and unreservedly apologise to all customers, both those impacted and those who were not.

Please don’t hesitate to contact support if you have any further queries or concerns.

Filed under: Downtime — Jules @ 09:11 - December 1, 2020 :: Comments Off on Partial Uptime Monitoring Outage November 30 06:35 – 14:43 UTC