SMS Delivery Outage

For a number of short periods between  Feb 10 17:05 and  Feb 11 09:36 (GMT+11) our New Jersey node (66.246.75.38) failed to deliver SMS and Phone call alerts for some customers.

This was caused by the node’s inability to resolve the DNS record needed to connect with both our primary and backup SMS gateways.

This in turn was caused by failures of the New Jersey data center’s DNS resolvers.

That, however, should not have been a problem because the standard operating environment (SOE) for all Wormly nodes includes a private DNS cache & resolver in order to prevent exactly this sort of problem.

However this particular data center provider uses a DHCP based network configuration process, which caused /etc/resolv.conf to be updated by their DHCP server, thus reverting DNS resolution back to their servers.

We have ensured that this cannot occur again by setting the immutable attribute on /etc/resolv.conf – something which is now part of our SOE.

Needless to say we apologize to the customers this has inconvenienced – and should mention that of course no charges were billed for the failed deliveries.

We’re also pleased to report that our internal monitoring alerted us to this situation, so even in the absence of contact from a couple of helpful customers we would have been able to identify and correct this problem in short order.

Thanks for your support and understanding!

Filed under: Announcements,Meta — Jules @ 10:44 am - February 11, 2009 :: Comments Off

Sorry, the comment form is closed at this time.

Never Offline

A blog hosted by James Peterson, director of insights @ Wormly

On a semi-regular basis James will be trying to demonstrate that website infrastructure really is an exciting topic, and that your users really do care about the uptime & speed of your website.