The Late Troubles
Today one of the data centers where Utopian.net server hardware resides (or is “colocated”) was plagued by a cascade of upstream network failures, causing several of the sites we host to be unavailable in turn. It is cold comfort that this wasn’t exactly server downtime: Servers were effectively disconnected from the Internet, but continued to function normally, albeit uselessly.
The colocation provider is connected over multiple redundant links, but in a twist today a deranged router at one access point morally corrupted alternate links as well. This caused far too many of the web sites we host to be sporadically gone fishin’ between roughly 10 AM and 3 PM, PDT. Today’s localized network failure disconnected 70 or 100 of our clients’ web sites, along with hampering several rather larger providers along the West coast. It did not deter email delivery or other network services for any client domains.
The direct cause has been remedied, and service across our network is mostly normal at the time of this writing (which, viz.), as it has been since at least late afternoon. Minor issues, expected to have a duration of only a few seconds, may yet occur during the next 24 hours.
We apologize for today’s interruption, and pass along the assurance that the colocation host is working with backbone access providers to investigate this scenario and ensure it does not happen again.
The Old Troubles
I will also take this unwelcome opportunity to apologize for and report on unrelated trouble with a hosting server early this month, roughly between March 8th and 10th. An overly optimistic software configuration confronted a series of traffic spikes, and failed. Once or twice spectacularly, and then with some subtlety once or twice more, for good measure. We have, so far, returned service to normal at the server most affected, and continue to closely monitor its performance.
In like a lion, out like a… well, lion. March begins and ends with unacceptable outages, but we’re working constantly on our own services and systems, as well as with our upstream providers, to return to the high availability that has always characterized these facilities.
We’ll be writing about better news in this spot soon (at least by April!), catching up on highlight projects from 2010′s end and filling you in on the new stuff we’re up to now. Stay tuned.