InGameNow has been down since Saturday... So have thousands of other sites as 9,000 servers were caught in a massive fire at server and hosting company The Planet's Houston, Texas data center. Now the good news is that no one was injured and that 6,000 of those servers were restored last night at 5:06pm CDT. The bad news: InGameNow is still down.
There are numerous lessons to be learned in this ordeal (one of which is the strain this causes on a start up, it's users and the engineers), but I am particularly struck by The Planet's responses - which truthfully has been mixed.
I often write about the importance of transparency. For customers, The Planet has been very transparent and responsive. They have sent out numerous mailings (see below), are reachable via phone, and in Twitter-esque style, are updating a blog / forum each hour with "Data Center Status Updates". Considering the severity of the issue, I applaud the type of regular communication and frankly, it probably also relieves some of their CS burden.
That transparency, however, isn't reflected to new customers - and that's disturbing. There is no mention of any issues in either "Recent News" or "Recent Blog Posts" - in fact, their main blog is linked to directly on the homepage and hasn't been updated since May 28th (last post: Hello, World). And much of the page is dedicated to "Learning more about World Class Data Centers". Considering the circumstances, perhaps part of the page should be dedicated to support or at least linking to the Status Blog.
The part that really irks me is the default pop-up screen that asks you if you'd like to interact with Live Sales Representatives... I am tempted to type in "When will InGameNow.com be running?"
Update: 11:05am pst
InGameNow is now up... although, as you can tell, it is running slowly and graphics aren't rendering as the necessary fixes we have implemented will take a few hours to propagate.
The Planet's Initial Email:
> Dear Valued Customers:
> This evening at 4:55 in our H1 data center, electrical gear shorted,
> creating an explosion and fire that knocked down three walls
> surrounding our electrical equipment room Thankfully, no one was
> injured. In addition, no customer servers were damaged or lost.
>
> We have just been allowed into the building to physically inspect
> the damage. Early indications are that the short was in a high-
> volume wire conduit. We were not allowed to activate our backup
> generator plan based on instructions from the fire department.
>
> This is a significant outage, impacting approximately 9,000 servers
> and 7,500 customers. All members of our support team are in, and
> all vendors who supply us with data center equipment are on site.
> Our initial assessment, although early, points to being able to have
> some service restored by mid-afternoon on Sunday. Rest assured we
> are working around the clock.
>
> We are in the process of communicating with all affected customers.
> we are planning to post updates every hour via our forum and in our
> customer portal. Our interactive voice response system is updating
> customers as well.
>
> There is no impact in any of our other five data centers.
>
> I am sorry that this accident has occurred and apologize for the
> impact.
>
> Sincerely,
>
> Douglas J. Erwin
> Chairman & Chief Executive Officer