A Post-Mortem on Yesterday’s Site Outage

Many of you probably noticed that this site was down for several hours yesterday. I’m not sure of the full extent, but looking at my Google Analytics data, it looks as if the site was down from about 9 am EST to sometime between noon and 1 pm EST. Let’s call it three hours.

Outage

Whenever the site is down I get stressed, knowing that it sucks to come to a site an get an obscure “Database connection error.” I was also annoyed because my host had recommended I move to another server and host platform a few months ago–something I did because they said it would greatly reduce the chance of an outage and improve the performance of the site.

The move certainly did the latter, but as yesterday showed, there was a major outage on the new platform. I spent some time working with the host to find out the root cause (database issues, not related to my site specifically) and what measures were being taken to reduce the chances of this happening again in the future. Being an IT person myself, I can talk their language and that helps a little.

In any case, I feel pretty confident that the chances are small that we’ll see another outage like this one–at least one related to the same cause.

I wanted to apologize for the outage yesterday, and for any frustration it may have caused. I am monitoring things here, and hopefully we won’t see this particular problem happen again.