Google+ post by Jeff Kaplan on 2012-12-30 02:37:13 UTC

ELB postmortem
http://aws.amazon.com/message/680587/

Nice, tasty details.

Thanks for sharing . So the “control plane” basically looked at some persistent data for information about the ELB’s state and this data was accidentally deleted ? Having been there before, I would hate to be that guy who accidentally deleted those files and regretted it a nanosecond later.

I’m actually surprised they singled out a person in this. You always hear about failures being talked about as “we failed…”, even if it did come down to one person making the mistake, there were failures in the system as a whole to allow this to happen and then to take so long to detect.