Postmortems

April 29, 2011 "For two periods during the first day of the issue,

April 29, 2011
“For two periods during the first day of the issue, the degraded EBS cluster affected the EBS APIs and caused high error rates and latencies for EBS calls to these APIs across the entire US East Region.”
Root cause: incorrect networking update during routine upgrade, followed by a “re-mirroring storm” which exhausted free space and blocked servers.
“Unlike a normal network interruption, this change disconnected both the primary and secondary network simultaneously, leaving the affected nodes completely isolated from one another.”

via a discussion of the ongoing (30+ hour) outage of the Apple Developer Center, apparently maintenance gone wrong.
https://news.ycombinator.com/item?id=6071233
http://aws.amazon.com/message/65648/