Postmortems

Topic Replies Activity
An update on Sunday’s service disruption | Google Cloud Blog 3 June 21, 2019
Matrix Security Postmortem 1 May 9, 2019
Fire alarm destroys hard drives Not really a post-mortem. I know. 7 May 8, 2019
1628: Swedish Navy’s new flagship sinks on its maiden voyage 1 May 2, 2019
NASA's Taurus launches - two satellites lost, supplier to blame 1 May 2, 2019
Mailchimp's day-long outage due to 32-bit ID wrap 1 April 30, 2019
Finding a CPU Design Bug in the Xbox 360 1 April 27, 2019
Accidents with hypergolics: NASA's catalogue 2 April 26, 2019
Intel's Spectre, and the Tay Bridge disaster 1 April 24, 2019
Boeing's 737 MAX (two total losses, many hundreds killed) 2 April 23, 2019
Consequences of GPS rollover in April 2019 1 April 17, 2019
From accident to investment: How to run better blameless postmortems 1 April 17, 2019
The RISKS digest has the ocasional PM-worthy story 1 April 16, 2019
With Google+ disappearing, is anyone starting a similar community or blog anywhere else? 14 March 16, 2019
Where are we all congregating once this community goes away on Apr 2? 5 February 25, 2019
"Fortnite hit a new peak of 3.4 million concurrent players last Sunday… and that 2 February 13, 2019
An interesting inside look on Cloudbleed, 1 February 2, 2019
"I would never have thought you could trust a random unauthenticated person on the 5 January 31, 2019
"A Cascading Failure of Distributed Systems" , , and some more. 4 January 17, 2019
Google+ post by Peter Scully on 2018-11-24 21:47:00 UTC 1 November 24, 2018
Github incident 2018-10-21 TL;DR a 43 seconds network partition due to hardware upgrades results 1 October 31, 2018
Originally shared by Troy Hunt 1 October 11, 2018
Microsoft on the loss of user data when updating to the 1809 version of 2 October 10, 2018
outage. TL;DR lightning strikes caused power instability, automated shutdown exposed some unexpected interdependencies. 1 September 24, 2018
Originally shared by James Salsman Anil Dash : 6 May 3, 2018
travis-ci outage in production TL;DR developer inadvertently ran truncate query against production database, 4 April 9, 2018
Via HN, the tale of chasing down a hardware bug in an ARM CPU. 3 February 19, 2018
"The first time we realized that something might be wrong was shortly after we'd 2 January 30, 2018
Originally shared by Ed S , 1 December 6, 2017
First outage at honeycomb.io 2 August 24, 2017