Here are some more details on the Google API infrastructure outage linked earlier by 

Here are some more details on the Google API infrastructure outage linked earlier by @Paul_Lindner .

At least from the outside, it seems to be a nice reminder that it’s just as hard to make monitoring systems cope with exceptional circumstances as it is to build that same resilience into the systems they monitor.

I thought this was an interesting unintended consequence: “This rollback failed due to complexity in the configuration system which caused our security checks to reject [it]”
You might have guessed that a security check like that is fail-safe, but in this case it prolonged a service outage.