Cloudflare’s customer sites and their own website were down for about 30 mins, giving 502 errors. The root cause was an update to Web Application Firewall rules, where new rules which are run in a simulation mode (so they don’t affect traffic) happened to include a very expensive regexp which pegged CPUs.
This was not an attack (as some have speculated) and we are incredibly sorry that this incident occurred. Internal teams are meeting as I write performing a full post-mortem to understand how this occurred and how we prevent this from ever occurring again.
Cloudflare run a CDN with many global points of presence and proxy a large proportion of website traffic. They also run the 22.214.171.124 DNS resolver and act as a DNS provider.
Many web services were affected, and many workflows and deployments which rely on online services.
Though some of them might not be because of Cloudflare, the ones I spot checked all do appear related. Medium, DigitalOcean, Shopify, CodeShip, Pingdom, and many more. The impact is staggering.
Via the discussion at HN.