The outage occurred on 18 November 2025, early in the day (UTC) and affected a large portion of the internet.
Cloudflare’s own status updates indicate the root issue was in a configuration file (automatically generated for bot-mitigation / threat traffic) that became larger than expected, triggering a software crash in a core traffic-handling subsystem.
Specifically, the company said: “In short, a latent bug in a service underpinning our bot mitigation capability started to crash after a routine configuration change we made. That cascaded into a broad degradation to our network and other services. This was not an attack.”
Cloudflare also said they observed “a spike in unusual traffic” around 11:20 UTC that caused error rates to rise.
The company issued a fix, and by roughly early afternoon EST/UTC most services were restored.
What it affected
Because Cloudflare provides services for a very large portion of websites and apps (≈ 20% of global web traffic) the outage had widespread ripple effects.
Affected platforms included:
ChatGPT (via OpenAI)
X (formerly Twitter)
Spotify, Claude, Canva and others.
Even public services: e.g., transport websites (like NJ Transit) and government agencies reported degraded or unavailable services.
The outage underscores how dependent the internet has become on a few infrastructure providers: one service failure cascades broadly.
Preliminary RCA (Root Cause Analysis)
Configuration file growth: An automatically generated configuration file (for bot mitigation / threat-traffic handling) grew beyond its expected size. That triggered a crash in the software subsystem that applies that configuration.
Cascade effect: The crash in that subsystem created broader service degradation rather than a localized fault, affecting traffic handling across multiple Cloudflare services.
Not a malicious attack: Cloudflare has stated there is no evidence that the outage was caused by an external attack or malicious activity.
Triggering event: They observed a spike in unusual traffic which may have stressed the system and exposed the latent bug. Timing details: around 11:20 UTC.
Fix deployed: A change was implemented which resolved the issue, and normal service began recovering.
Key take-aways & lessons
Single point of failure risk: Even with distributed infrastructure, a bug in one subsystem at a major infrastructure provider can cascade broadly — many sites were impacted simply because they used Cloudflare.
Importance of limit/size controls: The configuration file grew “beyond expected” size. Systems that automatically generate configs should enforce strong limits.
Traffic spikes + latent bugs = danger: The unusual traffic spike exposed a hidden bug. Systems must be stress-tested for unusual loads, not just typical ones.
Transparent communication: Cloudflare’s early acknowledgement and updates helped clarify the incident — good practice for critical infrastructure providers.
Resilience planning: Customers relying on third-party infrastructure should have failover or backup plans if a provider goes down.
Monitoring & alerting: Adequate monitoring of error rates, internal config growth, and abnormal traffic flows can help detect issues before they ripple out.