Sometimes the most valuable lessons come from our biggest mistakes. This is the story of how a single misconfigured Cloudflare compression rule broke our Server-Sent Events (SSE) streaming and brought down an entire API for several hours.
The Incident
Date: August 15, 2025
Duration: 4 hours 23 minutes
Impact: ~20% API downtime, 15,000+ affected users
Root Cause: Cloudflare Compression Rule Breaking SSE Streaming
What Happened
1. The Setup
I was working on performance optimization for our API endpoints. The goal was to reduce bandwidth usage and improve response times by enabling Cloudflare's compression features.
2. The Configuration
I enabled the Cloudflare compression rule:
Enable Brotli and Gzip CompressionEnables Cloudflare's default compression setting. Brotli is the preferred compression algorithm.3. The Mistake
The issue wasn't immediately apparent. The compression rule looked safe, but I had forgotten a critical detail: our API used Server-Sent Events (SSE) for real-time streaming, and Cloudflare's compression breaks SSE.
The Technical Problem
How SSE Works
- SSE keeps one long-lived HTTP response openThe server pushes chunks of data separated by
\n\nThe client processes these chunks incrementally as they arriveWhat Cloudflare's Compression Does
- Brotli and Gzip both buffer data before compressingInstead of passing through each tiny SSE event immediately, Cloudflare waits to accumulate enough data for efficient compressionThat buffering breaks the "streaming" nature of SSE
Why SSE Stops Working
- The connection may appear open, but the client never receives events in real-timeCloudflare terminates the stream early if it thinks the compression buffer is incompleteAll real-time functionality breaks completely
The Cascade Failure
Minute 0-5: Rule Activation
- Cloudflare activated the compression ruleAll SSE connections started buffering instead of streamingReal-time updates stopped working
Minute 5-15: Service Degradation
- Users started experiencing errorsReal-time features completely brokenError rates climbed to 100%
Hour 1-2: Investigation
- Team assembled for incident responseInitial investigation focused on backend servicesSSE compression issue was overlooked
Hour 2-3: Discovery
- Finally checked Cloudflare dashboardDiscovered the compression rule was enabledRule was immediately disabled
Hour 3-4: Recovery
- SSE streaming restoredService gradually recoveredReal-time functionality working again
Root Cause Analysis
Primary Cause
Cloudflare Compression Breaking SSE: The compression rule was enabled without understanding that it buffers data, breaking real-time streaming.
Contributing Factors
- Lack of SSE Knowledge: Didn't understand how compression affects streamingMissing Validation: No testing of real-time features after rule changesPoor Monitoring: SSE health wasn't monitored
Impact Assessment
- 15,000+ users affected during peak hours4+ hours of complete service unavailabilityReal-time features completely broken
Lessons Learned
1. Understand Your Protocols
- Never enable compression without understanding how it affects streaming protocolsTest real-time features after any infrastructure changesSSE and WebSocket connections require special consideration
2. Test Real-Time Features
- Always test streaming functionality after compression changesMonitor SSE connection health and event deliveryUse staging environments for infrastructure changes
3. Monitor Streaming Health
- Implement SSE health checksMonitor real-time event deliverySet up alerts for streaming failures
Prevention Measures
1. Automated Testing
- Test SSE functionality after any Cloudflare rule changesImplement automated streaming health checksValidate real-time features in staging
2. Documentation
- Document protocol-specific requirementsCreate change impact checklistsMaintain rollback procedures
3. Change Approval
- Require peer review for compression changesTest streaming protocols before productionSchedule changes during low-traffic periods
Conclusion
This incident taught us that compression isn't always beneficial — it can break real-time protocols like SSE. The key lesson is to understand how infrastructure changes affect your specific use cases, especially streaming protocols.
What I Would Do Differently
- Research first - Understand how compression affects streaming protocolsTest streaming - Always validate real-time features after changesMonitor SSE health - Implement proper streaming monitoringDocument protocols - Create protocol-specific change guidelines
