Back to Case Studies
AI Surveillance

Auto-Scaling RTSP Streaming Architecture with Dual Orchestrators & Zero Packet Drop

A surveillance platform needed to scale its video streaming infrastructure dynamically — handling anywhere from 10 to 200+ IP cameras with hundreds of concurrent viewers and AI processing workers — while guaranteeing zero packet loss during scaling operations and maintaining stable stream URLs that never change.

Discuss Your Project
Auto-Scaling RTSP Streaming Architecture with Dual Orchestrators & Zero Packet Drop
AI Surveillance
Domain
11
Technologies
6
Key Results
Delivered
Status

The Challenge

Fixed streaming infrastructure couldn't handle the variable demands of a growing surveillance platform:

  • Scale Variability — Camera count and viewer demand fluctuated dramatically throughout the day (10x peak-to-trough ratio)
  • Over-Provisioning Cost — Provisioning for peak load meant 70%+ idle resources during off-peak hours
  • Packet Loss During Scaling — Adding or removing streaming servers caused stream interruptions, dropping frames for AI processing workers
  • URL Instability — Cameras and viewers configured with specific server IPs needed reconfiguration when infrastructure changed
  • Different Scaling Needs — Camera ingestion and viewer distribution had fundamentally different load patterns requiring independent scaling
  • AI Worker Disruption — AI processing pipelines crashed when their source stream server was scaled down

Our Solution

We designed a dual-orchestrator auto-scaling streaming architecture with separate ingestion and distribution clusters, a 5-phase graceful shutdown for zero packet drop, stable DNS-based URLs, and automated AI worker reconnection.

Architecture

  • Streaming Server: MediaMTX for RTSP/WebRTC/HLS protocol support
  • Ingestion Cluster: 1-10 servers receiving camera RTSP streams
  • Distribution Cluster: 2-20 servers serving viewers (WebRTC/HLS) and AI workers (RTSP)
  • Dual Orchestrators: Independent scaling controllers for ingestion and distribution
  • Load Balancers: Separate load balancers per cluster with protocol-appropriate algorithms
  • Service Registry: Redis for server status, stream mappings, and coordination
  • Health Monitoring: Active health checks with automated recovery
  • DNS Layer: Stable domain names pointing to load balancers (URLs never change)

Dual Orchestrator Design

Why Two Orchestrators

Ingestion and distribution have fundamentally different scaling characteristics:

  • Ingestion scales with camera count and inbound bandwidth (predictable, grows steadily)
  • Distribution scales with viewer count and AI worker demand (bursty, unpredictable)

Separate orchestrators allow each to scale independently with specialized policies, metrics, and thresholds — without one cluster's scaling decisions affecting the other.

Ingestion Orchestrator

  • Primary Metric: Camera connections per server
  • Secondary Metric: Inbound bandwidth utilization
  • Scale Up: When CPU exceeds threshold or cameras per server exceeds capacity
  • Scale Down: When utilization drops below threshold for a sustained stabilization period
  • Server Range: 1 to 10 servers

Distribution Orchestrator

  • Primary Metric: Viewer + AI worker connections per server
  • Secondary Metric: Outbound bandwidth utilization
  • Scale Up: When CPU exceeds threshold or connections per server exceed capacity
  • Scale Down: When utilization drops below threshold for a sustained period (longer stabilization than ingestion)
  • Server Range: 2 to 20 servers (minimum 2 for high availability)

Zero Packet Drop: 5-Phase Graceful Shutdown

When a distribution server is scheduled for removal, a 5-phase process ensures no frames are lost:

Phase 1: Pre-Notification

Server marked as "DRAINING" in the service registry. Load balancer weight reduced so new connections route elsewhere. Redis pub/sub notifications and webhooks alert AI workers to prepare for migration.

Phase 2: Load Balancer Update

Server removed from the load balancer backend pool. No new connections can reach the draining server. Existing connections continue uninterrupted.

Phase 3: AI Worker Migration

AI workers disconnect from the draining server and reconnect to healthy distribution servers. Checkpoint-based state preservation ensures processing resumes from the exact frame where it left off. Total gap: approximately 3 seconds with zero frames lost.

Phase 4: Viewer Draining

Remaining viewer connections drain naturally over a configurable window. Modern video players auto-reconnect to the same stable URL, which routes to healthy servers. Most viewers experience no interruption.

Phase 5: Cleanup

Verify all connections have closed. Remove server from the service registry. Destroy the cloud instance. Record scaling metrics.

Stable URLs

The URL architecture ensures cameras and clients never need reconfiguration:

  • Camera publish target: A stable ingestion domain name
  • Viewer/AI access target: A stable distribution domain name
  • DNS records point to load balancer IPs (which are permanent)
  • Load balancers handle routing to backend servers transparently
  • Backend servers can be added, removed, or replaced without URL changes

Service Registry (Redis)

A centralized Redis instance coordinates the entire system:

  • Server status tracking (active, draining, offline)
  • Stream-to-server mapping (which camera is on which ingestion server)
  • AI worker state and checkpoint data
  • Load metrics per server for scaling decisions
  • Pub/sub channels for real-time coordination events

AI Client Reconnection

An AI client library provides seamless reconnection:

  • Listens for server removal notifications via Redis pub/sub
  • Automatic frame checkpointing at regular intervals
  • Reconnection to a healthy distribution server on notification
  • Resume processing from checkpoint with minimal gap
  • Metrics reporting for reconnection events

Health Monitoring

  • Active health checks on every server at regular intervals
  • Automatic load balancer updates on server failures
  • Auto-recovery triggers for unresponsive servers
  • Uptime tracking and availability reporting

Key Features

  1. Dual Orchestrators — Independent scaling for ingestion and distribution clusters
  2. Zero Packet Drop — 5-phase graceful shutdown with AI worker migration
  3. Stable URLs — DNS-based routing ensures URLs never change during scaling
  4. AI Worker Reconnection — Checkpoint-based migration with ~3 second gap and zero frame loss
  5. Independent Scaling — Ingestion and distribution scale based on their own metrics
  6. Service Registry — Redis-based coordination for server status and stream mappings
  7. Health Monitoring — Active checks with automatic recovery
  8. Cost Optimization — Automatic scale-down during low-demand periods

Results

Packet Loss: 0.00% for AI workers during scaling operations
AI Reconnection: ~3 seconds with checkpoint-based resume
Scale Up Time: ~60 seconds from trigger to serving
Scale Down Time: ~220 seconds with full graceful shutdown
URL Stability: 100% — no URL changes across any scaling events
Uptime: 99.95% system availability

Technology Stack

MediaMTXPythonFastAPIRedisDockerCloud VM APIsLoad BalancersDNSPrometheusGrafanaWebSocket

Have a Similar Project in Mind?

Let's discuss how we can build a solution tailored to your needs.

Contact UsSchedule Appointment