AI Surveillance

Auto-Scaling RTSP Streaming Architecture with Dual Orchestrators & Zero Packet Drop

A surveillance platform needed to scale its video streaming infrastructure dynamically — handling anywhere from 10 to 200+ IP cameras with hundreds of concurrent viewers and AI processing workers — while guaranteeing zero packet loss during scaling operations and maintaining stable stream URLs that never change.

Discuss Your Project

AI Surveillance

Domain

Technologies

Key Results

Delivered

Status

The Challenge

Fixed streaming infrastructure couldn't handle the variable demands of a growing surveillance platform:

Scale Variability — Camera count and viewer demand fluctuated dramatically throughout the day (10x peak-to-trough ratio)
Over-Provisioning Cost — Provisioning for peak load meant 70%+ idle resources during off-peak hours
Packet Loss During Scaling — Adding or removing streaming servers caused stream interruptions, dropping frames for AI processing workers
URL Instability — Cameras and viewers configured with specific server IPs needed reconfiguration when infrastructure changed
Different Scaling Needs — Camera ingestion and viewer distribution had fundamentally different load patterns requiring independent scaling
AI Worker Disruption — AI processing pipelines crashed when their source stream server was scaled down

Our Solution

We designed a dual-orchestrator auto-scaling streaming architecture with separate ingestion and distribution clusters, a 5-phase graceful shutdown for zero packet drop, stable DNS-based URLs, and automated AI worker reconnection.

Architecture

Streaming Server: MediaMTX for RTSP/WebRTC/HLS protocol support
Ingestion Cluster: 1-10 servers receiving camera RTSP streams
Distribution Cluster: 2-20 servers serving viewers (WebRTC/HLS) and AI workers (RTSP)
Dual Orchestrators: Independent scaling controllers for ingestion and distribution
Load Balancers: Separate load balancers per cluster with protocol-appropriate algorithms
Service Registry: Redis for server status, stream mappings, and coordination
Health Monitoring: Active health checks with automated recovery
DNS Layer: Stable domain names pointing to load balancers (URLs never change)

Dual Orchestrator Design

Why Two Orchestrators

Ingestion and distribution have fundamentally different scaling characteristics:

Ingestion scales with camera count and inbound bandwidth (predictable, grows steadily)
Distribution scales with viewer count and AI worker demand (bursty, unpredictable)

Separate orchestrators allow each to scale independently with specialized policies, metrics, and thresholds — without one cluster's scaling decisions affecting the other.

Ingestion Orchestrator

Primary Metric: Camera connections per server
Secondary Metric: Inbound bandwidth utilization
Scale Up: When CPU exceeds threshold or cameras per server exceeds capacity
Scale Down: When utilization drops below threshold for a sustained stabilization period
Server Range: 1 to 10 servers

Distribution Orchestrator

Primary Metric: Viewer + AI worker connections per server
Secondary Metric: Outbound bandwidth utilization
Scale Up: When CPU exceeds threshold or connections per server exceed capacity
Scale Down: When utilization drops below threshold for a sustained period (longer stabilization than ingestion)
Server Range: 2 to 20 servers (minimum 2 for high availability)

Zero Packet Drop: 5-Phase Graceful Shutdown

When a distribution server is scheduled for removal, a 5-phase process ensures no frames are lost:

Phase 1: Pre-Notification

Server marked as "DRAINING" in the service registry. Load balancer weight reduced so new connections route elsewhere. Redis pub/sub notifications and webhooks alert AI workers to prepare for migration.

Phase 2: Load Balancer Update

Server removed from the load balancer backend pool. No new connections can reach the draining server. Existing connections continue uninterrupted.

Phase 3: AI Worker Migration

AI workers disconnect from the draining server and reconnect to healthy distribution servers. Checkpoint-based state preservation ensures processing resumes from the exact frame where it left off. Total gap: approximately 3 seconds with zero frames lost.

Phase 4: Viewer Draining

Remaining viewer connections drain naturally over a configurable window. Modern video players auto-reconnect to the same stable URL, which routes to healthy servers. Most viewers experience no interruption.

Phase 5: Cleanup

Verify all connections have closed. Remove server from the service registry. Destroy the cloud instance. Record scaling metrics.

Stable URLs

The URL architecture ensures cameras and clients never need reconfiguration:

Camera publish target: A stable ingestion domain name
Viewer/AI access target: A stable distribution domain name
DNS records point to load balancer IPs (which are permanent)
Load balancers handle routing to backend servers transparently
Backend servers can be added, removed, or replaced without URL changes

Service Registry (Redis)

A centralized Redis instance coordinates the entire system:

Server status tracking (active, draining, offline)
Stream-to-server mapping (which camera is on which ingestion server)
AI worker state and checkpoint data
Load metrics per server for scaling decisions
Pub/sub channels for real-time coordination events

AI Client Reconnection

An AI client library provides seamless reconnection:

Listens for server removal notifications via Redis pub/sub
Automatic frame checkpointing at regular intervals
Reconnection to a healthy distribution server on notification
Resume processing from checkpoint with minimal gap
Metrics reporting for reconnection events

Health Monitoring

Active health checks on every server at regular intervals
Automatic load balancer updates on server failures
Auto-recovery triggers for unresponsive servers
Uptime tracking and availability reporting

Key Features

Dual Orchestrators — Independent scaling for ingestion and distribution clusters
Zero Packet Drop — 5-phase graceful shutdown with AI worker migration
Stable URLs — DNS-based routing ensures URLs never change during scaling
AI Worker Reconnection — Checkpoint-based migration with ~3 second gap and zero frame loss
Independent Scaling — Ingestion and distribution scale based on their own metrics
Service Registry — Redis-based coordination for server status and stream mappings
Health Monitoring — Active checks with automatic recovery
Cost Optimization — Automatic scale-down during low-demand periods

Results

Packet Loss: 0.00% for AI workers during scaling operations

AI Reconnection: ~3 seconds with checkpoint-based resume

Scale Up Time: ~60 seconds from trigger to serving

Scale Down Time: ~220 seconds with full graceful shutdown

URL Stability: 100% — no URL changes across any scaling events

Uptime: 99.95% system availability

Technology Stack

MediaMTXPythonFastAPIRedisDockerCloud VM APIsLoad BalancersDNSPrometheusGrafanaWebSocket

More Case Studies

Explore more of our technical implementations

AI Surveillance

RTSP Streaming over VPN with Auto-Scaling Restreaming, HLS Delivery & Recording

A surveillance platform needed to securely ingest RTSP camera feeds from remote locations over VPN tunnels, restream them for web-based viewing and AI processing, auto-scale the restreaming infrastructure based on demand, and record streams for archival — all while maintaining low latency and reliable connectivity across unpredictable network conditions.

Read Case Study

Web Scraping

AI-Powered Blog Content Scraping & Generation Platform

A media company needed an intelligent content platform that could automate blog content creation by scraping existing web content, analyzing it using AI, and generating original, SEO-optimized blog posts from the extracted data.

Read Case Study

Web Scraping

Automated B2B Supplier Data Collection Platform with Anti-Detection & IP Rotation

A sourcing team needed to build a comprehensive supplier database across 19+ product categories and 50+ countries by collecting structured business data from B2B marketplace platforms — at scale, reliably, and without being blocked.

Read Case Study

Frequently Asked Questions

MicrocosmWorks implemented an active-active dual orchestrator design where both orchestrators maintain synchronized state about stream assignments and worker health, with automatic failover that transfers stream management to the surviving orchestrator within seconds if one fails. This eliminates the single point of failure that traditional single-orchestrator designs suffer from, ensuring zero packet drop during orchestrator maintenance or unexpected crashes.

MicrocosmWorks engineered a graceful drain mechanism where retiring workers continue serving their assigned streams until all connections are cleanly migrated to new workers via RTSP TEARDOWN and re-SETUP sequences. New workers are fully initialized and health-checked before receiving stream assignments, and the transition uses overlapping windows where both old and new workers briefly serve the same stream to prevent any interruption.

MicrocosmWorks selected MediaMTX for this project because it is lightweight, open-source, and designed specifically for RTSP re-streaming with minimal resource overhead per stream compared to full-featured media servers. It supports dynamic stream creation via API, runs efficiently in containers for Kubernetes-based auto-scaling, and avoids the per-stream licensing costs of commercial alternatives like Wowza that can become prohibitive at scale.

MicrocosmWorks deployed a comprehensive observability stack that tracks per-stream metrics including packet loss rate, jitter, reconnection count, and end-to-end latency, with alerts that fire before degradation becomes visible to end users. The monitoring system also tracks orchestrator decision-making metrics like scaling events, stream migration durations, and worker utilization trends to enable proactive capacity planning.

Yes, MicrocosmWorks designed the worker nodes to support concurrent RTSP output for live viewers and segmented recording to object storage, with independent resource allocation for each workload. Recording uses a separate write path that buffers segments locally before uploading, so storage I/O spikes never impact live stream delivery, and the auto-scaler accounts for the combined resource demand of both workloads when making scaling decisions.

Have a Similar Project in Mind?

Let's discuss how we can build a solution tailored to your needs.

Start Your Project View All Case Studies