MicrocosmWorksNag-iinobasyon at Nagdidisenyo ng Digital Cosmos
Tungkol Sa AminMakipag-ugnayan
MicrocosmWorksNagpapabago at Nagdidisenyo ng Digital Cosmos

Nagbibigay ng mga solusyong IT na mahalaga. Kami ay masigasig sa teknolohiya, seguridad, at pagtulong sa mga negosyo na lumago sa pamamagitan ng maaasahan, makabagong IT infrastructure.

[email protected]
+91 7011868196
New Delhi, India

Sentro ng Paglago ng AI

AI HubInobasyon ng StartupPampabilis ng Negosyo

Mga Solusyon

Lahat ng SolusyonMga Wellness at Fitness AppsAI Video PlatformPag-unlad ng AI Agent

Mga Mapagkukunan

Mga PananawMga Gabay sa IndustriyaMga Plano ng PaggamitMga Pattern ng ArkitekturaMga Pag-aaral ng Kaso

Kumpanya

Tungkol sa AminMakipag-ugnayanAng Aming Gawain

Mga Serbisyo

Digital na PagkonsultaImprastraktura ng CloudPag-unlad ng SaaSPag-unlad ng AITeknolohiya ng Video
Pag-unlad ng ERPPagpapasadya ng ZohoPag-unlad ng OdooPagsasama ng SalesforcePag-unlad ng Custom na CRM
Pagsasama ng QuickBooksMga Solusyon sa IoTPag-unlad ng Blockchain
Pagkonsulta sa CybersecuritySuporta sa IT - L3

ยฉ 2026 MicrocosmWorks. Lahat ng karapatan ay nakalaan.

Patakaran sa PagkapribadoMga Tuntunin ng Serbisyo
Bumalik sa mga Case Study
Vector DatabasesNa-publish June 18, 2026 ยท Na-update May 25, 2026

Milvus Autoscaling on Kubernetes with EC2 and S3-Backed Persistent Storage

An AI platform with rapidly growing vector data (embeddings for search, recommendations, and RAG) needed their Milvus vector database to scale automatically based on query load and data volume โ€” with durable, cost-effective storage that wouldn't be lost if pods restarted or nodes were replaced.

Pag-usapan ang Iyong Proyekto
milvus-autoscaling-kubernetes-s3.webp
Vector Databases
Domain
11
Technologies
6
Key Results
Delivered
Status

Ang Hamon

Running Milvus at scale in production presented several infrastructure challenges:

  • Fixed Capacity โ€” Static Milvus deployments couldn't handle 10x query load spikes during peak hours
  • Data Loss Risk โ€” Pod restarts on ephemeral storage caused index rebuilds taking hours on large collections
  • Cost Inefficiency โ€” Over-provisioning for peak load meant paying for idle compute 70% of the time
  • Storage Costs โ€” Block storage volumes tied to instances were expensive for multi-terabyte vector datasets
  • Index Rebuilds โ€” Re-indexing millions of vectors after a node replacement took hours of downtime
  • Multi-AZ Durability โ€” Single-AZ storage couldn't survive availability zone failures

Ang Aming Solusyon

We deployed Milvus on Kubernetes (EKS) with Horizontal Pod Autoscaling for query nodes, Cluster Autoscaler for compute, and Amazon S3 as the persistent storage backend โ€” eliminating data loss risk and reducing storage costs by ~80%.

Architecture

  • Orchestration: Amazon EKS (Elastic Kubernetes Service)
  • Compute: EC2 instances (mixed instance types) managed by Cluster Autoscaler
  • Vector DB: Milvus deployed via Helm chart in distributed mode
  • Object Storage: Amazon S3 for segment files, index files, and binlog persistence
  • Metadata: etcd cluster for Milvus coordination and metadata
  • Message Queue: Message streaming for Milvus log pipeline
  • Monitoring: Prometheus + Grafana for Milvus metrics and autoscaling signals

Milvus Distributed Architecture on Kubernetes

Component Deployment

Milvus runs in distributed mode with dedicated node types, each deployed as a Kubernetes workload with independent scaling:

  • Proxy Nodes โ€” Handle client connections and request routing
  • Query Nodes โ€” Execute vector searches and load segments into memory
  • Data Nodes โ€” Handle write paths and flush segments to S3
  • Index Nodes โ€” Build vector indexes and write to S3
  • Coordinator โ€” Cluster coordination and timestamp allocation
  • etcd โ€” Metadata storage and service discovery
  • Message Queue โ€” Log streaming and write-ahead log

Horizontal Pod Autoscaling (HPA)

Query Node Autoscaling

Query nodes are the primary scaling target โ€” they load vector segments into memory and execute searches. Scaling is driven by multiple metrics including CPU utilization, memory utilization, query queue depth, and P99 query latency. The HPA is configured with appropriate min/max replicas, fast scale-up for handling spikes, and gradual scale-down to avoid flapping.

Index Node Autoscaling

Index nodes scale based on pending index build jobs โ€” scaling up when the build queue has pending items and scaling back down when idle.

EC2 Cluster Autoscaler

Instance Strategy

  • Node Groups: Multiple node groups with different instance types for cost optimization
  • Query Workload: Memory-optimized instances for in-memory vector segments
  • Index Workload: Compute-optimized instances for CPU-intensive index building
  • Spot Instances: Index nodes and non-critical data nodes run on spot instances for significant savings
  • On-Demand: Query nodes and coordinators on on-demand instances for stability

Scaling Behavior

When HPA creates new pods that can't be scheduled, the Cluster Autoscaler provisions new EC2 instances in the appropriate node group. New query nodes then load their assigned segments from S3 into memory and begin serving queries, with the total scale-up process completing in minutes.

S3-Backed Persistent Storage

Why S3 Instead of Block Storage

S3 provides significant advantages over block storage for Milvus:

  • ~80% lower storage cost for large datasets
  • 11-nines durability with built-in multi-AZ replication
  • Unlimited scaling without manual volume resizing
  • Pod-independent โ€” Data always available regardless of pod or node lifecycle
  • No AZ lock-in โ€” Data accessible from any availability zone

Data Flow with S3

  1. Write Path: Data nodes buffer inserts in memory, then flush sealed segments to S3
  2. Index Build: Index nodes read segments from S3, build indexes, and write index files back to S3
  3. Query Path: Query nodes download segments and indexes from S3, load into memory, and serve queries
  4. Recovery: On pod restart, query nodes re-download assigned segments from S3 (no data loss)

S3 Performance Optimization

  • Segment size tuning balances S3 request costs vs. data freshness
  • Local SSD caching on NVMe instance storage avoids repeated S3 reads for hot segments
  • Parallel downloads enable fast query node startup
  • Lifecycle policies archive old data to cheaper storage tiers

Monitoring & Observability

The deployment includes comprehensive monitoring via Prometheus and Grafana:

  • Query Performance โ€” Latency distribution, QPS, cache hit rate
  • Cluster Overview โ€” Node count, pod status, resource utilization
  • Storage Health โ€” S3 usage, segment counts, flush rates
  • Autoscaling Events โ€” HPA events, node scaling, pod scheduling latency
  • Alerting โ€” Automated alerts for high latency, OOM risk, flush failures, and capacity limits

Key Features

  1. Query Node HPA โ€” Automatic scaling based on CPU, memory, latency, and queue depth
  2. EC2 Cluster Autoscaler โ€” Dynamic node provisioning with mixed instance types
  3. S3 Persistence โ€” 11-nines durability, ~80% cheaper than block storage, survives AZ failures
  4. Spot Instances โ€” Index and data nodes on spot for significant compute savings
  5. Local SSD Cache โ€” NVMe caching eliminates repeated S3 reads for hot segments
  6. Zero-Downtime Recovery โ€” Pod restarts reload segments from S3 without data loss
  7. Multi-AZ โ€” S3 storage + multi-AZ node groups for full AZ failure tolerance
  8. Observability โ€” Prometheus + Grafana with Milvus-specific metrics and autoscaling visibility

Mga Resulta

Storage Cost: ~80% reduction vs. block-storage-backed deployment
Compute Cost: ~40% reduction via spot instances and right-sized autoscaling
Query Latency: P99 maintained under 200ms during 10x load spikes

Technology Stack

MilvusAmazon EKSKubernetes HPACluster AutoscalerAmazon EC2Amazon S3etcdPrometheusGrafanaHelmNVMe Instance Storage

caseStudyDetail.more Mga Case Study

Tuklasin ang higit pa sa aming mga teknikal na implementasyon

AI Accounting

Pagpoproseso ng Invoice na Pinapagana ng AI gamit ang OCR at Integrasyon ng QuickBooks

Isang katamtamang laking negosyo na nagpoproseso ng daan-daang invoice ng vendor buwan-buwan ang kinailangan alisin ang manu-manong pagpasok ng data sa pamamagitan ng awtomatikong pagkuha ng data ng invoice gamit ang AI/OCR at direktang i-sync ito sa QuickBooks para sa bookkeeping at pagsubaybay sa pagbabayad.

Basahin ang Case Study
Video Encoding

Client-Side Ad Insertion (CSAI) na may pag-parse ng SCTE-35 Marker at Integrasyon ng Multi-Platform Player

Isang platform para sa video streaming ay nangangailangan na magpatupad ng Client-Side Ad Insertion (CSAI) sa mga web, mobile, at connected TV apps โ€” na nagbibigay-daan sa mga personalized, device-level na karanasan sa ad na may buong suporta sa interaksyon ng ad (mga clickable overlay, companion banner, skip button) na hindi kayang ibigay ng server-side insertion.

Mga Madalas Itanong

MicrocosmWorks configured horizontal pod autoscaling with custom metrics from Milvus's built-in memory usage exporter, triggering scale-out events when any query node exceeds 75% memory utilization. Collection segments are automatically redistributed across new nodes using Milvus's segment manager, preventing any single node from becoming a bottleneck.

MicrocosmWorks selected S3-backed storage using MinIO as the object storage layer because it decouples storage from compute, allowing query nodes to scale independently without provisioning new EBS volumes. This architecture reduces storage costs by approximately 60% compared to gp3 EBS volumes while maintaining sub-100ms segment load times from S3.

MicrocosmWorks configured the deployment with replica sets for each Milvus component, including query nodes, index nodes, and data nodes, with pod disruption budgets ensuring minimum availability during rolling updates. Since all persistent data resides in S3, a failed node's replacement can immediately access all segments without data migration.

MicrocosmWorks found that r6i.2xlarge instances provide the optimal cost-to-performance ratio for Milvus query workloads, offering 64GB of memory for in-memory segment caching at a competitive spot price. For GPU-accelerated index building, g5.xlarge instances with NVIDIA A10G GPUs reduced index build times by 8x compared to CPU-only builds.

MicrocosmWorks delivers Kubernetes infrastructure projects at rates of $30-$50/hr, with a Milvus autoscaling deployment including Helm chart customization, HPA configuration, S3 integration, and monitoring setup typically requiring 150-250 hours. Ongoing managed support for cluster optimization and upgrades is available at the same hourly rates.

Handa nang Baguhin ang Iyong Negosyo?

Pag-usapan natin kung paano namin mailalapat ang katulad na mga solusyon sa iyong mga hamon.

Makipag-ugnayancaseStudyDetail.viewAllCaseStudies
Recovery Time: Pod restart to serving queries in 30-90 seconds (S3 segment reload)
Durability: Zero data loss across multiple node replacements and AZ failovers
Scale: Handled 50M+ vectors with automatic scaling from 2 to 20 query nodes
Basahin ang Case Study
Web Scraping

Platform sa Pag-scrape at Pagbuo ng Nilalaman ng Blog na Pinapagana ng AI

Isang kumpanya ng media ang nangailangan ng matalinong platform ng nilalaman na kayang i-automate ang paggawa ng nilalaman ng blog sa pamamagitan ng pag-scrape ng kasalukuyang nilalaman ng web, pagsusuri nito gamit ang AI, at pagbuo ng orihinal, naka-optimize para sa SEO na mga post sa blog mula sa nakuha na datos.

Basahin ang Case Study