Milvusã®Kubernetesäžã§ã®ãªãŒãã¹ã±ãŒãªã³ã°ãšEC2ããã³S3ããã¯ã¢ããæ°žç¶ã¹ãã¬ãŒãž
æ¥éã«å¢å ãããã¯ãã«ããŒã¿ïŒæ€çŽ¢ãã¬ã³ã¡ã³ããŒã·ã§ã³ãRAGã®ããã®åã蟌ã¿ïŒãæã€AIãã©ãããã©ãŒã ã¯ãã¯ãšãªè² è·ãšããŒã¿éã«åºã¥ããŠMilvusãã¯ãã«ããŒã¿ããŒã¹ãèªåçã«ã¹ã±ãŒãªã³ã°ãããå¿ èŠããããŸãããããã¯ãpodãåèµ·åããããnodeã眮ãæãããããããŠã倱ãããªããèä¹ æ§ãããè²»çšå¯Ÿå¹æã®é«ãã¹ãã¬ãŒãžãåãããã®ã§ããã
ãããžã§ã¯ããçžè«ãã
課é¡
æ¬çªç°å¢ã§Milvusãå€§èŠæš¡ã«å®è¡ããã«ã¯ãããã€ãã®ã€ã³ãã©ã¹ãã©ã¯ãã£ã«é¢ãã課é¡ããããŸããã
- åºå®ãã£ãã·ã㣠â éçãªMilvusãããã€ã¡ã³ãã§ã¯ãããŒã¯æã®10åã®ã¯ãšãªè² è·ã¹ãã€ã¯ãåŠçã§ããŸããã§ãã
- ããŒã¿æå€±ãªã¹ã¯ â ãšãã§ã¡ã©ã«ã¹ãã¬ãŒãžäžã§ã®Podã®åèµ·åã«ãããå€§èŠæš¡ãªã³ã¬ã¯ã·ã§ã³ã§ã€ã³ããã¯ã¹ã®åæ§ç¯ã«æ°æéãããããšããããŸãã
- ã³ã¹ãéå¹çæ§ â ããŒã¯è² è·ã«åããŠéå°ã«ããããžã§ãã³ã°ãããšãæéã®70%ã¯ã¢ã€ãã«ç¶æ ã®ã³ã³ãã¥ãŒãã£ã³ã°ã«æéãæ¯æãããšã«ãªããŸã
- ã¹ãã¬ãŒãžã³ã¹ã â ã€ã³ã¹ã¿ã³ã¹ã«çŽä»ãããããããã¯ã¹ãã¬ãŒãžããªã¥ãŒã ã¯ãæ°ãã©ãã€ãèŠæš¡ã®ãã¯ãã«ããŒã¿ã»ããã«ã¯é«äŸ¡ã§ãã
- ã€ã³ããã¯ã¹ã®åæ§ç¯ â ããŒãã®äº€æåŸã«æ°çŸäžã®ãã¯ãã«ã®åã€ã³ããã¯ã¹ã«æ°æéã®ããŠã³ã¿ã€ã ãå¿ èŠã§ãã
- Multi-AZã®èä¹ æ§ â ã·ã³ã°ã«AZã¹ãã¬ãŒãžã§ã¯ãã¢ãã€ã©ããªãã£ãŸãŒã³ã®é害ã«èããããšãã§ããŸããã§ãã
ç§ãã¡ã®ãœãªã¥ãŒã·ã§ã³
ã¯ãšãªããŒãçšã®Horizontal Pod Autoscalingãã³ã³ãã¥ãŒãã£ã³ã°çšã®Cluster AutoscalerããããŠæ°žç¶ã¹ãã¬ãŒãžããã¯ãšã³ããšããŠAmazon S3ã䜿çšããŠãKubernetes (EKS)äžã«Milvusããããã€ããŸãããããã«ãããããŒã¿æå€±ã®ãªã¹ã¯ãæé€ããã¹ãã¬ãŒãžã³ã¹ããçŽ80%åæžããŸããã
ã¢ãŒããã¯ãã£
- ãªãŒã±ã¹ãã¬ãŒã·ã§ã³: Amazon EKS (Elastic Kubernetes Service)
- ã³ã³ãã¥ãŒãã£ã³ã°: Cluster Autoscalerã«ãã£ãŠç®¡çãããEC2ã€ã³ã¹ã¿ã³ã¹ïŒæ··åã€ã³ã¹ã¿ã³ã¹ã¿ã€ãïŒ
- ãã¯ãã«DB: 忣ã¢ãŒãã§Helmãã£ãŒããä»ããŠãããã€ãããMilvus
- ãªããžã§ã¯ãã¹ãã¬ãŒãž: ã»ã°ã¡ã³ããã¡ã€ã«ãã€ã³ããã¯ã¹ãã¡ã€ã«ãããã³ãã€ããªãã°ã®æ°žç¶åã®ããã®Amazon S3
- ã¡ã¿ããŒã¿: Milvusã®èª¿æŽãšã¡ã¿ããŒã¿ã®ããã®etcdã¯ã©ã¹ã¿ãŒ
- ã¡ãã»ãŒãžãã¥ãŒ: Milvusãã°ãã€ãã©ã€ã³ã®ããã®ã¡ãã»ãŒãžã¹ããªãŒãã³ã°
- ã¢ãã¿ãªã³ã°: Milvusã®ã¡ããªã¯ã¹ãšãªãŒãã¹ã±ãŒãªã³ã°ã·ã°ãã«ã®ããã®Prometheus + Grafana
Kubernetesäžã®Milvus忣ã¢ãŒããã¯ãã£
ã³ã³ããŒãã³ãã®ãããã€
Milvusã¯ãå°çšã®ããŒãã¿ã€ããæã€åæ£ã¢ãŒãã§å®è¡ããããããããç¬ç«ããã¹ã±ãŒãªã³ã°ãæã€Kubernetesã¯ãŒã¯ããŒããšããŠãããã€ãããŸãã
- ãããã·ããŒã â ã¯ã©ã€ã¢ã³ãæ¥ç¶ãšãªã¯ãšã¹ãã«ãŒãã£ã³ã°ãåŠçããŸã
- ã¯ãšãªããŒã â ãã¯ãã«æ€çŽ¢ãå®è¡ããã»ã°ã¡ã³ããã¡ã¢ãªã«ããŒãããŸã
- ããŒã¿ããŒã â æžã蟌ã¿ãã¹ãåŠçããã»ã°ã¡ã³ããS3ã«ãã©ãã·ã¥ããŸã
- ã€ã³ããã¯ã¹ããŒã â ãã¯ãã«ã€ã³ããã¯ã¹ãæ§ç¯ããS3ã«æžã蟌ã¿ãŸã
- ã³ãŒãã£ããŒã¿ãŒ â ã¯ã©ã¹ã¿ãŒã®èª¿æŽãšã¿ã€ã ã¹ã¿ã³ãã®å²ãåœãŠ
- etcd â ã¡ã¿ããŒã¿ã¹ãã¬ãŒãžãšãµãŒãã¹ãã£ã¹ã«ããª
- ã¡ãã»ãŒãžãã¥ãŒ â ãã°ã¹ããªãŒãã³ã°ãšã©ã€ãã¢ããããã°
Horizontal Pod Autoscaling (HPA)
ã¯ãšãªããŒãã®ãªãŒãã¹ã±ãŒãªã³ã°
ã¯ãšãªããŒãã¯äž»èŠãªã¹ã±ãŒãªã³ã°ã¿ãŒã²ããã§ããããã¯ãã«ã»ã°ã¡ã³ããã¡ã¢ãªã«ããŒãããŠæ€çŽ¢ãå®è¡ããŸããã¹ã±ãŒãªã³ã°ã¯ãCPU䜿çšçãã¡ã¢ãªäœ¿çšçãã¯ãšãªãã¥ãŒæ·±åºŠãP99ã¯ãšãªã¬ã€ãã³ã·ãŒãå«ãè€æ°ã®ã¡ããªã¯ã¹ã«ãã£ãŠé§åãããŸããHPAã¯ãé©åãªæå°/æå€§ã¬ããªã«æ°ãã¹ãã€ã¯ãåŠçããããã®é«éã¹ã±ãŒã«ã¢ãããããã³ãã©ããã³ã°ãé¿ããããã®æ®µéçã¹ã±ãŒã«ããŠã³ã§æ§æãããŠããŸãã
ã€ã³ããã¯ã¹ããŒãã®ãªãŒãã¹ã±ãŒãªã³ã°
ã€ã³ããã¯ã¹ããŒãã¯ãä¿çäžã®ã€ã³ããã¯ã¹æ§ç¯ãžã§ãã«åºã¥ããŠã¹ã±ãŒãªã³ã°ããŸãããã«ããã¥ãŒã«ä¿çäžã®ã¢ã€ãã ãããå Žåã«ã¹ã±ãŒã«ã¢ããããã¢ã€ãã«æã«ã¹ã±ãŒã«ããŠã³ããŸãã
EC2 Cluster Autoscaler
ã€ã³ã¹ã¿ã³ã¹æŠç¥
- ããŒãã°ã«ãŒã: ã³ã¹ãæé©åã®ããã«ç°ãªãã€ã³ã¹ã¿ã³ã¹ã¿ã€ããæã€è€æ°ã®ããŒãã°ã«ãŒã
- ã¯ãšãªã¯ãŒã¯ããŒã: ã€ã³ã¡ã¢ãªãã¯ãã«ã»ã°ã¡ã³ãã®ããã®ã¡ã¢ãªæé©åã€ã³ã¹ã¿ã³ã¹
- ã€ã³ããã¯ã¹ã¯ãŒã¯ããŒã: CPUéçŽåã€ã³ããã¯ã¹æ§ç¯ã®ããã®ã³ã³ãã¥ãŒãã£ã³ã°æé©åã€ã³ã¹ã¿ã³ã¹
- Spot Instances: ã€ã³ããã¯ã¹ããŒããšéã¯ãªãã£ã«ã«ãªããŒã¿ããŒãã¯ãå€§å¹ ãªã³ã¹ãåæžã®ããã«Spot Instancesã§å®è¡ãããŸã
- On-Demand: å®å®æ§ã®ããã«ãã¯ãšãªããŒããšã³ãŒãã£ããŒã¿ãŒã¯ãªã³ããã³ãã€ã³ã¹ã¿ã³ã¹äžã§å®è¡ãããŸã
ã¹ã±ãŒãªã³ã°æå
HPAãã¹ã±ãžã¥ãŒã«ã§ããªãæ°ããPodãäœæãããšãCluster Autoscalerã¯é©åãªããŒãã°ã«ãŒãã«æ°ããEC2ã€ã³ã¹ã¿ã³ã¹ãããããžã§ãã³ã°ããŸãããã®åŸãæ°ããã¯ãšãªããŒãã¯S3ããå²ãåœãŠãããã»ã°ã¡ã³ããã¡ã¢ãªã«ããŒãããã¯ãšãªã®åŠçãéå§ããŸãããã®å šäœã®ã¹ã±ãŒã«ã¢ããããã»ã¹ã¯æ°åã§å®äºããŸãã
S3ããã¯ã¢ããæ°žç¶ã¹ãã¬ãŒãž
ãããã¯ã¹ãã¬ãŒãžã§ã¯ãªãS3ã䜿çšããçç±
S3ã¯ãMilvusã«ãšã£ãŠãããã¯ã¹ãã¬ãŒãžãããå€§å¹ ãªå©ç¹ãæäŸããŸãã
- å€§èŠæš¡ããŒã¿ã»ããã®å ŽåãçŽ80%äœãã¹ãã¬ãŒãžã³ã¹ã
- çµã¿èŸŒã¿ã®Multi-AZã¬ããªã±ãŒã·ã§ã³ã«ãã11-ninesã®èä¹ æ§
- æåã§ã®ããªã¥ãŒã ãµã€ãºå€æŽãªãã«ç¡å¶éã®ã¹ã±ãŒãªã³ã°
- Podããç¬ç« â PodãããŒãã®ã©ã€ããµã€ã¯ã«ã«é¢ä¿ãªããããŒã¿ã¯åžžã«å©çšå¯èœã§ã
- AZããã¯ã€ã³ãªã â ã©ã®Availability Zoneããã§ãããŒã¿ã«ã¢ã¯ã»ã¹ã§ããŸã
S3ãšã®ããŒã¿ãããŒ
- æžã蟌ã¿ãã¹: ããŒã¿ããŒãã¯ã¡ã¢ãªã«æ¿å ¥ããããã¡ãããã®åŸãç¢ºå®æžã¿ã»ã°ã¡ã³ããS3ã«ãã©ãã·ã¥ããŸã
- ã€ã³ããã¯ã¹æ§ç¯: ã€ã³ããã¯ã¹ããŒãã¯S3ããã»ã°ã¡ã³ããèªã¿èŸŒã¿ãã€ã³ããã¯ã¹ãæ§ç¯ããã€ã³ããã¯ã¹ãã¡ã€ã«ãS3ã«æžãæ»ããŸã
- ã¯ãšãªãã¹: ã¯ãšãªããŒãã¯S3ããã»ã°ã¡ã³ããšã€ã³ããã¯ã¹ãããŠã³ããŒãããã¡ã¢ãªã«ããŒãããŠã¯ãšãªãåŠçããŸã
- ãªã«ããª: Podã®åèµ·åæãã¯ãšãªããŒãã¯S3ããå²ãåœãŠãããã»ã°ã¡ã³ããåããŠã³ããŒãããŸãïŒããŒã¿æå€±ãªãïŒ
S3ã®ããã©ãŒãã³ã¹æé©å
- ã»ã°ã¡ã³ããµã€ãºã®ãã¥ãŒãã³ã°ã«ãããS3ãªã¯ãšã¹ãã³ã¹ããšããŒã¿ã®é®®åºŠã®ãã©ã³ã¹ãåããŸã
- NVMeã€ã³ã¹ã¿ã³ã¹ã¹ãã¬ãŒãžäžã®ããŒã«ã«SSDãã£ãã·ã³ã°ã«ããããããã»ã°ã¡ã³ãã«å¯ŸããS3ã®ç¹°ãè¿ãèªã¿èŸŒã¿ãåé¿ããŸã
- 䞊åããŠã³ããŒãã«ãããé«éãªã¯ãšãªããŒãã®èµ·åãå¯èœã«ãªããŸã
- ã©ã€ããµã€ã¯ã«ããªã·ãŒã«ãããå€ãããŒã¿ãããå®äŸ¡ãªã¹ãã¬ãŒãžå±€ã«ã¢ãŒã«ã€ãããŸã
ã¢ãã¿ãªã³ã°ãšå¯èŠ³æž¬æ§
ãããã€ã¡ã³ãã«ã¯ãPrometheusãšGrafanaãä»ããå æ¬çãªã¢ãã¿ãªã³ã°ãå«ãŸããŠããŸãã
- ã¯ãšãªããã©ãŒãã³ã¹ â ã¬ã€ãã³ã·ãŒååžãQPSããã£ãã·ã¥ãããç
- ã¯ã©ã¹ã¿ãŒæŠèŠ â ããŒãæ°ãPodã¹ããŒã¿ã¹ããªãœãŒã¹äœ¿çšç
- ã¹ãã¬ãŒãžå¥å šæ§ â S3䜿çšéãã»ã°ã¡ã³ãæ°ããã©ãã·ã¥ã¬ãŒã
- ãªãŒãã¹ã±ãŒãªã³ã°ã€ãã³ã â HPAã€ãã³ããããŒãã¹ã±ãŒãªã³ã°ãPodã¹ã±ãžã¥ãŒãªã³ã°ã¬ã€ãã³ã·ãŒ
- ã¢ã©ãŒã â é«ã¬ã€ãã³ã·ãŒãOOMãªã¹ã¯ããã©ãã·ã¥å€±æããã£ãã·ãã£å¶éã«å¯Ÿããèªåã¢ã©ãŒã
äž»èŠæ©èœ
- ã¯ãšãªããŒãHPA â CPUãã¡ã¢ãªãã¬ã€ãã³ã·ãŒããã¥ãŒæ·±åºŠã«åºã¥ãèªåã¹ã±ãŒãªã³ã°
- EC2 Cluster Autoscaler â æ··åã€ã³ã¹ã¿ã³ã¹ã¿ã€ãã«ããåçãªããŒãããããžã§ãã³ã°
- S3æ°žç¶æ§ â 11-ninesã®èä¹ æ§ããããã¯ã¹ãã¬ãŒãžããçŽ80%å®äŸ¡ãAZé害ã«èããŸã
- Spot Instances â å€§å¹ ãªã³ã³ãã¥ãŒãã£ã³ã°ã³ã¹ãåæžã®ãããã€ã³ããã¯ã¹ããŒããšããŒã¿ããŒãã¯ã¹ãããã€ã³ã¹ã¿ã³ã¹ã§çšŒåããŸã
- ããŒã«ã«SSDãã£ãã·ã¥ â NVMeãã£ãã·ã³ã°ã«ããããããã»ã°ã¡ã³ãã«å¯ŸããS3ã®ç¹°ãè¿ãèªã¿èŸŒã¿ãæé€ããŸã
- ãŒãããŠã³ã¿ã€ã ãªã«ã㪠â Podã®åèµ·åã¯S3ããã»ã°ã¡ã³ããåããŒãããããŒã¿æå€±ãçºçããŸãã
- Multi-AZ â å®å šãªAZéå®³èæ§ã®ããã®S3ã¹ãã¬ãŒãž + Multi-AZããŒãã°ã«ãŒã
- å¯èŠ³æž¬æ§ â Milvusåºæã®ã¡ããªã¯ã¹ãšãªãŒãã¹ã±ãŒãªã³ã°ã®å¯èŠæ§ãæäŸããPrometheus + Grafana
ææ
æè¡ã¹ã¿ãã¯
caseStudyDetail.more ã±ãŒã¹ã¹ã¿ãã£
ãã®ä»ã®æè¡å®è£ äºäŸãã芧ãã ãã
AIãæŽ»çšããOCRã«ããè«æ±æžåŠçãšQuickBooks飿º
æ¯ææ°çŸä»¶ã®ä»å ¥å è«æ±æžãåŠçããäžèŠæš¡äŒæ¥ããAI/OCRã䜿çšããŠè«æ±æžããŒã¿ãèªåæœåºãããããèšåž³ãšæ¯æè¿œè·¡ã®ããã«QuickBooksã«çŽæ¥åæãããããšã§ãæåããŒã¿å ¥åãæé€ããå¿ èŠããããŸããã
SCTE-35ããŒã«ãŒè§£æãšãã«ããã©ãããã©ãŒã ãã¬ã€ã€ãŒçµ±åã«ããã¯ã©ã€ã¢ã³ããµã€ãåºåæ¿å ¥ (CSAI)
ãããããªã¹ããªãŒãã³ã°ãã©ãããã©ãŒã ã¯ããŠã§ããã¢ãã€ã«ãã³ãã¯ãããTVã¢ããªå šäœã§ã¯ã©ã€ã¢ã³ããµã€ãåºåæ¿å ¥ (CSAI) ãå®è£ ããå¿ èŠããããŸãããããã«ããããµãŒããŒãµã€ãæ¿å ¥ã§ã¯æäŸã§ããªããå®å šãªåºåã€ã³ã¿ã©ã¯ã·ã§ã³ãµããŒãïŒã¯ãªãã¯å¯èœãªãªãŒããŒã¬ã€ãã³ã³ãããªã³ãããŒãã¹ããããã¿ã³ïŒãåãããããŒãœãã©ã€ãºãããããã€ã¹ã¬ãã«ã®åºåäœéšãå¯èœã«ãªããŸãã
ãããã質å
MicrocosmWorks configured horizontal pod autoscaling with custom metrics from Milvus's built-in memory usage exporter, triggering scale-out events when any query node exceeds 75% memory utilization. Collection segments are automatically redistributed across new nodes using Milvus's segment manager, preventing any single node from becoming a bottleneck.
MicrocosmWorks selected S3-backed storage using MinIO as the object storage layer because it decouples storage from compute, allowing query nodes to scale independently without provisioning new EBS volumes. This architecture reduces storage costs by approximately 60% compared to gp3 EBS volumes while maintaining sub-100ms segment load times from S3.
MicrocosmWorks configured the deployment with replica sets for each Milvus component, including query nodes, index nodes, and data nodes, with pod disruption budgets ensuring minimum availability during rolling updates. Since all persistent data resides in S3, a failed node's replacement can immediately access all segments without data migration.
MicrocosmWorks found that r6i.2xlarge instances provide the optimal cost-to-performance ratio for Milvus query workloads, offering 64GB of memory for in-memory segment caching at a competitive spot price. For GPU-accelerated index building, g5.xlarge instances with NVIDIA A10G GPUs reduced index build times by 8x compared to CPU-only builds.
MicrocosmWorks delivers Kubernetes infrastructure projects at rates of $30-$50/hr, with a Milvus autoscaling deployment including Helm chart customization, HPA configuration, S3 integration, and monitoring setup typically requiring 150-250 hours. Ongoing managed support for cluster optimization and upgrades is available at the same hourly rates.
ããžãã¹ã®å€é©ã®æºåã¯ã§ããŠããŸããïŒ
ã客æ§ã®èª²é¡ã«é¡äŒŒã®ãœãªã¥ãŒã·ã§ã³ãé©çšããæ¹æ³ã«ã€ããŠè©±ãåããŸãããã