MicrocosmWorks๋””์ง€ํ„ธ ์ฝ”์Šค๋ชจ์Šค ํ˜์‹  ๋ฐ ์„ค๊ณ„
์†Œ๊ฐœ์—ฐ๋ฝ์ฒ˜
MicrocosmWorks๋””์ง€ํ„ธ ์ฝ”์Šค๋ชจ์Šค๋ฅผ ํ˜์‹ ํ•˜๊ณ  ์„ค๊ณ„ํ•ฉ๋‹ˆ๋‹ค

์ค‘์š”ํ•œ IT ์†”๋ฃจ์…˜์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ์ˆ , ๋ณด์•ˆ์— ์—ด์ •์ ์ด๋ฉฐ ์‹ ๋ขฐํ•  ์ˆ˜ ์žˆ๋Š” ํ˜์‹ ์ ์ธ IT ์ธํ”„๋ผ๋ฅผ ํ†ตํ•ด ๋น„์ฆˆ๋‹ˆ์Šค ์„ฑ์žฅ์„ ๋•์Šต๋‹ˆ๋‹ค.

[email protected]
+91 7011868196
New Delhi, India

AI ์„ฑ์žฅ ํ—ˆ๋ธŒ

AI ํ—ˆ๋ธŒ์Šคํƒ€ํŠธ์—… ํ˜์‹ ๊ธฐ์—… ๊ฐ€์†๊ธฐ

์†”๋ฃจ์…˜

๋ชจ๋“  ์†”๋ฃจ์…˜์›ฐ๋‹ˆ์Šค ๋ฐ ํ”ผํŠธ๋‹ˆ์Šค ์•ฑAI ๋น„๋””์˜ค ํ”Œ๋žซํผAI ์—์ด์ „ํŠธ ๊ฐœ๋ฐœ

์ž์›

ํ†ต์ฐฐ๋ ฅ์‚ฐ์—… ๊ฐ€์ด๋“œ์‚ฌ์šฉ ์‚ฌ๋ก€ ์ฒญ์‚ฌ์ง„์•„ํ‚คํ…์ฒ˜ ํŒจํ„ด์‚ฌ๋ก€ ์—ฐ๊ตฌ

ํšŒ์‚ฌ

ํšŒ์‚ฌ ์†Œ๊ฐœ์—ฐ๋ฝ์ฒ˜์šฐ๋ฆฌ์˜ ์ž‘์—…

์„œ๋น„์Šค

๋””์ง€ํ„ธ ์ปจ์„คํŒ…ํด๋ผ์šฐ๋“œ ์ธํ”„๋ผSaaS ๊ฐœ๋ฐœAI ๊ฐœ๋ฐœ๋น„๋””์˜ค ๊ธฐ์ˆ 
ERP ๊ฐœ๋ฐœZoho ๋งž์ถคํ™”Odoo ๊ฐœ๋ฐœSalesforce ํ†ตํ•ฉ๋งž์ถคํ˜• CRM ๊ฐœ๋ฐœ
QuickBooks ํ†ตํ•ฉIoT ์†”๋ฃจ์…˜๋ธ”๋ก์ฒด์ธ ๊ฐœ๋ฐœ
์‚ฌ์ด๋ฒ„ ๋ณด์•ˆ ์ปจ์„คํŒ…IT ์ง€์› - L3

ยฉ 2026 MicrocosmWorks. ๋ชจ๋“  ๊ถŒ๋ฆฌ ๋ณด์œ .

๊ฐœ์ธ์ •๋ณด ์ฒ˜๋ฆฌ๋ฐฉ์นจ์„œ๋น„์Šค ์•ฝ๊ด€
์•„ํ‚คํ…์ฒ˜ ํŒจํ„ด์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ
AI / DataEnterprise

ํ™•์žฅ ๊ฐ€๋Šฅํ•œ ๋ฒกํ„ฐ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ์•„ํ‚คํ…์ฒ˜

1๋งŒ ๊ฐœ์˜ ๋ฒกํ„ฐ์—์„œ๋Š” ์ž„๋ฒ ๋”ฉ ๊ฒ€์ƒ‰์ด ์‰ฝ์Šต๋‹ˆ๋‹ค. P99 ์ง€์—ฐ ์‹œ๊ฐ„์ด 100ms ๋ฏธ๋งŒ์ธ 1์–ต ๊ฐœ์˜ ๋ฒกํ„ฐ์—์„œ๋Š” ์ธํ”„๋ผ ๋ฌธ์ œ๊ฐ€ ๋˜๋ฉฐ, ์ด ํŒจํ„ด์ด ๋ฐ”๋กœ ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•ฉ๋‹ˆ๋‹ค.

June 22, 2026
|
2 topics covered
์ด ์•„ํ‚คํ…์ฒ˜์— ๋Œ€ํ•ด ๋…ผ์˜ํ•˜์„ธ์š”
scalable-vector-database-architecture.webp
AI / Data
Category
Enterprise
Complexity
AI/ML, E-Commerce
Industries
2+
Technologies

์ด๊ฒƒ์ด ํ•„์š”ํ•  ๋•Œ

๋ช‡ ์ฒœ ๊ฐœ์˜ ๋ฒกํ„ฐ๋กœ๋Š” ๊ฐœ๋ฐœ ๋‹จ๊ณ„์—์„œ RAG pipeline ๋˜๋Š” ์ถ”์ฒœ ์‹œ์Šคํ…œ์ด ํ›Œ๋ฅญํ•˜๊ฒŒ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค. ์ด์ œ 5์ฒœ๋งŒ ๊ฐœ์˜ ์ž„๋ฒ ๋”ฉ์„ ๊ฐ€์ง€๊ณ  ์žˆ๊ณ , ์ฟผ๋ฆฌ๋Š” 100ms ๋ฏธ๋งŒ์˜ latency๋ฅผ ํ•„์š”๋กœ ํ•˜๋ฉฐ, ์ธ๋ฑ์Šค๋Š” ๊ณ„์† ์ฆ๊ฐ€ํ•˜๊ณ , ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ๋งŽ์ด ์†Œ๋ชจํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์ˆ˜ํ‰์ ์œผ๋กœ ํ™•์žฅํ•˜๊ณ , ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ํšจ์œจ์ ์œผ๋กœ ๊ด€๋ฆฌํ•˜๋ฉฐ(๋ชจ๋“  ๊ฒƒ์ด RAM์— ์ƒ์ฃผํ•  ํ•„์š”๋Š” ์—†์Šต๋‹ˆ๋‹ค), ์ฟผ๋ฆฌ ์„ฑ๋Šฅ์„ ์ €ํ•˜์‹œํ‚ค์ง€ ์•Š์œผ๋ฉด์„œ ๋™์‹œ ์“ฐ๊ธฐ๋ฅผ ์ฒ˜๋ฆฌํ•˜๊ณ , ๊ธฐ๋ณธ์ ์œผ๋กœ ๊ฒ€์ƒ‰ ์ธ๋ฑ์Šค์ธ ๊ฒƒ์— ์›” 1๋งŒ ๋‹ฌ๋Ÿฌ์˜ ์ธํ”„๋ผ ๋น„์šฉ์ด ๋“ค์ง€ ์•Š๋Š” ๋ฒกํ„ฐ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ์•„ํ‚คํ…์ฒ˜๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

ํŒจํ„ด ๊ฐœ์š”

Related Architecture Patterns

Explore more design patterns and system architectures

ai-ml-pipeline-architecture.webp
AI / Data

AI/ML ํŒŒ์ดํ”„๋ผ์ธ ์•„ํ‚คํ…์ฒ˜

๋ชจ๋ธ์€ ์Šค์Šค๋กœ ์ž‘๋™ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๋ชจ๋ธ์„ ํ›ˆ๋ จํ•˜๊ณ , ๊ฒ€์ฆํ•˜๊ณ , ๋ฐฐํฌํ•˜๊ณ , ๋ชจ๋‹ˆํ„ฐ๋งํ•˜๋Š” ํŒŒ์ดํ”„๋ผ์ธ์ด ์‹ค์ œ ์ œํ’ˆ์ž…๋‹ˆ๋‹ค. ๋ชจ๋ธ์€ ๋‹จ์ง€ ํ•˜๋‚˜์˜ ์•„ํ‹ฐํŒฉํŠธ์ผ ๋ฟ์ž…๋‹ˆ๋‹ค.

EnterpriseView
rag-pipeline-architecture.webp

์ž์ฃผ ๋ฌป๋Š” ์งˆ๋ฌธ

MicrocosmWorks๋Š” ํŒ€์ด ์ด๋ฏธ PostgreSQL์„ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ๋Š” ๊ฒฝ์šฐ, ์ƒˆ๋กœ์šด ์ธํ”„๋ผ ๊ตฌ์„ฑ ์š”์†Œ๋ฅผ ๋„์ž…ํ•  ํ•„์š”๊ฐ€ ์—†์œผ๋ฉฐ ํ•˜์ด๋ธŒ๋ฆฌ๋“œ SQL-plus-vector ์ฟผ๋ฆฌ๋ฅผ ๊ธฐ๋ณธ์ ์œผ๋กœ ์ง€์›ํ•˜๋ฏ€๋กœ 500๋งŒ~1,000๋งŒ ๊ฐœ ๋ฏธ๋งŒ์˜ ๋ฒกํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ํ”„๋กœ์ ํŠธ์— pgvector๋ฅผ ์ผ๋ฐ˜์ ์œผ๋กœ ๊ถŒ์žฅํ•ฉ๋‹ˆ๋‹ค. 1,000๋งŒ ๊ฐœ ์ด์ƒ์˜ ๋ฒกํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•˜๊ฑฐ๋‚˜ ๋†’์€ ๋™์‹œ์„ฑ์—์„œ 50ms ๋ฏธ๋งŒ์˜ p99 ์ง€์—ฐ ์‹œ๊ฐ„์ด ํ•„์š”ํ•  ๊ฒฝ์šฐ, Qdrant, Weaviate ๋˜๋Š” Milvus์™€ ๊ฐ™์€ ์ „์šฉ ๋ฒกํ„ฐ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค๋Š” ์ตœ์ ํ™”๋œ ์ธ๋ฑ์‹ฑ ์•Œ๊ณ ๋ฆฌ์ฆ˜๊ณผ GPU ๊ฐ€์† ๊ฒ€์ƒ‰์„ ํ†ตํ•ด ํ›จ์”ฌ ๋” ๋‚˜์€ ์„ฑ๋Šฅ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ์ €ํฌ๋Š” ์•„ํ‚คํ…์ฒ˜ ๊ฒ€ํ†  ์ค‘์— ๊ณ ๊ฐ์˜ ์‹ค์ œ ์ฟผ๋ฆฌ ํŒจํ„ด๊ณผ ์„ฑ์žฅ ์˜ˆ์ธก์„ ๋ฒค์น˜๋งˆํ‚นํ•˜์—ฌ ์ด๋Ÿฌํ•œ ๊ฒฐ์ •์„ ๋‚ด๋ฆด ์ˆ˜ ์žˆ๋„๋ก ๋•์Šต๋‹ˆ๋‹ค.

MicrocosmWorks๋Š” ํšจ์œจ์ ์ธ ๊ฒ€์ƒ‰์„ ์œ„ํ•ด ์˜๋ฏธ๋ก ์ ์œผ๋กœ ๊ด€๋ จ๋œ ๋ฐ์ดํ„ฐ๋ฅผ co-located ์ƒํƒœ๋กœ ์œ ์ง€ํ•˜๋ฉด์„œ, vector๋ฅผ node๋“ค์— ๋ถ„์‚ฐ์‹œํ‚ค๋Š” hash-based ๋˜๋Š” metadata-based sharding ์ „๋žต์œผ๋กœ vector database clusters๋ฅผ ์„ค๊ณ„ํ•ฉ๋‹ˆ๋‹ค. ์šฐ๋ฆฌ๋Š” ๊ด€๋ จ shard๋กœ ๊ฒ€์ƒ‰ ์š”์ฒญ์„ ๋ถ„์‚ฐ์‹œํ‚ค๊ณ  ์ „์—ญ top-K aggregation์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ฒฐ๊ณผ๋ฅผ ๋ณ‘ํ•ฉํ•˜๋ฉฐ, ์ˆ˜์‹ญ ๊ฐœ์˜ shard์— ๊ฑธ์ณ์„œ๋„ 100ms ๋ฏธ๋งŒ์˜ latency๋ฅผ ์œ ์ง€ํ•˜๋Š” query routing layers๋ฅผ ๊ตฌํ˜„ํ•ฉ๋‹ˆ๋‹ค. ์šฐ๋ฆฌ์˜ monitoring dashboards๋Š” dataset์ด ํ™•์žฅ๋จ์— ๋”ฐ๋ผ hotspot์ด ๋ฐœ์ƒํ•˜๋Š” ๊ฒƒ์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด shard balance, query distribution, ๊ทธ๋ฆฌ๊ณ  replication lag์„ ์ถ”์ ํ•ฉ๋‹ˆ๋‹ค.

MicrocosmWorks๋Š” scalar quantization (float32๋ฅผ int8๋กœ ์ค„์ž„) ๋ฐ product quantization์„ ์ ์šฉํ•˜์—ฌ vector storage๋ฅผ 4-8๋ฐฐ ์••์ถ•ํ•˜๋ฉฐ, ์ผ๋ฐ˜์ ์œผ๋กœ recall์—์„œ 2% ๋ฏธ๋งŒ์˜ ์ €ํ•˜๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” production์— ๋ฐฐํฌํ•˜๊ธฐ ์ „์— ๊ณ ๊ฐ์˜ ์‹ค์ œ query workload์— ๋Œ€ํ•œ A/B testing์„ ํ†ตํ•ด ๊ฒ€์ฆ๋ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ, quantized vectors๊ฐ€ ์ดˆ๊ธฐ candidate retrieval์— ์‚ฌ์šฉ๋˜๊ณ  full-precision vectors๋Š” ์ƒ์œ„ ๊ฒฐ๊ณผ์˜ ์ตœ์ข… re-ranking์—๋งŒ ์‚ฌ์šฉ๋˜๋Š” 2๋‹จ๊ณ„ retrieval approach๋ฅผ ๊ตฌํ˜„ํ•ฉ๋‹ˆ๋‹ค. ์ด ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ์ „๋žต์„ ํ†ตํ•ด ๊ณ ๊ฐ์€ ์ˆ˜์–ต ๊ฐœ์˜ vectors๋ฅผ ๊ทนํžˆ ์ผ๋ถ€์˜ ๋น„์šฉ์œผ๋กœ ์ €์žฅํ•˜๋ฉด์„œ๋„, ๋น„์••์ถ• ์ž‘์—…๊ณผ ๊ตฌ๋ณ„ํ•  ์ˆ˜ ์—†๋Š” ๊ฒ€์ƒ‰ ํ’ˆ์งˆ์„ ์œ ์ง€ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

MicrocosmWorks๋Š” ์“ฐ๊ธฐ ๋‚ด๊ตฌ์„ฑ์„ ์œ„ํ•ด ๋™๊ธฐ์‹ ๋ณต์ œ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋‹ค์ค‘-๋ณต์ œ๋ณธ ๊ตฌ์„ฑ์œผ๋กœ ๋ฒกํ„ฐ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค๋ฅผ ๋ฐฐํฌํ•˜๊ณ , ๋‚ด๊ฒฐํ•จ์„ฑ ๋ฐ ๋กœ๋“œ ๋ฐธ๋Ÿฐ์‹ฑ์„ ์œ„ํ•ด ๊ฐ€์šฉ ์˜์—ญ์— ๊ฑธ์ณ ๋ถ„์‚ฐ๋œ ์ฝ๊ธฐ ๋ณต์ œ๋ณธ์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ๋‹น์‚ฌ๋Š” ๋…ธ๋“œ ์žฅ์• ๊ฐ€ ๋ฐœ์ƒํ•˜๋”๋ผ๋„ 10์ดˆ ๋ฏธ๋งŒ์˜ ์ฝ๊ธฐ ๋ถˆ๊ฐ€์šฉ์„ฑ๋งŒ ๋ฐœ์ƒํ•˜๊ณ  ๋ฐ์ดํ„ฐ ์†์‹ค์€ ์ „ํ˜€ ์—†๋„๋ก ์ƒํƒœ ํ™•์ธ ๊ธฐ๋ฐ˜์˜ ๋ฆฌ๋” ์„ ์ถœ์„ ํ†ตํ•œ ์ž๋™ํ™”๋œ ํŽ˜์ผ์˜ค๋ฒ„๋ฅผ ๊ตฌ์„ฑํ•ฉ๋‹ˆ๋‹ค. ๋‹น์‚ฌ์˜ ์ฝ”๋“œํ˜• ์ธํ”„๋ผ ํ…œํ”Œ๋ฆฟ์—๋Š” ๊ฐ ๋ฒกํ„ฐ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ์—”์ง„์— ๋งž์ถฐ์ง„ ์‚ฌ์ „ ๊ตฌ์„ฑ๋œ ๋ฐฑ์—… ์ผ์ •, ํŠน์ • ์‹œ์  ๋ณต๊ตฌ, ๊ทธ๋ฆฌ๊ณ  ์žฌํ•ด ๋ณต๊ตฌ ๋Ÿฐ๋ถ์ด ํฌํ•จ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

MicrocosmWorks๋Š” ๊ฐ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ๋˜๋Š” ์ž„๋ฒ ๋”ฉ ๋ชจ๋ธ์ด ์ ์ ˆํ•œ ์ธ๋ฑ์Šค ๊ตฌ์„ฑ๊ณผ ํ•จ๊ป˜ ์ž์ฒด์ ์ธ ๊ฒฉ๋ฆฌ๋œ ์ปฌ๋ ‰์…˜์„ ๊ฐ€์ง€๋„๋ก ํ•˜๋Š” ๋‹ค์ค‘ ์ปฌ๋ ‰์…˜ ๋ฒกํ„ฐ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ๋ฐฐํฌ๋ฅผ ์„ค๊ณ„ํ•˜๋ฉฐ, ๋™์‹œ์— ๋น„์šฉ ํšจ์œจ์„ฑ์„ ์œ„ํ•ด ๊ธฐ๋ณธ ํด๋Ÿฌ์Šคํ„ฐ ์ธํ”„๋ผ๋ฅผ ๊ณต์œ ํ•ฉ๋‹ˆ๋‹ค. ์ €ํฌ๋Š” ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ์ปจํ…์ŠคํŠธ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์š”์ฒญ์„ ์˜ฌ๋ฐ”๋ฅธ ์ปฌ๋ ‰์…˜์œผ๋กœ ๋ผ์šฐํŒ…ํ•˜๊ณ , ์ผ์น˜ํ•˜๋Š” ๋ชจ๋ธ๋กœ ์ฟผ๋ฆฌ ์ž„๋ฒ ๋”ฉ๊ณผ ๊ฐ™์€ ์ปฌ๋ ‰์…˜๋ณ„ ์‚ฌ์ „ ์ฒ˜๋ฆฌ๋ฅผ ์ ์šฉํ•˜๋Š” ํ†ตํ•ฉ ์ฟผ๋ฆฌ ๊ฒŒ์ดํŠธ์›จ์ด๋ฅผ ๊ตฌํ˜„ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฉ€ํ‹ฐํ…Œ๋„ŒํŠธ ๋ฒกํ„ฐ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ์ ‘๊ทผ ๋ฐฉ์‹์€ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜๋ณ„๋กœ ๊ฐœ๋ณ„ ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ์šด์˜ํ•˜๋Š” ๊ฒƒ์— ๋น„ํ•ด ์ธํ”„๋ผ ๋น„์šฉ์„ ์ผ๋ฐ˜์ ์œผ๋กœ 40-60% ์ ˆ๊ฐํ•ฉ๋‹ˆ๋‹ค.

์ด ์•„ํ‚คํ…์ฒ˜ ๊ตฌํ˜„์— ๋„์›€์ด ํ•„์š”ํ•˜์‹ ๊ฐ€์š”?

์šฐ๋ฆฌ์˜ ์•„ํ‚คํ…ํŠธ๋“ค์€ ํŠน์ • ์š”๊ตฌ ์‚ฌํ•ญ์— ๋งž๊ฒŒ ์ด ํŒจํ„ด์„ ์‚ฌ์šฉํ•˜์—ฌ ์‹œ์Šคํ…œ์„ ์„ค๊ณ„ํ•˜๊ณ  ๊ตฌ์ถ•ํ•˜๋Š” ๋ฐ ๋„์›€์„ ๋“œ๋ฆด ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์—ฐ๋ฝํ•˜๊ธฐ

ํ™•์žฅ ๊ฐ€๋Šฅํ•œ ๋ฒกํ„ฐ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ์•„ํ‚คํ…์ฒ˜๋Š” ํ”„๋กœ๋•์…˜ ๊ทœ๋ชจ์—์„œ ๋ฒกํ„ฐ ๊ฒ€์ƒ‰์„ ์šด์˜ํ•˜๋Š” ๋ฐ ๋”ฐ๋ฅด๋Š” ๊ณผ์ œ๋ฅผ ํ•ด๊ฒฐํ•ฉ๋‹ˆ๋‹ค: ๋…ธ๋“œ ๊ฐ„ ์ธ๋ฑ์Šค ํŒŒํ‹ฐ์…”๋‹(sharding), ๊ณ„์ธตํ˜• ์Šคํ† ๋ฆฌ์ง€(๋ฉ”๋ชจ๋ฆฌ์˜ ํ•ซ ์„ธ๊ทธ๋จผํŠธ, SSD์˜ ์›œ ์„ธ๊ทธ๋จผํŠธ, S3์˜ ์ฝœ๋“œ ์„ธ๊ทธ๋จผํŠธ), ๋กœ๋“œ ๋ฐธ๋Ÿฐ์‹ฑ์„ ํ†ตํ•œ ์ฟผ๋ฆฌ ๋ผ์šฐํŒ…, ์ฟผ๋ฆฌ ๋กœ๋“œ ๋ฐ ์ธ๋ฑ์Šค ํฌ๊ธฐ์— ๊ธฐ๋ฐ˜ํ•œ autoscaling. ์ด ํŒจํ„ด์€ ๋ฐฐํฌ ํ† ํด๋กœ์ง€, ์šฉ๋Ÿ‰ ๊ณ„ํš, ์“ฐ๊ธฐ/์ฝ๊ธฐ ๊ฒฉ๋ฆฌ, ๋น„์šฉ ์ตœ์ ํ™”๋ฅผ ๋‹ค๋ฃน๋‹ˆ๋‹ค. ์ด๋Š” ๋Œ€๊ทœ๋ชจ RAG ๋ฐ ์ถ”์ฒœ ์‹œ์Šคํ…œ์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜๋Š” ์ธํ”„๋ผ ๊ณ„์ธต์ž…๋‹ˆ๋‹ค.

์ฐธ์กฐ ์•„ํ‚คํ…์ฒ˜

์ด ์•„ํ‚คํ…์ฒ˜๋Š” ์ฟผ๋ฆฌ ๋…ธ๋“œ(read path)์™€ ๋ฐ์ดํ„ฐ ๋…ธ๋“œ(write path)๋ฅผ ๋ถ„๋ฆฌํ•˜์—ฌ ํด๋Ÿฌ์Šคํ„ฐํ˜• ํ† ํด๋กœ์ง€๋กœ ๋ฒกํ„ฐ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ๋…ธ๋“œ๋ฅผ ๋ฐฐํฌํ•ฉ๋‹ˆ๋‹ค. ์ˆ˜์ง‘ ํŒŒ์ดํ”„๋ผ์ธ์€ ์ฟผ๋ฆฌ latency์— ์˜ํ–ฅ์„ ๋ฏธ์น˜์ง€ ์•Š๋„๋ก ์“ฐ๊ธฐ ๋ฒ„ํผ๋ง์„ ํ†ตํ•ด ์ž„๋ฒ ๋”ฉ ์ƒ์„ฑ ๋ฐ ๋ฐฐ์น˜ upsert๋ฅผ ์ฒ˜๋ฆฌํ•ฉ๋‹ˆ๋‹ค. ์ฟผ๋ฆฌ ๋ผ์šฐํ„ฐ๋Š” shard ์ˆ˜์ค€ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ๋กœ ์ฝ๊ธฐ ๋ณต์ œ๋ณธ์— ๊ฒ€์ƒ‰์„ ๋ถ„์‚ฐํ•ฉ๋‹ˆ๋‹ค. ๊ณ„์ธตํ˜• ์Šคํ† ๋ฆฌ์ง€๋Š” ์ž์ฃผ ์•ก์„ธ์Šค๋˜์ง€ ์•Š๋Š” ์„ธ๊ทธ๋จผํŠธ๋ฅผ ๋ฉ”๋ชจ๋ฆฌ์—์„œ SSD๋กœ, ๊ทธ๋ฆฌ๊ณ  S3๋กœ ์ด๋™์‹œํ‚ค๋ฉฐ, ํˆฌ๋ช…ํ•œ ์ฟผ๋ฆฌ ์‹œ๊ฐ„ ๋กœ๋”ฉ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. Autoscaling์€ ์ฟผ๋ฆฌ QPS ๋ฐ P99 latency๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๋ณต์ œ๋ณธ ์ˆ˜๋ฅผ ์กฐ์ •ํ•ฉ๋‹ˆ๋‹ค.

ํ•ต์‹ฌ ๊ตฌ์„ฑ ์š”์†Œ
  • ํด๋Ÿฌ์Šคํ„ฐ ๊ด€๋ฆฌ: ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ์กฐ์ •์„ ์œ„ํ•œ etcd, ์„ธ๊ทธ๋จผํŠธ ์ €์žฅ์„ ์œ„ํ•œ MinIO/S3, write-ahead logging์„ ์œ„ํ•œ Pulsar/Kafka์™€ ํ•จ๊ป˜ Milvus(๋Œ€๊ทœ๋ชจ ํ™•์žฅ์„ ์œ„ํ•œ ๊ธฐ๋ณธ๊ฐ’). ๋˜๋Š” ์šด์˜ ๋‹จ์ˆœ์„ฑ์ด ๋น„์šฉ๋ณด๋‹ค ์ค‘์š”ํ•  ๊ฒฝ์šฐ ๊ด€๋ฆฌํ˜• ์„œ๋น„์Šค(Pinecone, Zilliz Cloud) ์‚ฌ์šฉ
  • Shard & ํŒŒํ‹ฐ์…˜ ์ „๋žต: ๋ฐ์ดํ„ฐ ๊ฒฝ๊ณ„(ํ…Œ๋„ŒํŠธ๋ณ„, ๋ฌธ์„œ ์ปฌ๋ ‰์…˜๋ณ„, ์‹œ๊ฐ„ ์ฐฝ๋ณ„)์— ๋งž์ถฐ ์ •๋ ฌ๋œ ๋…ผ๋ฆฌ์  ํŒŒํ‹ฐ์…˜. ๊ฐ ํŒŒํ‹ฐ์…˜์€ ๋…๋ฆฝ์ ์œผ๋กœ ๊ฒ€์ƒ‰ ๊ฐ€๋Šฅํ•˜๋ฉฐ, ์ „์ฒด ์ธ๋ฑ์Šค๋ฅผ ์Šค์บ”ํ•˜์ง€ ์•Š๊ณ ๋„ ํ•„ํ„ฐ๋ง๋œ ์ฟผ๋ฆฌ๋ฅผ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค. ๋ณ‘๋ ฌ ์ฟผ๋ฆฌ ์‹คํ–‰์„ ์œ„ํ•ด ๋…ธ๋“œ์— ๋ถ„์‚ฐ๋œ Shard
  • ๊ณ„์ธตํ˜• ์Šคํ† ๋ฆฌ์ง€ ์—”์ง„: ์ž์ฃผ ์ฟผ๋ฆฌ๋˜๋Š” ์ปฌ๋ ‰์…˜์„ ์œ„ํ•œ ํ•ซ ํ‹ฐ์–ด(์ธ๋ฉ”๋ชจ๋ฆฌ HNSW/IVF ์ธ๋ฑ์Šค). ์ค‘๊ฐ„ ์ •๋„์˜ ์ฟผ๋ฆฌ ๋กœ๋“œ๊ฐ€ ์žˆ๋Š” ๋Œ€๊ทœ๋ชจ ์ปฌ๋ ‰์…˜์„ ์œ„ํ•œ ์›œ ํ‹ฐ์–ด(๋ฉ”๋ชจ๋ฆฌ ๋งตํ•‘๋œ SSD). ๊ฒ€์ƒ‰ ๊ฐ€๋Šฅํ•˜์ง€๋งŒ ๋” ๋†’์€ latency๋ฅผ ํ—ˆ์šฉํ•˜๋Š” ์•„์นด์ด๋ธŒ ์ปฌ๋ ‰์…˜์„ ์œ„ํ•œ ์ฝœ๋“œ ํ‹ฐ์–ด(S3 ๊ธฐ๋ฐ˜). ์•ก์„ธ์Šค ํŒจํ„ด์— ๋”ฐ๋ฅธ ์„ธ๊ทธ๋จผํŠธ ์ˆ˜์ค€ ์Šน๊ฒฉ/๊ฐ•๋“ฑ
  • Autoscaling ์ปจํŠธ๋กค๋Ÿฌ: ์ฟผ๋ฆฌ ๋…ธ๋“œ๋ฅผ QPS ๋ฐ P99 latency ๋ฉ”ํŠธ๋ฆญ์— ๋”ฐ๋ผ ํ™•์žฅํ•˜๋Š” Kubernetes์˜ HPA(Horizontal Pod Autoscaler). latency ์ดˆ๊ณผ ์‹œ ์Šค์ผ€์ผ์—…, ์ง€์†์ ์ธ ๋‚ฎ์€ ํ™œ์šฉ ์‹œ ์Šค์ผ€์ผ๋‹ค์šด. ์ฟผ๋ฆฌ ์„ฑ๋Šฅ์— ์˜ํ–ฅ์„ ์ฃผ์ง€ ์•Š๊ณ  ๋ฒ„์ŠคํŠธ ์—…๋กœ๋“œ๋ฅผ ์ฒ˜๋ฆฌํ•˜๊ธฐ ์œ„ํ•œ ์ˆ˜์ง‘ ์›Œ์ปค์˜ ๊ฐœ๋ณ„ ์Šค์ผ€์ผ๋ง

์„ค๊ณ„ ๊ฒฐ์ • ๋ฐ ์ ˆ์ถฉ์ 

Milvus vs. Pinecone vs. Qdrant vs. pgvector
pgvector๋Š” ์ด๋ฏธ PostgreSQL์„ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ๊ณ  ์•ฝ 200ms์˜ latency๋ฅผ ํ—ˆ์šฉํ•  ์ˆ˜ ์žˆ๋Š” 1๋ฐฑ๋งŒ ๊ฐœ ๋ฏธ๋งŒ์˜ ๋ฒกํ„ฐ์— ์ ํ•ฉํ•ฉ๋‹ˆ๋‹ค. Pinecone์€ ์šด์˜ ๋ถ€๋‹ด์ด ์—†๊ณ  ๊ฐ€๊ฒฉ์„ ์ˆ˜์šฉํ•  ์ˆ˜ ์žˆ๋Š” ํŒ€์„ ์œ„ํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค(์ž˜ ํ™•์žฅ๋˜์ง€๋งŒ 1์ฒœ๋งŒ ๊ฐœ ์ด์ƒ์˜ ๋ฒกํ„ฐ์—์„œ๋Š” ๋น„์‹ธ์ง‘๋‹ˆ๋‹ค). Qdrant๋Š” ๊น”๋”ํ•œ API์™€ ์šฐ์ˆ˜ํ•œ ๋‹จ์ผ ๋…ธ๋“œ ์„ฑ๋Šฅ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. Milvus๋Š” ์ง„์ •ํ•œ ๋ถ„์‚ฐ ์•„ํ‚คํ…์ฒ˜, ๊ณ„์ธตํ˜• ์Šคํ† ๋ฆฌ์ง€, ํ”„๋กœ๋•์…˜ ๋“ฑ๊ธ‰ sharding์„ ๊ฐ–์ถ˜ ์œ ์ผํ•œ open-source ์˜ต์…˜์œผ๋กœ, ๋Œ€๊ทœ๋ชจ ํ™•์žฅ์— ์ ํ•ฉํ•ฉ๋‹ˆ๋‹ค. MW๋Š” 5๋ฐฑ๋งŒ ๊ฐœ ์ด์ƒ์˜ ๋ฒกํ„ฐ์— ๋Œ€ํ•ด Milvus๋ฅผ ๊ธฐ๋ณธ์œผ๋กœ ์‚ฌ์šฉํ•˜๊ณ , ๊ด€๋ฆฌ์˜ ๋‹จ์ˆœ์„ฑ์„ ์šฐ์„ ์‹œํ•˜๋Š” ํŒ€์—๋Š” Pinecone์„ ๊ธฐ๋ณธ์œผ๋กœ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
HNSW vs. IVF_FLAT vs. IVF_PQ
HNSW(Hierarchical Navigable Small World)๋Š” ๋‚ฎ์€ latency์—์„œ ์ตœ๊ณ ์˜ recall์„ ์ œ๊ณตํ•˜์ง€๋งŒ ๊ฐ€์žฅ ๋งŽ์€ ๋ฉ”๋ชจ๋ฆฌ(RAM์— ์ „์ฒด ๋ฒกํ„ฐ)๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. IVF_FLAT์€ ๋ฒกํ„ฐ๋ฅผ ํด๋Ÿฌ์Šคํ„ฐ๋งํ•˜๊ณ  ๊ด€๋ จ ํด๋Ÿฌ์Šคํ„ฐ๋งŒ ๊ฒ€์ƒ‰ํ•ฉ๋‹ˆ๋‹ค โ€” ์†๋„์™€ ๋ฉ”๋ชจ๋ฆฌ์˜ ์ข‹์€ ๊ท ํ˜•์„ ์ด๋ฃน๋‹ˆ๋‹ค. IVF_PQ(Product Quantization)๋Š” ๋ฒกํ„ฐ๋ฅผ ์••์ถ•ํ•˜์—ฌ ์—„์ฒญ๋‚œ ๋ฉ”๋ชจ๋ฆฌ ์ ˆ์•ฝ์„ ์ œ๊ณตํ•˜์ง€๋งŒ recall์„ 3-8% ๊ฐ์†Œ์‹œํ‚ต๋‹ˆ๋‹ค. MW๋Š” 1์ฒœ๋งŒ ๊ฐœ ๋ฏธ๋งŒ์˜ ๋ฒกํ„ฐ ์ปฌ๋ ‰์…˜์— HNSW๋ฅผ ์‚ฌ์šฉํ•˜๊ณ , ๋ฉ”๋ชจ๋ฆฌ ๋น„์šฉ์ด ์ค‘์š”ํ•œ ๋” ํฐ ์ปฌ๋ ‰์…˜์—๋Š” PQ refinement(์ „์ฒด ๋ฒกํ„ฐ์— ๋Œ€ํ•ด ์ƒ์œ„ ํ›„๋ณด๋ฅผ ์žฌํ‰๊ฐ€)๊ฐ€ ํฌํ•จ๋œ IVF_PQ๋กœ ์ „ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
์“ฐ๊ธฐ ๊ฒฉ๋ฆฌ
๋Œ€๋ถ€๋ถ„์˜ ๋ฒกํ„ฐ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์—์„œ ์ˆ˜์ง‘ ์ค‘ ๋™์‹œ ์“ฐ๊ธฐ๋Š” ์ฟผ๋ฆฌ latency๋ฅผ ์ €ํ•˜์‹œํ‚ต๋‹ˆ๋‹ค. MW๋Š” ์“ฐ๊ธฐ ๊ฒฝ๋กœ๋ฅผ ๋ถ„๋ฆฌํ•ฉ๋‹ˆ๋‹ค: ์ƒˆ ๋ฒกํ„ฐ๋Š” write-ahead log์— ๋ฒ„ํผ๋ง๋˜๊ณ , ์ฃผ๊ธฐ์ ์œผ๋กœ ๋ด‰์ธ๋œ ์„ธ๊ทธ๋จผํŠธ๋กœ ํ”Œ๋Ÿฌ์‹œ๋˜๋ฉฐ, ํŠธ๋ž˜ํ”ฝ์ด ์ ์€ ์‹œ๊ฐ„ ๋™์•ˆ ๊ฒ€์ƒ‰ ๊ฐ€๋Šฅํ•œ ์ธ๋ฑ์Šค๋กœ ๋ณ‘ํ•ฉ๋ฉ๋‹ˆ๋‹ค. ์‹ค์‹œ๊ฐ„ ์ˆ˜์ง‘(์˜ˆ: ๋ผ์ด๋ธŒ ๋ฌธ์„œ ์ฒ˜๋ฆฌ)์ด ํ•„์š”ํ•œ ์‹œ์Šคํ…œ์˜ ๊ฒฝ์šฐ, ๋‹ค๋ฅธ ๋ฆฌ์†Œ์Šค ํ• ๋‹น์„ ๊ฐ€์ง„ ๋ณ„๋„์˜ ์ˆ˜์ง‘ ๋ฐ ์ฟผ๋ฆฌ ๋…ธ๋“œ ํ’€์„ ๋ฐฐํฌํ•ฉ๋‹ˆ๋‹ค.
๋น„์šฉ ์ตœ์ ํ™”
๋ฒกํ„ฐ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค๋Š” ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ๋งŽ์ด ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. 1536์ฐจ์› ์ž„๋ฒ ๋”ฉ์„ ๊ฐ€์ง„ 1์–ต ๊ฐœ์˜ ๋ฒกํ„ฐ ์ปฌ๋ ‰์…˜์€ HNSW ๋ชจ๋“œ์—์„œ ์•ฝ 600GB์˜ RAM์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. MW๋Š” ๋‹ค์Œ์„ ํ†ตํ•ด ๋น„์šฉ์„ ์ตœ์ ํ™”ํ•ฉ๋‹ˆ๋‹ค: (a) ๊ฐ€๋Šฅํ•œ ๊ฒฝ์šฐ ์ฐจ์› ์ถ•์†Œ(Matryoshka embeddings, PCA), (b) ์–‘์žํ™”(scalar ๋˜๋Š” product quantization), (c) ์ฝœ๋“œ ์„ธ๊ทธ๋จผํŠธ๋ฅผ RAM์—์„œ ๋ฐ€์–ด๋‚ด๋Š” ๊ณ„์ธตํ˜• ์Šคํ† ๋ฆฌ์ง€, (d) ์ž„๋ฒ ๋”ฉ ์ฐจ์› ์ ์ •ํ™” โ€” 1536์ฐจ์›์ด ๊ณผ๋„ํ•  ๋•Œ 768์ฐจ์›์œผ๋กœ๋„ ์ถฉ๋ถ„ํ•œ ๊ฒฝ์šฐ๊ฐ€ ๋งŽ์Šต๋‹ˆ๋‹ค.

๊ธฐ์ˆ  ์„ ํƒ

๊ณ„์ธต๊ธฐ์ˆ 
Vector DatabaseMilvus (๋ถ„์‚ฐ), Qdrant (๋‹จ์ผ ๋…ธ๋“œ/์†Œ๊ทœ๋ชจ ํด๋Ÿฌ์Šคํ„ฐ), Pinecone (๊ด€๋ฆฌํ˜•)
Storage BackendMinIO / S3 (์„ธ๊ทธ๋จผํŠธ ์Šคํ† ๋ฆฌ์ง€), SSD (์›œ ํ‹ฐ์–ด), RAM (ํ•ซ ํ‹ฐ์–ด)
์กฐ์ •etcd (Milvus ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ), Pulsar/Kafka (write-ahead log)
Embedding ModelsOpenAI text-embedding-3-large, Cohere embed-v4, BGE-M3, E5-large-v2
์ธํ”„๋ผKubernetes (EKS/GKE) (์ž„๋ฒ ๋”ฉ์„ ์œ„ํ•œ GPU ๋…ธ๋“œ, ์ฟผ๋ฆฌ๋ฅผ ์œ„ํ•œ ๋ฉ”๋ชจ๋ฆฌ ์ตœ์ ํ™” ๋…ธ๋“œ ํฌํ•จ)
๋ชจ๋‹ˆํ„ฐ๋งGrafana + Milvus ๋ฉ”ํŠธ๋ฆญ์Šค ์ต์Šคํฌํ„ฐ, ์‚ฌ์šฉ์ž ์ง€์ • P99/recall ๋Œ€์‹œ๋ณด๋“œ

์‚ฌ์šฉ ์‹œ๊ธฐ / ํ”ผํ•ด์•ผ ํ•  ์‹œ๊ธฐ

์‚ฌ์šฉ ์‹œ๊ธฐํ”ผํ•ด์•ผ ํ•  ์‹œ๊ธฐ
๋ฒกํ„ฐ ์ˆ˜๊ฐ€ 5๋ฐฑ๋งŒ ๊ฐœ๋ฅผ ์ดˆ๊ณผํ•˜๊ณ  ์ฆ๊ฐ€ํ•˜๋ฉฐ, ์ˆ˜ํ‰์  ํ™•์žฅ์ด ํ•„์š”ํ•  ๋•Œ๋ฒกํ„ฐ ์ˆ˜๊ฐ€ 1๋ฐฑ๋งŒ ๊ฐœ ๋ฏธ๋งŒ์ผ ๋•Œ โ€” ๊ธฐ์กด PostgreSQL์˜ pgvector๋กœ ์ถฉ๋ถ„ํ•ฉ๋‹ˆ๋‹ค
100ms ๋ฏธ๋งŒ์˜ P99 ์ฟผ๋ฆฌ latency๊ฐ€ ํ•„์ˆ˜ ์š”๊ตฌ์‚ฌํ•ญ์ผ ๋•Œ500ms ์ด์ƒ์˜ ์ฟผ๋ฆฌ latency๊ฐ€ ํ—ˆ์šฉ ๊ฐ€๋Šฅํ•  ๋•Œ โ€” ๋” ๊ฐ„๋‹จํ•œ ์˜ต์…˜์œผ๋กœ๋„ ์ถฉ๋ถ„ํ•ฉ๋‹ˆ๋‹ค
์—ฌ๋Ÿฌ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜/ํ…Œ๋„ŒํŠธ๊ฐ€ ๋ฒกํ„ฐ ์ธํ”„๋ผ๋ฅผ ๊ณต์œ ํ•  ๋•Œ๋‹จ์ผ ์ปฌ๋ ‰์…˜์„ ๊ฐ€์ง„ ๋‹จ์ผ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์ผ ๋•Œ โ€” ๊ด€๋ฆฌํ˜• ์„œ๋น„์Šค๋ฅผ ์‚ฌ์šฉํ•˜์„ธ์š”
๋น„์šฉ ์ตœ์ ํ™”๋ฅผ ์œ„ํ•ด ๊ณ„์ธตํ˜• ์Šคํ† ๋ฆฌ์ง€๊ฐ€ ํ•„์š”ํ•  ๋•Œ (๋ชจ๋“  ๊ฒƒ์ด RAM์— ์žˆ์ง€ ์•Š์•„๋„ ๋จ)์˜ˆ์‚ฐ์ด ์™„์ „ ๊ด€๋ฆฌํ˜• ์„œ๋น„์Šค๋ฅผ ํ—ˆ์šฉํ•˜๊ณ , ํ•ด๋‹น ๊ณต๊ธ‰์—…์ฒด์˜ ๊ฐ€๊ฒฉ ์ •์ฑ…์ด ํ˜„์žฌ ๊ทœ๋ชจ์— ์ ํ•ฉํ•  ๋•Œ

์šฐ๋ฆฌ์˜ ์ ‘๊ทผ ๋ฐฉ์‹

MW๋Š” "์ฒซ๋‚ ๋ถ€ํ„ฐ ์ ์ ˆํ•œ ํฌ๊ธฐ๋กœ, ์ธก์ •์— ๋”ฐ๋ผ ํ™•์žฅ"ํ•˜๋Š” ์ ‘๊ทผ ๋ฐฉ์‹์œผ๋กœ ๋ฒกํ„ฐ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ์ธํ”„๋ผ๋ฅผ ์„ค๊ณ„ํ•ฉ๋‹ˆ๋‹ค. ์šฐ๋ฆฌ๋Š” ์ถ”์ธก์ด ์•„๋‹Œ ๋ฒกํ„ฐ ์ˆ˜, ์ฐจ์›, ์ธ๋ฑ์Šค ์œ ํ˜• ๋ฐ ๋ชฉํ‘œ latency๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์šฉ๋Ÿ‰ ๊ณ„ํš์„ ์‹œ์ž‘ํ•ฉ๋‹ˆ๋‹ค. Kubernetes์— ๋ฐฐํฌ๋œ Milvus๋Š” ์„ธ๊ทธ๋จผํŠธ ์ˆ˜, ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰, ์ฟผ๋ฆฌ latency ๋ฐฑ๋ถ„์œ„์ˆ˜ ๋ฐ recall ์ถ”์ •์น˜๋ฅผ ์ถ”์ ํ•˜๋Š” Grafana ๋Œ€์‹œ๋ณด๋“œ๋ฅผ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค. ์šฐ๋ฆฌ๋Š” ์—…๋ฌด ์‹œ๊ฐ„ ๋™์•ˆ 10๋ฐฐ์˜ ํŠธ๋ž˜ํ”ฝ ๊ธ‰์ฆ์„ ์ฒ˜๋ฆฌํ•˜๊ณ  ๋ฐค์ƒˆ ์Šค์ผ€์ผ ๋‹ค์šดํ•˜์—ฌ, ์ •์  ํ”„๋กœ๋น„์ €๋‹์— ๋น„ํ•ด ์ธํ”„๋ผ ๋น„์šฉ์„ 40-60% ์ ˆ๊ฐํ•˜๋Š” autoscaling Milvus ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ๊ตฌํ˜„ํ–ˆ์Šต๋‹ˆ๋‹ค.

๊ด€๋ จ ์ฒญ์‚ฌ์ง„

  • AI Customer Support Agent โ€” ์ง€์› ์‘๋‹ต์„ ์œ„ํ•œ ์ง€์‹ ๊ฒ€์ƒ‰์„ ์ง€์›ํ•˜๋Š” ๋ฒกํ„ฐ ๊ฒ€์ƒ‰
  • AI Document Processing Pipeline โ€” ์ถ”์ถœ๋œ ๋ฌธ์„œ ์ฝ˜ํ…์ธ  ์ž„๋ฒ ๋”ฉ ๋ฐ ์ธ๋ฑ์‹ฑ
  • AI-Driven Personalized Learning Platform โ€” ์ฝ˜ํ…์ธ  ์ถ”์ฒœ์„ ์œ„ํ•œ ๋ฒกํ„ฐ ์œ ์‚ฌ์„ฑ

๊ด€๋ จ ์‚ฌ๋ก€ ์—ฐ๊ตฌ

  • Milvus Autoscaling โ€” Kubernetes HPA ๋ฐ S3 ๊ธฐ๋ฐ˜ ๊ณ„์ธตํ˜• ์Šคํ† ๋ฆฌ์ง€๋ฅผ ๊ฐ–์ถ˜ ํ”„๋กœ๋•์…˜ Milvus ํด๋Ÿฌ์Šคํ„ฐ
  • Document Intelligence โ€” ๋กœ์ปฌ ๋ฌธ์„œ ๊ฒ€์ƒ‰ ๋ฐ ๋ถ„์„์„ ์œ„ํ•œ ๋ฒกํ„ฐ ๊ฒ€์ƒ‰
Related Technologies
AI DevelopmentCloud Solutions
AI / Data

RAG ํŒŒ์ดํ”„๋ผ์ธ ์•„ํ‚คํ…์ฒ˜

๋ฏธ์„ธ ์กฐ์ •(fine-tuning) ์—†์ด LLM์ด ๋ฐ์ดํ„ฐ์— ์ ‘๊ทผํ•˜๋„๋ก ํ•˜์„ธ์š”. RAG๋Š” ๋ฒ”์šฉ ์–ธ์–ด ๋ชจ๋ธ๊ณผ ๋„๋ฉ”์ธ๋ณ„ ์ง€์‹ ๊ฐ„์˜ ๊ฒฉ์ฐจ๋ฅผ ํ•ด์†Œํ•ฉ๋‹ˆ๋‹ค.

AdvancedView
multi-tenant-saas-architecture.webp
Application

๋ฉ€ํ‹ฐํ…Œ๋„ŒํŠธ SaaS ์•„ํ‚คํ…์ฒ˜

ํ•˜๋‚˜์˜ ์ฝ”๋“œ๋ฒ ์ด์Šค, ์ˆ˜๋ฐฑ ๊ฐœ์˜ ํ…Œ๋„ŒํŠธ, ๋ฐ์ดํ„ฐ ์œ ์ถœ ์ œ๋กœ โ€” ๋ชจ๋“  ํ™•์žฅ ๊ฐ€๋Šฅํ•œ SaaS ๋น„์ฆˆ๋‹ˆ์Šค์˜ ๊ธฐ๋ฐ˜์ž…๋‹ˆ๋‹ค.

AdvancedView