MicrocosmWorks๋””์ง€ํ„ธ ์ฝ”์Šค๋ชจ์Šค ํ˜์‹  ๋ฐ ์„ค๊ณ„
์†Œ๊ฐœ์—ฐ๋ฝ์ฒ˜
MicrocosmWorks๋””์ง€ํ„ธ ์ฝ”์Šค๋ชจ์Šค๋ฅผ ํ˜์‹ ํ•˜๊ณ  ์„ค๊ณ„ํ•ฉ๋‹ˆ๋‹ค

์ค‘์š”ํ•œ IT ์†”๋ฃจ์…˜์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ์ˆ , ๋ณด์•ˆ์— ์—ด์ •์ ์ด๋ฉฐ ์‹ ๋ขฐํ•  ์ˆ˜ ์žˆ๋Š” ํ˜์‹ ์ ์ธ IT ์ธํ”„๋ผ๋ฅผ ํ†ตํ•ด ๋น„์ฆˆ๋‹ˆ์Šค ์„ฑ์žฅ์„ ๋•์Šต๋‹ˆ๋‹ค.

[email protected]
+91 7011868196
New Delhi, India

AI ์„ฑ์žฅ ํ—ˆ๋ธŒ

AI ํ—ˆ๋ธŒ์Šคํƒ€ํŠธ์—… ํ˜์‹ ๊ธฐ์—… ๊ฐ€์†๊ธฐ

์†”๋ฃจ์…˜

๋ชจ๋“  ์†”๋ฃจ์…˜์›ฐ๋‹ˆ์Šค ๋ฐ ํ”ผํŠธ๋‹ˆ์Šค ์•ฑAI ๋น„๋””์˜ค ํ”Œ๋žซํผAI ์—์ด์ „ํŠธ ๊ฐœ๋ฐœ

์ž์›

ํ†ต์ฐฐ๋ ฅ์‚ฐ์—… ๊ฐ€์ด๋“œ์‚ฌ์šฉ ์‚ฌ๋ก€ ์ฒญ์‚ฌ์ง„์•„ํ‚คํ…์ฒ˜ ํŒจํ„ด์‚ฌ๋ก€ ์—ฐ๊ตฌ

ํšŒ์‚ฌ

ํšŒ์‚ฌ ์†Œ๊ฐœ์—ฐ๋ฝ์ฒ˜์šฐ๋ฆฌ์˜ ์ž‘์—…

์„œ๋น„์Šค

๋””์ง€ํ„ธ ์ปจ์„คํŒ…ํด๋ผ์šฐ๋“œ ์ธํ”„๋ผSaaS ๊ฐœ๋ฐœAI ๊ฐœ๋ฐœ๋น„๋””์˜ค ๊ธฐ์ˆ 
ERP ๊ฐœ๋ฐœZoho ๋งž์ถคํ™”Odoo ๊ฐœ๋ฐœSalesforce ํ†ตํ•ฉ๋งž์ถคํ˜• CRM ๊ฐœ๋ฐœ
QuickBooks ํ†ตํ•ฉIoT ์†”๋ฃจ์…˜๋ธ”๋ก์ฒด์ธ ๊ฐœ๋ฐœ
์‚ฌ์ด๋ฒ„ ๋ณด์•ˆ ์ปจ์„คํŒ…IT ์ง€์› - L3

ยฉ 2026 MicrocosmWorks. ๋ชจ๋“  ๊ถŒ๋ฆฌ ๋ณด์œ .

๊ฐœ์ธ์ •๋ณด ์ฒ˜๋ฆฌ๋ฐฉ์นจ์„œ๋น„์Šค ์•ฝ๊ด€
์‚ฌ๋ก€ ์—ฐ๊ตฌ ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ
Video Analysis๊ฒŒ์‹œ์ผ June 22, 2026 ยท ์ˆ˜์ •์ผ June 22, 2026

AI ๊ธฐ๋ฐ˜ ๋‹ค์ค‘ ์นด๋ฉ”๋ผ ์˜์ƒ ์ œ์ž‘์„ ์œ„ํ•œ ๋Šฅ๋™ ํ™”์ž ๊ฐ์ง€

๋‹ค์ค‘ ์นด๋ฉ”๋ผ ์ธํ„ฐ๋ทฐ ๋ฐ ํŒจ๋„ ํ† ๋ก  ์ดฌ์˜์„ ์ง„ํ–‰ํ•˜๋Š” ๋ฏธ๋””์–ด ์ œ์ž‘์‚ฌ๋Š” ๋ณต์žกํ•œ ์˜์ƒ ํ‘ธํ‹ฐ์ง€์—์„œ ํŠน์ • ์ˆœ๊ฐ„์— ๋ˆ„๊ฐ€ ๋งํ•˜๊ณ  ์žˆ๋Š”์ง€ ์ž๋™์œผ๋กœ ์‹๋ณ„ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ํ•„์š”ํ–ˆ์Šต๋‹ˆ๋‹ค.

ํ”„๋กœ์ ํŠธ ์ƒ๋‹ดํ•˜๊ธฐ
ai-active-speaker-detection.webp
Video Analysis
Domain
11
Technologies
4
Key Results
Delivered
Status

๊ณผ์ œ

๋‹ค์ค‘ ์นด๋ฉ”๋ผ ์ฝ˜ํ…์ธ (์ธํ„ฐ๋ทฐ, ํŒŸ์บ์ŠคํŠธ, ํŒจ๋„ ํ† ๋ก )๋ฅผ ์ œ์ž‘ํ•˜๋ ค๋ฉด ํŽธ์ง‘์ž๋“ค์ด ์ˆ˜๋งŽ์€ ํ‘ธํ‹ฐ์ง€๋“ค์„ ์ˆ˜๋™์œผ๋กœ ํ™•์ธํ•˜์—ฌ ๋Šฅ๋™ ํ™”์ž๋ฅผ ์‹๋ณ„ํ•˜๊ณ  ์ปท์„ ๋งŒ๋“ค์–ด์•ผ ํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด ๊ณผ์ •์€ ๋‹ค์Œ๊ณผ ๊ฐ™์•˜์Šต๋‹ˆ๋‹ค.

  • ๋งค์šฐ ์‹œ๊ฐ„์ด ๋งŽ์ด ์†Œ์š”๋จ (์ˆ˜๋™ ๊ฒ€ํ† ์˜ ๊ฒฝ์šฐ ์‹ค์‹œ๊ฐ„์˜ 10-15๋ฐฐ)
  • ํ™”์ž ๊ท€์† ์‹œ ์ธ์  ์˜ค๋ฅ˜ ๋ฐœ์ƒ ๊ฐ€๋Šฅ์„ฑ ๋†’์Œ
  • ์‹ ์†ํ•œ ์ฝ˜ํ…์ธ  ์ฒ˜๋ฆฌ(turnaround)๋ฅผ ๋ฐฉํ•ดํ•˜๋Š” ๋ณ‘๋ชฉ ํ˜„์ƒ

์šฐ๋ฆฌ์˜ ์†”๋ฃจ์…˜

์šฐ๋ฆฌ๋Š” ์˜ค๋””์˜ค ๋ฐ ์‹œ๊ฐ ์‹ ํ˜ธ๋ฅผ ์œตํ•ฉํ•˜์—ฌ ๋Šฅ๋™ ํ™”์ž๋ฅผ ์ž๋™์œผ๋กœ ๊ฐ์ง€ํ•˜๋Š” ๋”ฅ๋Ÿฌ๋‹ ํŒŒ์ดํ”„๋ผ์ธ์ด ์ ์šฉ๋œ AI ๊ธฐ๋ฐ˜ ์˜์ƒ ๋ถ„์„ ํ”Œ๋žซํผ์„ ๊ตฌ์ถ•ํ–ˆ์Šต๋‹ˆ๋‹ค.

์•„ํ‚คํ…์ฒ˜

  • ๋ฐฑ์—”๋“œ: MongoDB ๋ฐ Redis๋ฅผ ์‚ฌ์šฉํ•œ Python/Flask REST API
  • ML ํŒŒ์ดํ”„๋ผ์ธ: TalkNet ์˜ค๋””์˜ค-์‹œ๊ฐ ์œตํ•ฉ ๋ชจ๋ธ, ์–ผ๊ตด ๊ฐ์ง€๋ฅผ ์œ„ํ•œ YOLOv8 Nano, ์ „์‚ฌ๋ฅผ ์œ„ํ•œ OpenAI Whisper
  • GPU ์ตœ์ ํ™”: CUDA๊ฐ€ ์ ์šฉ๋œ PyTorch, 3๋ฐฐ ์†๋„ ํ–ฅ์ƒ์„ ์œ„ํ•œ ํ”„๋ ˆ์ž„ ๋ฐ์‹œ๋ฉ”์ด์…˜, ๋ฐฐ์น˜ ์ฒ˜๋ฆฌ
  • ์ธํ”„๋ผ: ๋ถ„์‚ฐ MongoDB ๊ธฐ๋ฐ˜ ์ž ๊ธˆ์„ ์‚ฌ์šฉํ•œ ๋‹ค์ค‘ ์ธ์Šคํ„ด์Šค ๋ฐฐํฌ

์ฒ˜๋ฆฌ ํŒŒ์ดํ”„๋ผ์ธ

  1. ๋ฏธ๋””์–ด ์ถ”์ถœ - ์˜์ƒ ๋‹ค์šด๋กœ๋“œ ๋ฐ ์˜ค๋””์˜ค/์˜์ƒ ๋ถ„๋ฆฌ
  2. ์žฅ๋ฉด ๊ฐ์ง€ - PySceneDetect๋ฅผ ํ†ตํ•œ ์ฝ˜ํ…์ธ  ๊ธฐ๋ฐ˜ ๊ฒฝ๊ณ„ ๊ฐ์ง€
  3. ์–ผ๊ตด ๊ฐ์ง€ - ํ”„๋ ˆ์ž„ ๋ฐ์‹œ๋ฉ”์ด์…˜์„ ์‚ฌ์šฉํ•œ YOLOv8 Nano ์–ผ๊ตด ๊ฐ์ง€
  4. ์–ผ๊ตด ์ถ”์  - ํ”„๋ ˆ์ž„ ๊ฐ„ IoU ๊ธฐ๋ฐ˜ ์—ฐ๊ฒฐ
  5. TalkNet ์ถ”๋ก  - ๋‹ค์ค‘ ์ง€์† ์‹œ๊ฐ„ ์ ์ˆ˜(1์ดˆ, 2์ดˆ, 4์ดˆ, 6์ดˆ ์œˆ๋„์šฐ)๋ฅผ ์‚ฌ์šฉํ•œ ์˜ค๋””์˜ค-์‹œ๊ฐ ์œตํ•ฉ
  6. ์ „์‚ฌ - ๋‹จ์–ด ์ˆ˜์ค€ ํƒ€์ž„์Šคํƒฌํ”„๋ฅผ ํฌํ•จํ•œ Whisper ๊ธฐ๋ฐ˜ ์Œ์„ฑ-ํ…์ŠคํŠธ ๋ณ€ํ™˜

์ฃผ์š” ๊ธฐ๋Šฅ

  • ๊ต์ฐจ ๋ชจ๋‹ฌ ์–ดํ…์…˜(์ž…์ˆ  ์›€์ง์ž„ + ์˜ค๋””์˜ค)์„ ํ†ตํ•œ ๋Šฅ๋™ ํ™”์ž ๊ฐ์ง€
  • ๊ฐ•๋ ฅํ•œ ํ™”์ž ์‹๋ณ„์„ ์œ„ํ•œ ๋‹ค์ค‘ ์ง€์† ์‹œ๊ฐ„ ์‹ ๋ขฐ๋„ ์ ์ˆ˜ํ™”
  • ๋‹จ์–ด ์ˆ˜์ค€ ํƒ€์ž„์Šคํƒฌํ”„๋ฅผ ํฌํ•จํ•œ ์ž๋™ ์ „์‚ฌ
  • ์ทจ์†Œ ์ง€์› ๊ธฐ๋Šฅ์ด ์žˆ๋Š” ๋ฐฑ๊ทธ๋ผ์šด๋“œ ์ž‘์—… ์Šค์ผ€์ค„๋ง
  • ์„ฑ๋Šฅ ๋ชจ๋‹ˆํ„ฐ๋ง ๋ฐ GPU ๋ฉ”๋ชจ๋ฆฌ ๊ด€๋ฆฌ

๊ฒฐ๊ณผ

์ฒ˜๋ฆฌ ์†๋„: 12GB ์ด์ƒ GPU์—์„œ 30๋ถ„ ์˜์ƒ 10-15๋ถ„ ๋‚ด ๋ถ„์„
์ •ํ™•๋„: ๋‹ค์ค‘ ์ง€์† ์‹œ๊ฐ„ ์ ์ˆ˜ํ™”๋ฅผ ํ†ตํ•œ ๋†’์€ ์‹ ๋ขฐ๋„์˜ ํ™”์ž ๊ท€์†
ํ™•์žฅ์„ฑ: ์„œ๋ฒ„ ๊ฐ„ ์ˆ˜ํ‰ ํ™•์žฅ์„ ์ง€์›ํ•˜๋Š” ๋ถ„์‚ฐ ์•„ํ‚คํ…์ฒ˜
ํšจ์œจ์„ฑ: ํ”„๋ ˆ์ž„ ๋ฐ์‹œ๋ฉ”์ด์…˜ ์ตœ์ ํ™”๋ฅผ ํ†ตํ•œ 3๋ฐฐ ์†๋„ ํ–ฅ์ƒ

๊ธฐ์ˆ  ์Šคํƒ

PythonFlaskPyTorchTalkNetYOLOv8OpenAI WhisperMongoDBRedisFFmpegPySceneDetectCUDA

caseStudyDetail.more ์‚ฌ๋ก€ ์—ฐ๊ตฌ

๋” ๋งŽ์€ ๊ธฐ์ˆ  ๊ตฌํ˜„ ์‚ฌ๋ก€๋ฅผ ์‚ดํŽด๋ณด์„ธ์š”

Video Analysis

์‹ค์‹œ๊ฐ„ ๋น„๋””์˜ค ๊ฐ์ฒด ์ถ”์  ๋ฐ ์ž๋™ ์ค‘์•™ ์ •๋ ฌ & ๋ณต๊ตฌ

๋น„๋””์˜ค ์ œ์ž‘ํŒ€์€ ์˜์ƒ์—์„œ ์„ ํƒํ•œ ๊ฐ์ฒด๋ฅผ ์ถ”์ ํ•˜๊ณ , ์›€์ง์ผ ๋•Œ ํ”„๋ ˆ์ž„ ์ค‘์•™์— ์ž๋™์œผ๋กœ ์œ ์ง€ํ•˜๋Š” ๋„๊ตฌ๋ฅผ ํ•„์š”๋กœ ํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด ๋„๊ตฌ๋Š” ๋ถ€๋“œ๋Ÿฌ์šด ์ „ํ™˜, ๋‹ค์–‘ํ•œ ์ถ”์  ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์˜ต์…˜, ๊ทธ๋ฆฌ๊ณ  ์ถ”์ ๊ธฐ๊ฐ€ ๋Œ€์ƒ์„ ๋†“์ณค์„ ๋•Œ์˜ ์ž๋™ ๋ณต๊ตฌ ๊ธฐ๋Šฅ์„ ๊ฐ–์ถฐ์•ผ ํ–ˆ์Šต๋‹ˆ๋‹ค.

์‚ฌ๋ก€ ์—ฐ๊ตฌ ์ฝ๊ธฐ
Video Analysis

AI ๊ธฐ๋ฐ˜ ๋ถ„์„ ๊ธฐ๋Šฅ์„ ํ†ตํ•œ ํฌ๋กœ์Šค ํ”Œ๋žซํผ ๋ชจ๋ฐ”์ผ ๋น„๋””์˜ค ํŽธ์ง‘

์ฝ˜ํ…์ธ  ์ œ์ž‘์ž์™€ ๋ฏธ๋””์–ด ์ „๋ฌธ๊ฐ€๋“ค์€ ์ด๋™ ์ค‘์—๋„ AI ๊ธฐ๋ฐ˜ ๋ถ„์„ ๊ฒฐ๊ณผ๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋” ์Šค๋งˆํŠธํ•œ ํŽธ์ง‘ ์›Œํฌํ”Œ๋กœ์šฐ๋ฅผ ์ง€์›ํ•˜๋Š” ๋ชจ๋ฐ”์ผ ์šฐ์„  ๋น„๋””์˜ค ํŽธ์ง‘ ์†”๋ฃจ์…˜์„ ํ•„์š”๋กœ ํ–ˆ์Šต๋‹ˆ๋‹ค.

์‚ฌ๋ก€ ์—ฐ๊ตฌ ์ฝ๊ธฐ

์ž์ฃผ ๋ฌป๋Š” ์งˆ๋ฌธ

MicrocosmWorks developed a multimodal fusion model that correlates lip movement visual features extracted from each camera feed with the audio signal using cross-attention layers. The model outputs per-frame speaker probability scores for each visible face, achieving 94% accuracy even when multiple participants speak simultaneously.

MicrocosmWorks optimized the inference pipeline to run on NVIDIA T4 GPUs with TensorRT acceleration, achieving under 150ms end-to-end latency from frame capture to speaker identification. This latency is well within the acceptable range for live production switching, where typical cut delays are 300-500ms.

MicrocosmWorks trained the model on diverse occlusion scenarios and implemented a temporal smoothing algorithm that maintains speaker tracking through brief occlusions using audio-only confidence scores. When visual confidence drops below a threshold, the system falls back to audio source localization using beamforming data from multi-microphone arrays.

MicrocosmWorks built a companion control module that translates speaker detection outputs into standard tally/control signals compatible with Blackmagic ATEM via the ATEM SDK and NewTek NDI for TriCaster systems. Production directors can set the system to auto-switch or advisory mode where it suggests cuts without executing them.

MicrocosmWorks builds custom AI video analysis systems at rates of $30-$50/hr, with a multi-camera active speaker detection system including model training, TensorRT optimization, and switcher integration typically requiring 500-750 development hours. The model training phase requires GPU compute resources that usually add $2,000-$5,000 to the project cost.

๋น„์ฆˆ๋‹ˆ์Šค ํ˜์‹ ์„ ์‹œ์ž‘ํ•  ์ค€๋น„๊ฐ€ ๋˜์…จ๋‚˜์š”?

๊ท€ํ•˜์˜ ๊ณผ์ œ์— ์œ ์‚ฌํ•œ ์†”๋ฃจ์…˜์„ ์ ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ๋…ผ์˜ํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

๋ฌธ์˜ํ•˜๊ธฐcaseStudyDetail.viewAllCaseStudies
AI Accounting

OCR ๋ฐ QuickBooks ์—ฐ๋™์„ ํ†ตํ•œ AI ๊ธฐ๋ฐ˜ ์†ก์žฅ ์ฒ˜๋ฆฌ

๋งค์›” ์ˆ˜๋ฐฑ ๊ฑด์˜ ๊ณต๊ธ‰์—…์ฒด ์†ก์žฅ์„ ์ฒ˜๋ฆฌํ•˜๋Š” ์ค‘๊ฒฌ ๊ธฐ์—…์€ AI/OCR์„ ์‚ฌ์šฉํ•˜์—ฌ ์†ก์žฅ ๋ฐ์ดํ„ฐ๋ฅผ ์ž๋™์œผ๋กœ ์ถ”์ถœํ•˜๊ณ  ์ด๋ฅผ QuickBooks์— ์ง์ ‘ ๋™๊ธฐํ™”ํ•˜์—ฌ ์žฅ๋ถ€ ์ •๋ฆฌ ๋ฐ ์ง€๊ธ‰ ์ถ”์ ์„ ํ•จ์œผ๋กœ์จ ์ˆ˜๋™ ๋ฐ์ดํ„ฐ ์ž…๋ ฅ์„ ์—†์• ์•ผ ํ–ˆ์Šต๋‹ˆ๋‹ค.

์‚ฌ๋ก€ ์—ฐ๊ตฌ ์ฝ๊ธฐ