MicrocosmWorks๋””์ง€ํ„ธ ์ฝ”์Šค๋ชจ์Šค ํ˜์‹  ๋ฐ ์„ค๊ณ„
์†Œ๊ฐœ์—ฐ๋ฝ์ฒ˜
MicrocosmWorks๋””์ง€ํ„ธ ์ฝ”์Šค๋ชจ์Šค๋ฅผ ํ˜์‹ ํ•˜๊ณ  ์„ค๊ณ„ํ•ฉ๋‹ˆ๋‹ค

์ค‘์š”ํ•œ IT ์†”๋ฃจ์…˜์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ์ˆ , ๋ณด์•ˆ์— ์—ด์ •์ ์ด๋ฉฐ ์‹ ๋ขฐํ•  ์ˆ˜ ์žˆ๋Š” ํ˜์‹ ์ ์ธ IT ์ธํ”„๋ผ๋ฅผ ํ†ตํ•ด ๋น„์ฆˆ๋‹ˆ์Šค ์„ฑ์žฅ์„ ๋•์Šต๋‹ˆ๋‹ค.

[email protected]
+91 7011868196
New Delhi, India

AI ์„ฑ์žฅ ํ—ˆ๋ธŒ

AI ํ—ˆ๋ธŒ์Šคํƒ€ํŠธ์—… ํ˜์‹ ๊ธฐ์—… ๊ฐ€์†๊ธฐ

์†”๋ฃจ์…˜

๋ชจ๋“  ์†”๋ฃจ์…˜์›ฐ๋‹ˆ์Šค ๋ฐ ํ”ผํŠธ๋‹ˆ์Šค ์•ฑAI ๋น„๋””์˜ค ํ”Œ๋žซํผAI ์—์ด์ „ํŠธ ๊ฐœ๋ฐœ

์ž์›

ํ†ต์ฐฐ๋ ฅ์‚ฐ์—… ๊ฐ€์ด๋“œ์‚ฌ์šฉ ์‚ฌ๋ก€ ์ฒญ์‚ฌ์ง„์•„ํ‚คํ…์ฒ˜ ํŒจํ„ด์‚ฌ๋ก€ ์—ฐ๊ตฌ

ํšŒ์‚ฌ

ํšŒ์‚ฌ ์†Œ๊ฐœ์—ฐ๋ฝ์ฒ˜์šฐ๋ฆฌ์˜ ์ž‘์—…

์„œ๋น„์Šค

๋””์ง€ํ„ธ ์ปจ์„คํŒ…ํด๋ผ์šฐ๋“œ ์ธํ”„๋ผSaaS ๊ฐœ๋ฐœAI ๊ฐœ๋ฐœ๋น„๋””์˜ค ๊ธฐ์ˆ 
ERP ๊ฐœ๋ฐœZoho ๋งž์ถคํ™”Odoo ๊ฐœ๋ฐœSalesforce ํ†ตํ•ฉ๋งž์ถคํ˜• CRM ๊ฐœ๋ฐœ
QuickBooks ํ†ตํ•ฉIoT ์†”๋ฃจ์…˜๋ธ”๋ก์ฒด์ธ ๊ฐœ๋ฐœ
์‚ฌ์ด๋ฒ„ ๋ณด์•ˆ ์ปจ์„คํŒ…IT ์ง€์› - L3

ยฉ 2026 MicrocosmWorks. ๋ชจ๋“  ๊ถŒ๋ฆฌ ๋ณด์œ .

๊ฐœ์ธ์ •๋ณด ์ฒ˜๋ฆฌ๋ฐฉ์นจ์„œ๋น„์Šค ์•ฝ๊ด€
์‚ฌ๋ก€ ์—ฐ๊ตฌ ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ
Video Creation๊ฒŒ์‹œ์ผ June 18, 2026 ยท ์ˆ˜์ •์ผ May 25, 2026

AI Face Tracking & Smart Reframing for Vertical Video Conversion

A content repurposing platform needed to automatically convert horizontal (16:9) long-form videos into vertical (9:16) short-form clips while keeping speakers and subjects perfectly centered โ€” without any manual cropping or keyframing.

ํ”„๋กœ์ ํŠธ ์ƒ๋‹ดํ•˜๊ธฐ
ai-face-tracking-vertical-reframing.webp
Video Creation
Domain
7
Technologies
4
Key Results
Delivered
Status

๊ณผ์ œ

Converting horizontal video to vertical format was one of the most tedious steps in short-form content production:

  • Manually cropping and repositioning the frame for every clip was time-consuming
  • Multi-person conversations required dynamic reframing as speakers changed
  • Static center-crop cut off speakers who moved or sat off-center
  • Traditional face detection was too slow for real-time reframing decisions across thousands of clips
  • Different content types (interviews, solo vlogs, presentations) required different framing strategies

์šฐ๋ฆฌ์˜ ์†”๋ฃจ์…˜

We built an AI-powered face tracking and smart reframing engine that detects faces in video frames, tracks their movement, and dynamically adjusts the vertical crop region to keep the active subject centered.

Architecture

  • Face Detection: YOLO-based face detection model optimized for speed
  • Face Tracking: IoU-based frame-to-frame tracking with persistent subject IDs
  • Reframing Engine: Dynamic crop region calculation based on face positions and movement
  • Active Speaker Coupling: Integration with speaker detection to prioritize the person talking
  • Rendering: FFmpeg crop filter chain with smooth pan transitions

Reframing Pipeline

  1. Face Detection - Run YOLO face detection across sampled frames
  2. Subject Tracking - Link face detections across frames using IoU-based tracking
  3. Speaker Priority - When coupled with active speaker detection, prioritize the talking subject
  4. Crop Calculation - Determine optimal 9:16 crop region based on primary subject position
  5. Smoothing - Apply easing to crop movement to avoid jarring jumps
  6. Rendering - FFmpeg applies the dynamic crop with smooth pan transitions

Key Features

  1. Multi-Subject Handling - Tracks multiple faces and determines the primary subject per segment
  2. Speaker-Aware Framing - Prioritizes the active speaker when integrated with speaker detection
  3. Smooth Transitions - Eased panning between subjects eliminates jarring cuts
  4. Content-Type Adaptation - Different framing strategies for solo, interview, and group content
  5. Batch Processing - Reframe hundreds of clips from a single long-form video
  6. No Manual Intervention - Fully automated from detection to final render

๊ฒฐ๊ณผ

Time Savings: Eliminated 2-5 minutes of manual cropping per clip
Quality: Subjects stayed centered 95%+ of the time across tested content
Scale: Processed thousands of clips daily without human intervention

๊ธฐ์ˆ  ์Šคํƒ

YOLOPythonFFmpegOpenCVIoU TrackingNode.jsGPU-Accelerated Inference

caseStudyDetail.more ์‚ฌ๋ก€ ์—ฐ๊ตฌ

๋” ๋งŽ์€ ๊ธฐ์ˆ  ๊ตฌํ˜„ ์‚ฌ๋ก€๋ฅผ ์‚ดํŽด๋ณด์„ธ์š”

Video Creation

ํฌ๋กœ์Šค ํ”Œ๋žซํผ ์†Œ์…œ ๋ฏธ๋””์–ด ์Šค์ผ€์ค„๋ง & ์„ฑ๊ณผ ๋ถ„์„

๋งค์ฃผ ์ˆ˜์‹ญ ๊ฐœ์˜ ์ˆํผ ํด๋ฆฝ์„ ์ œ์ž‘ํ•˜๋Š” ์ฝ˜ํ…์ธ  ํฌ๋ฆฌ์—์ดํ„ฐ๋“ค์€ ๋‹จ์ผ ๋Œ€์‹œ๋ณด๋“œ์—์„œ TikTok, YouTube Shorts, Instagram Reels์— ์ฝ˜ํ…์ธ ๋ฅผ ๋ฐฐํฌํ•˜๊ณ  ๊ฒŒ์‹œ ์ „๋žต์„ ์ตœ์ ํ™”ํ•  ํ†ต์ฐฐ๋ ฅ์„ ์–ป๊ธฐ ์œ„ํ•œ ํ†ตํ•ฉ ์Šค์ผ€์ค„๋ง ๋ฐ ๋ถ„์„ ์‹œ์Šคํ…œ์ด ํ•„์š”ํ–ˆ์Šต๋‹ˆ๋‹ค.

์‚ฌ๋ก€ ์—ฐ๊ตฌ ์ฝ๊ธฐ
Video Creation

๊ธ€๋กœ๋ฒŒ ์ฝ˜ํ…์ธ  ๋ฐฐํฌ๋ฅผ ์œ„ํ•œ ๋‹ค๊ตญ์–ด ์ž๋ง‰ ๋ฒˆ์—ญ

๊ตญ์ œ์ ์ธ ์‹œ์ฒญ์ž์ธต์„ ๊ฐ€์ง„ ์ฝ˜ํ…์ธ  ํฌ๋ฆฌ์—์ดํ„ฐ๋“ค์€ ์›๋ณธ ์˜ค๋””์˜ค๋ฅผ ๋ณด์กดํ•˜๋ฉด์„œ ๋น„๋””์˜ค ์ž๋ง‰์„ 30๊ฐœ ์ด์ƒ์˜ ์–ธ์–ด๋กœ ๋ฒˆ์—ญํ•˜์—ฌ ๋„๋‹ฌ ๋ฒ”์œ„๋ฅผ ํ™•์žฅํ•ด์•ผ ํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ์ „ ์„ธ๊ณ„ ์‹œ์ฒญ์ž๋“ค์ด ๋ชจ๊ตญ์–ด๋กœ ์ฝ˜ํ…์ธ ๋ฅผ ์‹œ์ฒญํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

์‚ฌ๋ก€ ์—ฐ๊ตฌ ์ฝ๊ธฐ

์ž์ฃผ ๋ฌป๋Š” ์งˆ๋ฌธ

MicrocosmWorks implemented a hybrid tracking approach that combines a lightweight face detector running every 5th frame with a KCF optical flow tracker for inter-frame predictions. When occlusion is detected via confidence score drops, the system maintains the last known trajectory with Kalman filtering and re-acquires the face within 200ms of it becoming visible again.

MicrocosmWorks built a saliency-weighted cropping algorithm that prioritizes detected faces, then text regions, then motion areas when determining the 9:16 crop window position. For multi-person scenes, the system uses a configurable priority ranking, defaulting to the active speaker or the largest face, with smooth interpolation between crop positions to avoid jarring shifts.

Yes, MicrocosmWorks implemented a fallback saliency detection mode that activates when no faces are present, using a combination of motion detection, visual attention modeling, and mouse cursor tracking for screen recordings. The system intelligently follows the most relevant content region even in purely visual or text-based footage.

MicrocosmWorks optimized the pipeline for batch workflows, achieving 8x real-time processing speed on a single NVIDIA T4 GPU, meaning a 10-minute video is reframed in approximately 75 seconds. The system supports parallel processing across multiple GPUs, scaling linearly for high-volume content operations.

MicrocosmWorks develops AI video reframing systems at rates of $25-$45/hr, with a full face tracking and smart reframing solution including model optimization, batch processing support, and API integration typically requiring 350-550 development hours. This investment eliminates the need for manual reframing editors, which typically cost $5-$15 per video.

๋น„์ฆˆ๋‹ˆ์Šค ํ˜์‹ ์„ ์‹œ์ž‘ํ•  ์ค€๋น„๊ฐ€ ๋˜์…จ๋‚˜์š”?

๊ท€ํ•˜์˜ ๊ณผ์ œ์— ์œ ์‚ฌํ•œ ์†”๋ฃจ์…˜์„ ์ ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ๋…ผ์˜ํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

๋ฌธ์˜ํ•˜๊ธฐcaseStudyDetail.viewAllCaseStudies
Creator Satisfaction: Vertical clips looked professionally framed without manual editing
Video Creation

์ž๋™ ์บก์…˜ ์Šคํƒ€์ผ๋ง ๋ฐ ๋น„๋””์˜ค ๋‚ด๋ณด๋‚ด๊ธฐ ์—”์ง„

๋™์˜์ƒ ์ œ์ž‘์ž๋“ค์€ ๋‹ค์–‘ํ•œ ์Šคํƒ€์ผ๊ณผ ํ”Œ๋žซํผ์—์„œ ํ”ฝ์…€ ์™„๋ฒฝํ•œ ๋ Œ๋”๋ง์œผ๋กœ ์ „๋ฌธ์ ์ธ ์ˆ˜์ค€์˜ ์• ๋‹ˆ๋ฉ”์ด์…˜ ์บก์…˜์„ ์งง์€ ํ˜•์‹์˜ ๋™์˜์ƒ์— ์ ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ๋น ๋ฅด๊ณ  ์•ˆ์ •์ ์ธ ์‹œ์Šคํ…œ์„ ํ•„์š”๋กœ ํ–ˆ์Šต๋‹ˆ๋‹ค.

์‚ฌ๋ก€ ์—ฐ๊ตฌ ์ฝ๊ธฐ