AIé¡è¿œè·¡ãšã¹ããŒããªãã¬ãŒãã³ã°ã«ãã瞊ååç»å€æ
ããã³ã³ãã³ãåå©çšãã©ãããã©ãŒã ã¯ã話è ã被åäœãå®ç§ã«äžå€®ã«ä¿ã¡ãªãããæåã§ã®ã¯ããããããŒãã¬ãŒã èšå®ãªãã§ã暪å (16:9) ã®é·å°ºåç»ã瞊å (9:16) ã®çå°ºã¯ãªããã«èªåçã«å€æããå¿ èŠããããŸããã
ãããžã§ã¯ããçžè«ãã課é¡
暪ååç»ã瞊åãã©ãŒãããã«å€æããããšã¯ãçå°ºã³ã³ãã³ãå¶äœã«ãããŠæãé¢åãªæé ã®1ã€ã§ããã
- åã¯ãªããããšã«ãã¬ãŒã ãæåã§ã¯ãããããåé 眮ããããšã¯æéãããããŸãã
- è€æ°äººã§ã®äŒè©±ã§ã¯ã話è ãå€ãããã³ã«åçãªãªãã¬ãŒãã³ã°ãå¿ èŠã§ãã
- éçãªäžå€®ã¯ãããã§ã¯ãç§»åãããäžå¿ããå€ããå Žæã«åº§ã£ãããã話è ãéäžã§åããŠããŸããŸãã
- åŸæ¥ã®é¡æ€åºã¯ãäœåãã®ã¯ãªããã«å¯Ÿãããªã¢ã«ã¿ã€ã ã®ãªãã¬ãŒãã³ã°æ±ºå®ã«ã¯é ãããŸãã
- ç°ãªãã³ã³ãã³ãã¿ã€ã (ã€ã³ã¿ãã¥ãŒããœãVlogããã¬ãŒã³ããŒã·ã§ã³) ã«ã¯ãããããç°ãªããã¬ãŒãã³ã°æŠç¥ãå¿ èŠã§ãã
ç§ãã¡ã®ãœãªã¥ãŒã·ã§ã³
åœç€Ÿã¯ãåç»ãã¬ãŒã å ã®é¡ãæ€åºãããã®åãã远跡ããã¢ã¯ãã£ããªè¢«åäœãäžå€®ã«ä¿ã€ããã«çžŠæ¹åã®ã¯ãããé åãåçã«èª¿æŽãããAIãæèŒããé¡è¿œè·¡ããã³ã¹ããŒããªãã¬ãŒãã³ã°ãšã³ãžã³ãæ§ç¯ããŸããã
ã¢ãŒããã¯ãã£
- 顿€åº: é床ãæé©åããYOLOããŒã¹ã®é¡æ€åºã¢ãã«
- é¡è¿œè·¡: æ°žç¶çãªè¢«åäœIDãæã€IoUããŒã¹ã®ãã¬ãŒã é远跡
- ãªãã¬ãŒãã³ã°ãšã³ãžã³: é¡ã®äœçœ®ãšåãã«åºã¥ããåçãªã¯ãããé åèšç®
- ã¢ã¯ãã£ãã¹ããŒã«ãŒé£æº: 話ããŠãã人ç©ãåªå ããããã®è©±è æ€åºãšã®é£æº
- ã¬ã³ããªã³ã°: ã¹ã ãŒãºãªãã³é·ç§»ã䌎ãFFmpegã¯ããããã£ã«ã¿ãŒãã§ãŒã³
ãªãã¬ãŒãã³ã°ãã€ãã©ã€ã³
- 顿€åº - ãµã³ãã«ããããã¬ãŒã å šäœã§YOLO顿€åºãå®è¡
- 被åäœè¿œè·¡ - IoUããŒã¹ã®è¿œè·¡ã䜿çšããŠãã¬ãŒã éã§é¡æ€åºããªã³ã¯ãã
- 話è åªå - ã¢ã¯ãã£ãã¹ããŒã«ãŒæ€åºãšé£æºããå Žåã話ããŠãã被åäœãåªå ãã
- ã¯ãããèšç® - äž»èŠãªè¢«åäœã®äœçœ®ã«åºã¥ããŠæé©ãª9:16ã¯ãããé åãæ±ºå®ãã
- ã¹ã ãŒãžã³ã° - äžèªç¶ãªé£ã³ãé¿ããããã«ã¯ãããã®åãã«ã€ãŒãžã³ã°ãé©çšãã
- ã¬ã³ããªã³ã° - FFmpegãã¹ã ãŒãºãªãã³é·ç§»ã䌎ãåçãªã¯ããããé©çšãã
äž»ãªæ©èœ
- è€æ°è¢«åäœå¯Ÿå¿ - è€æ°ã®é¡ã远跡ããã»ã°ã¡ã³ãããšã«äž»èŠãªè¢«åäœã決å®ããŸã
- 話è èªèãã¬ãŒãã³ã° - è©±è æ€åºãšçµ±åãããŠããå Žåãã¢ã¯ãã£ããªè©±è ãåªå ããŸã
- ã¹ã ãŒãºãªãã©ã³ãžã·ã§ã³ - 被åäœéã§ã®ã€ãŒãžã³ã°ããããã³ã«ãããäžèªç¶ãªã«ããããªããªããŸã
- ã³ã³ãã³ãã¿ã€ãé©å¿ - ãœããã€ã³ã¿ãã¥ãŒãã°ã«ãŒãã³ã³ãã³ãåãã«ç°ãªããã¬ãŒãã³ã°æŠç¥ãæäŸããŸã
- ãããåŠç - 1ã€ã®é·å°ºåç»ããæ°çŸã®ã¯ãªããããªãã¬ãŒã ã§ããŸã
- æåä»å ¥ãªã - æ€åºããæçµã¬ã³ããªã³ã°ãŸã§å®å šã«èªååãããŠããŸã
ææ
æè¡ã¹ã¿ãã¯
caseStudyDetail.more ã±ãŒã¹ã¹ã¿ãã£
ãã®ä»ã®æè¡å®è£ äºäŸãã芧ãã ãã
ã¯ãã¹ãã©ãããã©ãŒã ãœãŒã·ã£ã«ã¡ãã£ã¢ ã¹ã±ãžã¥ãŒãªã³ã° & ããã©ãŒãã³ã¹åæ
æ¯é±äœåãã®ã·ã§ãŒããã©ãŒã ã¯ãªãããå¶äœããã³ã³ãã³ãã¯ãªãšã€ã¿ãŒã¯ãæçš¿æŠç¥ãæé©åããããã®æŽå¯ãåŸãªãããåäžã®ããã·ã¥ããŒããã TikTokãYouTube ShortsãInstagram Reels ã«ã³ã³ãã³ããé ä¿¡ããããã®çµ±åãããã¹ã±ãžã¥ãŒãªã³ã°ããã³åæã·ã¹ãã ãå¿ èŠãšããŠããŸããã
ã°ããŒãã«ã³ã³ãã³ãé ä¿¡ã®ããã®å€èšèªãã£ãã·ã§ã³ç¿»èš³
åœéçãªèŠèŽè ãæã€ã³ã³ãã³ãã¯ãªãšã€ã¿ãŒã¯ããªãªãžãã«ã®é³å£°ãç¶æãã€ã€ããããªãã£ãã·ã§ã³ã30以äžã®èšèªã«ç¿»èš³ããããšã§ãªãŒããæ¡å€§ããäžçäžã®èŠèŽè ãæ¯åœèªã§ã³ã³ãã³ããæ¶è²»ã§ããããã«ããå¿ èŠããããŸããã
ãããã質å
MicrocosmWorks implemented a hybrid tracking approach that combines a lightweight face detector running every 5th frame with a KCF optical flow tracker for inter-frame predictions. When occlusion is detected via confidence score drops, the system maintains the last known trajectory with Kalman filtering and re-acquires the face within 200ms of it becoming visible again.
MicrocosmWorks built a saliency-weighted cropping algorithm that prioritizes detected faces, then text regions, then motion areas when determining the 9:16 crop window position. For multi-person scenes, the system uses a configurable priority ranking, defaulting to the active speaker or the largest face, with smooth interpolation between crop positions to avoid jarring shifts.
Yes, MicrocosmWorks implemented a fallback saliency detection mode that activates when no faces are present, using a combination of motion detection, visual attention modeling, and mouse cursor tracking for screen recordings. The system intelligently follows the most relevant content region even in purely visual or text-based footage.
MicrocosmWorks optimized the pipeline for batch workflows, achieving 8x real-time processing speed on a single NVIDIA T4 GPU, meaning a 10-minute video is reframed in approximately 75 seconds. The system supports parallel processing across multiple GPUs, scaling linearly for high-volume content operations.
MicrocosmWorks develops AI video reframing systems at rates of $25-$45/hr, with a full face tracking and smart reframing solution including model optimization, batch processing support, and API integration typically requiring 350-550 development hours. This investment eliminates the need for manual reframing editors, which typically cost $5-$15 per video.
ããžãã¹ã®å€é©ã®æºåã¯ã§ããŠããŸããïŒ
ã客æ§ã®èª²é¡ã«é¡äŒŒã®ãœãªã¥ãŒã·ã§ã³ãé©çšããæ¹æ³ã«ã€ããŠè©±ãåããŸãããã