颿°åŒã³åºããšåæ¹åãªãŒãã£ãªã¹ããªãŒãã³ã°ãåãããªã¢ã«ã¿ã€ã é³å£°AIã¢ã·ã¹ã¿ã³ã
ãã£ãããã¹ããã³æ é€ãã©ãããã©ãŒã ã¯ããŠãŒã¶ãŒã«ãªã¢ã«ã¿ã€ã ã§èªç¶ãªäŒè©±ã§å¿çãããã¡ã€ã³åºæã®èšç®ïŒé£äºèª¿æŽãã«ããªãŒè¿œè·¡ïŒãå®è¡ããå¿çãé³å£°ã§è¿ãããšãã§ããé³å£°ãã¡ãŒã¹ãã®AIã¢ã·ã¹ã¿ã³ããå¿ èŠãšããŠããŸãããããã¯ãã¹ãŠãçã«äŒè©±çãªäœéšã®ããã«1ç§æªæºã®é å»¶ã§è¡ããããã®ã§ãã
ãããžã§ã¯ããçžè«ãã
課é¡
å®çšŒåã¬ãã«ã®é³å£°AIã¢ã·ã¹ã¿ã³ããæ§ç¯ããã«ã¯ãç¬èªã®ãªã¢ã«ã¿ã€ã ãšã³ãžãã¢ãªã³ã°äžã®èª²é¡ããããŸããã
- é å»¶ â åŸæ¥ã®speech-to-text â LLM â text-to-speechãã€ãã©ã€ã³ã§ã¯3ã5ç§ã®é å»¶ãçºçããäŒè©±ã®æµããéåããŠããŸãã
- Function Calling â ã¢ã·ã¹ã¿ã³ãã¯ãåãªããã£ãããšããŠã§ã¯ãªããäŒè©±ã®éäžã§ãã¡ã€ã³ããžãã¯ïŒæ é€èšç®ãé£äºãã©ã³èª¿æŽïŒãå®è¡ããå¿ èŠããããŸãã
- Audio Streaming â åæ¹åãªãŒãã£ãªã¯ããããã¡ãªã³ã°ã®éåãããšã³ãŒã®åé¡ãªãã«ç¶ç¶çã«æµããå¿ èŠããããŸãã
- Context Awareness â ã¢ã·ã¹ã¿ã³ãã¯ãå²ã蟌ã¿ãåŠçããªãããäŒè©±ã®ã¿ãŒã³å šäœã§ã³ã³ããã¹ããç¶æããå¿ èŠããããŸãã
- Multi-Language â ãŠãŒã¶ãŒã¯ç°ãªãèšèªã§è©±ããåãèšèªã§ã®å¿çãæåŸ ããŠããŸãã
- Session Isolation â åé³å£°ã»ãã·ã§ã³ã¯ãã¯ãã¹ããŒã¯ãªãã§ç¬ç«ããç¶æ 管çãå¿ èŠãšããŸãã
ç§ãã¡ã®ãœãªã¥ãŒã·ã§ã³
åœç€Ÿã¯ãGoogleã®Gemini Live APIãæèŒãããã€ãã£ããªãŒãã£ãªæ©èœããã¡ã€ã³åºæã®èšç®ã®ããã®ã«ã¹ã¿ã function callingæ©èœãããã³WebSocketããŒã¹ã®audio streamingãåããReactããã³ããšã³ããåãããªã¢ã«ã¿ã€ã é³å£°AIã¢ã·ã¹ã¿ã³ããæ§ç¯ããŸããã
ã¢ãŒããã¯ãã£
- AI Model: ãã€ãã£ããªãŒãã£ãªå ¥åºåãšfunction callingæ©èœãæã€Gemini
- Backend: åæ¹åãªãŒãã£ãªçšã®WebSocketãšã³ããã€ã³ããåããPython/FastAPI
- Audio Pipeline: ãªã¢ã«ã¿ã€ã ã¹ããªãŒãã³ã°ãåãããã€ã¯/ã¹ããŒã«ãŒI/Oçšã®PyAudio
- Frontend: ã»ãã·ã§ã³å¶åŸ¡UIçšã®ViteãšTailwind CSSãåããReact
- Communication: äœé å»¶JSONã¡ãã»ãŒãžã³ã°ããã³ãã€ããªãªãªãŒãã£ãªè»¢éçšã®WebSocket
- Multimodal: èŠèŠçã³ã³ããã¹ãã®ããã®ãªãã·ã§ã³ã®ã«ã¡ã©ããã³ã¹ã¯ãªãŒã³ãã£ããã£
ãªã¢ã«ã¿ã€ã ãªãŒãã£ãªãã€ãã©ã€ã³
åæ¹åã¹ããªãŒãã³ã°
ã·ã¹ãã ã¯äž¡æ¹åã§é£ç¶çãªãªãŒãã£ãªã¹ããªãŒã ãç¶æããŸãã
- Input: 16kHzã¢ãã©ã«ã§ãã£ããã£ãããå°ããªãã¬ãŒã ã«åå²ããããªã¢ã«ã¿ã€ã ã§AIã¢ãã«ã«ã¹ããªãŒãã³ã°ããããã€ã¯ãªãŒãã£ãª
- Output: 24kHzã§åä¿¡ãããããã«ã¹ããŒã«ãŒããåçãããAIçæé³å£°
- No Batching: ãªãŒãã£ãªãã£ã³ã¯ã¯ãã£ããã£ããããšããã«éä¿¡ãããŸããèç©ã«ããé å»¶ã¯ãããŸããã
- Interrupt Handling: ãŠãŒã¶ãŒã¯å¿çäžã«ã¢ã·ã¹ã¿ã³ããèªç¶ã«äžæã§ããŸã
ãªãŒãã£ãªåŠç
- å ¥åºåäž¡æ¹ã§16ãããPCMãã©ãŒããã
- é³å£°çšã«æé©åãããåå¥ã®ãµã³ãã«ã¬ãŒãïŒ16kHzãã£ããã£ã24kHzåçïŒ
- æå°éã®é å»¶ã®ããã®å°ããªãããã¡ãµã€ãº
- ã¿ãŒã³éã®éå§/忢ã®éåãããªãé£ç¶ã¹ããªãŒãã³ã°
Function Callingçµ±å
ä»çµã¿
ãã¡ã€ã³åºæã®èšç®ãå¿ èŠãªå ŽåãAIã¢ãã«ã¯äŒè©±ã®éäžã§ããŒã«ã«Python颿°ãåŒã³åºãããšãã§ããŸãã
- ãŠãŒã¶ãŒããªã¯ãšã¹ãã話ããŸãïŒäŸïŒã仿¥ã©ã³ããé£ã¹æãããïŒ
- AIã¢ãã«ãæå³ãæžãèµ·ãããçè§£ããŸã
- ã¢ãã«ã¯function callãå¿ èŠã§ãããšå€æããæ§é åããããªã¯ãšã¹ããéä¿¡ããŸã
- ããã¯ãšã³ãã¯é¢æ°åãåŒæ°ãåŒã³åºãIDãæœåºããŸã
- ããŒã«ã«é¢æ°ããã¡ã€ã³èšç®ãå®è¡ããŸã
- çµæã¯æ§é åãããå¿çãšããŠã¢ãã«ã«éãè¿ãããŸã
- ã¢ãã«ã¯çµæãçµã¿èŸŒãã èªç¶èšèªã®é³å£°å¿çãçæããŸã
ãã¡ã€ã³æ©èœ
ã·ã¹ãã ã¯ã次ã®ãããªã·ããªãªã§æ é€ã«çŠç¹ãåœãŠãfunction callingããµããŒãããŠããŸãã
- é£äºã®æ¬ é£ â æ¬ é£ããäž»èŠæ é€çŽ ãæ®ãã®é£äºã«åé åããŸã
- äºå®å€ã®é£äº â äºæãã¬æåéãè£ãããã«ä»åŸã®é£äºã調æŽããŸã
- é£äºã®ä»£æ¿ â ãã¯ãæ é€çŽ ã®ç®æšãç¶æããªãã飿ã亀æããŸã
- 掻å远跡 â ã«ããªãŒæ¶è²»éãæšå®ããæ é€ãããã¡ã調æŽããŸã
å颿°ã¯ãé£åããšã®æ é€ãããã¡ã€ã«ãæã€ãã¯ãããŒã¿ããŒã¹ã䜿çšããèªç¶ãªå¿çã®ããã«ããããªç¢ºççå€åã䌎ãåçãªèšç®ãå®è¡ããŸãã
å®è¡ã®å®å šæ§
- éè€ãé²ãããã颿°å®è¡äžã¯ãã€ã¯å ¥åãäžæåæ¢ãããŸã
- å€ãã³ã³ããã¹ããé¿ãããããä¿çäžã®ãªãŒãã£ãªãã¬ãŒã ã¯ç Žæ£ãããŸã
- 颿°å®è¡ã倱æããå Žåã§ãããšã©ãŒå¿çã¯é©åã«éãè¿ãããŸã
- 颿°å®äºåŸããã«éåžžã®ã¹ããªãŒãã³ã°ãåéãããŸã
ããã¯ãšã³ãã¢ãŒããã¯ãã£
FastAPI WebSocketãµãŒããŒ
- ãã¹ãŠã®ã¯ã©ã€ã¢ã³ãéä¿¡ã®ããã®åäžã®WebSocketãšã³ããã€ã³ã
- ã»ãã·ã§ã³ã©ã€ããµã€ã¯ã«ç®¡çïŒéå§ã忢ãping/pongãã«ã¹ãã§ãã¯ïŒ
- ã»ãã·ã§ã³ããã¯ã«ãããäžåºŠã«1ã€ã®ã¢ã¯ãã£ãã»ãã·ã§ã³ã®ã¿
- éçºç°å¢åãã®CORSããã«ãŠã§ã¢
- ç£èŠçšã®ãã«ã¹ãã§ãã¯ãšã³ããã€ã³ã
ã»ãã·ã§ã³ç®¡ç
- ã¯ã©ã€ã¢ã³ãæ¥ç¶æã«ã¢ãŒãéžæïŒãªãŒãã£ãªã®ã¿ãã«ã¡ã©ããŸãã¯ã¹ã¯ãªãŒã³ïŒä»ãã§ã»ãã·ã§ã³ãäœæãããŸã
- ããã¯ã°ã©ãŠã³ãã®asyncã¿ã¹ã¯ããªãŒãã£ãªãã£ããã£ãåŠçãåçã䞊è¡ããŠåŠçããŸã
- ãªãœãŒã¹ã¯ãªãŒã³ã¢ãããäŒŽãæ£åžžãªåæ
- APIããŒã®æ€èšŒãšãšã©ãŒäŒæ
ãã«ãã¢ãŒãã«å ¥åïŒãªãã·ã§ã³ïŒ
é³å£°ä»¥å€ã«ãã·ã¹ãã ã¯ãªãã·ã§ã³ã®èŠèŠçã³ã³ããã¹ãããµããŒãããŠããŸãã
- ã«ã¡ã©ã¢ãŒã â äŒè©±ã«ãããèŠèŠçã³ã³ããã¹ãã®ããã«ãŠã§ãã«ã¡ã©ãã¬ãŒã ïŒ1fpsïŒãã¹ããªãŒãã³ã°ããŸã
- ã¹ã¯ãªãŒã³ã¢ãŒã â ç»é¢äžã®æ å ±ãè°è«ããããã«ã¹ã¯ãªãŒã³ã³ã³ãã³ãããã£ããã£ããŸã
- ç»åã¯éä¿¡åã«ãªãµã€ãºããã³å§çž®ãããŸã
- èŠèŠçã³ã³ããã¹ãã¯ãAIãé¢é£æ§ã®é«ãå¿çãæäŸããèœåãåäžãããŸã
ããã³ããšã³ãã€ã³ã¿ãŒãã§ãŒã¹
- ã»ãã·ã§ã³å¶åŸ¡ â æç¢ºãªã¹ããŒã¿ã¹ã€ã³ãžã±ãŒã¿ä»ãã§ã®èãåãéå§/忢
- ã¹ããŒã¿ã¹è¡šç€º â ãªã¢ã«ã¿ã€ã æ¥ç¶ããã³ã»ãã·ã§ã³ç¶æ ïŒidleãconnectingãactiveãerrorïŒ
- ããŒããµããŒã â æ°žç¶æ§ã®ããã©ã€ã/ããŒã¯ã¢ãŒã
- ã¬ã€ãä»ããŠã©ãŒã¯ã¹ã«ãŒ â åããŠã®ãŠãŒã¶ãŒåãã¹ããããã€ã¹ããããã¢
- WebSocket管ç â èªå忥ç¶ããžãã¯
AIã¢ãã«èšå®
- ãã€ãã£ããªãŒãã£ãªã¢ããªãã£ïŒç¬ç«ããSTT/TTSãã€ãã©ã€ã³ãªãïŒ
- è€æ°ã®ããªã»ããé³å£°ããã®èšå®å¯èœãªé³å£°éžæ
- ã¢ã·ã¹ã¿ã³ãã®ããŒãœããªãã£ãå¿çã¹ã¿ã€ã«ãèšèªåŠçãå®çŸ©ããã·ã¹ãã æç€º
- ãã©ã¡ãŒã¿ã¹ããŒãä»ãã®å©çšå¯èœãªãã¹ãŠã®é¢æ°ã®Toolå®çŸ©
- åãèšèªã§ã®å¿çã䌎ãèªåèšèªæ€åº
äž»èŠæ©èœ
- 1ç§æªæºã®é å»¶ â ãã€ãã£ããªãŒãã£ãªã¢ãã«ãSTT/TTSãã€ãã©ã€ã³ã®ãªãŒããŒããããæé€
- ãªã¢ã«ã¿ã€ã åæ¹åãªãŒãã£ãª â ãã£ã³ã¯ããã50msæªæºã®é å»¶ã§ã®é£ç¶ã¹ããªãŒãã³ã°
- Function Calling â äŒè©±ã®éäžã§å®è¡ããããã¡ã€ã³åºæã®èšç®
- èªç¶ãªå²ã蟌㿠â ãŠãŒã¶ãŒã¯ç¹å¥ãªã³ãã³ããªãã§ã¢ã·ã¹ã¿ã³ããèªç¶ã«äžæã§ããŸã
- å€èšèªå¯Ÿå¿ â åãèšèªã§ã®å¿çã䌎ãèªåèšèªæ€åº
- ãã«ãã¢ãŒãã«å ¥å â èŠèŠççè§£ã®ããã®ãªãã·ã§ã³ã®ã«ã¡ã©ããã³ã¹ã¯ãªãŒã³ã³ã³ããã¹ã
- ã»ãã·ã§ã³ç®¡ç â ããã¯ãšãªãœãŒã¹ã¯ãªãŒã³ã¢ããã䌎ãã»ãã·ã§ã³ã©ã€ããµã€ã¯ã«å¶åŸ¡
- ãã¯ãèšç® â é£åããšã®ãã¯ããããã¡ã€ã«ã«ããåçãªæ é€èª¿æŽ
- ãšã©ãŒå埩 â 颿°é害ããã³ãããã¯ãŒã¯äžæã®é©åãªåŠç
- æ¡åŒµæ§ â ã¹ããŒããšãã³ãã©ãå®çŸ©ããããšã§æ°ããæ©èœã远å ã§ããã¢ãŒããã¯ãã£ã®å€æŽã¯äžèŠã§ã
ææ
æè¡ã¹ã¿ãã¯
caseStudyDetail.more ã±ãŒã¹ã¹ã¿ãã£
ãã®ä»ã®æè¡å®è£ äºäŸãã芧ãã ãã
AIãæŽ»çšããOCRã«ããè«æ±æžåŠçãšQuickBooks飿º
æ¯ææ°çŸä»¶ã®ä»å ¥å è«æ±æžãåŠçããäžèŠæš¡äŒæ¥ããAI/OCRã䜿çšããŠè«æ±æžããŒã¿ãèªåæœåºãããããèšåž³ãšæ¯æè¿œè·¡ã®ããã«QuickBooksã«çŽæ¥åæãããããšã§ãæåããŒã¿å ¥åãæé€ããå¿ èŠããããŸããã
SCTE-35ããŒã«ãŒè§£æãšãã«ããã©ãããã©ãŒã ãã¬ã€ã€ãŒçµ±åã«ããã¯ã©ã€ã¢ã³ããµã€ãåºåæ¿å ¥ (CSAI)
ãããããªã¹ããªãŒãã³ã°ãã©ãããã©ãŒã ã¯ããŠã§ããã¢ãã€ã«ãã³ãã¯ãããTVã¢ããªå šäœã§ã¯ã©ã€ã¢ã³ããµã€ãåºåæ¿å ¥ (CSAI) ãå®è£ ããå¿ èŠããããŸãããããã«ããããµãŒããŒãµã€ãæ¿å ¥ã§ã¯æäŸã§ããªããå®å šãªåºåã€ã³ã¿ã©ã¯ã·ã§ã³ãµããŒãïŒã¯ãªãã¯å¯èœãªãªãŒããŒã¬ã€ãã³ã³ãããªã³ãããŒãã¹ããããã¿ã³ïŒãåãããããŒãœãã©ã€ãºãããããã€ã¹ã¬ãã«ã®åºåäœéšãå¯èœã«ãªããŸãã
ãããã質å
MicrocosmWorksã¯ããŠãŒã¶ãŒã®é³å£°ããªã¢ã«ã¿ã€ã ã®ãã£ã³ã¯ã§ASRãšã³ãžã³ã«ã¹ããªãŒãã³ã°ãããŠãŒã¶ãŒã話ãçµããåã«ã¹ããªãŒãã³ã°æåèµ·ããã䜿çšããŠLLMæšè«ãéå§ããå¿çã®æåã®ããŒã¯ã³ã§ããã¹ãèªã¿äžãåæãéå§ããåæ¹åWebSocketãªãŒãã£ãªãã€ãã©ã€ã³ãèšèšããŸããããã®ãã€ãã©ã€ã³åãããã¢ãããŒãã«ãããçºè©±çµäºããæåã®é³å£°åºåãŸã§ã®å¿çã¬ã€ãã³ã·ã800msæªæºã«æãããŠãŒã¶ãŒã¯ãããèªç¶ãªäŒè©±ã®ããåããšããŠèªèããŸãã
MicrocosmWorksã¯ãLLMãäŒè©±ã®ã³ã³ããã¹ãã«åºã¥ããŠãäºçŽã®åä»ãããŒã¿ããŒã¹ãžã®åãåãããã¯ãŒã¯ãããŒã®ããªã¬ãŒãšãã£ãäºåå®çŸ©ãããAPIsãåŒã³åºãããã®çµæãèªç¶ãªåœ¢ã§çºä¿¡è ã«é³å£°ã§äŒããããšãã§ãããæ§é åããããã¡ã³ã¯ã·ã§ã³ã³ãŒãªã³ã°ãçµ±åããŸããããã®ã·ã¹ãã ã«ã¯ãæ¯æããŸãã¯ãã£ã³ã»ã«ãšãã£ãéèŠãªã¢ã¯ã·ã§ã³ã«å¯Ÿãã確èªãããŒãå«ãŸããŠãããã¢ã·ã¹ã¿ã³ããå£é ã§è©³çްã確èªããå®è¡ããåã«çºä¿¡è ã®æç¢ºãªæ¿èªãåŸ ã¡ãŸãã
ã¯ããMicrocosmWorksã¯ããŒãžã€ã³æ€åºãå®è£ ããŠãããããã«ããçºä¿¡è ã¯ã¢ã·ã¹ã¿ã³ããå¿çäžã«å²ã蟌ãããšãã§ããé³å£°åçãå³åº§ã«åæ¢ããŠæ°ããçºè©±ãåŠçããŸããASRãã€ãã©ã€ã³ã«ã¯ãã€ãºãã£ã³ã»ãªã³ã°ã®ååŠçãå«ãŸããŠããã倿§ãªã¢ã¯ã»ã³ãã«åãããŠåŸ®èª¿æŽãããã¢ãã«ããµããŒãããŠãããããè»å ããªãã£ã¹ãå ¬å ±ã¹ããŒã¹ããã®é»è©±ã«ããããéšãããç°å¢ã«ãããŠã90%ãè¶ ããæåèµ·ãã粟床ãéæããŠããŸãã
MicrocosmWorks ã¯ãSIP trunk 飿ºãš Twilio æ¥ç¶ã§é³å£°ã¢ã·ã¹ã¿ã³ããæ§ç¯ããŸãããããã«ãããçºä¿¡è ãã¢ããªãã€ã³ã¹ããŒã«ããããç¹å¥ãªã€ã³ã¿ãŒãã§ãŒã¹ã䜿çšãããããããšãªããæ¢åã®ããžãã¹é»è©±çªå·ãIVR ã·ã¹ãã ãã³ã³ã¿ã¯ãã»ã³ã¿ãŒãã©ãããã©ãŒã ãžã®ãããã€ããµããŒãããŸãããã©ãããã©ãŒã ã¯ãAI ãäŒè©±ã«äººéã®å°éç¥èãå¿ èŠã§ãããšå€æããå Žåã«ãé話ã«ãŒãã£ã³ã°ããã¥ãŒç®¡çãããã³æäººãšãŒãžã§ã³ããžã®ãŠã©ãŒã ãã©ã³ã¹ãã¡ãŒãåŠçããŸãã
MicrocosmWorksã¯ãæçµŠ30ãã«ãã50ãã«ã®ã¬ãŒãã§ã«ã¹ã¿ã é³å£° AI ã¢ã·ã¹ã¿ã³ããéçºããŠããŸããåææ§ç¯è²»çšã¯ãããŒãžããã©ãããã©ãŒã ã®ã»ããã¢ããè²»çšãäžåããŸãããã«ã¹ã¿ã ãœãªã¥ãŒã·ã§ã³ã¯ãDialogflow CX ã Amazon Lex ã®ãããªãã©ãããã©ãŒã ã課ãååäœã®å©çšæéãåé¿ã§ããããã¯é«ãé話éã§é¡èã«ãªããŸããã«ã¹ã¿ã æ§ç¯ã¯ãŸããLLMãé³å£°ãã«ãœããããã³é¢æ°åŒã³åºãããžãã¯ã«å¯Ÿããå®å šãªå¶åŸ¡ãæäŸããŸããããããŒãžããã©ãããã©ãŒã ã¯ããã峿 Œãªãã€ã¢ãã°ãããŒãã©ãã€ã ã§å¶çŽããŸãã
ããžãã¹ã®å€é©ã®æºåã¯ã§ããŠããŸããïŒ
ã客æ§ã®èª²é¡ã«é¡äŒŒã®ãœãªã¥ãŒã·ã§ã³ãé©çšããæ¹æ³ã«ã€ããŠè©±ãåããŸãããã