挑战
将长篇内容重新制作成短视频是一个手动且耗时的过程:
- 从数小时的素材中识别最吸引人的片段需要手动审查
- 字幕样式因平台和受众而异,需要专业的编辑技能
- 缺乏针对多人内容的自动活跃发言人检测功能
- 跨多个平台分发需要单独上传和格式化
我们的解决方案
我们构建了一个全栈的 AI 驱动视频创作平台,可以大规模地自动剪辑、添加字幕和分发短视频内容。
架构
- 前端: React 18 + Vite + TypeScript (使用 Chakra UI 和 Tailwind CSS)
- 后端: Node.js/Express (使用 MongoDB 和 Redis)
- 视频渲染: FFmpeg (使用 Advanced SubStation Alpha (ASS) 字幕)
- 发言人检测: Python/Flask (使用 TalkNet、YOLO 人脸检测、Whisper 转录)
- YouTube 下载器: Node.js (使用 yt-dlp 和 Mullvad VPN 进行 IP 轮换)
- AI/LLM: Claude 3 (主用), Gemini 2.0 Flash, GPT-4o (备用链)
- 基础设施: 混合部署(本地 + Azure 云),使用 Cloudflare R2/CDN
AI 工作流程
- 内容摄取 - YouTube URL 或文件上传
- AI 剪辑 - LLM 驱动的引人入胜片段识别
- 转录 - OpenAI Whisper (带有词级时间戳)
- 发言人检测 - TalkNet 音视频融合 (适用于多人内容)
- 字幕样式 - 14+ 动画样式 (MrBeast, Hormozi, Ali Abdaal, Karaoke 等)
- 渲染 - FFmpeg (带有 ASS 字幕渲染和批处理)
- 分发 - 直接上传到 YouTube, TikTok 和 Instagram
主要功能
- AI 剪辑检测 - 自动找到最有可能走红的片段
- 14+ 字幕样式 - 针对不同平台优化的专业模板
- 活跃发言人检测 - 识别多人视频中谁在讲话
- 多平台发布 - 安排并发布到 YouTube, TikTok, Instagram
- 模板系统 - 预设模板 (Baby Podcast, App Explainer, Supplement Doctor)
- 基于积分的计费 - 集成 Stripe (支持分级订阅)
成果
技术栈
常见问题
MicrocosmWorks trained the generation model on a dataset of viral short-form content to learn structural patterns like hook timing (first 1.5 seconds), pacing cadence, and text overlay placement that correlate with high engagement. The platform generates multiple variants per brief and scores them using a predicted engagement model before presenting the top options.
Yes, MicrocosmWorks built an automated content pipeline that accepts a text brief, product URL, or blog post and extracts key messaging, generates a storyboard, selects or creates visuals, applies motion graphics, and adds a voiceover. The end-to-end generation takes approximately 3-5 minutes per 30-second video with no manual editing required.
MicrocosmWorks implemented a brand kit system where clients upload their logos, fonts, color palettes, and approved stock asset libraries. Every generated video is constrained to these brand guidelines, and the text-to-speech voice can be cloned from a 30-second sample to maintain consistent audio branding across all content.
MicrocosmWorks integrated multilingual support covering 25 languages with native text-to-speech voices and automatic subtitle generation. The platform also adapts content pacing and text density for different markets, since Asian social media audiences often prefer faster cuts and denser text overlays compared to Western audiences.
MicrocosmWorks builds AI content creation platforms at rates of $25-$50/hr, with a full short-form video generation system including the storyboard AI, rendering engine, and brand kit management typically requiring 600-900 development hours. Ongoing AI model hosting costs range from $2,000-$8,000/month depending on generation volume.
