挑战
现有视频标注工具要么过于依赖 GUI 且没有编程 API,要么是可视化效果差的命令行工具:
- ML 团队需要用于大规模训练数据的边界框、多边形和标签
- 教育工作者需要用于教学视频的动画叠加层(箭头、聚光灯、文本)
- 传统标注工具无法处理关键帧插值或缓动动画
- 没有桌面原生解决方案能将 OpenCV 处理与专业视频输出相结合
我们的解决方案
我们构建了一个基于 React/Remotion 的视频标注框架,该框架具有类型安全的标注系统、关键帧插值功能和一个 Tauri 桌面编辑器。
架构
- 视频引擎:Remotion 4.0,用于逐帧编程渲染
- 前端:React 18 + TypeScript,结合 Vite
- 桌面应用:Tauri 2,结合 OpenCV.js 和 ONNX Runtime
- 导出:FFmpeg,用于高质量视频输出
标注类型
- 边界框 - 带有标签和置信度分数的矩形区域
- 圆形 - 具有可配置半径的点标注
- 多边形 - 用于不规则形状的复杂区域轮廓
- 文本标签 - 带有定位的样式化文本叠加层
- 箭头 - 用于指示流向或注意力的方向性指示器
- 手绘路径 - 自定义绘制的标注
- 聚光灯 - 背景变暗的突出区域
动画系统
- 关键帧插值 - 标注状态之间的平滑过渡
- 缓动函数 - Spring, ease-in-out, bounce 和自定义曲线
- 场景合成 - 介绍、标注层、组合时间轴、结束
- 淡入淡出效果 - 具有可配置持续时间的淡入/淡出
主要功能
- 类型安全 API - 所有标注原语的全面 TypeScript 类型
- 场景系统 - 从场景构建块合成复杂视频
- 关键帧动画 - 随时间动画化任何标注属性
- 桌面编辑器 - 基于 Tauri 的 GUI,具有实时预览功能
- 批量导出 - 通过 FFmpeg 渲染标注视频
- OpenCV 集成 - 桌面应用中的计算机视觉处理
成果
技术栈
常见问题
MicrocosmWorks built this framework for teams that need to generate annotations at scale using code-driven rules rather than human clicking. It supports writing annotation pipelines as Python scripts that apply pre-trained detectors, temporal logic, and spatial rules to automatically generate training data, then exports in COCO, Pascal VOC, or YOLO formats.
Yes, MicrocosmWorks implemented a temporal annotation model that supports frame ranges, keyframe interpolation, and event-based labels with start/end timestamps. Annotators can define temporal rules like 'label as running when pose estimation detects both feet off ground for more than 3 consecutive frames' to automate action labeling.
MicrocosmWorks built a validation pipeline that computes agreement scores between programmatic annotations and a human-reviewed golden set, flagging any annotations that fall below a configurable IoU or temporal overlap threshold. The framework also supports active learning workflows that route low-confidence annotations to human reviewers.
MicrocosmWorks built the framework on top of FFmpeg and OpenCV, supporting all major container formats including MP4, MKV, AVI, and MOV, with codecs from H.264 to ProRes. The framework processes videos at their native resolution but supports configurable downscaling for the annotation pass to accelerate throughput on large datasets.
MicrocosmWorks delivers ML infrastructure projects at rates of $25-$45/hr, with a programmatic video annotation framework including the rule engine, format exporters, and quality validation pipeline typically requiring 300-500 development hours. The framework pays for itself quickly by reducing manual annotation costs that can run $5-$15 per minute of video.
