Twelve Labs
Website:
SIGGRAPH 2024
Twelve Labs provides the most powerful video foundation models for developers and enterprises. Twelve Labs’ multimodal foundation models create powerful vector embeddings that enable a variety of use cases across industries. Our Marengo model understands video natively and is able to identify and interpret movements, actions, objects, individuals, sounds, and spoken words just like humans, enabling high-precision semantic search. Our Pegasus model is able to provide state of the art video-to-text generation enabling a variety of use cases across industries. Built by developers, for developers, our easy to use APIs make it easy to integrate our models into your existing applications and workflows.