Odyssey AI Pioneers Interactive Video Worlds: A Glimpse into the Future of Entertainment

Photos provided by Pexels

London-based AI lab Odyssey is pushing the boundaries of interactive media with a new AI model capable of transforming standard video into dynamic, responsive environments. This technology offers a tantalizing glimpse into a future where viewers can directly influence the video world in real-time through keyboard, phone, controller, and even voice input.

Dubbed an “early version of the Holodeck” by Odyssey, the AI model generates realistic video frames at a rate of 25 frames per second (40 milliseconds per frame), delivering near-instantaneous feedback to user actions. Early testers describe the experience as a “glitchy dream—raw, unstable, but undeniably new,” hinting at both the current limitations and the immense potential of the technology.

Odyssey’s innovation lies in its “world model” approach, which differs from traditional video generation techniques. Instead of creating entire video clips at once, the AI predicts and generates frames sequentially based on user inputs and the current state of the environment. This method mirrors the way large language models predict words, bringing a new level of interactivity to video content.

To ensure stability and realism, Odyssey employs a two-stage training process: pre-training on diverse video data followed by fine-tuning on specific environments. While the current infrastructure requires significant computational resources, costing between £0.80 and £1.60 per user-hour utilizing H100 GPUs, Odyssey anticipates substantial cost reductions as their models become more efficient.

Odyssey envisions transformative applications across entertainment, education, and advertising. Imagine interactive training videos where users can practice skills in a simulated environment, or virtual travel experiences that allow users to explore destinations from the comfort of their homes. The research preview is available for testing [here](link).