Genie 3 is a general purpose world model that can generate an unprecedented diversity of interactive environments in real-time at 24 frames per second. Unlike previous versions, Genie 3 is the first world model to allow interaction in real-time, while also improving consistency and realism compared to Genie 2. The system works by taking text prompts and generating dynamic worlds that users can navigate and interact with at 720p resolution.
The key technical advancement that sets Genie 3 apart is its ability to handle auto-regressive frame generation while maintaining real-time responsiveness. During the auto-regressive generation of each frame, the model has to take into account the previously generated trajectory that grows with time. This means when you revisit a location after exploring elsewhere, the model can recall and reconstruct that environment based on information from earlier in the session. Previous versions like Genie 1 and Genie 2 could generate environments for agents but lacked this real-time interactive capability that makes exploration feel natural and responsive.
Another major difference is the introduction of “promptable world events,” which allows users to modify the generated world through text commands. You can change weather conditions, introduce new objects or characters, and create “what if” scenarios that weren’t possible with earlier versions. This feature significantly expands the creative and experimental possibilities beyond simple navigation, making Genie 3 more versatile for both research applications and creative exploration. The system maintains visual consistency throughout these changes, creating a more immersive and believable interactive experience.