Loka-1

Loka-1 studies physical reasoning from video: whether a model that watches a short clip can predict what happens next under ordinary physics. Description is not the bottleneck. A frontier model can narrate a scene in fluent detail, yet still fail to anticipate that a tilted glass spills or that an unsupported object falls. We track that gap directly.

The task

Each item presents a short video and asks the model to predict the physical outcome of an event in the scene. Scenes cover everyday dynamics: contact and collision, support and balance, containment, and the effects of force over time. The questions are designed so that surface description is not enough; answering correctly requires a working model of how the depicted world evolves.

Why it matters

The capabilities that follow today's language models are likely to come from systems that build and update an internal model of the world, then use that model to predict and plan. Measuring physical prediction from video isolates that ability from language fluency, which makes it a cleaner signal of progress toward grounded reasoning.

Leaderboard

Results and submission details are tracked on the public leaderboard.

View the Physical Reasoning leaderboard