Research
Loka-1
Keon Kim and Krish Chelikavada
Loka-1 studies physical reasoning from video: whether a model that watches a short clip can predict what happens next under ordinary physics. Description is not the bottleneck. A frontier model can narrate a scene in fluent detail, yet still fail to anticipate that a tilted glass spills or that an unsupported object falls. We track that gap directly.
The task
Each item presents a short video and asks the model to predict the physical outcome of an event in the scene. Scenes cover everyday dynamics: contact and collision, support and balance, containment, and the effects of force over time. The questions are designed so that surface description is not enough; answering correctly requires a working model of how the depicted world evolves.
Why it matters
The capabilities that follow today's language models are likely to come from systems that build and update an internal model of the world, then use that model to predict and plan. Measuring physical prediction from video isolates that ability from language fluency, which makes it a cleaner signal of progress toward grounded reasoning.
Leaderboard
Results and submission details are tracked on the public leaderboard.
Citation
Please cite this work as:
Kim, Keon and Chelikavada, Krish, "Loka-1", Om Labs, Jun 2026.
Or use the BibTeX citation:
@article{kim2026loka1,
author = {Keon Kim and Krish Chelikavada},
title = {Loka-1},
journal = {Om Labs},
year = {2026},
note = {https://omlabs.xyz/research/loka-1},
}