Illustration of outcome-hacking, where the generated video has the correct final state but an incorrect process.
Current video generation models often suffer from Outcome-hacking: they may generate a video with the correct final outcome but a wrong process. This hacks traditional single-frame evaluation metrics.
VIPER (VIdeo Process Evaluation for Reasoning) is designed to bridge this gap:
- π Comprehensive Benchmark: 309 carefully curated samples spanning 6 distinct domains (Temporal, Structural, Symbolic, Spatial, Physics, and Planning).
-
π New Metric (POC@r): Process-Outcome Consistency. We evaluate correctness at both the process and outcome levels by uniformly sampling frames at rate
$r$ . - π« Failure Pattern: We identify and summarize four common failure patterns in current generative video models.
VIPER covers diverse reasoning tasks to ensure a holistic evaluation of video generation capabilities.
| Domain | Samples | Task Types |
|---|---|---|
| Physics | 32 | experiment, game |
| Planning | 44 | navigation, manipulation |
| Spatial | 60 | rotate, restore |
| Structural | 70 | chess, maze, sudoku |
| Symbolic | 60 | math, multimodal |
| Temporal | 43 | obj_move, zoom |
from datasets import load_dataset
# Load the full VIPER benchmark
dataset = load_dataset("Monosail/VIPER")id: Unique identifier for the sampledomain: The reasoning domain (Physics, Planning, Spatial, Structural, Symbolic, Temporal)task_type: Specific task category within the domainprompt: Text prompt describing the taskimage: The input imagereference_frames: Ground-truth image framesreference_texts: Ground-truth text descriptionsprotocol: Process-level task constraints
If you find our benchmark useful for your research, please consider citing:
@article{li2026viper,
title={Beyond the Last Frame: Process-aware Evaluation for Generative Video Reasoning},
author={Li, Yifan and Gu, Yukai and Min, Yingqian and Liu, Zikang and Du, Yifan and Zhou, Kun and Yang, Min and Zhao, Wayne Xin and Qiu, Minghui},
journal={arXiv preprint arXiv:2512.24952},
year={2025}
}