Beyond the Last Frame: Process-aware Evaluation for
Generative Video Reasoning

Illustration of outcome-hacking, where the generated video has the correct final state but an incorrect process.

👀 Overview

Current video generation models often suffer from Outcome-hacking: they may generate a video with the correct final outcome but a wrong process. This hacks traditional single-frame evaluation metrics.

VIPER (VIdeo Process Evaluation for Reasoning) is designed to bridge this gap:

🏆 Comprehensive Benchmark: 309 carefully curated samples spanning 6 distinct domains (Temporal, Structural, Symbolic, Spatial, Physics, and Planning).
📏 New Metric (POC@r): Process-Outcome Consistency. We evaluate correctness at both the process and outcome levels by uniformly sampling frames at rate $r$.
🚫 Failure Pattern: We identify and summarize four common failure patterns in current generative video models.

Overview of VIPER. VIPER consists of 16 tasks from 6 domains

📊 Dataset Statistics

VIPER covers diverse reasoning tasks to ensure a holistic evaluation of video generation capabilities.

Domain	Samples	Task Types
Physics	32	experiment, game
Planning	44	navigation, manipulation
Spatial	60	rotate, restore
Structural	70	chess, maze, sudoku
Symbolic	60	math, multimodal
Temporal	43	obj_move, zoom

🚀 Quick Start

Download

from datasets import load_dataset

# Load the full VIPER benchmark
dataset = load_dataset("Monosail/VIPER")

Data Fields

id: Unique identifier for the sample
domain: The reasoning domain (Physics, Planning, Spatial, Structural, Symbolic, Temporal)
task_type: Specific task category within the domain
prompt: Text prompt describing the task
image: The input image
reference_frames: Ground-truth image frames
reference_texts: Ground-truth text descriptions
protocol: Process-level task constraints

🛠️ Evaluation (Coming Soon)

📝 Citation

If you find our benchmark useful for your research, please consider citing:

@article{li2026viper,
  title={Beyond the Last Frame: Process-aware Evaluation for Generative Video Reasoning},
  author={Li, Yifan and Gu, Yukai and Min, Yingqian and Liu, Zikang and Du, Yifan and Zhou, Kun and Yang, Min and Zhao, Wayne Xin and Qiu, Minghui},
  journal={arXiv preprint arXiv:2512.24952},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
assets		assets
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Beyond the Last Frame: Process-aware Evaluation for
Generative Video Reasoning

👀 Overview

📊 Dataset Statistics

🚀 Quick Start

Download

Data Fields

🛠️ Evaluation (Coming Soon)

📝 Citation

About

Uh oh!

Releases

Packages

License

AoiDragon/VIPER

Folders and files

Latest commit

History

Repository files navigation

Beyond the Last Frame: Process-aware Evaluation for Generative Video Reasoning

👀 Overview

📊 Dataset Statistics

🚀 Quick Start

Download

Data Fields

🛠️ Evaluation (Coming Soon)

📝 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Beyond the Last Frame: Process-aware Evaluation for
Generative Video Reasoning

Packages