Minecraft LM-Arena Baseline
Left is the reference .mp4; right is the paired WanGame _wangame.mp4 output. Players are independent. The artifact button uses a manual timestamp fallback because plain Gradio does not reliably expose live currentTime from both video widgets.
Read-only mode: annotation writes are disabled. Use this for public demo review until the final eval schema is settled.
Sample 1 / 60
1_wasd_only/01
Scenario: 1_wasd_only | Case ID: 01 | Pairing: left=data_subset/1_wasd_only/01.mp4 | right=data_subset/1_wasd_only/01_wangame.mp4 | Action file: data_subset/1_wasd_only/01_action.npy | Preview still: data_subset/1_wasd_only/01.jpg | Inferred control regime: keyboard-only | Resolution: 640x352 | FPS: 25.00 | Duration: 3.08s
Action summary: 77 action steps | 25.00 FPS | ~3.08s Inferred control mode: keyboard-only Keys used: W Mouse values: pitch=[+0.0] | yaw=[+0.0]
Timeline
0.00s-3.08s: W
Artifact flagging fallback: enter the paused player time in seconds, then record it.
No artifact flags recorded yet. Pause a player, read the native timestamp, type it below, and click Flag artifact.
Status: Loaded 1_wasd_only/01. Save an annotation, then move to the next sample.