Physics-aware Video Instance Removal Please download the dataset from GIVE-Challenge-Dataset

This challenge is jointly organized by Texas A&M University, Visko Platform, and Abaka AI.

📅 Important Dates

2026.02.20 Release of Validation Data (Video + prompt + mask); validation submission opens.
2026.03.25 Submission deadline.
2026.04.03 Technical report deadline for Innovation Award eligibility.
2026.04.06 Competition results released to participants.

🔍 Challenge Overview

The 1st Workshop on Video Generative Models: Benchmarks and Evaluation (VGBE) will be held in June 2026 in conjunction with CVPR 2026.

Recent advances in video generative models, such as Sora, Veo, and Wan, have demonstrated an unprecedented ability to generate high-fidelity content. However, moving toward practical workflows requires pushing video editing beyond simple object deletion. In real-world scenarios, removing an object is complex because objects interact dynamically with their surroundings. A truly realistic removal requires modeling and regenerating these environmental interactions—such as shadows, reflections, water ripples, or secondary motion propagation—to maintain physical plausibility.

This challenge focuses on physics-aware video restoration. Unlike traditional inpainting, participants must ensure that the "void" left by a removed object is filled with content that is not only visually consistent but also physically coherent with the rest of the scene. This requires a deep semantic understanding of how objects influence their environment through lighting, physics, and geometry.

Hosting this challenge accelerates the development of models capable of sophisticated, physically-grounded video manipulation. It provides a standardized benchmark to evaluate how effectively these systems can restore complex environments while maintaining perfect temporal stability.

The top-ranked participants will be awarded and invited to describe their solution to the associated VGBE workshop at CVPR 2026. The results of the challenge will be published in the VGBE 2026 workshop (CVPR Proceedings).

📋 Task Definition

Task: Physics-aware Video Instance Removal

Given an Input Video, a Text Prompt describing the object, and a Segmentation Mask, the model must generate a video that:

  • Physically Aware: Realistically reflects physical changes (shadows, ripples, etc.) caused by the object's removal.
  • Temporally Coherent: Maintains stability and visual realism across all frames without flickering.
  • Exclusive in Editing: Preserves all unrelated regions of the video perfectly.

Output Specifications

To ensure fairness and standardized evaluation, all submissions must adhere to the following technical constraints:

  • Frames: The generated video sequence must have strictly the same number of frames as the original video.
  • Resolution:
    • Minimum: 480p (e.g., $854 \times 480$).
    • Recommended: 720p (e.g., $1280 \times 720$) or higher.
  • Aspect Ratio: The output video must preserve the aspect ratio of the input video. Cropping or distorting the input aspect ratio will result in significant score deductions.

Recommended Baselines / Architectures

We encourage participants to explore or build upon recent efficient architectures, such as:

  1. DiffuEraser: A Diffusion Model for Video Inpainting
  2. ROSE: Remove Objects with Side Effects in Videos
  3. Any closed-source or open-source model / pipeline is welcome.

📊 Evaluation

The evaluation process consists of two primary components:

  1. Automated Evaluation (VBench): We utilize VBench to provide an objective assessment of video quality and perceptual fidelity.
  2. Human Evaluation: A panel of experts will score each entry across four key dimensions:
    • Physical Awareness (55%): Realism of physical restoration (shadows, reflections, etc.).
    • Instruction Following (15%): Is the correct object removed as specified?
    • Rendering Quality (15%): Is the video visually and temporally coherent?
    • Exclusivity of Edit (15%): Are unrelated regions preserved without artifacts?

Human Evaluation Score: Calculated as the weighted sum of the four dimensions above.

Final Score Calculation

To balance objective performance with human-centric quality, the final ranking is determined by:

$$\text{Final Score} = 0.2 \times \text{VBench Score} + 0.8 \times \text{Human Evaluation Score}$$

🏆 Awards

We have established a total prize pool of $1,000 USD:

🏆

Highest Score Award (Champion)

$500 USD

+ Award Certificate

🌟

Innovation Award

$500 USD

+ Award Certificate

Recognizes technically novel or methodologically inspiring contributions. A technical report is required.

📧 Issues & Contact

  • Technical Discussions: Please utilize the community forum on the official challenge page.
  • Inquiries: Contact the organizing committee at tcve-cvpr-2026@googlegroups.com.