Unified Video Editing with Temporal Reasoner

๐Ÿ‘๏ธ See โ†’ ๐Ÿง  Reason โ†’ โœ๏ธ Edit

๐Ÿš€ A Chain of Frames editing method enbale temporal reasoning and 4x video length generalization with just 50k training pairs!

Daily Paper arXiv Project Page GitHub

Xiangpeng Yang1, Ji Xie2, Yiyuan Yang1, Yan Huang1, Min Xu1, Qiang Wu1
1University of Technology Sydney, 2Zhejiang University

VideoCoF: Unified Video Editing with Temporal Reasoner

VideoCoF is a unified video editing model that bridges the gap between expert models (precise but restricted) and unified in-context models (flexible but spatially inaccurate). By introducing a "See โ†’ Reason โ†’ Edit", a Chain-of-Frames paradigm, VideoCoF predicts reasoning tokens before generating the target video tokens, thereby removing the need for user-provided masks while achieving precise instruction to-region alignment.

Video Demo
Click the image above to watch the full video on YouTube ๐ŸŽฌ

๐ŸŒŸ Key Capabilities

  1. Temporal Reasoning: Adopts a unique approach where the model first identifies where and how to edit (Reasoning) before predicting the target video tokens.
  2. Data Efficiency: Achieves SOTA performance with only 50k training pairs (33 frames each).
  3. Length Extrapolation: Demonstrates robust multi-shot editing and can generalize to videos 4ร— longer than training samples.
  4. Versatile Editing: Supports:
    • Object Removal
    • Object Addition
    • Object Swap
    • Local Style Transfer

๐Ÿ”ง Quick Start

To use these weights, please refer to the official GitHub Repository for inference code and environment setup.

Installation

git clone https://github.com/knightyxp/VideoCoF
cd VideoCoF

# 1. Create and activate a conda environment
conda create -n videocof python=3.10
conda activate videocof

# 2. Install PyTorch (Choose version compatible with your CUDA)
# For standard GPUs (CUDA 12.1):
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121

# For Hopper GPUs (e.g., H100/H800) requiring fast inference:
# pip install torch==2.8.0 torchvision==0.23.0 torchaudio==2.8.0 --index-url https://download.pytorch.org/whl/cu128

# 3. Install other dependencies
pip install -r requirements.txt

Note on Flash Attention: We recommend using FlashAttention-3 (currently beta) for optimal performance, especially on NVIDIA H100/H800 GPUs. If you are using these GPUs, please follow the official FlashAttention-3 installation guide after installing the compatible PyTorch version (e.g., PyTorch 2.8 + CUDA 12.8).

Download Models

  • Wan-2.1-T2V-14B Pretrained Weights:

    git lfs install
    git clone https://huggingface.co/Wan-AI/Wan2.1-T2V-14B
    
    # Or using huggingface-cli:
    # hf download Wan-AI/Wan2.1-T2V-14B --local-dir Wan2.1-T2V-14B
    
  • VideoCoF Checkpoint:

    git lfs install
    git clone https://huggingface.co/XiangpengYang/VideoCoF videocof_weight
    
    # Or using huggingface-cli:
    # hf download XiangpengYang/VideoCoF --local-dir videocof_weight
    

Inference

export CUDA_VISIBLE_DEVICES=0
torchrun --nproc_per_node=1 inference.py \
  --video_path assets/two_man.mp4 \
  --prompt "Remove the young man with short black hair wearing black shirt on the left." \
  --output_dir results/obj_rem \
  --model_name /scratch3/yan204/models/Wan2.1-T2V-14B \
  --seed 0 \
  --num_frames 33 \
  --source_frames 33 \
  --reasoning_frames 4 \
  --repeat_rope \
  --videocof_path videocof_weight/videocof.safetensors

For parallel inference:

sh scripts/parallel_infer.sh

๐Ÿ™ Acknowledgments

We thank the authors of related works and the open-source community VideoX-Fun and Wan for their contributions.

๐Ÿ“œ License

This project is licensed under the Apache License 2.0.

๐Ÿ“ฎ Contact

For any questions, please feel free to reach out to the author Xiangpeng Yang @knightyxp, email: [email protected]/[email protected]

๐Ÿ“„ Citation

If you find this work useful for your research, please consider citing:

@article{yang2025videocof,
  title={Unified Video Editing with Temporal Reasoner},
  author={Yang, Xiangpeng and Xie, Ji and Yang, Yiyuan and Huang, Yan and Xu, Min and Wu, Qiang},
  journal={arXiv preprint arXiv:2512.07469},
  year={2025}
}
โค๏ธ **If you find this project helpful, please consider giving it a like!** โค๏ธ
Downloads last month
13
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support