3 26 1

Xiaohan Wang

nicholswang

https://wxh1996.github.io/

XiaohanWang96

AI & ML interests

Video Understanding, Vision-Language Models

Recent Activity

upvoted a paper 22 days ago

UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist

authored a paper about 2 months ago

Closing the Modality Gap for Mixed Modality Search

authored a paper about 2 months ago

SciVideoBench: Benchmarking Scientific Video Reasoning in Large Multimodal Models

View all activity

Organizations

upvoted a paper 22 days ago

UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist

Paper • 2511.08521 • Published 25 days ago • 37

authored 3 papers about 2 months ago

Closing the Modality Gap for Mixed Modality Search

Paper • 2507.19054 • Published Jul 25

SciVideoBench: Benchmarking Scientific Video Reasoning in Large Multimodal Models

Paper • 2510.08559 • Published Oct 9 • 8

FineVision: Open Data Is All You Need

Paper • 2510.17269 • Published Oct 20 • 67

upvoted 2 papers about 2 months ago

FineVision: Open Data Is All You Need

Paper • 2510.17269 • Published Oct 20 • 67

SciVideoBench: Benchmarking Scientific Video Reasoning in Large Multimodal Models

Paper • 2510.08559 • Published Oct 9 • 8

published an article 5 months ago

Article

TimeScope: How Long Can Your Video Large Multimodal Model Go?

Jul 23

•

published a dataset 6 months ago

nicholswang/TimeLens

Updated Jun 19 • 13

upvoted 2 papers 8 months ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7 • 200

CodeARC: Benchmarking Reasoning Capabilities of LLM Agents for Inductive Program Synthesis

Paper • 2503.23145 • Published Mar 29 • 35

liked a dataset 8 months ago

HuggingFaceFV/finevideo

Viewer • Updated Dec 16, 2024 • 39.5k • 12k • 331

upvoted a paper 9 months ago

MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research

Paper • 2503.13399 • Published Mar 17 • 22

authored a paper 9 months ago

Video Action Differencing

Paper • 2503.07860 • Published Mar 10 • 33

upvoted a paper 9 months ago

Video Action Differencing

Paper • 2503.07860 • Published Mar 10 • 33

upvoted a paper 11 months ago

Temporal Preference Optimization for Long-Form Video Understanding

Paper • 2501.13919 • Published Jan 23 • 23

authored 2 papers 11 months ago

Temporal Preference Optimization for Long-Form Video Understanding

Paper • 2501.13919 • Published Jan 23 • 23

BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature

Paper • 2501.07171 • Published Jan 13 • 55

upvoted 2 papers 11 months ago

BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature

Paper • 2501.07171 • Published Jan 13 • 55

Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation

Paper • 2501.03225 • Published Jan 6 • 7

upvoted a paper 12 months ago

Feather the Throttle: Revisiting Visual Token Pruning for Vision-Language Model Acceleration

Paper • 2412.13180 • Published Dec 17, 2024 • 13

Xiaohan Wang

AI & ML interests

Recent Activity

Organizations

nicholswang's activity

TimeScope: How Long Can Your Video Large Multimodal Model Go?