TempSamp-R1: Effective Temporal Sampling with Reinforcement Fine-Tuning for Video LLMs Paper • 2509.18056 • Published Sep 22 • 27
Cascade-CLIP: Cascaded Vision-Language Embeddings Alignment for Zero-Shot Semantic Segmentation Paper • 2406.00670 • Published Jun 2, 2024
Unbiased Region-Language Alignment for Open-Vocabulary Dense Prediction Paper • 2412.06244 • Published Dec 9, 2024
A Glimpse to Compress: Dynamic Visual Token Pruning for Large Vision-Language Models Paper • 2508.01548 • Published Aug 3 • 13
Revisiting Efficient Semantic Segmentation: Learning Offsets for Better Spatial and Class Feature Alignment Paper • 2508.08811 • Published Aug 12 • 2