TimeViper: A Hybrid Mamba-Transformer Vision-Language Model for Efficient Long Video Understanding Paper • 2511.16595 • Published 18 days ago • 9 • 2
REVISOR: Beyond Textual Reflection, Towards Multimodal Introspective Reasoning in Long-Form Video Understanding Paper • 2511.13026 • Published 21 days ago • 25 • 2