how about those multimodal Benchmark like VideoBench ?
#9
by
LukeAlan
- opened
as a omni like model, those benchmarks performance is important.
Hi ,
Sorry for the late response, That's a great point. For a model with "omni-like , That's an insightful question. As a multimodal, or "omni-like," mode, the performance of Gemma 3n E4B-it is evaluated across a complex set of benchmarks that measure its ability to reason over text, images, and audio.
Benchmarks like VQA-v2 and OK-VQA are critical, as they test the model's ability to look at an image and answer a question about it. Since the Gemma 3n models support audio, benchmarks that test the model's ability to understand spoken language in context with an image or video are essential.
Thank you.