how about those multimodal Benchmark like VideoBench ?

#9
by LukeAlan - opened

as a omni like model, those benchmarks performance is important.

Google org

Hi ,

Sorry for the late response, That's a great point. For a model with "omni-like , That's an insightful question. As a multimodal, or "omni-like," mode, the performance of Gemma 3n E4B-it is evaluated across a complex set of benchmarks that measure its ability to reason over text, images, and audio.

Benchmarks like VQA-v2 and OK-VQA are critical, as they test the model's ability to look at an image and answer a question about it. Since the Gemma 3n models support audio, benchmarks that test the model's ability to understand spoken language in context with an image or video are essential.

Thank you.

Sign up or log in to comment