how about those multimodal Benchmark like VideoBench ?

by LukeAlan - opened Jun 27

Discussion

LukeAlan

Jun 27

as a omni like model, those benchmarks performance is important.

lkv

Google org Sep 30

Hi ,

Sorry for the late response, That's a great point. For a model with "omni-like , That's an insightful question. As a multimodal, or "omni-like," mode, the performance of Gemma 3n E4B-it is evaluated across a complex set of benchmarks that measure its ability to reason over text, images, and audio.

Benchmarks like VQA-v2 and OK-VQA are critical, as they test the model's ability to look at an image and answer a question about it. Since the Gemma 3n models support audio, benchmarks that test the model's ability to understand spoken language in context with an image or video are essential.

Thank you.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment