evalita_llm_leaderboard

Running

App Files Files Community

rzanoli commited on 10 days ago

Commit

d789006

1 Parent(s): a91626f

Revise the measurement description for MAIA

Browse files

Files changed (1) hide show

src_maia/tasks.py +3 -3

src_maia/tasks.py CHANGED Viewed

@@ -143,7 +143,7 @@ SU_DESCRIPTION = """### Summarization (SUM) --- *Generative task*
 | 7   | Riassumi il seguente articolo di giornale: '{{source}}'\\nRiassunto:             |
 | 8   | Devi risolvere un compito di sintesi automatica del testo. Riassumi il seguente articolo di giornale: '{{source}}'\\nRiassunto: |
-<small>**Combined Performance** = (1 - (**Best Prompt** - **Prompt Average**) / 100) * **Best Prompt**. **Prompt Average** = F1 averaged over the 2 prompts. **Best Prompt** = F1 of the best prompt. **Prompt ID** = ID of the best prompt (see legend above). </small>
 """
@@ -184,7 +184,7 @@ MAIA_MC_DESCRIPTION = """### Multimodal AI Assessment (MAIA) — *Multiple-Choic
 | 5   | Dato il video, scegli la descrizione corretta:\\nA. {{A}}\\nB. {{B}}\\nRispondi solo A o B. '{{video}}' |
 | 6   | Devi risolvere un compito di domande su video. Dato il video, scegli la descrizione corretta:\\nA. {{A}}\\nB. {{B}}\\nRispondi solo A o B. '{{video}}' |
-<small>**Combined Performance** = (1 - (**Best Prompt** - **Prompt Average**) / 100) * **Best Prompt**. **Prompt Average** = F1 averaged over the 2 prompts. **Best Prompt** = F1 of the best prompt. **Prompt ID** = ID of the best prompt (see legend above). </small>
 """
@@ -196,7 +196,7 @@ MAIA_GEN_DESCRIPTION = """### Multimodal AI Assessment (MAIA) — *Generative Ta
 | 7   | '{{video}}' Rispondi alla seguente domanda con una singola frase.\\n'{{text}}' |
 | 8   | '{{video}}' Devi svolgere un compito di Visual Question Answering. Rispondi alla seguente domanda con una singola frase.\\n'{{text}}' |
-<small>**Combined Performance** = (1 - (**Best Prompt** - **Prompt Average**) / 100) * **Best Prompt**. **Prompt Average** = F1 averaged over the 2 prompts. **Best Prompt** = F1 of the best prompt. **Prompt ID** = ID of the best prompt (see legend above). </small>
 """

 | 7   | Riassumi il seguente articolo di giornale: '{{source}}'\\nRiassunto:             |
 | 8   | Devi risolvere un compito di sintesi automatica del testo. Riassumi il seguente articolo di giornale: '{{source}}'\\nRiassunto: |
+<small>**Combined Performance** = (1 - (**Best Prompt** - **Prompt Average**) / 100) * **Best Prompt**. **Prompt Average** = Rouge averaged over the 2 prompts. **Best Prompt** = Rouge of the best prompt. **Prompt ID** = ID of the best prompt (see legend above). </small>
 """
 | 5   | Dato il video, scegli la descrizione corretta:\\nA. {{A}}\\nB. {{B}}\\nRispondi solo A o B. '{{video}}' |
 | 6   | Devi risolvere un compito di domande su video. Dato il video, scegli la descrizione corretta:\\nA. {{A}}\\nB. {{B}}\\nRispondi solo A o B. '{{video}}' |
+<small>**Combined Performance** = (1 - (**Best Prompt** - **Prompt Average**) / 100) * **Best Prompt**. **Prompt Average** Accuracy averaged over the 2 prompts. **Best Prompt** = Accuracy of the best prompt. **Prompt ID** = ID of the best prompt (see legend above). </small>
 """
 | 7   | '{{video}}' Rispondi alla seguente domanda con una singola frase.\\n'{{text}}' |
 | 8   | '{{video}}' Devi svolgere un compito di Visual Question Answering. Rispondi alla seguente domanda con una singola frase.\\n'{{text}}' |
+<small>**Combined Performance** = (1 - (**Best Prompt** - **Prompt Average**) / 100) * **Best Prompt**. **Prompt Average** = Rouge-1 averaged over the 2 prompts. **Best Prompt** = Rouge-1 of the best prompt. **Prompt ID** = ID of the best prompt (see legend above). </small>
 """