Update README.md
Browse files
README.md
CHANGED
|
@@ -132,6 +132,87 @@ You can also try seeking support from a therapist or counselor if you are strugg
|
|
| 132 |
|
| 133 |
This model was trained with SFT.
|
| 134 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 135 |
### Framework versions
|
| 136 |
|
| 137 |
- PEFT 0.17.1
|
|
@@ -142,8 +223,6 @@ This model was trained with SFT.
|
|
| 142 |
- Tokenizers: 0.22.1
|
| 143 |
|
| 144 |
## Citations
|
| 145 |
-
|
| 146 |
-
Cite TRL as:
|
| 147 |
|
| 148 |
```bibtex
|
| 149 |
@misc{vonwerra2022trl,
|
|
@@ -154,4 +233,14 @@ Cite TRL as:
|
|
| 154 |
publisher = {GitHub},
|
| 155 |
howpublished = {\url{https://github.com/huggingface/trl}}
|
| 156 |
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 157 |
```
|
|
|
|
| 132 |
|
| 133 |
This model was trained with SFT.
|
| 134 |
|
| 135 |
+
## Evaluation
|
| 136 |
+
|
| 137 |
+
This model has been loaded in 4-bit and evaluated with [lighteval](https://github.com/huggingface/lighteval)
|
| 138 |
+
|
| 139 |
+
|
| 140 |
+
|
| 141 |
+
| Task |Version| Metric |Value | |Stderr|
|
| 142 |
+
|------------------------------------------------------|-------|----------------------------------------------------------------------------------------------------------------------------|-----:|---|-----:|
|
| 143 |
+
|all | |acc |0.4450|± |0.1503|
|
| 144 |
+
| | |acc:logprob_normalization=LogProbCharNorm(name='norm', ignore_first_space=True) |0.7000|± |0.1528|
|
| 145 |
+
| | |acc:logprob_normalization=LogProbCharNorm(name='norm', ignore_first_space=False) |0.8000|± |0.1333|
|
| 146 |
+
| | |truthfulqa_mc1 |0.4000|± |0.1633|
|
| 147 |
+
| | |truthfulqa_mc2 |0.5256|± |0.1573|
|
| 148 |
+
| | |em:normalize_gold=<function gsm8k_normalizer at 0x7d2a8a2a0fe0>&normalize_pred=<function gsm8k_normalizer at 0x7d2a8a2a0fe0>|0.4000|± |0.1633|
|
| 149 |
+
|leaderboard:arc:challenge:25 | |acc |0.7000|± |0.1528|
|
| 150 |
+
| | |acc:logprob_normalization=LogProbCharNorm(name='norm', ignore_first_space=True) |0.7000|± |0.1528|
|
| 151 |
+
|leaderboard:gsm8k:5 | |em:normalize_gold=<function gsm8k_normalizer at 0x7d2a8a2a0fe0>&normalize_pred=<function gsm8k_normalizer at 0x7d2a8a2a0fe0>|0.4000|± |0.1633|
|
| 152 |
+
|leaderboard:hellaswag:10 | |acc |0.4000|± |0.1633|
|
| 153 |
+
| | |acc:logprob_normalization=LogProbCharNorm(name='norm', ignore_first_space=False) |0.8000|± |0.1333|
|
| 154 |
+
|leaderboard:mmlu:_average:5 | |acc |0.4386|± |0.1498|
|
| 155 |
+
|leaderboard:mmlu:abstract_algebra:5 | |acc |0.3000|± |0.1528|
|
| 156 |
+
|leaderboard:mmlu:anatomy:5 | |acc |0.2000|± |0.1333|
|
| 157 |
+
|leaderboard:mmlu:astronomy:5 | |acc |0.3000|± |0.1528|
|
| 158 |
+
|leaderboard:mmlu:business_ethics:5 | |acc |0.3000|± |0.1528|
|
| 159 |
+
|leaderboard:mmlu:clinical_knowledge:5 | |acc |0.7000|± |0.1528|
|
| 160 |
+
|leaderboard:mmlu:college_biology:5 | |acc |0.4000|± |0.1633|
|
| 161 |
+
|leaderboard:mmlu:college_chemistry:5 | |acc |0.5000|± |0.1667|
|
| 162 |
+
|leaderboard:mmlu:college_computer_science:5 | |acc |0.4000|± |0.1633|
|
| 163 |
+
|leaderboard:mmlu:college_mathematics:5 | |acc |0.6000|± |0.1633|
|
| 164 |
+
|leaderboard:mmlu:college_medicine:5 | |acc |0.6000|± |0.1633|
|
| 165 |
+
|leaderboard:mmlu:college_physics:5 | |acc |0.3000|± |0.1528|
|
| 166 |
+
|leaderboard:mmlu:computer_security:5 | |acc |0.5000|± |0.1667|
|
| 167 |
+
|leaderboard:mmlu:conceptual_physics:5 | |acc |0.2000|± |0.1333|
|
| 168 |
+
|leaderboard:mmlu:econometrics:5 | |acc |0.4000|± |0.1633|
|
| 169 |
+
|leaderboard:mmlu:electrical_engineering:5 | |acc |0.7000|± |0.1528|
|
| 170 |
+
|leaderboard:mmlu:elementary_mathematics:5 | |acc |0.1000|± |0.1000|
|
| 171 |
+
|leaderboard:mmlu:formal_logic:5 | |acc |0.2000|± |0.1333|
|
| 172 |
+
|leaderboard:mmlu:global_facts:5 | |acc |0.6000|± |0.1633|
|
| 173 |
+
|leaderboard:mmlu:high_school_biology:5 | |acc |0.3000|± |0.1528|
|
| 174 |
+
|leaderboard:mmlu:high_school_chemistry:5 | |acc |0.4000|± |0.1633|
|
| 175 |
+
|leaderboard:mmlu:high_school_computer_science:5 | |acc |0.4000|± |0.1633|
|
| 176 |
+
|leaderboard:mmlu:high_school_european_history:5 | |acc |0.4000|± |0.1633|
|
| 177 |
+
|leaderboard:mmlu:high_school_geography:5 | |acc |0.8000|± |0.1333|
|
| 178 |
+
|leaderboard:mmlu:high_school_government_and_politics:5| |acc |0.7000|± |0.1528|
|
| 179 |
+
|leaderboard:mmlu:high_school_macroeconomics:5 | |acc |0.4000|± |0.1633|
|
| 180 |
+
|leaderboard:mmlu:high_school_mathematics:5 | |acc |0.1000|± |0.1000|
|
| 181 |
+
|leaderboard:mmlu:high_school_microeconomics:5 | |acc |0.6000|± |0.1633|
|
| 182 |
+
|leaderboard:mmlu:high_school_physics:5 | |acc |0.2000|± |0.1333|
|
| 183 |
+
|leaderboard:mmlu:high_school_psychology:5 | |acc |0.7000|± |0.1528|
|
| 184 |
+
|leaderboard:mmlu:high_school_statistics:5 | |acc |0.5000|± |0.1667|
|
| 185 |
+
|leaderboard:mmlu:high_school_us_history:5 | |acc |0.4000|± |0.1633|
|
| 186 |
+
|leaderboard:mmlu:high_school_world_history:5 | |acc |0.9000|± |0.1000|
|
| 187 |
+
|leaderboard:mmlu:human_aging:5 | |acc |0.4000|± |0.1633|
|
| 188 |
+
|leaderboard:mmlu:human_sexuality:5 | |acc |0.4000|± |0.1633|
|
| 189 |
+
|leaderboard:mmlu:international_law:5 | |acc |0.5000|± |0.1667|
|
| 190 |
+
|leaderboard:mmlu:jurisprudence:5 | |acc |0.4000|± |0.1633|
|
| 191 |
+
|leaderboard:mmlu:logical_fallacies:5 | |acc |0.4000|± |0.1633|
|
| 192 |
+
|leaderboard:mmlu:machine_learning:5 | |acc |0.4000|± |0.1633|
|
| 193 |
+
|leaderboard:mmlu:management:5 | |acc |0.6000|± |0.1633|
|
| 194 |
+
|leaderboard:mmlu:marketing:5 | |acc |0.5000|± |0.1667|
|
| 195 |
+
|leaderboard:mmlu:medical_genetics:5 | |acc |0.7000|± |0.1528|
|
| 196 |
+
|leaderboard:mmlu:miscellaneous:5 | |acc |0.4000|± |0.1633|
|
| 197 |
+
|leaderboard:mmlu:moral_disputes:5 | |acc |0.4000|± |0.1633|
|
| 198 |
+
|leaderboard:mmlu:moral_scenarios:5 | |acc |0.0000|± |0.0000|
|
| 199 |
+
|leaderboard:mmlu:nutrition:5 | |acc |0.8000|± |0.1333|
|
| 200 |
+
|leaderboard:mmlu:philosophy:5 | |acc |0.3000|± |0.1528|
|
| 201 |
+
|leaderboard:mmlu:prehistory:5 | |acc |0.4000|± |0.1633|
|
| 202 |
+
|leaderboard:mmlu:professional_accounting:5 | |acc |0.1000|± |0.1000|
|
| 203 |
+
|leaderboard:mmlu:professional_law:5 | |acc |0.4000|± |0.1633|
|
| 204 |
+
|leaderboard:mmlu:professional_medicine:5 | |acc |0.5000|± |0.1667|
|
| 205 |
+
|leaderboard:mmlu:professional_psychology:5 | |acc |0.1000|± |0.1000|
|
| 206 |
+
|leaderboard:mmlu:public_relations:5 | |acc |0.5000|± |0.1667|
|
| 207 |
+
|leaderboard:mmlu:security_studies:5 | |acc |0.4000|± |0.1633|
|
| 208 |
+
|leaderboard:mmlu:sociology:5 | |acc |0.7000|± |0.1528|
|
| 209 |
+
|leaderboard:mmlu:us_foreign_policy:5 | |acc |0.4000|± |0.1633|
|
| 210 |
+
|leaderboard:mmlu:virology:5 | |acc |0.5000|± |0.1667|
|
| 211 |
+
|leaderboard:mmlu:world_religions:5 | |acc |0.7000|± |0.1528|
|
| 212 |
+
|leaderboard:truthfulqa:mc:0 | |truthfulqa_mc1 |0.4000|± |0.1633|
|
| 213 |
+
| | |truthfulqa_mc2 |0.5256|± |0.1573|
|
| 214 |
+
|leaderboard:winogrande:5 | |acc |0.6000|± |0.1633|
|
| 215 |
+
|
| 216 |
### Framework versions
|
| 217 |
|
| 218 |
- PEFT 0.17.1
|
|
|
|
| 223 |
- Tokenizers: 0.22.1
|
| 224 |
|
| 225 |
## Citations
|
|
|
|
|
|
|
| 226 |
|
| 227 |
```bibtex
|
| 228 |
@misc{vonwerra2022trl,
|
|
|
|
| 233 |
publisher = {GitHub},
|
| 234 |
howpublished = {\url{https://github.com/huggingface/trl}}
|
| 235 |
}
|
| 236 |
+
```
|
| 237 |
+
|
| 238 |
+
```bibtex
|
| 239 |
+
@misc{lighteval,
|
| 240 |
+
author = {Habib, Nathan and Fourrier, Clémentine and Kydlíček, Hynek and Wolf, Thomas and Tunstall, Lewis},
|
| 241 |
+
title = {LightEval: A lightweight framework for LLM evaluation},
|
| 242 |
+
year = {2023},
|
| 243 |
+
version = {0.11.0},
|
| 244 |
+
url = {https://github.com/huggingface/lighteval}
|
| 245 |
+
}
|
| 246 |
```
|