Text Generation
Transformers
Safetensors
xlstm
sft
trl
conversational
🇪🇺 Region: EU
mrs83 commited on
Commit
679d842
·
verified ·
1 Parent(s): be15fc2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +91 -2
README.md CHANGED
@@ -132,6 +132,87 @@ You can also try seeking support from a therapist or counselor if you are strugg
132
 
133
  This model was trained with SFT.
134
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
135
  ### Framework versions
136
 
137
  - PEFT 0.17.1
@@ -142,8 +223,6 @@ This model was trained with SFT.
142
  - Tokenizers: 0.22.1
143
 
144
  ## Citations
145
-
146
- Cite TRL as:
147
 
148
  ```bibtex
149
  @misc{vonwerra2022trl,
@@ -154,4 +233,14 @@ Cite TRL as:
154
  publisher = {GitHub},
155
  howpublished = {\url{https://github.com/huggingface/trl}}
156
  }
 
 
 
 
 
 
 
 
 
 
157
  ```
 
132
 
133
  This model was trained with SFT.
134
 
135
+ ## Evaluation
136
+
137
+ This model has been loaded in 4-bit and evaluated with [lighteval](https://github.com/huggingface/lighteval)
138
+
139
+
140
+
141
+ | Task |Version| Metric |Value | |Stderr|
142
+ |------------------------------------------------------|-------|----------------------------------------------------------------------------------------------------------------------------|-----:|---|-----:|
143
+ |all | |acc |0.4450|± |0.1503|
144
+ | | |acc:logprob_normalization=LogProbCharNorm(name='norm', ignore_first_space=True) |0.7000|± |0.1528|
145
+ | | |acc:logprob_normalization=LogProbCharNorm(name='norm', ignore_first_space=False) |0.8000|± |0.1333|
146
+ | | |truthfulqa_mc1 |0.4000|± |0.1633|
147
+ | | |truthfulqa_mc2 |0.5256|± |0.1573|
148
+ | | |em:normalize_gold=<function gsm8k_normalizer at 0x7d2a8a2a0fe0>&normalize_pred=<function gsm8k_normalizer at 0x7d2a8a2a0fe0>|0.4000|± |0.1633|
149
+ |leaderboard:arc:challenge:25 | |acc |0.7000|± |0.1528|
150
+ | | |acc:logprob_normalization=LogProbCharNorm(name='norm', ignore_first_space=True) |0.7000|± |0.1528|
151
+ |leaderboard:gsm8k:5 | |em:normalize_gold=<function gsm8k_normalizer at 0x7d2a8a2a0fe0>&normalize_pred=<function gsm8k_normalizer at 0x7d2a8a2a0fe0>|0.4000|± |0.1633|
152
+ |leaderboard:hellaswag:10 | |acc |0.4000|± |0.1633|
153
+ | | |acc:logprob_normalization=LogProbCharNorm(name='norm', ignore_first_space=False) |0.8000|± |0.1333|
154
+ |leaderboard:mmlu:_average:5 | |acc |0.4386|± |0.1498|
155
+ |leaderboard:mmlu:abstract_algebra:5 | |acc |0.3000|± |0.1528|
156
+ |leaderboard:mmlu:anatomy:5 | |acc |0.2000|± |0.1333|
157
+ |leaderboard:mmlu:astronomy:5 | |acc |0.3000|± |0.1528|
158
+ |leaderboard:mmlu:business_ethics:5 | |acc |0.3000|± |0.1528|
159
+ |leaderboard:mmlu:clinical_knowledge:5 | |acc |0.7000|± |0.1528|
160
+ |leaderboard:mmlu:college_biology:5 | |acc |0.4000|± |0.1633|
161
+ |leaderboard:mmlu:college_chemistry:5 | |acc |0.5000|± |0.1667|
162
+ |leaderboard:mmlu:college_computer_science:5 | |acc |0.4000|± |0.1633|
163
+ |leaderboard:mmlu:college_mathematics:5 | |acc |0.6000|± |0.1633|
164
+ |leaderboard:mmlu:college_medicine:5 | |acc |0.6000|± |0.1633|
165
+ |leaderboard:mmlu:college_physics:5 | |acc |0.3000|± |0.1528|
166
+ |leaderboard:mmlu:computer_security:5 | |acc |0.5000|± |0.1667|
167
+ |leaderboard:mmlu:conceptual_physics:5 | |acc |0.2000|± |0.1333|
168
+ |leaderboard:mmlu:econometrics:5 | |acc |0.4000|± |0.1633|
169
+ |leaderboard:mmlu:electrical_engineering:5 | |acc |0.7000|± |0.1528|
170
+ |leaderboard:mmlu:elementary_mathematics:5 | |acc |0.1000|± |0.1000|
171
+ |leaderboard:mmlu:formal_logic:5 | |acc |0.2000|± |0.1333|
172
+ |leaderboard:mmlu:global_facts:5 | |acc |0.6000|± |0.1633|
173
+ |leaderboard:mmlu:high_school_biology:5 | |acc |0.3000|± |0.1528|
174
+ |leaderboard:mmlu:high_school_chemistry:5 | |acc |0.4000|± |0.1633|
175
+ |leaderboard:mmlu:high_school_computer_science:5 | |acc |0.4000|± |0.1633|
176
+ |leaderboard:mmlu:high_school_european_history:5 | |acc |0.4000|± |0.1633|
177
+ |leaderboard:mmlu:high_school_geography:5 | |acc |0.8000|± |0.1333|
178
+ |leaderboard:mmlu:high_school_government_and_politics:5| |acc |0.7000|± |0.1528|
179
+ |leaderboard:mmlu:high_school_macroeconomics:5 | |acc |0.4000|± |0.1633|
180
+ |leaderboard:mmlu:high_school_mathematics:5 | |acc |0.1000|± |0.1000|
181
+ |leaderboard:mmlu:high_school_microeconomics:5 | |acc |0.6000|± |0.1633|
182
+ |leaderboard:mmlu:high_school_physics:5 | |acc |0.2000|± |0.1333|
183
+ |leaderboard:mmlu:high_school_psychology:5 | |acc |0.7000|± |0.1528|
184
+ |leaderboard:mmlu:high_school_statistics:5 | |acc |0.5000|± |0.1667|
185
+ |leaderboard:mmlu:high_school_us_history:5 | |acc |0.4000|± |0.1633|
186
+ |leaderboard:mmlu:high_school_world_history:5 | |acc |0.9000|± |0.1000|
187
+ |leaderboard:mmlu:human_aging:5 | |acc |0.4000|± |0.1633|
188
+ |leaderboard:mmlu:human_sexuality:5 | |acc |0.4000|± |0.1633|
189
+ |leaderboard:mmlu:international_law:5 | |acc |0.5000|± |0.1667|
190
+ |leaderboard:mmlu:jurisprudence:5 | |acc |0.4000|± |0.1633|
191
+ |leaderboard:mmlu:logical_fallacies:5 | |acc |0.4000|± |0.1633|
192
+ |leaderboard:mmlu:machine_learning:5 | |acc |0.4000|± |0.1633|
193
+ |leaderboard:mmlu:management:5 | |acc |0.6000|± |0.1633|
194
+ |leaderboard:mmlu:marketing:5 | |acc |0.5000|± |0.1667|
195
+ |leaderboard:mmlu:medical_genetics:5 | |acc |0.7000|± |0.1528|
196
+ |leaderboard:mmlu:miscellaneous:5 | |acc |0.4000|± |0.1633|
197
+ |leaderboard:mmlu:moral_disputes:5 | |acc |0.4000|± |0.1633|
198
+ |leaderboard:mmlu:moral_scenarios:5 | |acc |0.0000|± |0.0000|
199
+ |leaderboard:mmlu:nutrition:5 | |acc |0.8000|± |0.1333|
200
+ |leaderboard:mmlu:philosophy:5 | |acc |0.3000|± |0.1528|
201
+ |leaderboard:mmlu:prehistory:5 | |acc |0.4000|± |0.1633|
202
+ |leaderboard:mmlu:professional_accounting:5 | |acc |0.1000|± |0.1000|
203
+ |leaderboard:mmlu:professional_law:5 | |acc |0.4000|± |0.1633|
204
+ |leaderboard:mmlu:professional_medicine:5 | |acc |0.5000|± |0.1667|
205
+ |leaderboard:mmlu:professional_psychology:5 | |acc |0.1000|± |0.1000|
206
+ |leaderboard:mmlu:public_relations:5 | |acc |0.5000|± |0.1667|
207
+ |leaderboard:mmlu:security_studies:5 | |acc |0.4000|± |0.1633|
208
+ |leaderboard:mmlu:sociology:5 | |acc |0.7000|± |0.1528|
209
+ |leaderboard:mmlu:us_foreign_policy:5 | |acc |0.4000|± |0.1633|
210
+ |leaderboard:mmlu:virology:5 | |acc |0.5000|± |0.1667|
211
+ |leaderboard:mmlu:world_religions:5 | |acc |0.7000|± |0.1528|
212
+ |leaderboard:truthfulqa:mc:0 | |truthfulqa_mc1 |0.4000|± |0.1633|
213
+ | | |truthfulqa_mc2 |0.5256|± |0.1573|
214
+ |leaderboard:winogrande:5 | |acc |0.6000|± |0.1633|
215
+
216
  ### Framework versions
217
 
218
  - PEFT 0.17.1
 
223
  - Tokenizers: 0.22.1
224
 
225
  ## Citations
 
 
226
 
227
  ```bibtex
228
  @misc{vonwerra2022trl,
 
233
  publisher = {GitHub},
234
  howpublished = {\url{https://github.com/huggingface/trl}}
235
  }
236
+ ```
237
+
238
+ ```bibtex
239
+ @misc{lighteval,
240
+ author = {Habib, Nathan and Fourrier, Clémentine and Kydlíček, Hynek and Wolf, Thomas and Tunstall, Lewis},
241
+ title = {LightEval: A lightweight framework for LLM evaluation},
242
+ year = {2023},
243
+ version = {0.11.0},
244
+ url = {https://github.com/huggingface/lighteval}
245
+ }
246
  ```