simon-clmtd commited on
Commit
37ffd63
·
verified ·
1 Parent(s): 377d493

Add model specification and test set information

Browse files
Files changed (1) hide show
  1. README.md +41 -0
README.md CHANGED
@@ -22,8 +22,49 @@ tags:
22
 
23
  This model has been adapted for short, OCR-noisy and fragmentary inputs typical of historical digitized texts.
24
 
 
25
  ## Model Details
26
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
  ### Model Description
28
 
29
  - **Developed by:** University of Zurich (UZH) from the [Impresso team](https://impresso-project.ch). The project is an interdisciplinary project focused on historical media analysis across languages, time, and modalities. Funded by the Swiss National Science Foundation ([CRSII5_173719](http://p3.snf.ch/project-173719), [CRSII5_213585](https://data.snf.ch/grants/grant/213585)) and the Luxembourg National Research Fund (grant No. 17498891).
 
22
 
23
  This model has been adapted for short, OCR-noisy and fragmentary inputs typical of historical digitized texts.
24
 
25
+
26
  ## Model Details
27
 
28
+ This model is a supervised [floret model](https://github.com/explosion/floret), trained with the following parameters:
29
+ ```
30
+ {'bucket': 200000,
31
+ 'dimension': 40,
32
+ 'hash_function': 'N/A',
33
+ 'loss': 'softmax',
34
+ 'maxn': 4,
35
+ 'minn': 1,
36
+ 'model_type': 'supervised',
37
+ 'vocab_size': 3}
38
+ ```
39
+
40
+ On the [impresso language identification challenge test set](https://github.com/impresso/dataset-challenge-lid) it achieves the following performance:
41
+
42
+ ```
43
+ de en fr it la lb nl
44
+ de 2854 0 79 3 0 38 0
45
+ en 0 156 1 0 0 0 0
46
+ fr 14 11 1515 1 7 9 0
47
+ it 0 0 0 136 0 0 0
48
+ la 0 0 0 0 0 0 0
49
+ lb 6 1 20 0 0 775 1
50
+ nl 0 0 0 0 0 0 0
51
+
52
+ Detailed Classification Report:
53
+
54
+ precision recall f1-score support
55
+
56
+ de 0.99 0.96 0.98 2974
57
+ en 0.93 0.99 0.96 157
58
+ fr 0.94 0.97 0.96 1557
59
+ it 0.97 1.00 0.99 136
60
+ la 0.00 0.00 0.00 0
61
+ lb 0.94 0.97 0.95 803
62
+ nl 0.00 0.00 0.00 0
63
+
64
+ accuracy 0.97 5627
65
+ macro avg 0.68 0.70 0.69 5627
66
+ weighted avg 0.97 0.97 0.97 5627
67
+ ```
68
  ### Model Description
69
 
70
  - **Developed by:** University of Zurich (UZH) from the [Impresso team](https://impresso-project.ch). The project is an interdisciplinary project focused on historical media analysis across languages, time, and modalities. Funded by the Swiss National Science Foundation ([CRSII5_173719](http://p3.snf.ch/project-173719), [CRSII5_213585](https://data.snf.ch/grants/grant/213585)) and the Luxembourg National Research Fund (grant No. 17498891).