Simon van Dyk
commited on
Commit
·
33b5a57
1
Parent(s):
ebd7d22
Add: cost calc explanation
Browse files
README.md
CHANGED
|
@@ -56,6 +56,10 @@ We computed the validation set accuracy based on a sample of 1,000 points from o
|
|
| 56 |
<img src="https://huggingface.co/NOSIBLE/financial-sentiment-v1.1-base/resolve/main/plots/accuracy-vs-cost--nosible-dataset.png"/>
|
| 57 |
<p>
|
| 58 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 59 |
#### Financial PhraseBank Dataset
|
| 60 |
|
| 61 |
We computed the Financial PhraseBank accuracies on the entire dataset. The 86% for FinBERT was their reported number in their paper.
|
|
|
|
| 56 |
<img src="https://huggingface.co/NOSIBLE/financial-sentiment-v1.1-base/resolve/main/plots/accuracy-vs-cost--nosible-dataset.png"/>
|
| 57 |
<p>
|
| 58 |
|
| 59 |
+
Cost per 1M tokens for the LLMs was calculated as a weighted average of input and output token costs using a 10:1 ratio (10× input cost + 1× output cost, divided by 11), based on pricing from OpenRouter. This reflects the ratio between our prompt used to label our dataset.
|
| 60 |
+
|
| 61 |
+
For the NOSIBLE model, we conservatively used the cost of Qwen-8B on OpenRouter with a 100:1 ratio since the model produces a single output token when used as described in this guide. Despite this, our model is still the cheapest option.
|
| 62 |
+
|
| 63 |
#### Financial PhraseBank Dataset
|
| 64 |
|
| 65 |
We computed the Financial PhraseBank accuracies on the entire dataset. The 86% for FinBERT was their reported number in their paper.
|