Upload README.md
Browse files
README.md
CHANGED
|
@@ -119,7 +119,10 @@ for i, query in enumerate(queries):
|
|
| 119 |
See full example notebook [here](https://huggingface.co/MongoDB/mdbr-leaf-ir/blob/main/transformers_example.ipynb).
|
| 120 |
|
| 121 |
## Asymmetric Retrieval Setup
|
| 122 |
-
|
|
|
|
|
|
|
|
|
|
| 123 |
`mdbr-leaf-ir` is *aligned* to [`snowflake-arctic-embed-m-v1.5`](https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v1.5), the model it has been distilled from. This enables flexible architectures in which, for example, documents are encoded using the larger model, while queries can be encoded faster and more efficiently with the compact `leaf` model:
|
| 124 |
```python
|
| 125 |
# Use mdbr-leaf-ir for query encoding (real-time, low latency)
|
|
@@ -139,25 +142,19 @@ Retrieval results in asymmetric mode are often superior to the [standard mode ab
|
|
| 139 |
|
| 140 |
Embeddings have been trained via [MRL](https://arxiv.org/abs/2205.13147) and can be truncated for more efficient storage:
|
| 141 |
```python
|
| 142 |
-
|
| 143 |
-
|
| 144 |
-
query_embeds = model.encode(queries, prompt_name="query", convert_to_tensor=True)
|
| 145 |
-
doc_embeds = model.encode(documents, convert_to_tensor=True)
|
| 146 |
-
|
| 147 |
-
# Truncate and normalize according to MRL
|
| 148 |
-
query_embeds = F.normalize(query_embeds[:, :256], dim=-1)
|
| 149 |
-
doc_embeds = F.normalize(doc_embeds[:, :256], dim=-1)
|
| 150 |
|
| 151 |
similarities = model.similarity(query_embeds, doc_embeds)
|
| 152 |
|
| 153 |
print('After MRL:')
|
| 154 |
print(f"* Embeddings dimension: {query_embeds.shape[1]}")
|
| 155 |
-
print(f"* Similarities
|
| 156 |
|
| 157 |
# After MRL:
|
| 158 |
# * Embeddings dimension: 256
|
| 159 |
# * Similarities:
|
| 160 |
-
#
|
| 161 |
# [0.4567, 0.6022]])
|
| 162 |
```
|
| 163 |
|
|
@@ -185,7 +182,7 @@ similarities = query_embeds.astype(int) @ doc_embeds.astype(int).T
|
|
| 185 |
|
| 186 |
print('After quantization:')
|
| 187 |
print(f"* Embeddings type: {query_embeds.dtype}")
|
| 188 |
-
print(f"* Similarities
|
| 189 |
|
| 190 |
# After quantization:
|
| 191 |
# * Embeddings type: int8
|
|
|
|
| 119 |
See full example notebook [here](https://huggingface.co/MongoDB/mdbr-leaf-ir/blob/main/transformers_example.ipynb).
|
| 120 |
|
| 121 |
## Asymmetric Retrieval Setup
|
| 122 |
+
|
| 123 |
+
> [!Note]
|
| 124 |
+
> **Note**: a version of this asymmetric setup, conveniently packaged into a single model, is [available here](https://huggingface.co/MongoDB/mdbr-leaf-ir-asym).
|
| 125 |
+
|
| 126 |
`mdbr-leaf-ir` is *aligned* to [`snowflake-arctic-embed-m-v1.5`](https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v1.5), the model it has been distilled from. This enables flexible architectures in which, for example, documents are encoded using the larger model, while queries can be encoded faster and more efficiently with the compact `leaf` model:
|
| 127 |
```python
|
| 128 |
# Use mdbr-leaf-ir for query encoding (real-time, low latency)
|
|
|
|
| 142 |
|
| 143 |
Embeddings have been trained via [MRL](https://arxiv.org/abs/2205.13147) and can be truncated for more efficient storage:
|
| 144 |
```python
|
| 145 |
+
query_embeds = model.encode(queries, prompt_name="query", truncate_dim=256)
|
| 146 |
+
doc_embeds = model.encode(documents, truncate_dim=256)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 147 |
|
| 148 |
similarities = model.similarity(query_embeds, doc_embeds)
|
| 149 |
|
| 150 |
print('After MRL:')
|
| 151 |
print(f"* Embeddings dimension: {query_embeds.shape[1]}")
|
| 152 |
+
print(f"* Similarities: \n\t{similarities}")
|
| 153 |
|
| 154 |
# After MRL:
|
| 155 |
# * Embeddings dimension: 256
|
| 156 |
# * Similarities:
|
| 157 |
+
# tensor([[0.7136, 0.4989],
|
| 158 |
# [0.4567, 0.6022]])
|
| 159 |
```
|
| 160 |
|
|
|
|
| 182 |
|
| 183 |
print('After quantization:')
|
| 184 |
print(f"* Embeddings type: {query_embeds.dtype}")
|
| 185 |
+
print(f"* Similarities: \n{similarities}")
|
| 186 |
|
| 187 |
# After quantization:
|
| 188 |
# * Embeddings type: int8
|