Update README.md
Browse files
README.md
CHANGED
|
@@ -1,4 +1,11 @@
|
|
| 1 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
base_model:
|
| 3 |
- deepseek-ai/DeepSeek-OCR
|
| 4 |
---
|
|
@@ -7,6 +14,15 @@ base_model:
|
|
| 7 |
> [!NOTE]
|
| 8 |
> Note currently only [NexaSDK](https://github.com/NexaAI/nexa-sdk) supports this model's GGUF.
|
| 9 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
## Model Description
|
| 11 |
**DeepSeek OCR** is a high-accuracy optical character recognition model built for extracting text from complex visual inputs such as documents, screenshots, receipts, and natural scenes.
|
| 12 |
It combines vision-language modeling with efficient visual encoders to achieve superior recognition of multi-language and multi-layout text while remaining lightweight enough for edge or on-device deployment.
|
|
@@ -15,7 +31,7 @@ It combines vision-language modeling with efficient visual encoders to achieve s
|
|
| 15 |
- **Multilingual OCR** β recognizes printed and handwritten text across major global languages.
|
| 16 |
- **Document Layout Understanding** β preserves structure such as tables, paragraphs, and titles.
|
| 17 |
- **Scene Text Recognition** β robust against lighting, distortion, and low-quality captures.
|
| 18 |
-
- **Lightweight & Fast** β optimized for CPU
|
| 19 |
- **End-to-End Pipeline** β supports image-to-text and structured JSON output.
|
| 20 |
|
| 21 |
## Use Cases
|
|
@@ -38,7 +54,6 @@ It combines vision-language modeling with efficient visual encoders to achieve s
|
|
| 38 |
DeepSeek OCR can be integrated through:
|
| 39 |
- Python API (`pip install deepseek-ocr`)
|
| 40 |
- REST or gRPC endpoints for server deployment
|
| 41 |
-
- On-device SDKs optimized for NPUs (via NexaSDK, OpenVINO, or TensorRT)
|
| 42 |
|
| 43 |
## License
|
| 44 |
This model is released under the **Apache 2.0 License**, allowing commercial use, modification, and redistribution with attribution.
|
|
|
|
| 1 |
---
|
| 2 |
+
language:
|
| 3 |
+
- multilingual
|
| 4 |
+
tags:
|
| 5 |
+
- deepseek
|
| 6 |
+
- vision-language
|
| 7 |
+
- ocr
|
| 8 |
+
- document-parse
|
| 9 |
base_model:
|
| 10 |
- deepseek-ai/DeepSeek-OCR
|
| 11 |
---
|
|
|
|
| 14 |
> [!NOTE]
|
| 15 |
> Note currently only [NexaSDK](https://github.com/NexaAI/nexa-sdk) supports this model's GGUF.
|
| 16 |
|
| 17 |
+
## Quickstart
|
| 18 |
+
|
| 19 |
+
1. **Install [NexaSDK](https://github.com/NexaAI/nexa-sdk)**
|
| 20 |
+
2. Run the model locally with one line of code:
|
| 21 |
+
|
| 22 |
+
```bash
|
| 23 |
+
nexa infer NexaAI/DeepSeek-OCR-GGUF
|
| 24 |
+
```
|
| 25 |
+
|
| 26 |
## Model Description
|
| 27 |
**DeepSeek OCR** is a high-accuracy optical character recognition model built for extracting text from complex visual inputs such as documents, screenshots, receipts, and natural scenes.
|
| 28 |
It combines vision-language modeling with efficient visual encoders to achieve superior recognition of multi-language and multi-layout text while remaining lightweight enough for edge or on-device deployment.
|
|
|
|
| 31 |
- **Multilingual OCR** β recognizes printed and handwritten text across major global languages.
|
| 32 |
- **Document Layout Understanding** β preserves structure such as tables, paragraphs, and titles.
|
| 33 |
- **Scene Text Recognition** β robust against lighting, distortion, and low-quality captures.
|
| 34 |
+
- **Lightweight & Fast** β optimized for CPU and GPU acceleration.
|
| 35 |
- **End-to-End Pipeline** β supports image-to-text and structured JSON output.
|
| 36 |
|
| 37 |
## Use Cases
|
|
|
|
| 54 |
DeepSeek OCR can be integrated through:
|
| 55 |
- Python API (`pip install deepseek-ocr`)
|
| 56 |
- REST or gRPC endpoints for server deployment
|
|
|
|
| 57 |
|
| 58 |
## License
|
| 59 |
This model is released under the **Apache 2.0 License**, allowing commercial use, modification, and redistribution with attribution.
|