File size: 2,504 Bytes
06dc89f
3ab81f5
 
 
 
 
 
 
06dc89f
 
 
 
 
 
 
 
3ab81f5
 
 
 
 
 
 
 
8ec05c1
49d71ec
 
 
8ec05c1
 
 
49d71ec
 
 
8ec05c1
 
 
 
3ab81f5
06dc89f
 
 
 
 
 
 
 
3ab81f5
06dc89f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
---
language:
- multilingual
tags:
- deepseek
- vision-language
- ocr
- document-parse
base_model:
- deepseek-ai/DeepSeek-OCR
---
# DeepSeek OCR

> [!NOTE]  
> Note currently only [NexaSDK](https://github.com/NexaAI/nexa-sdk) supports this model's GGUF.

## Quickstart

1. **Install [NexaSDK](https://github.com/NexaAI/nexa-sdk)**
2. Run the model locally with one line of code:

   ```bash
   nexa infer NexaAI/DeepSeek-OCR-GGUF
   ```
3. Then drag your image to terminal or type into the image path


case 1 : extract text
```bash
<your-image-path> Free OCR.
```


case 2 : extract bounding box
```bash
<your-image-path> <|grounding|>Convert the document to markdown. 
```
> Note: If the model fails to run, install the latest [Vulkan driver for Windows](https://www.amd.com/en/support/download/drivers.html)

## Model Description
**DeepSeek OCR** is a high-accuracy optical character recognition model built for extracting text from complex visual inputs such as documents, screenshots, receipts, and natural scenes.  
It combines vision-language modeling with efficient visual encoders to achieve superior recognition of multi-language and multi-layout text while remaining lightweight enough for edge or on-device deployment.

## Features
- **Multilingual OCR** — recognizes printed and handwritten text across major global languages.  
- **Document Layout Understanding** — preserves structure such as tables, paragraphs, and titles.  
- **Scene Text Recognition** — robust against lighting, distortion, and low-quality captures.  
- **Lightweight & Fast** — optimized for CPU and GPU acceleration.  
- **End-to-End Pipeline** — supports image-to-text and structured JSON output.  

## Use Cases
- Digitizing scanned documents or PDFs  
- Extracting text from mobile camera inputs or screenshots  
- Invoice and receipt parsing  
- OCR-based search and indexing systems  
- Visual question answering or document agents  

## Inputs and Outputs
**Input:**  
- Image file (JPEG, PNG, or tensor array)  
- Optional parameters for language hints or layout detection  

**Output:**  
- Extracted text (plain text or structured format with bounding boxes)  
- Confidence scores per word or region  

## Integration
DeepSeek OCR can be integrated through:  
- Python API (`pip install deepseek-ocr`)  
- REST or gRPC endpoints for server deployment  

## License
This model is released under the **Apache 2.0 License**, allowing commercial use, modification, and redistribution with attribution.