moazx commited on
Commit
cc053e8
Β·
verified Β·
1 Parent(s): 443e99e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -125
README.md CHANGED
@@ -1,128 +1,10 @@
1
- # PDF Layout Extraction Companion
2
-
3
- A streamlined workflow for extracting figures, tables, annotated layouts, and markdown text from scientific PDFs using [DocLayout-YOLO](https://github.com/juliozhao/DocLayout-YOLO), PyMuPDF, and Flask. The project exposes a command-line pipeline (`main.py`) and a modern Flask web UI (`app.py`).
4
-
5
- ---
6
-
7
- ## Features
8
- - **Layout-aware extraction** of figures and tables with YOLO-based detection
9
- - **Cross-page stitching** for multi-page tables, captions, titles, and body text
10
- - **Annotated PDF output** with bounding boxes for detected regions
11
- - **Markdown export** powered by `pymupdf4llm` / `pymupdf-layout`
12
- - **Flask Web UI** with modern design, dark/light theme, GPU/CPU status, and individual PDF viewing
13
- - Unified `output/<PDF stem>/` directory structure for CLI + UI runs
14
-
15
  ---
16
-
17
- ## Requirements
18
- - Python 3.12+
19
- - [uv](https://docs.astral.sh/uv/latest/) (recommended) or `pip`
20
- - GPU optional (DocLayout-YOLO runs on CPU as well)
21
-
22
- Install dependencies:
23
- ```bash
24
- uv pip install
25
- ```
26
-
27
- > If you prefer a virtualenv, create/activate it first, then run `uv pip install` inside.
28
-
29
  ---
30
 
31
- ## Quick Start
32
-
33
- ### Command Line Pipeline
34
- Process all PDFs in `./pdfs` and write outputs to `./output/<PDF stem>/`:
35
- ```bash
36
- uv run python main.py
37
- ```
38
-
39
- Each subdirectory contains:
40
- - `* _content_list.json` – metadata for extracted figures/tables
41
- - `*_layout.pdf` – annotated PDF with layout boxes
42
- - `*.md` – markdown export (if `pymupdf4llm` is installed)
43
- - `figures/` & `tables/` – cropped PNGs with stitched captions/titles
44
-
45
- ### Flask Web App (Recommended)
46
- Launch the modern Flask web interface locally:
47
- ```bash
48
- python run_flask_gpu.py
49
- ```
50
- Then open your browser to `http://localhost:5000`
51
-
52
- **Features:**
53
- - Clean, modern UI with dark/light theme support
54
- - Multiple PDF upload and processing
55
- - Individual PDF output viewing with sidebar navigation
56
- - Real-time GPU/CPU status display
57
- - Image gallery for figures and tables
58
- - Markdown preview and download
59
- - Responsive design for mobile and desktop
60
-
61
- All Flask app runs also write into `./output/<PDF stem>/` using the same structure as the CLI.
62
-
63
- ### Deploy to Modal.com (Cloud with GPU)
64
- Deploy your Flask app online with GPU support using Modal:
65
- ```bash
66
- # Install Modal CLI
67
- pip install modal
68
-
69
- # Authenticate with Modal
70
- modal token new
71
-
72
- # Deploy to Modal
73
- modal deploy modal_app.py
74
- ```
75
-
76
- See [MODAL_DEPLOYMENT.md](MODAL_DEPLOYMENT.md) for detailed instructions.
77
-
78
- **Benefits:**
79
- - GPU support (T4, A10G, or A100)
80
- - Pay-per-use pricing
81
- - Automatic HTTPS
82
- - Auto-scaling
83
- - Global deployment
84
-
85
- ---
86
-
87
- ## Configuration Highlights
88
- - **Detection model:** DocLayout-YOLO (`doclayout_yolo_docstructbench_imgsz1024.pt`)
89
- - **Detection thresholds:** configurable in `main.py`
90
- - **Layout stitching:** tables, captions, titles, body text
91
- - **Markdown extraction:** defaults to enabled (`pymupdf4llm.to_markdown`); falls back gracefully if the package is missing
92
- - **Output directory:** `./output` (configurable near the bottom of `main.py`)
93
-
94
- ---
95
-
96
- ## File Overview
97
- | Path | Description |
98
- |------|-------------|
99
- | `main.py` | CLI pipeline for batch PDF processing |
100
- | `app.py` | Flask web application (recommended UI) |
101
- | `run_flask_gpu.py` | Local Flask runner with GPU support |
102
- | `modal_app.py` | Modal.com deployment configuration (cloud GPU) |
103
- | `MODAL_DEPLOYMENT.md` | Modal.com deployment guide |
104
- | `templates/` | Flask HTML templates |
105
- | `static/` | Flask static files (CSS, JS) |
106
- | `pdfs/` | Source PDFs (gitignored) |
107
- | `output/` | Generated outputs per PDF |
108
- | `pyproject.toml` | Project metadata & dependency list |
109
- | `uv.lock` | Locked dependency versions (auto-maintained by `uv`) |
110
-
111
- ---
112
-
113
- ## Troubleshooting
114
- - **`ModuleNotFoundError: pymupdf4llm`** – install it via `uv pip install pymupdf4llm` (already listed in `pyproject.toml`).
115
- - **Slow performance** – ensure GPU CUDA drivers are available or reduce concurrency by toggling `USE_MULTIPROCESSING` in `main.py`.
116
- - **Large outputs** – clean the `output/` directory before reruns to avoid confusing duplicates.
117
-
118
- For additional logging, set `LOG_LEVEL` or edit the `logger` configuration in `main.py`.
119
-
120
- ---
121
-
122
- ## Acknowledgements
123
- - [DocLayout-YOLO](https://github.com/juliozhao/DocLayout-YOLO)
124
- - [PyMuPDF](https://pymupdf.readthedocs.io/)
125
- - [PyMuPDF4LLM](https://github.com/pymupdf/RAG/blob/main/pymupdf4llm.md)
126
- - [Flask](https://flask.palletsprojects.com/)
127
-
128
- Happy extracting! πŸŽ‰
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: AI PDF Tool
3
+ emoji: πŸŒ–
4
+ colorFrom: gray
5
+ colorTo: green
6
+ sdk: docker
7
+ pinned: false
 
 
 
 
 
 
 
8
  ---
9
 
10
+ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference