English | 简体中文
Monkey OCR GUI is a cross-platform graphical tool based on Monkey OCR API, designed to provide a clean, efficient, and user-friendly OCR interface. The tool supports text recognition, formula extraction, table parsing, and complete document recognition, with powerful features including multi-threaded concurrent processing, real-time progress tracking, result preview, and export.
- 🎯 Multi-Mode Recognition: Supports four recognition modes: text, formula (LaTeX), table (HTML), and document (Markdown)
- 📄 Multi-Format Support: Supports PDF, PNG, JPG, JPEG format files
- 🔄 Page Navigation: PDF multi-page file support with page browsing and selective processing
- ⚡ High Performance: Intelligent multi-threaded concurrent processing, automatically adjusts thread count based on system load
- 📊 Real-time Progress: Real-time progress bar and detailed log output
- 🎨 Dual Display Mode: Supports preview mode and source code mode switching
- 💾 Flexible Export: Supports single page or all pages export
- 🌐 API Management: Built-in API health check and connection testing
- 🎭 Theme Switching: Supports dark/light/system theme
- 🌍 Internationalization: Chinese and English interface switching support
- 📝 Marked PDF: Automatically loads and displays marked PDFs returned by API
- Operating System: Windows 10+, macOS 10.14+, Linux
- Python Version: 3.11 or higher
- Memory: 4GB+ recommended
- Disk Space: At least 500MB available space
- Clone Repository
git clone https://github.com/yourusername/Dev-Monkey-OCR-GUI.git
cd Dev-Monkey-OCR-GUI- Install Dependencies
pip install -r requirements.txtMain dependencies include:
customtkinter- Modern GUI frameworkPyMuPDF- PDF processingPillow- Image processingrequests- HTTP requestsmarkdown2- Markdown renderingmatplotlib- LaTeX rendering support
- Run Application
python main.py-
API Configuration
- First launch will prompt for API address configuration
- Enter the base URL of Monkey OCR API (e.g.,
http://localhost:8000) - Click "Test" button to verify connection
-
Upload File
- Click "Select File" button in the left panel
- Or drag and drop files to the upload area
- Supported formats: PDF, PNG, JPG, JPEG
-
Select Processing Range
- Process Current Page: Only process the currently displayed page
- Process All Pages: Process all pages of the file
- Custom Range: Specify in the page range input box (e.g., 1-5)
-
Start Recognition
- Select recognition mode (text/formula/table/document)
- Click "Start Processing" button
- View real-time progress and logs in the right panel
-
View Results
- Middle panel displays recognition results
- Preview Mode: Rendered effect (Markdown/LaTeX/HTML)
- Source Mode: Raw code content
- Use page navigation buttons to switch between results of different pages
-
Export Results
- Export Current: Export results of the currently viewed page
- Export All: Batch export results of all recognized pages
| Mode | Output Format | Use Case |
|---|---|---|
| Text Recognition | Markdown | Plain text content, article paragraphs |
| Formula Extraction | LaTeX | Mathematical formulas, scientific papers |
| Table Extraction | HTML | Tabular data, statistical reports |
| Document Parsing | Markdown + Marked PDF | Complete documents, mixed content |
- File upload and preview
- PDF page navigation
- Marked PDF loading and comparison
- Dual mode switching (preview/source)
- Page result navigation
- Content copy and export
- API configuration and status
- Page selection control
- Processing progress tracking
- Real-time log output
In the right panel "API Configuration" area:
- API Address: Base URL of Monkey OCR API
- Test Button: Verify API connection status
- Status Indicator: Shows current connection status (healthy/error/not tested)
The application automatically detects system resources and optimizes concurrent performance:
- CPU Load > 80%: Reduce thread count by 50%
- Available Memory < 1GB: Reduce thread count by 50%
- Available Memory < 2GB: Reduce thread count by 25%
You can adjust default configuration by modifying src/config/settings.py:
"performance": {
"concurrency": {
"ocr_processing": 4,
"min_workers": 2,
"max_workers": 16
}
}Theme switching is available at the bottom of the right panel:
- System: Automatically adapts to system theme
- Dark Mode: Dark interface
- Light Mode: Light interface
Dev-Monkey-OCR-GUI/
├── main.py # Application entry
├── requirements.txt # Dependency list
├── version.json # Version info
├── src/
│ ├── api/ # API client
│ │ └── monkey_ocr_client.py
│ ├── config/ # Configuration management
│ │ └── settings.py
│ ├── gui/ # GUI components
│ │ ├── main_window.py
│ │ ├── panels/ # Panel components
│ │ ├── dialogs/ # Dialogs
│ │ ├── renderers/ # Content renderers
│ │ └── styles/ # Style definitions
│ └── utils/ # Utility functions
│ ├── file_utils.py
│ └── i18n.py
└── locales/ # Internationalization resources
├── zh_CN.json
└── en_US.json
Package as standalone executable using PyInstaller:
pyinstaller monkey_ocr.specGenerated executable is located in the dist/ directory.
- Automatically detects system CPU and memory status
- Dynamically adjusts worker thread count
- Prevents system overload
- HTTP connection pool reuse
- Reduces connection establishment overhead
- Improves concurrent performance
- Automatically retries network errors
- Exponential backoff strategy
- Maximum 3 retry attempts
- Fine-grained exception classification
- Detailed error logging
- User-friendly prompts
- Automatic temporary file cleanup
- Startup cleanup of expired cache
- Exit cleanup of all temporary resources
A: Check the following:
- Confirm API service is running
- Check if URL format is correct (e.g.,
http://localhost:8000) - Confirm firewall is not blocking the connection
- Check log output for detailed error information
A: Possible reasons:
- High network latency
- Insufficient system resources (CPU/memory)
- Processing large files or many pages
- High API server load
Try:
- Reduce concurrent page count
- Process large files in batches
- Upgrade system hardware
A: Currently supported:
- Images: PNG, JPG, JPEG
- Documents: PDF (multi-page support)
A: The application automatically detects system language. You can manually switch by modifying the configuration file:
"ui": {
"language": "zh_CN" # or "en_US"
}A: A file selection dialog will pop up during export, allowing you to freely choose the save location.
A:
git pull origin main
pip install -r requirements.txt --upgradeThis project is licensed under the MIT License.
- Monkey OCR API - MonkeyOCR: Document Parsing with a Structure-Recognition-Relation Triplet Paradigm
- CustomTkinter - Modern Tkinter interface library
- PyMuPDF - High-performance PDF processing library



