Monkey OCR GUI

📖 Project Introduction

Monkey OCR GUI is a cross-platform graphical tool based on Monkey OCR API, designed to provide a clean, efficient, and user-friendly OCR interface. The tool supports text recognition, formula extraction, table parsing, and complete document recognition, with powerful features including multi-threaded concurrent processing, real-time progress tracking, result preview, and export.

Core Features

🎯 Multi-Mode Recognition: Supports four recognition modes: text, formula (LaTeX), table (HTML), and document (Markdown)
📄 Multi-Format Support: Supports PDF, PNG, JPG, JPEG format files
🔄 Page Navigation: PDF multi-page file support with page browsing and selective processing
⚡ High Performance: Intelligent multi-threaded concurrent processing, automatically adjusts thread count based on system load
📊 Real-time Progress: Real-time progress bar and detailed log output
🎨 Dual Display Mode: Supports preview mode and source code mode switching
💾 Flexible Export: Supports single page or all pages export
🌐 API Management: Built-in API health check and connection testing
🎭 Theme Switching: Supports dark/light/system theme
🌍 Internationalization: Chinese and English interface switching support
📝 Marked PDF: Automatically loads and displays marked PDFs returned by API

📸 Screenshots

Interface Overview

Light Theme

Dark Theme

OCR Recognition

Light Theme OCR

Dark Theme OCR

🚀 Installation Guide

System Requirements

Operating System: Windows 10+, macOS 10.14+, Linux
Python Version: 3.11 or higher
Memory: 4GB+ recommended
Disk Space: At least 500MB available space

Dependency Installation

Clone Repository

git clone https://github.com/yourusername/Dev-Monkey-OCR-GUI.git
cd Dev-Monkey-OCR-GUI

Install Dependencies

pip install -r requirements.txt

Main dependencies include:

customtkinter - Modern GUI framework
PyMuPDF - PDF processing
Pillow - Image processing
requests - HTTP requests
markdown2 - Markdown rendering
matplotlib - LaTeX rendering support

Run Application

python main.py

📋 Usage Instructions

First Launch

API Configuration
- First launch will prompt for API address configuration
- Enter the base URL of Monkey OCR API (e.g., http://localhost:8000)
- Click "Test" button to verify connection
Upload File
- Click "Select File" button in the left panel
- Or drag and drop files to the upload area
- Supported formats: PDF, PNG, JPG, JPEG
Select Processing Range
- Process Current Page: Only process the currently displayed page
- Process All Pages: Process all pages of the file
- Custom Range: Specify in the page range input box (e.g., 1-5)
Start Recognition
- Select recognition mode (text/formula/table/document)
- Click "Start Processing" button
- View real-time progress and logs in the right panel
View Results
- Middle panel displays recognition results
- Preview Mode: Rendered effect (Markdown/LaTeX/HTML)
- Source Mode: Raw code content
- Use page navigation buttons to switch between results of different pages
Export Results
- Export Current: Export results of the currently viewed page
- Export All: Batch export results of all recognized pages

Recognition Mode Description

Mode	Output Format	Use Case
Text Recognition	Markdown	Plain text content, article paragraphs
Formula Extraction	LaTeX	Mathematical formulas, scientific papers
Table Extraction	HTML	Tabular data, statistical reports
Document Parsing	Markdown + Marked PDF	Complete documents, mixed content

Function Panels

Left Panel - File Management

File upload and preview
PDF page navigation
Marked PDF loading and comparison

Middle Panel - Result Display

Dual mode switching (preview/source)
Page result navigation
Content copy and export

Right Panel - Control Center

API configuration and status
Page selection control
Processing progress tracking
Real-time log output

⚙️ Configuration

API Configuration

In the right panel "API Configuration" area:

API Address: Base URL of Monkey OCR API
Test Button: Verify API connection status
Status Indicator: Shows current connection status (healthy/error/not tested)

Performance Tuning

The application automatically detects system resources and optimizes concurrent performance:

CPU Load > 80%: Reduce thread count by 50%
Available Memory < 1GB: Reduce thread count by 50%
Available Memory < 2GB: Reduce thread count by 25%

You can adjust default configuration by modifying src/config/settings.py:

"performance": {
    "concurrency": {
        "ocr_processing": 4,
        "min_workers": 2,
        "max_workers": 16
    }
}

Theme Settings

Theme switching is available at the bottom of the right panel:

System: Automatically adapts to system theme
Dark Mode: Dark interface
Light Mode: Light interface

🛠️ Development Guide

Project Structure

Dev-Monkey-OCR-GUI/
├── main.py                 # Application entry
├── requirements.txt        # Dependency list
├── version.json           # Version info
├── src/
│   ├── api/              # API client
│   │   └── monkey_ocr_client.py
│   ├── config/           # Configuration management
│   │   └── settings.py
│   ├── gui/              # GUI components
│   │   ├── main_window.py
│   │   ├── panels/       # Panel components
│   │   ├── dialogs/      # Dialogs
│   │   ├── renderers/    # Content renderers
│   │   └── styles/       # Style definitions
│   └── utils/            # Utility functions
│       ├── file_utils.py
│       └── i18n.py
└── locales/              # Internationalization resources
    ├── zh_CN.json
    └── en_US.json

Build from Source

Package as standalone executable using PyInstaller:

pyinstaller monkey_ocr.spec

Generated executable is located in the dist/ directory.

Technical Highlights

1. Intelligent Concurrency Control

Automatically detects system CPU and memory status
Dynamically adjusts worker thread count
Prevents system overload

2. Connection Pool Management

HTTP connection pool reuse
Reduces connection establishment overhead
Improves concurrent performance

3. Retry Mechanism

Automatically retries network errors
Exponential backoff strategy
Maximum 3 retry attempts

4. Error Handling

Fine-grained exception classification
Detailed error logging
User-friendly prompts

5. Resource Management

Automatic temporary file cleanup
Startup cleanup of expired cache
Exit cleanup of all temporary resources

❓ FAQ

Q1: What to do if API connection fails?

A: Check the following:

Confirm API service is running
Check if URL format is correct (e.g., http://localhost:8000)
Confirm firewall is not blocking the connection
Check log output for detailed error information

Q2: Why is processing slow?

A: Possible reasons:

High network latency
Insufficient system resources (CPU/memory)
Processing large files or many pages
High API server load

Try:

Reduce concurrent page count
Process large files in batches
Upgrade system hardware

Q3: What file formats are supported?

A: Currently supported:

Images: PNG, JPG, JPEG
Documents: PDF (multi-page support)

Q4: How to change interface language?

A: The application automatically detects system language. You can manually switch by modifying the configuration file:

"ui": {
    "language": "zh_CN"  # or "en_US"
}

Q5: Where are exported files saved?

A: A file selection dialog will pop up during export, allowing you to freely choose the save location.

Q6: How to update to the latest version?

A:

git pull origin main
pip install -r requirements.txt --upgrade

📄 License

This project is licensed under the MIT License.

🙏 Acknowledgments

Monkey OCR API - MonkeyOCR: Document Parsing with a Structure-Recognition-Relation Triplet Paradigm
CustomTkinter - Modern Tkinter interface library
PyMuPDF - High-performance PDF processing library

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github/workflows		.github/workflows
locales		locales
picture		picture
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README.zh.md		README.zh.md
main.py		main.py
monkey_ocr.spec		monkey_ocr.spec
requirements.txt		requirements.txt
version.json		version.json

License

bdim404/Monkey-OCR-GUI

Folders and files

Latest commit

History

Repository files navigation