From 40d5dbdc332814cc08092f8aba6f657c130aa23a Mon Sep 17 00:00:00 2001
From: Deepak <ds954642@gmail.com>
Date: Fri, 30 Jan 2026 13:05:09 +0530
Subject: [PATCH] docs: document OCR support and required pymupdf.layout import
 for PyMuPDF4LLM

---
 docs/images/layout-ocr-flow.png | Bin 59396 -> 59398 bytes
 docs/pymupdf-layout/index.rst   |  49 ++++++++++++++++++++++++--------
 2 files changed, 37 insertions(+), 12 deletions(-)
diff --git a/docs/images/layout-ocr-flow.png b/docs/images/layout-ocr-flow.png
index 3b9a2e1ecc8e31958221e369999ebd056b7ba9bb..8f84016f5018bdc89b8c56acf42bfe4cba9798ac 100644
GIT binary patch
delta 12
TcmZp<z}$9$c>~J}CSEQ8B9jDW

delta 9
QcmZp>z}#|yc>~J}02PP?O#lD@

diff --git a/docs/pymupdf-layout/index.rst b/docs/pymupdf-layout/index.rst
index bd89e2501..6ecda0df1 100644
--- a/docs/pymupdf-layout/index.rst
+++ b/docs/pymupdf-layout/index.rst
@@ -138,28 +138,53 @@ Now we can happily load Office files and convert them as follows::
 OCR support
 ~~~~~~~~~~~~~~~~~
 
-The new layout-sensitive |PyMuPDF4LLM| version also evaluates whether a page would benefit from applying OCR to it. If its heuristics come to this conclusion, the built-in Tesseract-OCR module is automatically invoked. Its results are then handled like normal page content.
- 
-If a page contains (roughly) no text at all, but is covered with images or many character-sized vectors, a check is made using `OpenCV <https://pypi.org/project/opencv-python/>`_ whether text is *probably* detectable on the page at all. This is done to tell apart image-based text from ordinary pictures (like photographs).
+**Critical: Import pymupdf.layout FIRST**
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-If the page does contain text but too many characters are unreadable (like "�����"), OCR is also executed, but **for the affected text areas only** -- not the full page. This way, we avoid losing already existing text and other content like images and vectors.
+.. code-block:: python
+   :emphasize-lines: 1
 
-For these heuristics to work we need both, an existing :ref:`Tesseract installation <installation_ocr>` and the availability of `OpenCV <https://pypi.org/project/opencv-python/>`_ in the Python environment. If either is missing, no OCR is attempted at all.
+   import pymupdf.layout  # REQUIRED FIRST - enables OCR decision tree
+   import pymupdf4llm     # Now OCR heuristics are active
 
-The decision tree for whether OCR is actually used or not depends on the following:
+   md_text = pymupdf4llm.to_markdown("scanned.pdf")
+   # Auto: detects image pages → OCR → markdown
 
-1. :ref:`PyMuPDF Layout is imported <pymupdf_layout_using>`
+.. warning::
+   **Without `import pymupdf.layout`, OCR is NEVER attempted** - 
+   even if Tesseract and OpenCV are installed.
 
-2. In the :ref:`PyMuPDF4LLM API <pymupdf4llm-api>` you have `use_ocr` enabled (this is set to `True` by default)
+**Complete Requirements** (all must be satisfied)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-3. :ref:`Tesseract is correctly installed <installation_ocr>`
+.. list-table:: OCR Decision Prerequisites
+   :widths: 15 85
+   :header-rows: 1
 
-4. `OpenCV <https://pypi.org/project/opencv-python/>`_ is available in your Python environment
+   * - Check
+     - Requirement
+   * - 1. Layout
+     - :ref:`PyMuPDF Layout is imported <pymupdf_layout_using>`
+   * - 2. OCR API
+     - :ref:`PyMuPDF4LLM API <pymupdf4llm-api>` you have ``use_ocr`` enabled (this is set to ``True`` by default)
+   * - 3. Tesseract
+     - :ref:`Tesseract OCR is correctly installed <installation_ocr>`
+   * - 4. OpenCV
+     - Available in the Python environment (``pip install opencv-python``)
 
+**Smart OCR Heuristics** (Detailed)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-.. image:: ../images/layout-ocr-flow.png
+The new layout-sensitive |PyMuPDF4LLM| version also evaluates whether a page would benefit from applying OCR to it. If its heuristics come to this conclusion, the built-in Tesseract-OCR module is automatically invoked. Its results are then handled like normal page content.
 
-----
+If a page contains (roughly) **no text at all**, but is covered with **images or many character-sized vectors**, a check is made using `OpenCV <https://pypi.org/project/opencv-python/>`_ whether text is *probably* detectable on the page at all. This is done to tell apart **image-based text** from ordinary pictures (like photographs).
+
+If the page **does contain text** but **too many characters are unreadable** (like "�����"), OCR is also executed, but **for the affected text areas only** – not the full page. This way, we avoid losing already existing text and other content like images and vectors.
+
+**OCR Decision Tree**
+^^^^^^^^^^^^^^^^^^^^
+
+.. image:: ../images/layout-ocr-flow.png
 
 .. _pymupdf_layout_and_pymupdf4llm_api: