Skip to content

Commit 227df90

Browse files
Docs: Explain main content node detection in visual diff (#12586)
Adds a new section to the visual diff documentation explaining how Read the Docs detects the main content area of HTML pages for comparison. ## Changes - Added "How main content is detected" section explaining the 4-step priority order - Reorganized limitations section to separate content detection explanation - Provided recommendations for improving detection in non-standard documentation This helps users understand why certain changes may or may not appear in the visual diff and what they can do to improve detection accuracy. --- *Generated by Copilot* --------- Co-authored-by: Manuel Kaufmann <humitos@gmail.com>
1 parent 4e3c82a commit 227df90

File tree

5 files changed

+173
-7
lines changed

5 files changed

+173
-7
lines changed

docs/user/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,7 @@ Read the Docs: documentation simplified
7979
/explanation/documentation-structure
8080
/guides/best-practice/links
8181
/security-implications
82+
/reference/main-content-detection
8283

8384
.. toctree::
8485
:maxdepth: 1

docs/user/link-previews.rst

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,8 @@ Troubleshooting link previews
2929

3030
We perform some heuristic to detect the documentation tool used to generate the page based on its HTML structure.
3131
This auto-detection may fail, resulting in the content rendered inside the popup being incorrect.
32-
If you are experiencing this, you can specify the CSS selector for the main content in :guilabel:`Settings > Addons > Advanced`,
33-
or you can `open an issue in the addons repository <https://github.com/readthedocs/addons>`_ so we improve our heuristic.
32+
If you are experiencing this, you can specify the CSS selector for the main content in :guilabel:`Settings > Addons > Advanced`.
33+
See :ref:`reference/main-content-detection:detection logic` for how this content is detected and guidance on choosing a good selector.
34+
You can also `open an issue in the addons repository <https://github.com/readthedocs/addons>`_ so we improve our heuristic.
3435

3536
Link previews won't be generated if JavaScript is not enabled in your web browser or if all cookies are blocked.
Lines changed: 145 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,145 @@
1+
Main content detection
2+
======================
3+
4+
Read the Docs detects the main content area of HTML pages
5+
to focus on the documentation content itself,
6+
ignoring headers, footers, navigation, and other page elements.
7+
8+
Feature usage
9+
-------------
10+
11+
Different features use this detection in different ways:
12+
13+
* :doc:`/visual-diff`: Uses a documentation-tool specific heuristic first. If you configure a custom selector it overrides that heuristic.
14+
* :doc:`/link-previews`: Uses the custom selector (if set) to scope links; otherwise falls back to heuristics here.
15+
* :doc:`/server-side-search/index`: Always uses heuristics; it ignores any configured custom selector.
16+
17+
Detection logic
18+
---------------
19+
20+
The main content node is detected using the following logic, in order of priority:
21+
22+
#. **Elements with** ``role="main"`` **attribute**: This ARIA role is used by many static site generators and themes to indicate the main content area.
23+
#. **The** ``<main>`` **HTML tag**: The semantic HTML5 element for main content.
24+
#. **Parent of the first** ``<h1>`` **tag**: If no explicit main content markers are found, the system assumes all sections are siblings under a common parent, and uses the parent of the first heading as the main content container.
25+
#. **The** ``<body>`` **tag**: As a last resort, if none of the above are found, the entire body is used.
26+
27+
.. tip::
28+
29+
Following the ARIA_ conventions will also improve the accessibility of your site.
30+
See also https://webaim.org/techniques/semanticstructure/.
31+
32+
.. _ARIA: https://www.w3.org/TR/wai-aria/
33+
34+
Improving detection
35+
-------------------
36+
37+
If your documentation uses a non-standard structure,
38+
Read the Docs may not correctly identify the main content area.
39+
40+
To improve detection, consider:
41+
42+
- Adding a ``role="main"`` attribute to your main content container
43+
- Using a ``<main>`` HTML tag in your theme
44+
- Ensuring your main content has at least one ``<h1>`` heading
45+
46+
Configuring a custom selector
47+
-----------------------------
48+
49+
If the automatic detection does not work for your project, you can explicitly set the CSS selector for the main content node in your project settings:
50+
51+
#. Go to your project's :guilabel:`Settings`.
52+
#. Click :guilabel:`Addons`.
53+
#. Open :guilabel:`Advanced`
54+
#. Fill in the :guilabel:`CSS main content selector` field (for example: ``div#main`` or ``.my-content``). Leave it blank to use automatic detection.
55+
#. Save the settings.
56+
57+
When this selector is configured, it overrides the heuristic detection only for addons that honor it (currently Visual Diff and Link Previews). Choose a stable container whose structure does not change between builds to avoid spurious diffs or missed link previews. It will not affect search indexing unless we add support in the future.
58+
59+
Examples of good selectors:
60+
61+
- ``div#content`` (an id that wraps all page content)
62+
- ``div[role="main"]`` (ARIA role usage)
63+
64+
.. warning::
65+
66+
Avoid overly broad selectors like ``body`` or ones matching multiple nodes (e.g. a class applied to multiple elements).
67+
68+
Example structures
69+
------------------
70+
71+
Using role="main" or <main> tag
72+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
73+
74+
.. code-block:: html
75+
:emphasize-lines: 10-12
76+
77+
<html>
78+
<head>
79+
...
80+
</head>
81+
<body>
82+
<div>
83+
This content isn't processed
84+
</div>
85+
86+
<div role="main">
87+
All content inside the main node is processed
88+
</div>
89+
90+
<footer>
91+
This content isn't processed
92+
</footer>
93+
</body>
94+
</html>
95+
96+
Inferring from first h1 tag
97+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
98+
99+
If a main node isn't found,
100+
we try to infer the main node from the parent of the first section with a ``h1`` tag.
101+
Example:
102+
103+
.. code-block:: html
104+
:emphasize-lines: 10-20
105+
106+
<html>
107+
<head>
108+
...
109+
</head>
110+
<body>
111+
<div>
112+
This content isn't processed
113+
</div>
114+
115+
<div id="parent">
116+
<h1>First title</h1>
117+
<p>
118+
The parent of the h1 title will
119+
be taken as the main node,
120+
this is the div tag.
121+
</p>
122+
123+
<h2>Second title</h2>
124+
<p>More content</p>
125+
</div>
126+
</body>
127+
</html>
128+
129+
Fallback to body tag
130+
~~~~~~~~~~~~~~~~~~~~
131+
132+
If a section title isn't found, we default to the ``body`` tag.
133+
Example:
134+
135+
.. code-block:: html
136+
:emphasize-lines: 5-7
137+
138+
<html>
139+
<head>
140+
...
141+
</head>
142+
<body>
143+
<p>Content</p>
144+
</body>
145+
</html>

docs/user/server-side-search/index.rst

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -74,11 +74,20 @@ Analytics
7474

7575
.. _Elasticsearch: https://www.elastic.co/products/elasticsearch
7676

77-
7877
Search as you type
7978
------------------
8079

8180
Search as-you-type allows users to quickly find exactly what they are looking for while typing.
8281
It also saves recent searches, for future reference.
8382

8483
Try it by pressing :guilabel:`/` (forward slash) and typing.
84+
85+
How main content is detected
86+
----------------------------
87+
88+
Server Side Search indexes the "main content" of HTML pages,
89+
ignoring headers, footers, navigation, and other page elements that aren't part of the documentation content itself.
90+
This keeps results focused and avoids repeated elements like nav menus from polluting relevance.
91+
92+
For details on how the main content area is detected,
93+
see :ref:`reference/main-content-detection:detection logic`.

docs/user/visual-diff.rst

Lines changed: 14 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -82,16 +82,26 @@ This is useful if your ``latest`` version doesn't point the default branch of yo
8282

8383
This option can be changed by contacting :doc:`/support`.
8484

85+
How main content is detected
86+
----------------------------
87+
88+
The visual diff compares the "main content" of HTML pages,
89+
ignoring headers, footers, navigation, and other page elements that aren't part of the documentation content itself.
90+
This helps avoid false positives, like all pages being marked as changed because of a date or commit hash being updated in the footer.
91+
92+
For details on how the main content area is detected,
93+
see :ref:`reference/main-content-detection:detection logic`.
94+
95+
.. tip::
96+
97+
If the heuristic root element picked by Visual Diff is wrong for your project theme, set the :guilabel:`CSS main content selector` under :guilabel:`Settings > Addons`. Visual Diff honors this override; other features like Server Side Search do not.
98+
8599
Limitations and known issues
86100
----------------------------
87101

88102
- The diff considers HTML files only.
89103
- The diff is done between the files from the latest successful build of the pull request and the default base version (latest by default).
90104
If your pull request gets out of sync with its base branch, the diff may not be accurate, and may show unrelated files and sections as changed.
91-
- The diff is done by comparing the "main content" of the HTML files.
92-
This means that some changes outside the main content, like header or footer, may not be detected.
93-
This is done to avoid showing changes that are not relevant to the documentation content itself.
94-
Like all pages being marked as changed because of a date or commit hash being updated in the footer.
95105
- Invisible changes. Some sections may be highlighted as changed, even when they haven't actually visually changed.
96106
This can happen when the underlying HTML changes without a corresponding visual change, for example, if a link's URL is updated
97107
- Tables may be shown to have changes when they have not actually changed.

0 commit comments

Comments
 (0)