Skip to content

Comments

experiment: enable QUARTO_PDF_STANDARD and run latex/typst tests#14097

Draft
gordonwoodhull wants to merge 2 commits intomainfrom
experiment/pdf-standard-env-var
Draft

experiment: enable QUARTO_PDF_STANDARD and run latex/typst tests#14097
gordonwoodhull wants to merge 2 commits intomainfrom
experiment/pdf-standard-env-var

Conversation

@gordonwoodhull
Copy link
Contributor

@gordonwoodhull gordonwoodhull commented Feb 23, 2026

Here are results of adding a QUARTO_PDF_STANDARD environment variable and running the LaTeX and Typst tests.

Idk if we want to keep the vibe-coded tools, so I'll leave this as an experimental branch

  • quarto run --dev tools/find-tests.ts <format> <directory> to find all qmd tests that list or test the format (perhaps a filter in run-tests.sh would be better but this was easy)
  • quarto run tools/filter-pdf-errors.ts - takes logs of rendering all qmds and creates report below

Typst Results

Typst has only 3 kinds of error, only one unexpected:


    56  PDF/UA-1 error: missing alt text
    30  PDF/UA-1 error: missing document title
    13  PDF/UA-1 error: invalid document structure, this element's PDF tag would be split up

We do not have any tests that get past Typst's built-in validation that do not pass veraPDF.

The invalid document structure is orange-book, filed upstream:

LaTeX Results

These are much more varied, so I'll include the report in full for now (and go shovel more snow):

Total files with errors: 25
Total files rendered:    92

    16  ua-2: The Metadata stream as specified in ISO 32000-2:2020, 14.3 in the document catalog dictionary shall contain a dc:title entry
     3  ua-2: <Document> shall not contain <Caption>
     3  ua-2: The font programs for all fonts used for rendering within a conforming file shall be embedded within that file, as defined in ISO 32000-2:2020, 9.9
     3  ua-2: StructTreeRoot shall not contain <P>
     3  ua-2: The structure tree root shall contain a single Document structure element as its only child, as specified in ISO 32000-2:2020, Annex L and ISO/TS 32005
     2  ua-2: Content that is not considered real shall be an artifact
     2  ua-2: StructTreeRoot shall not contain <Caption>
     2  ua-2: <P> shall not contain <Part>
     2  ua-2: StructTreeRoot shall not contain <Part>
     2  ua-2: StructTreeRoot shall not contain <Div>
     2  ua-2: <P> shall not contain <Aside>
     2  ua-2: <P> shall not contain <P>
     1  ua-2: <Sect> shall not contain content items
     1  ua-2: StructTreeRoot shall not contain <Figure>
     1  ua-2: A file in conformance with PDF/UA-2 shall not contain a reference to the .notdef glyph from any of the text showing operators, regardless of text rendering mode, in any content stream
     1  ua-2: StructTreeRoot shall not contain <Table>
     1  ua-2: <TOCI> shall not contain <Formula>
     1  ua-2: <Link> shall not contain <Link>

ua-2: The Metadata stream as specified in ISO 32000-2:2020, 14.3 in the document catalog dictionary shall contain a dc:title entry (16 files):
  - docs/smoke-all/2022/09/30/caption-footnotes/test.qmd
  - docs/smoke-all/2025/03/21/issue-12344.qmd
  - docs/smoke-all/2024/07/18/10324.qmd
  - docs/smoke-all/2024/09/02/10655.qmd
  - docs/smoke-all/2024/08/30/10291/latex-hyphen-lang-es-no-install.qmd
  - docs/smoke-all/2024/08/30/10291/latex-hyphen-lang-es.qmd
  - docs/smoke-all/2024/08/30/10291/latex-hyphen-lang-zh.qmd
  - docs/smoke-all/2023/03/03/article-layout/table-endnotes-4324.qmd
  - docs/smoke-all/2023/04/24/format-links.qmd
  - docs/smoke-all/2023/11/02/latex-quarto-markdown-base64.qmd
  - docs/smoke-all/2023/11/02/7262.qmd
  - docs/smoke-all/2023/11/15/4370.qmd
  - docs/smoke-all/2023/11/14/7568.qmd
  - docs/smoke-all/2023/07/24/code-annotation-false.qmd
  - docs/smoke-all/2023/01/17/format-variants.qmd
  - docs/smoke-all/article-layout/tables/compute-table-screen.qmd

ua-2: <Document> shall not contain <Caption> (3 files):
  - docs/smoke-all/2022/09/30/caption-footnotes/test.qmd
  - docs/smoke-all/2023/09/19/issue-6907.qmd
  - docs/smoke-all/2023/01/17/online-image-mediabag.qmd

ua-2: The font programs for all fonts used for rendering within a conforming file shall be embedded within that file, as defined in ISO 32000-2:2020, 9.9 (3 files):
  - docs/smoke-all/2022/12/9/jats/example.qmd
  - docs/smoke-all/2023/11/02/7262.qmd
  - docs/smoke-all/2023/11/15/4370.qmd

ua-2: StructTreeRoot shall not contain <P> (3 files):
  - docs/smoke-all/2024/05/06/9582.qmd
  - docs/smoke-all/typst/margin-layout/margin-figure-crossref-interleaved.qmd
  - docs/smoke-all/article-layout/tables/compute-table-margin.qmd

ua-2: The structure tree root shall contain a single Document structure element as its only child, as specified in ISO 32000-2:2020, Annex L and ISO/TS 32005 (3 files):
  - docs/smoke-all/2024/05/06/9582.qmd
  - docs/smoke-all/2024/06/24/10112.qmd
  - docs/smoke-all/article-layout/tables/compute-table-margin.qmd

ua-2: Content that is not considered real shall be an artifact (2 files):
  - docs/smoke-all/2022/11/17/3359a.qmd
  - docs/smoke-all/2023/09/19/issue-6907.qmd

ua-2: StructTreeRoot shall not contain <Caption> (2 files):
  - docs/smoke-all/2024/05/06/9582.qmd
  - docs/smoke-all/typst/margin-layout/margin-figure-crossref-interleaved.qmd

ua-2: <P> shall not contain <Part> (2 files):
  - docs/smoke-all/2024/05/06/9582.qmd
  - docs/smoke-all/article-layout/tables/compute-table-margin.qmd

ua-2: StructTreeRoot shall not contain <Part> (2 files):
  - docs/smoke-all/2024/06/24/10112.qmd
  - docs/smoke-all/typst/margin-layout/margin-figure-crossref-interleaved.qmd

ua-2: StructTreeRoot shall not contain <Div> (2 files):
  - docs/smoke-all/2024/06/24/10112.qmd
  - docs/smoke-all/typst/margin-layout/margin-figure-crossref-interleaved.qmd

ua-2: <P> shall not contain <Aside> (2 files):
  - docs/smoke-all/2024/06/24/10112.qmd
  - docs/smoke-all/article-layout/tables/compute-table-margin.qmd

ua-2: <P> shall not contain <P> (2 files):
  - docs/smoke-all/typst/margin-layout/margin-figure-crossref-interleaved.qmd
  - docs/smoke-all/article-layout/tables/compute-table-margin.qmd

ua-2: <Sect> shall not contain content items (1 files):
  - docs/smoke-all/2022/12/9/jats/example.qmd

ua-2: StructTreeRoot shall not contain <Figure> (1 files):
  - docs/smoke-all/2024/05/06/9582.qmd

ua-2: A file in conformance with PDF/UA-2 shall not contain a reference to the .notdef glyph from any of the text showing operators, regardless of text rendering mode, in any content stream (1 files):
  - docs/smoke-all/2024/08/30/10291/latex-hyphen-lang-zh.qmd

ua-2: StructTreeRoot shall not contain <Table> (1 files):
  - docs/smoke-all/2024/06/24/10112.qmd

ua-2: <TOCI> shall not contain <Formula> (1 files):
  - docs/smoke-all/2023/09/19/issue-6907.qmd

ua-2: <Link> shall not contain <Link> (1 files):
  - docs/smoke-all/crossrefs/float/latex/latex-custom-categories.qmd

gordonwoodhull and others added 2 commits February 23, 2026 14:10
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add environment variable fallback for pdf-standard option so any
document without an explicit pdf-standard setting inherits from
QUARTO_PDF_STANDARD (comma-separated, e.g. "ua-1" or "a-2b,ua-1").

Also add tools/find-tests.ts to find test documents by format
and tools/filter-pdf-errors.ts to extract and summarize PDF
validation errors from render logs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@posit-snyk-bot
Copy link
Collaborator

posit-snyk-bot commented Feb 23, 2026

Snyk checks have passed. No issues have been found so far.

Status Scanner Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues
Licenses 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

@gordonwoodhull
Copy link
Contributor Author

gordonwoodhull commented Feb 23, 2026

Here is Claude's analysis of the LaTeX failures. The big structural one is margin layout.

There's one knitr problem where we could embed fonts, but otherwise everything is upstream.

PDF UA-2 LaTeX Failure Analysis

22 files fail PDF/UA-2 validation (excluding 3 tests that use non-LuaTeX engines or
lack CJK fonts, which are separate issues). The failures cluster into 6 root causes.

1. Missing dc:title in XMP metadata (15 files)

Error: The Metadata stream as specified in ISO 32000-2:2020, 14.3 in the document catalog dictionary shall contain a dc:title entry

Cause: These test files have no title: in their YAML frontmatter. The LaTeX
template (src/resources/formats/pdf/pandoc/hypersetup.latex) only sets pdftitle
when the $title-meta$ variable exists, which derives from title. Without it, the
XMP metadata stream (enabled by xmp=true in \DocumentMetadata) has no dc:title
entry, which PDF/UA-2 requires.

Files:

File Notes
2025/03/21/issue-12344.qmd No title. Tests column-page-right layout.
2024/07/18/10324.qmd No title. Tests R subcaptions with tinytable.
2024/09/02/10655.qmd No title. Tests font auto-install.
2024/08/30/10291/latex-hyphen-lang-es-no-install.qmd No title. Tests Spanish hyphenation without install.
2024/08/30/10291/latex-hyphen-lang-es.qmd No title. Tests Spanish hyphenation.
2023/03/03/article-layout/table-endnotes-4324.qmd No title. Tests table endnotes with margin references.
2023/04/24/format-links.qmd Has Title: Test 123 (capital T). Pandoc requires lowercase title:.
2023/11/02/latex-quarto-markdown-base64.qmd No title. Tests base64-encoded markdown in LaTeX.
2023/11/02/7262.qmd No title. Tests layout-ncol with figure + table.
2023/11/15/4370.qmd No title. Tests R figure layout.
2023/11/14/7568.qmd No title. Tests code annotations in LaTeX.
2023/07/24/code-annotation-false.qmd No title. Tests code-annotations: false.
2023/01/17/format-variants.qmd No title. Tests format variant syntax.
article-layout/tables/compute-table-screen.qmd No title. Tests screen-width table.
2022/09/30/caption-footnotes/test.qmd No title. Tests caption footnotes. (Also has Caption error, see below.)

Problem in: Test files. These tests predate PDF/UA-2 and simply lack a document
title. One special case: format-links.qmd uses Title: (capital T) which Pandoc
treats as a custom metadata key, not the document title.

Quarto could help: Quarto could auto-generate a dc:title fallback (e.g. from the
filename) when tagging is on and no title is set. But fundamentally, real documents
should have titles.


2. Margin layout breaks PDF structure tree (5 files)

Errors: StructTreeRoot shall not contain <P>/<Caption>/<Figure>/<Table>/<Div>/<Part>,
The structure tree root shall contain a single Document structure element,
<P> shall not contain <Aside>/<Part>/<P>

Cause: All these tests use margin layout (.column-margin or column: margin).
Quarto's filters generate:

  • \marginnote{\begin{footnotesize}...\end{footnotesize}} for text
    (src/resources/filters/layout/latex.lua:215-238)
  • \begin{marginfigure}...\end{marginfigure} for figures
    (src/resources/filters/layout/latex.lua:554)
  • \begin{margintable}...\end{margintable} for tables
    (src/resources/filters/layout/latex.lua:643-663)

These come from the sidenotes and marginnote LaTeX packages (injected at
src/resources/filters/layout/meta.lua:106-116). These packages predate PDF
tagging
and don't cooperate with tagpdf:

  • \marginnote creates content in a separate output stream that escapes the
    Document structure element entirely, placing children at StructTreeRoot.
  • tagpdf assigns an <Aside> role to margin content but nests it inside an
    active <P> context, which is invalid per PDF/UA-2.
  • The result is structure elements (<P>, <Caption>, <Figure>, etc.) appearing
    directly under StructTreeRoot instead of inside <Document>.

Files:

File Margin content
2024/05/06/9582.qmd Multiple .column-margin divs with figures and text
2024/06/24/10112.qmd R table with column: margin
typst/margin-layout/margin-figure-crossref-interleaved.qmd Figures with .column-margin class
article-layout/tables/compute-table-margin.qmd R table with column: margin
2022/09/30/caption-footnotes/test.qmd Also affected (uses figure captions; see section 3)

Problem in: LaTeX packages (sidenotes/marginnote). These packages are not
tag-aware. Fixing this requires either:

  • Upstream patches to sidenotes/marginnote for tagpdf compatibility
  • A different LaTeX approach for margin content when tagging is enabled
  • Skipping UA-2 validation for margin-layout tests until the ecosystem catches up

Upstream status (as of Feb 2026):

The LaTeX kernel's built-in \marginpar command does have tagging support (tags as
<Aside>), but quarto doesn't use \marginpar directly — it uses the marginnote
and sidenotes packages, both of which are tracked as currently-incompatible in
the LaTeX tagging project:

  • marginnote: latex3/tagging-project#165
    (opened Jul 2024, still open). Errors with paragraph hook counting mismatches.
    The LaTeX team has a commented-out namespace entry mapping marginnote -> Aside
    in latex-lab-namespace.dtx, indicating intent but no implementation. Package
    maintainer (Markus Kohm) is listed as inactive. No comments, no assignees,
    no PRs on the issue.

  • sidenotes: latex3/tagging-project#555
    (opened Aug 2024, still open). Basic \sidenote{} works (uses \marginpar
    internally), but \sidenote[][offset]{text}, \sidecaption, marginfigure,
    and margintable environments all fail because they depend on marginnote.
    No public repo or issue tracker for the package. Priority 7 (low) in the
    tagging status tracker.

  • All 10 area: marginpars issues in latex3/tagging-project are open with
    zero closed. No milestones, no active PRs, minimal discussion.

There is no timeline for resolution. Margin layout and PDF/UA-2 are incompatible
in the current LaTeX ecosystem.


3. <Document> shall not contain <Caption> (2 files)

Error: <Document> shall not contain <Caption>

Cause: When a figure caption contains complex content (footnotes, citations),
LaTeX's footnote processing extracts content out of the figure environment. tagpdf
loses the parent-child relationship and places <Caption> directly under <Document>
instead of inside <Figure>.

Files:

File Caption content
2022/09/30/caption-footnotes/test.qmd Figures with footnotes and citations in captions
2023/01/17/online-image-mediabag.qmd Online image; may relate to implicit figure wrapping

Problem in: Pandoc + LaTeX tagging. Pandoc's generated LaTeX for captions with
footnotes doesn't maintain proper structural nesting for tagpdf. This is an
upstream issue in how pandoc emits LaTeX figure environments with complex captions.


4. Fonts not embedded (3 files)

Error: The font programs for all fonts used for rendering within a conforming file shall be embedded within that file, as defined in ISO 32000-2:2020, 9.9

Cause: R's graphics devices (used by knitr to generate plot PDFs) don't always
embed all fonts. When these plot PDFs are included in the final document, unembedded
font references carry over.

Files:

File R content
2022/12/9/jats/example.qmd plot(cars), knitr::kable(head(mtcars)), embedded notebook
2023/11/02/7262.qmd plot(cars) via knitr in layout-ncol
2023/11/15/4370.qmd plot(1), plot(2) via knitr

Problem in: R/knitr (external tool). R's default pdf() device doesn't always
embed fonts. Quarto could mitigate this by configuring knitr's default graphics device
to use cairo_pdf() or setting pdf(embed=TRUE) when PDF standards are active.


5. <Link> shall not contain <Link> (1 file)

Error: <Link> shall not contain <Link>

Cause: This test defines custom crossref types with list-of commands
(\listofsupptbls, \listofdiagrams in raw LaTeX). The list-of entries are
hyperlinked (clickable navigation), and the cross-references inside them also
generate \hyperref links. This creates nested <Link> elements in the structure
tree.

File: crossrefs/float/latex/latex-custom-categories.qmd

Problem in: LaTeX tagging (tagpdf + hyperref interaction). When hyperref
generates links inside list-of entries that are themselves linked, the structure
tree gets nested <Link> elements. This is an upstream LaTeX issue.


6. <Sect> shall not contain content items (1 file)

Error: <Sect> shall not contain content items

Cause: This document has complex metadata (multiple authors with affiliations,
funding, citations, licenses) and toc: true. The author block or TOC generation
produces text content directly inside a <Sect> structure element without a <P>
wrapper.

File: 2022/12/9/jats/example.qmd

Problem in: Pandoc + LaTeX tagging. Pandoc's LaTeX output has loose text in
section contexts that tagpdf doesn't automatically wrap in <P> elements.


Summary

Root cause Files Owner Effort
Missing title: in YAML 15 Test files Low -- add titles
Margin layout breaks structure tree 5 LaTeX packages High -- needs upstream work
Caption with footnotes misplaced 2 Pandoc + LaTeX Medium -- pandoc change needed
R plots don't embed fonts 3 R/knitr Medium -- configure knitr defaults
Nested hyperlinks in list-of 1 LaTeX (tagpdf/hyperref) High -- upstream fix
Loose content in Sect 1 Pandoc + LaTeX Medium -- pandoc change needed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants