Skip to content

MarkdownDocumentReader does not parse/retain links in Markdown content #5144

@madhukargunda

Description

@madhukargunda

Bug description

the MarkdownDocumentReader in Spring AI appears to ignore or fail to read hyperlinks in Markdown documents. When a Markdown file contains links (![text](url)), the extracted Document content does not include the expected URL or link text, effectively dropping link information in the processed output and also table contains the image links also ignored by readers.

#Syntax Issue 1: "[![Alt text for image](image-url)](link-url)"
#Syntax Issue 2: "[![Alt text for image](link-url)"

This affects applications where preserving link references is important (e.g., ingestion for RAG, documentation analysis, link-aware agents, etc).

Environment

  • Spring AI version: (e.g., 1.1.x or 1.0.x)
  • MarkdownDocumentReader: org.springframework.ai.reader.markdown.MarkdownDocumentReader (from spring-ai-markdown-document-reader)
  • Java version: (e.g., 17, 21)
  • Build tool: (Maven)
  • Operating System: (Mac book Pro M2)

Steps to reproduce

Create a Markdown file with content that includes one or more links,
e.g.:

# Welcome to the Docling Java Project!

![Docling Java](docs/src/doc/docs/assets/img/docling-java.png)

This is the repository for Docling Java, a Java API for using [Docling](https://github.com/docling-project).
Instantiate MarkdownDocumentReader with the Markdown as a resource:

Resource resource = new ClassPathResource("example.md");
MarkdownDocumentReader reader = new MarkdownDocumentReader(resource, MarkdownDocumentReaderConfig.defaultConfig());
List<Document> docs = reader.get();

Inspect the resulting Document contents for the link. The link URL/text is missing or the link is stripped out.

Expected behavior
Links in Markdown content (both link text AND URL) should either: be preserved in the parsed Document output, or included in a structured property/metadata field (if configurable)

The current behavior should be clarified (documented), or a change should be made so that links are not silently dropped.

Minimal Complete Reproducible example

    @Order(4)
    public CommandLineRunner springAIMarkdownReader() {
        return args -> {
            MarkdownDocumentReaderConfig config = MarkdownDocumentReaderConfig.builder()
                    .withHorizontalRuleCreateDocument(true)
                    .withIncludeCodeBlock(true)
                    .withIncludeBlockquote(true)
                    .withAdditionalMetadata("filename", "Test.md")
                    .build();
            MarkdownDocumentReader reader = new MarkdownDocumentReader("classpath:mark-down/Test.md", config);
            List<Document> documents = reader.get();
            log.info("The size of the documents list is {}", documents.size());
            System.out.println("");
            documents.forEach(document ->
                    {
                        log.info("{} {}", document.getText(), document.getMetadata());
                        System.out.println("");
                    }
            );
        };
    }`

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions