Skip to content

Conversation

@sebastien-rosset
Copy link
Contributor

Description

This PR fixes the read_multiple_files function to properly handle empty arrays and improves documentation to make the parameter structure self-documenting for AI assistants.

Server Details

  • Server: filesystem
  • Changes to: tools (read_multiple_files function schema and validation)

Motivation and Context

Addresses issue #2032 where the read_multiple_files function lacked clear documentation about its parameter structure and behavior, causing AI assistants to use it incorrectly without explicit instruction. The function would accept empty arrays and return empty results without helpful feedback, making it difficult for AI assistants to understand proper usage.

How Has This Been Tested?

  • Schema validation tested with empty arrays (now properly rejected with clear error message)
  • Verified that non-empty arrays continue to work as expected
  • Confirmed that the JSON schema generation includes the new descriptive text
  • Tested backward compatibility with existing valid use cases

Breaking Changes

No breaking changes for valid usage. The only change is that empty arrays (which previously returned empty results) now return a clear error message explaining that at least one file path is required.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update

Checklist

  • I have read the MCP Protocol Documentation
  • My changes follows MCP security best practices
  • I have updated the server's README accordingly
  • I have tested this with an LLM client
  • My code follows the repository's style guidelines
  • New and existing tests pass locally
  • I have added appropriate error handling
  • I have documented all environment variables and configuration options

Additional context

Changes made:

  1. Schema validation: Added .min(1, "At least one file path must be provided") to reject empty arrays with a clear error message
  2. Parameter documentation: Added .describe() to the paths parameter explaining the expected format and requirements
  3. Code formatting: Reformatted the schema definition for better readability

Before this fix:

  • Empty arrays would be processed and return empty results
  • AI assistants required explicit instruction about parameter structure
  • No clear feedback about minimum requirements

After this fix:

  • Empty arrays are rejected with descriptive error: "At least one file path must be provided"
  • Schema is self-documenting with clear parameter descriptions
  • AI assistants can better understand correct usage without explicit instruction

The fix maintains full backward compatibility for all valid use cases while providing better guidance for incorrect usage.

Copy link
Member

@olaservo olaservo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! In order to fully address this issue, should the tool description also be updated? #2032

@olaservo olaservo added the server-filesystem Reference implementation for the Filesystem MCP server - src/filesystem label Aug 23, 2025
@sebastien-rosset
Copy link
Contributor Author

sebastien-rosset commented Aug 23, 2025

#2032

I'm assuming you are referring to enhancing this description:

      {
        name: "read_multiple_files",
        description:
          "Read the contents of multiple files simultaneously. This is more " +
          "efficient than reading files one by one when you need to analyze " +
          "or compare multiple files. Each file's content is returned with its " +
          "path as a reference. Failed reads for individual files won't stop " +
          "the entire operation. Only works within allowed directories. " +
          "Requires at least one file path in the 'paths' parameter array.",

Are you suggesting the input parameters should be documented in the tool description field?

The MCP tool specification defines three distinct documentation layers for the description fields:

  1. Tool description: "Human-readable description of functionality"
  2. Input inputSchema: "JSON Schema defining expected parameters" (with property-level descriptions)
  3. Output outputSchema: "Optional JSON Schema defining expected output structure" (with property-level descriptions)

Established Patterns for description fields

Looking at established patterns for the description field, I see inconsistencies. Sometimes the tool description is high-level without parameter descriptions. And sometimes the tool description includes parameter descriptions.

Filesystem Server:

  • Most tools: high-level, e.g. "Create a new file or completely overwrite an existing file"
  • Inconsistent: read_text_file mentions 'head' and 'tail' parameters in description

Memory Server: Tool descriptions are consistently high-level:

  • "Create multiple new entities in the knowledge graph"
  • "Add new observations to existing entities"

NO parameter names or structures mentioned in tool descriptions

Git Server: Tool descriptions are brief and functional:

  • "Shows the working tree status"
  • "Records changes to the repository"

NO parameter details in tool descriptions

How LLMs Use Tool Documentation in Practice

A key consideration is how LLMs Use Tool Documentation in Practice. Looking at VS Code, Claude desktop and Claude code, I see the LLMs use the following pattern:

  1. Read Tool descriptions → Help LLM understand when and why to use a tool
  2. Read Schema descriptions → Help LLM understand how to structure the parameters

Both description fields are read before invoking the tool, as they serve complementary, not redundant purposes. Note this is just anecdotal evidence, I'm not sure if this pattern is widespread across all LLMs.

I've also noticed the following behavior:

  1. When the input schema is not documented or poorly documented, the LLM creates a wrong input payload for the tool action.
  2. The above problem is resolved after writing proper descriptions to the input schema. In that case, there is no need to document the input schema in the high-level tool description.

Problems with Including Parameters in Tool Descriptions

IMHO, it makes sense to keep the tool description high-level otherwise it becomes redundant with the input schema description, potentially leading to maintenance issues and contradictions/ambiguities. Including Parameters in Tool Descriptions can cause problems:

  1. Redundancy: Parameter details are already captured in schema descriptions where they belong
  2. Maintenance burden: Changes to parameters require updates in multiple places
  3. Inconsistency: Creates different documentation patterns within the same codebase
  4. Semantic confusion: Mixes high-level purpose with implementation details
  5. Verbosity: Makes tool descriptions unnecessarily long and harder to scan (i.e. more tokens consumed if the parameters are described both in the input schema and tool description)

Benefits of Separation

The separation of the tool description and input schema descriptions allows for:

  • Quick tool selection based on high-level descriptions
  • Detailed parameter construction using schema specifications
  • Clear separation of concerns in documentation

Potential Enhancements to the MCP Spec

I think there is a genuine ambiguity in the MCP spec that could benefit from clarification. For example, there could be documentation best practices. I have created modelcontextprotocol/modelcontextprotocol#1382

@olaservo
Copy link
Member

Thanks @sebastien-rosset , looking at the tool description again, I think you're correct that there isn't a need to specify anything differently there. I agree that we shouldn't be repeating the information given by tool params given that the model should already see those (and tbh the name of the function already implies that a list or array of files should be passed in, so not sure why models are struggling with using it correctly in the first place).

@olaservo olaservo merged commit 338d8af into modelcontextprotocol:main Aug 23, 2025
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

server-filesystem Reference implementation for the Filesystem MCP server - src/filesystem

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants