Skip to content

[Bug]: Error when using a generic datamodel with structured output #20324

@schelv

Description

@schelv

Bug Description

Using a generic BaseModel with as_structured_llm causes a 400 BadRequest from OpenAI.
This is probably due to the generated JSON schema name.

Version

0.14.8

Steps to Reproduce

The following code works correctly with a non-generic model (DataModel) but fails when using the generic version (GenericDataModel[int]):

from typing import Sequence
from llama_index.core.base.llms.types import ChatMessage
from llama_index.llms.openai import OpenAI
from pydantic import BaseModel

class DataModel(BaseModel):
    answer: int

class GenericDataModel[T](BaseModel):
    answer: T

def structured_output_existential_chat(datamodel):
    llm = OpenAI(model="gpt-4o")
    sllm = llm.as_structured_llm(output_cls=datamodel)

    messages: Sequence[ChatMessage] = [ChatMessage(content="What is the meaning of life?")]
    response = sllm.chat(messages)
    print(response)

structured_output_existential_chat(DataModel)

print(GenericDataModel[int].model_json_schema())
structured_output_existential_chat(GenericDataModel[int])

Observed behavior

assistant: {"answer":42}

{'properties': {'answer': {'title': 'Answer', 'type': 'integer'}}, 'required': ['answer'], 'title': 'GenericDataModel[int]', 'type': 'object'}

Then:

openai.BadRequestError: Error code: 400 - {'error': {'message': "Invalid 'response_format.json_schema.name': string does not match pattern. Expected a string that matches the pattern '^[a-zA-Z0-9_-]+$'.", 'type': 'invalid_request_error', 'param': 'response_format.json_schema.name', 'code': 'invalid_value'}}

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingtriageIssue needs to be triaged/prioritized

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions