Skip to content

Conversation

@Olocool17
Copy link
Contributor

@Olocool17 Olocool17 commented Dec 11, 2025

Closes #9122. See the linked issue for an extended problem description + minimal reproducible example.

Structured outputs fix

My proposed fix mutates the response_format request argument into the proper form if it is passed as a Pydantic model.
Documentation around this is somewhat unreliable: the litellm docs related to this are outdated/incorrect and the OpenAI docs seem to imply that passing a Pydantic model into text.format is valid, but this causes JSON serialization errors down the line.

So far, the specific formulation as presented in this PR has rendered no issues with our testing.

Cache retrieval fix

While implementing the fix above, I additionally stumbled on an error pertaining to responses models when using caching.
When an item is successfully retrieved from the cache, an ad-hoc attribute .cache_hit is created and set to True on the response object.

Unfortunately, this is only possible for litellm.ModelResponse (the response for chat models) and not for litellm.ResponsesAPIResponse (the response for response models), because the latter is a Pydantic model without extra="allow" set in its config.

My proposed fix for this is to simply remove this ad-hoc attribute alltogether, since it is actually superflueous: the response.usage attribute is cleared on cache hit anyways, which makes settings.usage_tracker.add_usage(self.model, dict(results.usage) a null-op.

Copy link
Collaborator

@chenmoneygithub chenmoneygithub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! both fixes look good to me, but could you please separate them into 2 PRs? Ideally we want to keep a clean commit history, and each PR only does one thing.

Please also add unit test for the responses API conversion, thank you again!

# Convert `response_format` to `text.format` for Responses API
if "response_format" in request:
response_format = request.pop("response_format")
if isinstance(response_format, type) and issubclass(response_format, pydantic.BaseModel):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of one more serialization, can we use the text_format arg?

response_format = request.pop("response_format")
request["text_format"] = response_format

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately not, I tried just the text_format arg at first, but I think litellm refused to play nice with it.
Serializing it ourselves might a bit more verbose but being explicit is warranted here, since we modify the Pydantic model's model_json_schema in JSONAdapter manually anyways.

@Olocool17
Copy link
Contributor Author

Olocool17 commented Dec 15, 2025

Thanks! I'll close this PR and spin up two new ones (see #9130 and #9131) .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Structured outputs always fail when using responses model

2 participants