Fix responses structured outputs + cache retrieval error #9123
+9
−3
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Closes #9122. See the linked issue for an extended problem description + minimal reproducible example.
Structured outputs fix
My proposed fix mutates the
response_formatrequest argument into the proper form if it is passed as a Pydantic model.Documentation around this is somewhat unreliable: the litellm docs related to this are outdated/incorrect and the OpenAI docs seem to imply that passing a Pydantic model into
text.formatis valid, but this causes JSON serialization errors down the line.So far, the specific formulation as presented in this PR has rendered no issues with our testing.
Cache retrieval fix
While implementing the fix above, I additionally stumbled on an error pertaining to responses models when using caching.
When an item is successfully retrieved from the cache, an ad-hoc attribute
.cache_hitis created and set toTrueon theresponseobject.Unfortunately, this is only possible for
litellm.ModelResponse(the response for chat models) and not forlitellm.ResponsesAPIResponse(the response for response models), because the latter is a Pydantic model withoutextra="allow"set in its config.My proposed fix for this is to simply remove this ad-hoc attribute alltogether, since it is actually superflueous: the
response.usageattribute is cleared on cache hit anyways, which makessettings.usage_tracker.add_usage(self.model, dict(results.usage)a null-op.