-
Notifications
You must be signed in to change notification settings - Fork 749
Description
Environment details
Programming language: Python
OS: Ubuntu 24.04
Language runtime version: Python 3.12
Package version: google-genai 1.56.0
Steps to reproduce
- Initiate a batch-mode request using Gemini to generate a JSON object.
- Observe the model entering a repetitive hallucination loop.
- The model hits the max_output_tokens limit due to the loop, resulting in a truncated and corrupted JSON string.
Description of the Issue
When processing batch-mode requests, Gemini frequently (about 70% of the requests in the JSON file) enters a state of repetitive hallucination. This behavior leads to the following sequence of failures:
- Token Exhaustion: The model continues repeating text until it reaches the maximum output token limit.
- Data Corruption: Because the generation is cut off at the token limit, the resulting JSON object is incomplete and cannot be parsed.
Crucially, this only happens in batch mode, never when using single requests via aio.models.generate_content. For context, I use a rather complicated JSON schema for structured output though I don't think this is relevant since it works without problems with single requests.
Attempted Fixes
In an effort to mitigate this, I have tried the following without success:
- Increasing Temperature: I raised the temperature to encourage more varied output and help the model "break out" of repeating loops.
Expected Behavior
The model should generate a valid, well-formed JSON object with a reasonable number of tokens without entering into repetitive loops.