-
Notifications
You must be signed in to change notification settings - Fork 2.5k
add processing agentstate #4518
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| # reset the `created_at` to the start time of the tool execution | ||
| fnc_call.created_at = time.time() | ||
| speech_handle._item_added([fnc_call]) | ||
| self._session._update_agent_state("processing") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if the tool call has a text message alongside, or there is a session.say in the tool call? the state may become thinking -> speaking -> processing (while agent is still speaking), or thinking -> processing -> speaking (while the function tool is running).
the main problem is the function call execution can be parallel with other states. I am not sure what is the original purpose of adding this state, but we had a function_tools_executed event, what if adding a function_tools_started event? does that solve the issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comment.
I mentioned in #4460 that I can already fire that event from server to worker, and simulate that. So event-based handling isn't an issue.
The problem is that our client side also communicate with livekit cloud for agent state, and that state will still be thinking when tool is being used. Sure I could communicate between client and my backend/worker, but that's kinda circumventing the entire livekit agent state management.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MonkeyLeeT What do you think of supporting something like a ToolState, which would switch between executing and idle? Perhaps this could be an AgentSession property? Let me know if this could address your use case!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That could work! As long as that's synced via livekit cloud so any client connecting to that can get this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MonkeyLeeT you can sync the tool state to client via room.local_participant.set_attributes, for example the agent_state is updated in https://github.com/livekit/agents/blob/[email protected]/livekit-agents/livekit/agents/voice/room_io/room_io.py#L425-L429.
I think you can track the tool state in the function tool itself and sync the state to the client via the set_attributes API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would client side get an event for attribute updated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe you can listen on "participant_attributes_changed"?
|
@codex review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7e68501139
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if exe_task.done(): | ||
| self._session._update_agent_state("listening") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Restore listening state after long tool runs
In _realtime_generation_task_impl, the state flips back to listening only when exe_task is already done at audio playout time. If a tool execution outlasts playout and does not require a follow‑up reply, exe_task completes later but there is no subsequent state update in this function, so the agent can remain stuck in processing/thinking until the next turn. This makes state consumers (e.g., room attributes or UI) believe the agent is still busy even though tool execution finished; consider updating the state after await exe_task when no reply is generated.
Useful? React with 👍 / 👎.
the flow for a tool call goes:
listening->thinking->processing->thinking->speaking->listeningthe first
thinkingstate change is regarding making the tool call, and the second is for processing the tool call output