-
Notifications
You must be signed in to change notification settings - Fork 2.5k
add AgentConfigUpdate & initial judges
#4547
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughThis PR introduces a structured evaluation framework for agent conversations through a new Changes
Sequence DiagramsequenceDiagram
participant Client
participant Judge
participant LLM
Client->>Judge: evaluate(chat_ctx, reference)
Judge->>Judge: format_chat_ctx(chat_ctx)
Judge->>Judge: build_composite_prompt(instructions, formatted_ctx, reference)
Judge->>LLM: stream(prompt)
LLM-->>Judge: response chunks
Judge->>Judge: detect PASS/FAIL in response
Judge->>Judge: create JudgmentResult(passed, reasoning)
Judge-->>Client: JudgmentResult
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@livekit-agents/livekit/agents/llm/chat_context.py`:
- Around line 213-229: AgentConfigUpdate is missing the agent_id field, so
callers that set agent_id and the formatter _format_chat_ctx that reads it can
fail; add an agent_id: str | None = None (or appropriate type) to the
AgentConfigUpdate model declaration so the value is preserved and safe to
access, and ensure the new field is included before PrivateAttr/_tools in the
AgentConfigUpdate class so ChatItem (which unions AgentConfigUpdate) will carry
agent_id correctly.
♻️ Duplicate comments (2)
livekit-agents/livekit/agents/voice/agent_activity.py (1)
314-323:agent_idfield missing inAgentConfigUpdate(covered in model)This block passes
agent_id; ensure the model defines it so the value isn’t lost.livekit-agents/livekit/agents/evals/judge.py (1)
16-40: Guardagent_idaccess for config updates
agent_idis referenced here; ensure the model defines it (see AgentConfigUpdate).
🧹 Nitpick comments (3)
livekit-agents/livekit/agents/voice/agent_activity.py (1)
331-349: Stabilize tool diff ordering for deterministic updates
set→listproduces nondeterministic ordering; sorting keeps logs/tests stable.🔧 Proposed fix
- tools_added = list(new_tool_names - old_tool_names) or None - tools_removed = list(old_tool_names - new_tool_names) or None + tools_added = sorted(new_tool_names - old_tool_names) or None + tools_removed = sorted(old_tool_names - new_tool_names) or Nonelivekit-agents/livekit/agents/evals/judge.py (2)
8-13: Add a Google‑style class docstring forJudgmentResultAs per coding guidelines, please add Google-style docstrings.🔧 Proposed fix
`@dataclass` class JudgmentResult: + """Result of a judge evaluation. + + Attributes: + passed: Whether the evaluation passed. + reasoning: Model reasoning for the judgment. + """ passed: bool """Whether the evaluation passed.""" reasoning: str """Chain-of-thought reasoning for the judgment."""
43-87: Make PASS/FAIL parsing deterministic
rfindcan be tripped by “PASS/FAIL” in the reasoning. Require a final verdict line and parse only that.🔧 Proposed fix
prompt_parts.extend( [ "", "Does the conversation meet the criteria? Don't overthink it.", - "Explain your reasoning step by step, then answer Pass or Fail.", + "Provide a brief justification, then output a final line with exactly PASS or FAIL.", ] ) @@ - response = "".join(response_chunks) - - response_upper = response.upper() - pass_pos = response_upper.rfind("PASS") - fail_pos = response_upper.rfind("FAIL") - passed = pass_pos > fail_pos if pass_pos != -1 else False + response = "".join(response_chunks).strip() + last_line = response.splitlines()[-1].strip().upper() if response else "" + passed = last_line.startswith("PASS")
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
livekit-agents/livekit/agents/__init__.pylivekit-agents/livekit/agents/evals/__init__.pylivekit-agents/livekit/agents/evals/judge.pylivekit-agents/livekit/agents/llm/__init__.pylivekit-agents/livekit/agents/llm/chat_context.pylivekit-agents/livekit/agents/voice/agent_activity.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Format code with ruff
Run ruff linter and auto-fix issues
Run mypy type checker in strict mode
Maintain line length of 100 characters maximum
Ensure Python 3.9+ compatibility
Use Google-style docstrings
Files:
livekit-agents/livekit/agents/evals/__init__.pylivekit-agents/livekit/agents/llm/__init__.pylivekit-agents/livekit/agents/__init__.pylivekit-agents/livekit/agents/voice/agent_activity.pylivekit-agents/livekit/agents/evals/judge.pylivekit-agents/livekit/agents/llm/chat_context.py
🧬 Code graph analysis (6)
livekit-agents/livekit/agents/evals/__init__.py (1)
livekit-agents/livekit/agents/evals/judge.py (6)
Judge(43-87)JudgmentResult(9-13)accuracy_judge(112-128)safety_judge(151-168)task_completion_judge(90-109)tool_use_judge(131-148)
livekit-agents/livekit/agents/llm/__init__.py (1)
livekit-agents/livekit/agents/llm/chat_context.py (1)
AgentConfigUpdate(213-224)
livekit-agents/livekit/agents/__init__.py (1)
livekit-agents/livekit/agents/llm/chat_context.py (2)
AgentConfigUpdate(213-224)AgentHandoff(205-210)
livekit-agents/livekit/agents/voice/agent_activity.py (2)
livekit-agents/livekit/agents/llm/tool_context.py (4)
get_fnc_tool_names(283-292)tools(44-46)ToolContext(295-418)flatten(320-325)livekit-agents/livekit/agents/llm/chat_context.py (1)
AgentConfigUpdate(213-224)
livekit-agents/livekit/agents/evals/judge.py (4)
livekit-agents/livekit/agents/voice/agent_activity.py (1)
llm(2815-2819)livekit-agents/livekit/agents/llm/chat_context.py (7)
ChatContext(232-670)items(241-242)items(245-246)text_content(164-173)copy(297-354)copy(690-691)add_message(248-281)livekit-agents/livekit/agents/voice/agent_session.py (1)
output(394-395)livekit-agents/livekit/agents/voice/agent.py (1)
instructions(99-104)
livekit-agents/livekit/agents/llm/chat_context.py (2)
livekit-agents/livekit/agents/utils/misc.py (1)
shortuuid(21-22)livekit-agents/livekit/agents/llm/tool_context.py (1)
Tool(31-32)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: livekit-plugins-cartesia
- GitHub Check: livekit-plugins-deepgram
- GitHub Check: livekit-plugins-inworld
- GitHub Check: livekit-plugins-openai
- GitHub Check: livekit-plugins-elevenlabs
- GitHub Check: unit-tests
- GitHub Check: type-check (3.13)
- GitHub Check: type-check (3.9)
🔇 Additional comments (11)
livekit-agents/livekit/agents/llm/__init__.py (2)
1-15: LGTM:AgentConfigUpdatere-exported fromllm
55-69: LGTM:__all__updated to includeAgentConfigUpdatelivekit-agents/livekit/agents/__init__.py (2)
39-49: LGTM: top-level imports updated
117-156: LGTM:__all__export list updatedlivekit-agents/livekit/agents/voice/agent_activity.py (2)
18-25: LGTM: tool diff helpers wired in
603-611: LGTM: initial config snapshot recordedlivekit-agents/livekit/agents/evals/judge.py (4)
90-109: LGTM: task completion judge instructions are clear
112-128: LGTM: accuracy judge instructions look solid
131-148: LGTM: tool-use judge instructions look solid
151-167: LGTM: safety judge instructions look solidlivekit-agents/livekit/agents/evals/__init__.py (1)
1-17: LGTM: judge APIs re-exported
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| class AgentConfigUpdate(BaseModel): | ||
| id: str = Field(default_factory=lambda: utils.shortuuid("item_")) | ||
| type: Literal["agent_config_update"] = Field(default="agent_config_update") | ||
|
|
||
| instructions: str | None = None | ||
| tools_added: list[str] | None = None | ||
| tools_removed: list[str] | None = None | ||
|
|
||
| created_at: float = Field(default_factory=time.time) | ||
|
|
||
| _tools: list[Tool] = PrivateAttr(default_factory=list) | ||
| """Full tool definitions (in-memory only, not serialized).""" | ||
|
|
||
| ChatItem = Annotated[ | ||
| Union[ChatMessage, FunctionCall, FunctionCallOutput, AgentHandoff], Field(discriminator="type") | ||
| Union[ChatMessage, FunctionCall, FunctionCallOutput, AgentHandoff, AgentConfigUpdate], | ||
| Field(discriminator="type"), | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add agent_id to AgentConfigUpdate to match callers and formatter
Call sites set agent_id and _format_chat_ctx reads it, but the model doesn’t declare it. Add the field so the value is preserved and attribute access is safe.
🔧 Proposed fix
class AgentConfigUpdate(BaseModel):
id: str = Field(default_factory=lambda: utils.shortuuid("item_"))
type: Literal["agent_config_update"] = Field(default="agent_config_update")
+ agent_id: str | None = None
instructions: str | None = None
tools_added: list[str] | None = None
tools_removed: list[str] | None = None🤖 Prompt for AI Agents
In `@livekit-agents/livekit/agents/llm/chat_context.py` around lines 213 - 229,
AgentConfigUpdate is missing the agent_id field, so callers that set agent_id
and the formatter _format_chat_ctx that reads it can fail; add an agent_id: str
| None = None (or appropriate type) to the AgentConfigUpdate model declaration
so the value is preserved and safe to access, and ensure the new field is
included before PrivateAttr/_tools in the AgentConfigUpdate class so ChatItem
(which unions AgentConfigUpdate) will carry agent_id correctly.
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.