Red Teaming functionality using function middleware #1228

kshiarl1 · 2025-12-05T14:08:48Z

Description

Closes

Red Teaming functionality using function middleware.

This PR introduces the necessary ingredients to perform AI red teaming on an agent workflow.
These are:

Intercepting function and adding attack payloads (prompt injections)
Evaluating the effect of these payloads on other parts of the workflow.
Attack scenario definition and orchestration.

Key Features

Implements RedTeamingMiddleware.

Special middleware that can replace inputs and output strings and floats to functions.
Configurable payloads to replace i/o with.
Can search within function schema to find the right fields to introduce the payload.

Implements RedTeamingEvaluationRunner.

Allows the user to run red teaming using different threat scenarios.
Threat scenarios are implemented via the RedTeamingEvaluationRunnerConfig.
Processes, summarizes and saves results for ease of use post evaluation.

Implements a RedTeamingEvaluator.

Can filter intermediate steps to find specific function inputs / outputs.
Uses LLM as a judge to determine whether an attack is successful.
Accepts scenario specific instructions for evaluation.

By Submitting this PR I confirm:

I am familiar with the Contributing Guidelines.
We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
- Any contribution which contains commits that are not Signed-Off will not be accepted.
When the PR is ready for review, new or existing tests cover these changes.
When the PR is ready for review, the documentation is up to date with these changes.

copy-pr-bot · 2025-12-05T14:08:52Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2025-12-05T14:09:05Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

🗂️ Base branches to auto review (2)

develop
release/.*

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Closes - Brought out agents to emphasize on what they offer and how users can use them - Removed any repeated details in their overview sections / features sections - Rephrased verbiage for clarity - Ran docs through cursor to ensure it meets style guide requirements ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. ## Summary by CodeRabbit * **Documentation** * Added docs for new agents: ReAct, Reasoning, ReWOO, Router, Tool Calling, and Responses API & Agent * Added Sequential Executor documentation * Consolidated workflow navigation to a single About page and updated related links across docs and examples * Rewrote many workflow pages to emphasize configuration-first guides with YAML examples and clearer installation instructions * Updated quick-start link labels and multiple example README links for consistency <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> Authors: - https://github.com/lvojtku - David Gardner (https://github.com/dagardner-nv) Approvers: - David Gardner (https://github.com/dagardner-nv) URL: NVIDIA#1173

This PR: 1. Adds an a2a subpackage with support for a2a client and servers. This is base support which models the remote-agent as a tool. 2. Adds examples demonstrating NAT workflows with NAT A2A servers (math assistant) and external A2A servers (currency agent) 3. Adds a2a CLI commands for troubleshooting 4. Adds docs 5. Adds unit and end2end tests Pending items that will be addressed in a separate PR: 1. Data part and file part support 2. Auth 3. Unified Telemetry and Logging **Sever CLI** 1. Start server using the NAT a2a frontend: ``` nat a2a serve --config_file examples/getting_started/simple_calculator/configs/config.yml ``` **Client CLI:** 1. agent card discovery ``` nat a2a client discover --url http://localhost:10000 ``` <img width="1114" height="611" alt="image" src="https://github.com/user-attachments/assets/d7edca10-5bf9-4804-b1b0-41a337580b2c" /> 2. high level agent call ``` nat a2a client call --url http://localhost:10000 --message "Is the product of 2 and 4 greater than the hour of the day" ``` ``` Query: Is the product of 2 and 4 greater than the hour of the day No, the product of 2 and 4 is less than the hour of the day. ``` ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. ## Summary by CodeRabbit * **New Features** * Agent-to-Agent (A2A) support: client & server integrations, workflow publishing, and CLI commands (serve, discover, info, skills, call). * **Documentation** * Comprehensive A2A docs: protocol overview, client/server/CLI guides, installation, configuration, examples, usage samples, and troubleshooting. * **Examples** * Two end-to-end A2A examples with READMEs, configs, sample queries, and project setup. * **Tests** * Client, server, agent-card generation, and end-to-end workflow tests. * **Chores** * Packaging, license, CLI wiring, and project configuration for new A2A package. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> Authors: - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) Approvers: - David Gardner (https://github.com/dagardner-nv) - Will Killian (https://github.com/willkill07) - Yuchen Zhang (https://github.com/yczhang-nv) URL: NVIDIA#1147

src/nat/cli/cli_utils/config_override.py

src/nat/cli/commands/red_teaming/red_teaming_utils.py

ericevans-nv · 2025-12-07T21:26:34Z

src/nat/eval/evaluator/evaluator_model.py

    score: typing.Any  # float or any serializable type
    reasoning: typing.Any

+EvaluatorTemplateItem = TypeVar('EvaluatorTemplateItem', bound=EvalOutputItem)


Please revert the TypeVar and Generic[EvaluatorTemplateItem] changes to EvalOutput. This added complexity is unused as the simpler non-generic class works fine since subclasses like RedTeamingEvalOutputItem already inherit from EvalOutputItem.

This does not work. Here is the problem. I am happy to discuss a different solution but the problem is there and it prevent extensibility of the whole evaluation framework so it needs to be solved:

If EvalOutput is constructed with RedTeamingEvalOutputItem instead of EvalOutputItem and then model_dump_json is called on EvalOutput (in order to save the result) it will dump only the fields in EvalOutputItem. This model_dump_json operation happens in EvaluationRun.

NeMo-Agent-Toolkit/src/nat/eval/evaluate.py

Line 351 in 161c463

# create json content using the evaluation results

The simplest solution is to change the typing inside EvalOutput to be list[Any] instead of list[EvalOutputItem]

If you would like to verify yourself that there is a problem try running this standalone script:

from pydantic import BaseModel
from typing import List

class EvalOutputItem(BaseModel):
base_field: str

class NewEvalOutputItem(EvalOutputItem):
new_field: str # This gets lost during serialization

class EvalOutput(BaseModel):
items: List[EvalOutputItem] # Type annotation limits serialization

#This works fine
new_item = NewEvalOutputItem(base_field="test", new_field="extra")
eval_output = EvalOutput(items=[new_item])

#But this loses the new_field
json_str = eval_output.model_dump_json()
#Only serializes EvalOutputItem fields, not NewEvalOutputItem fields
print(json_str)

I think what we are looking for here is:

eval_output_items: list[SerializeAsAny[EvalOutputItem]]

Section: Serializing with duck typing 🦆¶

src/nat/eval/red_teaming_evaluator/data_models.py

ericevans-nv · 2025-12-07T21:35:03Z

src/nat/eval/red_teaming_evaluator/register.py

+    '''Register red teaming evaluator'''
+    from .evaluate import RedTeamingEvaluator
+
+    llm = await builder.get_llm(config.llm_name, wrapper_type=LLMFrameworkEnum.LANGCHAIN)


The hard-coded wrapper_type=LLMFrameworkEnum.LANGCHAIN is fragile, the register function shouldn't need to know what framework the evaluator uses internally. Consider following the pattern from the DynamicFunctionMiddleware implementation, which passes the builder instance and patches get_llm to discover and wrap components at runtime. This decouples the registration from the implementation details and allows the evaluator to request whatever framework it needs without the register function making assumptions.

I took this from here:

NeMo-Agent-Toolkit/src/nat/eval/tunable_rag_evaluator/register.py

Line 40 in 161c463

async def register_tunable_rag_evaluator(config: TunableRagEvaluatorConfig, builder: EvalBuilder):

and it is also the case here:

NeMo-Agent-Toolkit/src/nat/eval/rag_evaluator/register.py

Line 122 in 161c463

llm = await builder.get_llm(config.llm_name, wrapper_type=LLMFrameworkEnum.LANGCHAIN)

so basically all LLMJudge evaluators in NAT use the same wrapper.

could you point me out to what you mean exactly?

src/nat/eval/runners/red_teaming_runner/__init__.py

ericevans-nv · 2025-12-07T21:50:49Z

src/nat/eval/runners/red_teaming_runner/config.py

+logger = logging.getLogger(__name__)
+
+# Fixed LLM name for red teaming evaluator
+RED_TEAMING_EVALUATOR_LLM_NAME = "red_teaming_evaluator_llm"


Avoid injecting hardcoded names into already defined components. It's acceptable to add a new component with its full configuration (e.g., adding a new evaluator or LLM), but if a component is already configured by the user, the runner should only modify its parameters—not override its name or type.

This is not yet in the full configuration. The RedTeamingEvaluationRunner constructs the full configuration it just forces the name.

ericevans-nv · 2025-12-07T22:04:27Z

src/nat/eval/runners/red_teaming_runner/config.py

+        return self
+
+    @classmethod
+    def rebuild_annotations(cls) -> bool:


The base workflow file should remain the source of truth for type information. The current RedTeamingRunnerConfig creates a parallel type system by defining fields like evaluator_llm: LLMBaseConfig and requiring rebuild_annotations() to stay in sync with the workflow's type registry—this is fragile and overcomplicated. Instead, the scenarios schema should work with the base workflow through mutations (path-value changes) rather than duplicating its type definitions. If a user has an LLM configured in their workflow, the scenario should reference it by name instead of injecting a new one with a hardcoded key. This would eliminate the need for rebuild_annotations(), remove type coupling between scenarios and workflows, respect user-configured values, and avoid duplicating type registration efforts. While this approach works for the red teaming evaluator since you own that component, we should not couple concerns this way for other components. Let's discuss a better pattern for converting scenarios to workflow configurations.

Happy do discuss a different solution. I feel like the current solution is best for the user as it makes the components of red teaming much more explicit and explicitly ties the evaluator with the scenarios (as is the case). I don't see how this overcomplicates things as these operations are strictly performed for the red teaming workflow that can only be done using the red-team command.

ericevans-nv · 2025-12-07T22:08:14Z

src/nat/eval/runners/red_teaming_runner/runner.py

+logger = logging.getLogger(__name__)
+
+
+class RedTeamingRunner:


RedTeamingRunner should inherit from MultiEvaluationRunner rather than being a standalone class that wraps it internally. The current design instantiates MultiEvaluationRunner inside run() and delegates to run_all(), which is essentially inheritance without the benefits. By extending MultiEvaluationRunner, the red teaming runner can override init to handle scenario-based config transformation, call super().run_all() for execution, and add pre/post-processing in an overridden run_all() method. This makes the relationship explicit, reduces indirection, and follows standard patterns for specialized runners.

I could make the inheritance for the RedTeamingEvaluationRunner work. But because injeriting the config is not recommended in my opinion it might be better to not do this. See below.

ericevans-nv · 2025-12-07T22:10:30Z

src/nat/eval/runners/red_teaming_runner/runner.py

+
+    def __init__(
+        self,
+        config: RedTeamingRunnerConfig | None,


Similarly, RedTeamingRunnerConfig should extend MultiEvaluationRunConfig. Currently, the runner manually transforms RedTeamingRunnerConfig into a MultiEvaluationRunConfig before passing it to MultiEvaluationRunner. If the config class extended MultiEvaluationRunConfig, the inheritance hierarchy would be consistent between the configs and runners, and the transformation could be simplified or handled through inheritance rather than manual conversion.

Are you sure about this? The whole point of RedTeamingRunnerConfig is that it can be loaded and instantiated from yaml / json...If i inherit then I will also inherit the mandatory configs field that does not exist in the stored config. The two configs have nothing to do with each other.

- Add RedTeamingEvaluator class with LLM judge support - Add data models (ConditionEvaluationResult, RedTeamingEvalOutputItem) - Add filter conditions for selecting intermediate steps - Add evaluator registration and configuration - Add comprehensive README documentation

- Add RedTeamingEvaluationRunner for orchestrating red team scenarios - Add RedTeamingEvaluationConfig and RedTeamScenarioEntry data models - Add support for loading scenarios from JSON and applying middleware - Export runner classes in runners/__init__.py

- Add complete red teaming example for simple calculator workflow - Add calculator middleware for demonstrating middleware pattern - Add workflow configuration and test datasets - Add red team scenarios JSON with middleware test cases - Add run_redteam_eval.py script for executing evaluations - Add comprehensive README with usage instructions and examples

ericevans-nv

Please make the update I mentioned to use SerializeAsAny instead of the generics, and we can address the LLM-wrapper related changes later. The current changes are fine to merge just don’t pull dev in yet. Once everyone gets their first PR in, we can update the epic branch. Comment /merge no squash to retain your commits.

kshiarl1 requested a review from a team as a code owner December 5, 2025 14:08

lvojtku and others added 2 commits December 5, 2025 17:07

ericevans-nv requested changes Dec 7, 2025

View reviewed changes

kshiarl1 added 11 commits December 8, 2025 14:05

Generic red teaming middleware

705b311

Move all RedTeamingEvalRunner to a simpler red-team cli commnad.

0666e94

addressing the first round of reviews.

d87cbec

force PR fix

898468b

force PR fix

6786a4b

adds logging after the red teaming run

31e0a1a

saving flattened results

15af193

adressing second round of comments

cb873dd

kshiarl1 force-pushed the ks/red_team_1 branch from d58d14f to cb873dd Compare December 8, 2025 13:05

kshiarl1 requested a review from a team as a code owner December 8, 2025 13:05

full scenario definition in red teaming config

3aba281

kshiarl1 force-pushed the ks/red_team_1 branch from e129ad6 to 3aba281 Compare December 9, 2025 19:33

ericevans-nv self-assigned this Dec 9, 2025

ericevans-nv added feature request New feature or request non-breaking Non-breaking change labels Dec 9, 2025

ericevans-nv approved these changes Dec 10, 2025

View reviewed changes

Red Teaming functionality using function middleware #1228

Are you sure you want to change the base?

Red Teaming functionality using function middleware #1228

Uh oh!

Conversation

kshiarl1 commented Dec 5, 2025

Description

Key Features

Implements RedTeamingMiddleware.

Implements RedTeamingEvaluationRunner.

Implements a RedTeamingEvaluator.

By Submitting this PR I confirm:

Uh oh!

copy-pr-bot bot commented Dec 5, 2025

Uh oh!

coderabbitai bot commented Dec 5, 2025

Review skipped

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ericevans-nv left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants