-
Notifications
You must be signed in to change notification settings - Fork 447
Introduce Finetuning Harness for In-Situ Reinforcement Learning of Agentic Workflows #1221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
This commit updates the `uv.lock` file to increase the revision number and introduce new package overrides for `litellm` and `openai`. Additionally, metadata like upload times for files has been added, providing a more detailed package resolution record. Signed-off-by: dnandakumar-nv <[email protected]>
Deleted the entire OpenPipe ART finetuning backend, including configurations, runner, and registration modules. This removal simplifies the codebase by eliminating components related to the ART backend that are no longer required. Signed-off-by: dnandakumar-nv <[email protected]>
Deleted the entire OpenPipe ART finetuning backend, including configurations, runner, and registration modules. This removal simplifies the codebase by eliminating components related to the ART backend that are no longer required. Signed-off-by: dnandakumar-nv <[email protected]>
This commit introduces a `FinetuningConfig` model and integrates fine-tuning support across the codebase. It includes new components such as `Trainer`, `TrainerAdapter`, and `TrajectoryBuilder`, along with methods to manage and bind these components. Fine-tuning is now fully configurable and enabled through the updated workflow builder and configuration interfaces. Signed-off-by: dnandakumar-nv <[email protected]>
Replaced all instances of FinetuningRunner with Trainer to reflect updated naming conventions. This change ensures consistency in naming and improves clarity in describing the finetuning workflow orchestration. Updated relevant imports, class definitions, method signatures, and type hints. Signed-off-by: dnandakumar-nv <[email protected]>
Introduced structures and methods to register, retrieve, and handle fine-tuning trainers, trainer adapters, and trajectory builders. Updated type registry and workflows to support these components, ensuring seamless integration into the fine-tuning process. Signed-off-by: dnandakumar-nv <[email protected]>
Separated `TRAINING` into `TRAINERS`, `TRAINER_ADAPTERS`, and `TRAJECTORY_BUILDERS` for better modularity. Adjusted related models, workflow logic, and configurations to align with the new structure. This improves clarity and supports more granular component handling. Signed-off-by: dnandakumar-nv <[email protected]>
Introduced Trainer, TrainerAdapter, and TrajectoryBuilder components with configurations, mocking methods, and dependency tracking. Enhanced testing and registration workflows to validate the integration of these new components into the builder and workflow lifecycle. Signed-off-by: dnandakumar-nv <[email protected]>
This commit introduces a reward field to critical configuration models, including `TrainerConfig` and `TrajectoryBuilderConfig`, enabling enhanced reward-based customization. Additionally, it implements extensive unit tests for `Trainer`, `TrajectoryBuilder`, and `TrainerAdapter`, covering initialization, run workflows, and error handling for robust validation. Signed-off-by: dnandakumar-nv <[email protected]>
Introduced comprehensive unit tests for `llama_index_parser`, `langchain_parser`, and `base_parser` under `tests/nat/finetuning/utils/parsers`. These tests cover various edge cases, ensuring proper functionality of content extraction, message parsing, and sequence validation. Added `__init__.py` files to relevant directories for module structure. Signed-off-by: dnandakumar-nv <[email protected]>
Signed-off-by: dnandakumar-nv <[email protected]>
Introduced a new `finetune` command to the CLI, allowing users to run finetuning on workflows using collected trajectories. This includes options for specifying a configuration file and overriding config values. Additionally, the command integrates with existing CLI infrastructure and logs progress during execution. Signed-off-by: dnandakumar-nv <[email protected]>
Introduced a unified `FinetuneConfig` model to replace `FinetuningConfig` and `FinetuneRunConfig`, simplifying finetuning-related data handling. Updated runtime, interfaces, and CLI to leverage the new configuration model, including enhanced validation and improved CLI parameter handling. Signed-off-by: dnandakumar-nv <[email protected]>
Introduced a `num_epochs` field to the `FinetuneConfig` model for managing training iterations. Updated `Trainer` and `TrainerAdapter` to simplify initialization, validation, and cleanup processes. Refactored logic for managing configuration and validation datasets, streamlining the finetuning execution. Signed-off-by: dnandakumar-nv <[email protected]>
Reorganized imports to adhere to consistent style, prioritizing standard and external libraries first, followed by internal ones. Removed unused imports and redundant whitespace for better readability. Simplified code structure and formatting across multiple files to enhance maintainability and clarity. Signed-off-by: dnandakumar-nv <[email protected]>
Reorganized imports to adhere to consistent style, prioritizing standard and external libraries first, followed by internal ones. Removed unused imports and redundant whitespace for better readability. Simplified code structure and formatting across multiple files to enhance maintainability and clarity. Signed-off-by: dnandakumar-nv <[email protected]>
This commit introduces a new subpackage, `nvidia-nat-openpipe-art`, for OpenPipe ART integration in the NVIDIA NeMo Agent Toolkit. It includes configuration files such as `pyproject.toml`, licensing information, and placeholder plugin files for future expansion. Signed-off-by: dnandakumar-nv <[email protected]>
Integrates the nvidia-nat-openpipe-art package into the project, including its dependency on openpipe-art version 0.5.1. Updates uv.lock and pyproject.toml to reflect the new package and its associated configurations. Signed-off-by: dnandakumar-nv <[email protected]>
This commit introduces the ARTTrajectoryBuilder and its configuration, enabling trajectory generation for the OpenPipe backend. It includes registration logic, main functionalities, and associated dependency updates. Signed-off-by: dnandakumar-nv <[email protected]>
Introduced ARTTrainerAdapter for integrating with the ART backend, enabling advanced training workflows. Updated configuration classes, registration logic, and initialization to support new functionality, including reward functions in both trajectory builder and trainer adapter. Signed-off-by: dnandakumar-nv <[email protected]>
Introduced ARTTrainer for OpenPipe ART backend, enabling finetuning with curriculum learning and trajectory-based rewards. Updated configuration, registration, and dependencies to support these functionalities, including matplotlib for visualizations. Signed-off-by: dnandakumar-nv <[email protected]>
Changed the default value of the `name` field in the configuration from "trainer_run_4" to "trainer_run". This simplifies the default naming convention for trainer runs. Signed-off-by: dnandakumar-nv <[email protected]>
Introduced comprehensive unit tests for `ARTTrainer` and `ARTTrainerAdapter` to validate initialization, training, trajectory handling, status management, and error handling. These tests ensure robust functionality and resilience of training components. Signed-off-by: dnandakumar-nv <[email protected]>
Introduces a Tic-Tac-Toe game workflow utilizing LangChain for LLM-based agents. Includes game logic, XML move parsing, heuristic evaluation, and a configurable function for LLM interaction. Adds dependency management, configuration, and registration setup for OpenPipe ART integration. Signed-off-by: dnandakumar-nv <[email protected]>
Rolled back `uv.lock` file from revision 3 to revision 1. Removed redundant `upload-time` metadata for sdist and wheel dependencies to simplify the file structure and ensure consistency. Signed-off-by: dnandakumar-nv <[email protected]>
This ensures that the `rl_with_openpipe_art_function` is properly imported, triggering its registration as expected. This change improves the reliability of the registration process. Signed-off-by: dnandakumar-nv <[email protected]>
Introduced a `target_model` attribute to allow filtering of intermediate steps by model name during trajectory processing. Updated the trajectory builder logic to skip steps not matching the specified target model, ensuring more precise fine-tuning when multiple models are involved. Signed-off-by: dnandakumar-nv <[email protected]>
Added `noqa` flags to suppress unused import warnings for ruff and flake8. Introduced a new placeholder data file, `data.json`, with sample question-answer pairs for testing purposes. Signed-off-by: dnandakumar-nv <[email protected]>
Introduce `AccuracyEvaluator` class to evaluate RL outputs based on specific scoring logic. Register the evaluator for use in workflows and include its configuration. Also, update logging in the Tic-Tac-Toe workflow from `info` to `debug` for reduced verbosity. Signed-off-by: dnandakumar-nv <[email protected]>
Updated the RL workflow to handle player model configurations more flexibly, simplified move retries, and refined game termination handling. Added failure penalty in evaluation, adjusted default learning rates, and updated dataset answers to reflect new scoring outputs. Signed-off-by: dnandakumar-nv <[email protected]>
Modified parsing logic to prevent duplicate messages, improved evaluation logic for scoring aborted games, and adjusted configurations for training, evaluation, and model setup. Additional enhancements include refining message history handling and making the board display more user-friendly. Signed-off-by: dnandakumar-nv <[email protected]>
Adjusted "finish_reason" in trainer adapter to "stop" for consistency and updated training beta default to 0 for improved behavior. Additionally, set max_parser_retries default to 0 in RL example to align with updated retry strategy. Signed-off-by: dnandakumar-nv <[email protected]>
Updated configuration files to adjust temperature, concurrency, and dataset paths. Refined game logic to account for player roles. Simplified trajectory message handling and expanded evaluation dataset with additional examples. Signed-off-by: dnandakumar-nv <[email protected]>
Updated the RL evaluator to incorporate intermediate step value computation for richer insights during evaluation. Refactored game logic to enable step-level state tracking with value assignments, improving training feedback. Adjusted configurations for model, evaluator, and workflow parameters to align with the updated logic. Signed-off-by: dnandakumar-nv <[email protected]>
Updated training parameters, including learning rate, temperature, and number of epochs. Adjusted naming conventions for clarity and added `max_parser_retries` to configs. Expanded the dataset with additional entries to improve model training coverage. Signed-off-by: dnandakumar-nv <[email protected]>
This update introduces documentation on the finetuning harness, covering key concepts, reinforcement learning approaches, and curriculum learning. Additionally, it includes a reference for integrating OpenPipe ART for training LLMs using GRPO optimization. Configuration and CLI usage details are also provided. Signed-off-by: dnandakumar-nv <[email protected]>
This commit introduces a detailed example demonstrating reinforcement learning finetuning with OpenPipe ART to enhance an LLM's Tic-Tac-Toe performance. Includes a comprehensive README explaining setup, workflow, reward function, and best practices, alongside necessary configs, scripts, and assets. Signed-off-by: dnandakumar-nv <[email protected]>
Clarified the purpose of overriding dependencies to prevent regressions caused by nvidia-nat-openpipe-art. Added additional context to emphasize maintaining the status quo for other packages. Signed-off-by: dnandakumar-nv <[email protected]>
WalkthroughAdds a finetuning subsystem and OpenPipe ART integration: new finetuning data models, interfaces, runtime and CLI command, builder/registry/workflow extensions, parsers and utilities, an OpenPipe ART plugin (builder/adapter/trainer), a Tic‑Tac‑Toe RL example with evaluators, packaging, docs, and extensive tests. Also updates small tooling and vocabulary. Changes
Sequence Diagram(s)sequenceDiagram
participant CLI as User/CLI
participant Builder as WorkflowBuilder
participant TrajBuilder as TrajectoryBuilder
participant Adapter as TrainerAdapter
participant Trainer as Trainer
participant Backend as OpenPipe ART
participant Eval as Eval subsystem
CLI->>Builder: finetuning_main(run_config)
Builder->>TrajBuilder: get_trajectory_builder(config)
Builder->>Adapter: get_trainer_adapter(config)
Builder->>Trainer: get_trainer(config, traj_builder, adapter)
Trainer->>Adapter: initialize(run_config)
Adapter->>Backend: register model / health check
loop per epoch
Trainer->>TrajBuilder: start_run(run_id, meta)
TrajBuilder->>Eval: run_eval() (parallel generations)
Eval->>TrajBuilder: evaluation outputs (intermediate steps & payloads)
TrajBuilder->>TrajBuilder: finalize -> TrajectoryCollection
Trainer->>Trainer: apply_curriculum_learning(collection, epoch)
Trainer->>Adapter: submit(filtered_collection)
Adapter->>Backend: submit training job (async)
Trainer->>Adapter: wait_until_complete(job_ref)
Trainer->>Trainer: log_progress(epoch, metrics)
Trainer->>TrajBuilder: run validation eval (optional)
end
Trainer->>Trainer: cleanup()
CLI->>CLI: report results/artifacts
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used📓 Path-based instructions (8)**/*.{md,rst,py}📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Files:
**/*.{md,rst}📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Files:
docs/**/*.{md,rst}📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Files:
**/*.{py,js,ts,yaml,yml,json,md,rst}📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Files:
docs/source/**/*.{md,rst}📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Files:
**/*.{py,md,rst}📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Files:
**/*⚙️ CodeRabbit configuration file
Files:
docs/source/**/*⚙️ CodeRabbit configuration file
Files:
🪛 LanguageTooldocs/source/reference/cli.md[typographical] ~486-~486: The conjunction “so that” does not have a comma in front. (SO_THAT_UNNECESSARY_COMMA) 🔇 Additional comments (3)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 13
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/nat/profiler/callbacks/llama_index_callback_handler.py (1)
170-195: Defensively handleresponse.additional_kwargswhen buildingTokenUsageBaseModelto prevent profiling failures.While
ChatResponse.additional_kwargsdefaults to an empty dict (notNone), not all LlamaIndex integrations populate token count fields (e.g.,prompt_tokens,completion_tokens,total_tokens). Attempting to instantiateTokenUsageBaseModel(**response.additional_kwargs)will fail with a validation error if required fields are missing or unexpected keys are present.Wrap the instantiation in a try-except block and fall back to a default
TokenUsageBaseModel()so LLM_END events are still recorded even when token usage data is incomplete:# Append usage data to NAT usage stats + additional_kwargs = response.additional_kwargs or {} + try: + token_usage = TokenUsageBaseModel(**additional_kwargs) + except (TypeError, ValueError): + logger.exception("Error parsing token usage from additional_kwargs") + token_usage = TokenUsageBaseModel() + with self._lock: stats = IntermediateStepPayload( event_type=IntermediateStepType.LLM_END, span_event_timestamp=self._run_id_to_timestamp.get(event_id), framework=LLMFrameworkEnum.LLAMA_INDEX, name=model_name, UUID=event_id, - data=StreamEventData(input=self._run_id_to_llm_input.get(event_id), - output=llm_text_output, - payload=response), + data=StreamEventData( + input=self._run_id_to_llm_input.get(event_id), + output=llm_text_output, + payload=response, + ), metadata=TraceMetadata(chat_responses=response.message if response.message else None), - usage_info=UsageInfo(token_usage=TokenUsageBaseModel(**response.additional_kwargs))) + usage_info=UsageInfo(token_usage=token_usage), + )Also apply similar defensive handling to the
TOOL_ENDevent ifTokenUsageBaseModel()is called there with incomplete data.
🟡 Minor comments (13)
tests/nat/tools/test_tool_test_runner.py-101-127 (1)
101-127: Fix unusedrunnervariable in mocked training-components test.The new test correctly exercises
mock_trainer,mock_trainer_adapter, andmock_trajectory_builder, butrunnerfromwith_mocked_dependencies()is never used, which ruff flags (RUF059).You can keep the behavior and silence the warning by ignoring the first element:
- async with with_mocked_dependencies() as (runner, mock_builder): + async with with_mocked_dependencies() as (_, mock_builder):Everything else in the test looks good and matches the patterns in
ToolTestRunner’s mock helpers.docs/source/reference/finetuning/extending.md-20-50 (1)
20-50: Expand NAT on first use and add a language to the ASCII code block.To align with naming and markdown-style guidelines:
- Spell out “NVIDIA NeMo Agent toolkit (NAT)” on first mention.
- Tag the ASCII diagram’s fenced block with a language (e.g.,
text) to satisfy MD040.-This guide covers how to create custom components for the NAT finetuning harness. You'll learn about the three core interfaces, how to implement them, and best practices for creating robust, reusable components. +This guide covers how to create custom components for the NVIDIA NeMo Agent toolkit (NAT) finetuning harness. You'll learn about the three core interfaces, how to implement them, and best practices for creating robust, reusable components. @@ -``` +```text ┌────────────────────────────────────────────────────────────────────────┐ │ Your Implementation │ @@ └─────────────────────────────────────────────────────────────────────────┘</blockquote></details> <details> <summary>docs/source/reference/finetuning/index.md-24-58 (1)</summary><blockquote> `24-58`: **Align toolkit naming and consider adding a fenced-code language.** Body text should use the full product name on first mention and the lowercase “toolkit” spelling per guidelines. Also, specifying a language for the ASCII diagram helps markdown linters and tooling. - Update the first mention of the toolkit. - Optionally, tag the ASCII diagram block as `text`. ```diff - The NeMo Agent Toolkit provides a powerful finetuning harness designed for **in-situ reinforcement learning** of agentic LLM workflows. This enables iterative improvement of agents through experience, allowing models to learn from their interactions with environments, tools, and users. + The NVIDIA NeMo Agent toolkit provides a powerful finetuning harness designed for **in-situ reinforcement learning** of agentic LLM workflows. This enables iterative improvement of agents through experience, allowing models to learn from their interactions with environments, tools, and users. @@ -``` +```text ┌────────────────────────────────────────────────────────────────────────┐ │ Trainer │ @@ └─────────────────────────┘</blockquote></details> <details> <summary>docs/source/reference/finetuning/concepts.md-90-91 (1)</summary><blockquote> `90-91`: **Correct trajectory sequence notation.** The trajectory example currently shows `StateₙAction, ₙ, Rewardₙ`, which appears to be a formatting typo. It should likely read `Stateₙ, Actionₙ, Rewardₙ` to match the earlier pattern. </blockquote></details> <details> <summary>src/nat/finetuning/utils/parsers/base_parser.py-77-91 (1)</summary><blockquote> `77-91`: **Potential `KeyError` if parsed message lacks `content` key.** The hashing logic assumes all messages have a `"content"` key. If a tool call message or other special message type lacks this key, a `KeyError` will be raised. Consider using `.get()` with a default. ```diff if isinstance(parsed_msg, list): for msg in parsed_msg: - content_hash = hash(msg["role"] + ": " + msg["content"]) + content_hash = hash(msg["role"] + ": " + msg.get("content", "")) if content_hash not in message_content_hashes: messages.append(msg) message_content_hashes.add(content_hash) else: - content_hash = hash(parsed_msg["role"] + ": " + parsed_msg["content"]) + content_hash = hash(parsed_msg["role"] + ": " + parsed_msg.get("content", "")) messages.append(parsed_msg) message_content_hashes.add(content_hash) else: assert not isinstance(parsed_msg, list), "TOOL_END or LLM_END should not produce multiple messages" - content_hash = hash(parsed_msg["role"] + ": " + parsed_msg["content"]) + content_hash = hash(parsed_msg["role"] + ": " + parsed_msg.get("content", "")) message_content_hashes.add(content_hash) messages.append(parsed_msg)examples/finetuning/rl_with_openpipe_art/src/rl_with_openpipe_art/rl_with_openpipe_art.py-192-201 (1)
192-201: Update misleading docstring for_echo.The docstring describes echoing text with a prefix, but the function actually plays a Tic-Tac-Toe game. Update to reflect actual behavior.
async def _echo(role: str) -> str: """ - Takes a text input and echoes back with a pre-defined prefix. + Play a Tic-Tac-Toe game and return the outcome. Args: - role (str): If smaller model will be X or O + role (str): The symbol ('X' or 'O') assigned to the player model. Returns: - str: The text with the prefix. + str: Game outcome - 'Win!', 'Lose!', 'Draw!', or step count if aborted. """examples/finetuning/rl_with_openpipe_art/src/rl_with_openpipe_art/rl_with_openpipe_art.py-162-168 (1)
162-168: Update placeholder docstring.The docstring "NAT function template. Please update the description." is a placeholder that should be replaced with an actual description of the function's purpose.
class RlWithOpenpipeArtFunctionConfig(FunctionBaseConfig, name="rl_with_openpipe_art"): """ - NAT function template. Please update the description. + Configuration for the Tic-Tac-Toe RL workflow function. + + This function plays a Tic-Tac-Toe game between two LLM-powered players + to generate training trajectories for reinforcement learning. """ player_model: LLMRef = Field(description="LLMRef for the player model to use.")packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.py-1-30 (1)
1-30: Remove duplicate SPDX license header.The license header appears twice (lines 1-14 and lines 16-30). Remove the duplicate block.
# SPDX-FileCopyrightText: Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved. # SPDX-License-Identifier: Apache-2.0 # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. -# SPDX-FileCopyrightText: Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. -# All rights reserved. -# SPDX-License-Identifier: Apache-2.0 -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - import asyncioexamples/finetuning/rl_with_openpipe_art/src/rl_with_openpipe_art/accuracy_evaluator.py-27-33 (1)
27-33: Class docstring doesn't match implementation.The docstring describes scoring based on
expected_answer == workflow_output, but the actual implementation (lines 109-120) scores based on fixed strings: "Win!" → 1.0, "Draw!" → 0.5, "Lose!" → 0.0. Update the docstring to reflect the actual scoring logic.class AccuracyEvaluator(BaseEvaluator): - """Custom evaluator for RL with OpenPipe ART workflow outputs. - - Scoring logic: - - Score 1: if expected_answer == workflow_output - - Score 0.5: if expected_answer != workflow_output AND expected_answer == "0" - - Score 0: if expected_answer != workflow_output AND expected_answer != "0" - """ + """Custom evaluator for RL with OpenPipe ART workflow outputs. + + Scoring logic: + - Score 1.0: if workflow_output == "Win!" + - Score 0.5: if workflow_output == "Draw!" + - Score 0.0: if workflow_output == "Lose!" or aborted + """src/nat/finetuning/utils/parsers/langchain_parser.py-31-41 (1)
31-41: Remove stale docstring reference to non-existent parameter.The docstring mentions
previous_message: Previous message for context (reserved for future)but this parameter doesn't exist in the function signature.def parse_to_openai_message(message: IntermediateStep) -> dict | list[dict]: """ Convert IntermediateStep to OpenAI-compatible message dictionary. Args: message: An IntermediateStep object representing a single message. - previous_message: Previous message for context (reserved for future). Returns: A dictionary formatted for OpenAI API consumption. """src/nat/builder/builder.py-312-315 (1)
312-315: Remove unnecessary parentheses in return type annotation.@abstractmethod async def get_trajectory_builder_config( - self, trajectory_builder_name: str | TrajectoryBuilderRef) -> (TrajectoryBuilderConfig): + self, trajectory_builder_name: str | TrajectoryBuilderRef) -> TrajectoryBuilderConfig: passsrc/nat/finetuning/interfaces/trainer_adapter.py-53-64 (1)
53-64: Docstring parameter type mismatch.The docstring states
trajectories (list[Trajectory])but the actual parameter type isTrajectoryCollection.@abstractmethod async def submit(self, trajectories: TrajectoryCollection) -> TrainingJobRef: """ Submit trajectories to remote training backend. Args: - trajectories (list[Trajectory]): The list of trajectories to submit. + trajectories (TrajectoryCollection): The collection of trajectories to submit. Returns: TrainingJobRef: Reference to the submitted training job. """ raise NotImplementedErrorsrc/nat/finetuning/interfaces/trainer_adapter.py-32-34 (1)
32-34: Type hint mismatch:run_configcan beNone.The type annotation claims
FinetuneConfigbut the initial value isNone. This will cause type checker errors.def __init__(self, adapter_config: TrainerAdapterConfig): self.adapter_config = adapter_config - self.run_config: FinetuneConfig = None + self.run_config: FinetuneConfig | None = None
🧹 Nitpick comments (63)
src/nat/finetuning/utils/__init__.py (1)
1-15: Add module docstring after the license header.The file contains the required Apache-2.0 license header but is missing a module-level docstring. Per coding guidelines, every public module should have a Google-style docstring with a concise description ending with a period.
Add a module docstring after the license header:
# SPDX-FileCopyrightText: Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. # All rights reserved. # SPDX-License-Identifier: Apache-2.0 # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. + +"""Utilities for finetuning operations in the NeMo Agent toolkit."""tests/conftest.py (1)
58-60: Good environment setup; consideros.pathsepfor portability.The additions make sense for tests (propagate
SRC_DIRviaPYTHONPATHand pin the Dask worker interpreter), but you can make thePYTHONPATHhandling cross-platform by usingos.pathsepinstead of a hard-coded colon:-os.environ["PYTHONPATH"] = (f"{SRC_DIR}:{os.environ['PYTHONPATH']}" if "PYTHONPATH" in os.environ else SRC_DIR) +sep = os.pathsep +os.environ["PYTHONPATH"] = (f"{SRC_DIR}{sep}{os.environ['PYTHONPATH']}" if "PYTHONPATH" in os.environ else SRC_DIR)The current code is fine on POSIX; this change just improves Windows compatibility.
pyproject.toml (2)
105-105: Optional dependencyopenpipe-artis not in alphabetical order.Per the "Keep sorted!!!" comment at line 74, optional dependencies should be alphabetically ordered.
openpipe-artshould appear beforeredis(betweenmysqlandphoenix).Apply this diff to fix the ordering:
mysql = ["nvidia-nat-mysql"] +openpipe-art = ["nvidia-nat-openpipe-art"] redis = ["nvidia-nat-redis"] ... zep-cloud = ["nvidia-nat-zep-cloud"] -openpipe-art = ["nvidia-nat-openpipe-art"]
203-203: Workspace source is not alphabetically sorted.The workspace sources appear to be sorted alphabetically.
nvidia-nat-openpipe-artshould be placed betweennvidia-nat-opentelemetryandnvidia-nat-phoenix.nvidia-nat-opentelemetry = { workspace = true } +nvidia-nat-openpipe-art = { workspace = true } nvidia-nat-phoenix = { workspace = true } ... nvidia-nat-zep-cloud = { workspace = true } -nvidia-nat-openpipe-art = { workspace = true }examples/finetuning/rl_with_openpipe_art/README.md (4)
169-174: Add language specifier to fenced code block.Static analysis flags this code block as missing a language specifier.
-``` +```text INFO: Started server process INFO: Waiting for application startup.
314-329: Add language specifier to fenced code block.Static analysis flags this code block as missing a language specifier.
-``` +```text INFO - Starting finetuning with config: src/rl_with_openpipe_art/configs/config.yml
337-337: Minor grammar fix: use hyphenation for compound adjective.Per LanguageTool hint, "alpha-beta pruning based reward" should be hyphenated when used as a compound modifier.
-The reward function is the key to effective RL training. This example uses a sophisticated **alpha-beta pruning based reward** instead of simple win/loss signals. +The reward function is the key to effective RL training. This example uses a sophisticated **alpha-beta-pruning-based reward** instead of simple win/loss signals.
517-522: Add language specifier to fenced code block.Static analysis flags this code block as missing a language specifier.
-``` +```text .tmp/nat/finetuning/tic_tac_toe/ ├── training_metrics.jsonl # Per-epoch metricssrc/nat/data_models/component.py (2)
44-46: Enum members should be alphabetically sorted.Per the "Keep sorted!!!" comment, these new entries should be reordered.
TRAINER_ADAPTERshould come betweenTRAINERandTRAJECTORY_BUILDER.TRAINER = "trainer" + TRAINER_ADAPTER = "trainer_adapter" TRAJECTORY_BUILDER = "trajectory_builder" - TRAINER_ADAPTER = "trainer_adapter" UNDEFINED = "undefined"
62-64: Enum members should be alphabetically sorted.Per the "Keep sorted!!!" comment, these new entries should be reordered alphabetically.
RETRIEVERS = "retrievers" + TRAINER_ADAPTERS = "trainer_adapters" TRAINERS = "trainers" - TRAINER_ADAPTERS = "trainer_adapters" TRAJECTORY_BUILDERS = "trajectory_builders"examples/finetuning/rl_with_openpipe_art/src/rl_with_openpipe_art/llm_agents.py (2)
62-65: Misleading docstring: legacy parsing not implemented.The docstring states "Try XML first, then legacy 'row,col'" but the function only delegates to
parse_move_xmlwithout any fallback parsing.Either update the docstring to match the implementation, or implement the legacy fallback:
def parse_move_any(text: str) -> tuple[int, int] | None: - """Try XML first, then legacy 'row,col'.""" + """Parse move from XML format.""" mv = parse_move_xml(text) return mv
100-100: Remove debug comment artifact.The comment
# ruffappears to be a leftover debug artifact and should be removed.- # ruff for attempt in range(0, self.max_retries + 1):src/nat/data_models/finetuning.py (2)
118-118: Typo in comment."logprobs can't be none of role is assistant" should be "logprobs can't be none if role is assistant".
- # Add model validator after construction that checks that logprobs can't be none of role is assistant + # Add model validator after construction that checks that logprobs can't be none if role is assistant
212-212: UseField()for consistency with other field declarations.This field uses a bare default value instead of
Field(), which is inconsistent with the other fields in the class and loses the ability to add a description.- target_functions: list[str] = ["<workflow>"] + target_functions: list[str] = Field( + default=["<workflow>"], + description="List of target function names to fine-tune.")examples/finetuning/rl_with_openpipe_art/src/rl_with_openpipe_art/configs/config_post_train.yml (1)
1-26: Config wiring looks correct; consider 4‑space indentation for YAMLThe config structure (LLM, workflow, eval, evaluator) is coherent for the RL with OpenPipe ART example, and the
api_keyis a non‑secret placeholder, which is good for checked‑in examples. For consistency with the repo guidelines (**/*.{yaml,yml}→ 4‑space indentation), consider reindenting nested mappings to 4 spaces in a follow‑up pass.examples/finetuning/rl_with_openpipe_art/src/rl_with_openpipe_art/configs/config_pre_train.yml (1)
1-23: Pre‑train config is coherent; consider matching YAML indentation conventionThe pre‑train config’s structure (LLM, workflow, eval, evaluator) is consistent with the post‑train setup and the described RL workflow. As with
config_post_train.yml, you may want to switch nested keys to 4‑space indentation to match the repository’s YAML style guideline.src/nat/cli/commands/finetune.py (1)
84-132: Refine CLI signature and exception handling to match runtime and linting expectations.A few small tweaks will make this command align with the project’s typing and error‑handling guidance and resolve the Ruff hints.
- Mark optional options as
Path | None/str | None.- Use a context parameter for
@click.pass_contextand silence its unused status explicitly.- Wrap exceptions with
ClickExceptionwhile preserving the cause and logging the stack trace.-@click.pass_context -def finetune_command( - processors, # pylint: disable=unused-argument - *, - config_file: Path, - dataset: Path, - result_json_path: str, - endpoint: str, - endpoint_timeout: int, - override: tuple[tuple[str, str], ...], - validation_dataset: Path, - validation_interval: int, - validation_config_file: Path, -): +@click.pass_context +def finetune_command( + ctx: click.Context, # noqa: ARG001 + *, + config_file: Path, + dataset: Path | None, + result_json_path: str, + endpoint: str | None, + endpoint_timeout: int, + override: tuple[tuple[str, str], ...], + validation_dataset: Path | None, + validation_interval: int, + validation_config_file: Path | None, +): @@ - except Exception as e: - logger.error("Finetuning failed: %s", e) - raise click.ClickException(str(e)) + except Exception as exc: # noqa: BLE001 + logger.exception("Finetuning failed") + raise click.ClickException(str(exc)) from excThis keeps the UX of wrapping all errors in a
ClickExceptionwhile satisfying the project’s linting and exception‑handling conventions.examples/finetuning/rl_with_openpipe_art/src/rl_with_openpipe_art/register.py (1)
15-19: Tighten Ruff suppression for side‑effect imports.Ruff reports the
noqa: F401directive as unused here. Since this module intentionally relies on import side effects, a file‑levelruff: noqais simpler and avoids RUF100.-# Import the generated workflow function to trigger registration -# Add flags to ignore unused import with ruff and flak8 -# ruff: noqa: F401 -from .evaluator_register import register_accuracy_evaluator -from .rl_with_openpipe_art import rl_with_openpipe_art_function +# Import the generated workflow function to trigger registration. +# Suppress unused-import warnings for registration side effects. +# ruff: noqa +from .evaluator_register import register_accuracy_evaluator +from .rl_with_openpipe_art import rl_with_openpipe_art_functiondocs/source/reference/finetuning/concepts.md (1)
24-24: Fix NeMo toolkit naming on first mention.Per docs guidelines, the first textual mention should be “NVIDIA NeMo Agent toolkit”, with subsequent mentions as “NeMo Agent toolkit”. Please adjust the sentence starting with “The NeMo Agent Toolkit...” accordingly.
tests/nat/finetuning/interfaces/test_trajectory_builder.py (1)
91-96: Avoid passing unsupportedtarget_functionsintoFinetuneRunConfig.
FinetuneRunConfigdoesn’t declare atarget_functionsfield; pydantic will silently ignore it. To keep tests aligned with the schema, either drop this argument fromFinetuneRunConfig(...)or, if you need to control target functions, set them onFinetuneConfiginstead.Also applies to: 240-245
src/nat/finetuning/__init__.py (1)
20-24: Sort__all__to satisfy ruff style check.Ruff flags
__all__as unsorted. Reorder entries alphabetically, e.g.:-__all__ = [ - "Trainer", - "TrajectoryBuilder", - "TrainerAdapter", -] +__all__ = [ + "Trainer", + "TrainerAdapter", + "TrajectoryBuilder", +]docs/source/reference/finetuning/rl_with_openpipe.md (2)
42-65: Add explicit languages to fenced code blocks with diagrams.Markdown lint flags multiple fenced blocks without a language (the architecture/flow diagrams and console output). Consider annotating them as
textfor clarity and to satisfy MD040, for example:```text ┌─────────────────────────────────────────────────────────────────────────┐ ...Also applies to: 237-273, 289-400, 464-472 --- `189-220`: **Convert emphasized section labels into proper headings.** Labels like `**Backend Configuration**`, `**Training Arguments**`, and the troubleshooting titles are effectively sub‑headings. To appease MD036 and improve structure, convert them to markdown headings (e.g., `### Backend Configuration`) instead of bold text. Also applies to: 201-213, 214-220, 536-582 </blockquote></details> <details> <summary>src/nat/profiler/callbacks/langchain_callback_handler.py (1)</summary><blockquote> `46-53`: **Profiler changes align with updated models; watch for invalid tool schemas.** The token‑usage simplification and inclusion of `payload` on LLM/TOOL end events match the current `TokenUsageBaseModel` and `StreamEventData`/`TraceMetadata` structures and look good. One thing to be aware of: `_extract_tools_schema` now does a bare `ToolSchema(**tool)` for each entry in `invocation_params["tools"]`; if any tool dict is malformed, the callback will raise and break tracing. If you expect untrusted tool payloads, consider wrapping that construction in a small `try/except ValidationError` and skipping invalid tools. Also applies to: 90-101, 197-248, 279-298 </blockquote></details> <details> <summary>packages/nvidia_nat_openpipe_art/tests/test_trainer_adapter.py (1)</summary><blockquote> `171-172`: **Tidy regex literals and unused mock argument to satisfy ruff.** - The `match=` patterns in `pytest.raises(..., match="first message.*must be from 'user' or 'system'")` and the duplicate‑job assertion both contain regex metacharacters. To address RUF043 and avoid accidental mis‑escaping, make them raw strings, e.g. `match=r"first message.*must be from 'user' or 'system'"`. - `mock_traj_group` in `test_construct_trajectory_with_tool_calls` is never used; to clear ARG002 either rename it to `_mock_traj_group` or drop the corresponding `@patch` decorator if that patch isn’t needed. Also applies to: 281-282, 428-444 </blockquote></details> <details> <summary>tests/nat/finetuning/utils/parsers/test_base_parser.py (1)</summary><blockquote> `27-35`: **Docstring first line should end with a period.** Per coding guidelines, the first line of docstrings must be a concise description ending with a period. ```diff def create_mock_step(event_type, event_state, framework=None, data=None, name=None): - """Helper function to create mock IntermediateStep objects.""" + """Helper function to create mock `IntermediateStep` objects.""" step = MagicMock()packages/nvidia_nat_openpipe_art/tests/test_trainer.py (1)
41-68: Consider using fixture naming conventions per coding guidelines.Per project guidelines, pytest fixtures should define the
nameargument in the decorator, and the decorated function should use thefixture_prefix. This is a minor stylistic suggestion that could be deferred.Example for one fixture:
- @pytest.fixture - def trainer_config(self): + @pytest.fixture(name="trainer_config") + def fixture_trainer_config(self): """Create test trainer configuration.""" return ARTTrainerConfig(reward=RewardFunctionConfig(name="test_reward"))src/nat/finetuning/utils/parsers/llama_index_parser.py (2)
26-37: Remove unusednoqaand fix stale docstring parameter.
messageis used, so# noqa: ARG001is no longer needed, and the docstring still documents aprevious_messageargument that isn’t in the signature.Consider:
-def parse_to_openai_message(message: IntermediateStep) -> dict: # noqa: ARG001 - """ - Convert IntermediateStep to OpenAI-compatible message dictionary. - ... - message: An IntermediateStep object representing a single message. - previous_message: Previous message for context (reserved for future). +def parse_to_openai_message(message: IntermediateStep) -> dict: + """ + Convert an `IntermediateStep` to an OpenAI-compatible message dictionary. + ... + message: An `IntermediateStep` object representing a single message.
111-130: Align TOOL_END parsing with existing OpenAI/tool message conventions.The current implementation uses
role: "function"and omits atool_call_id, while the LangChain parser usesrole: "tool"plus an ID field. For consistency with the rest of the finetuning parsers and typical OpenAI-style tool messages, consider:-def _parse_tool_message(message: IntermediateStep) -> dict: - """Parse a tool/function response message from TOOL_END event.""" - result = {"role": "function"} +def _parse_tool_message(message: IntermediateStep) -> dict: + """Parse a tool/function response message from a `TOOL_END` event.""" + result = {"role": "tool"} @@ - # Add function name if available - if message.name: - result["name"] = message.name + # Add function/tool name and a stable ID if available + if message.name: + result["name"] = message.name + # Mirror LangChain parser behavior by providing a tool_call_id + result["tool_call_id"] = getattr(message.metadata, "tool_call_id", message.UUID if hasattr(message, "UUID") else 0)This keeps the LlamaIndex parser’s output shape closer to the existing LangChain parser and typical OpenAI tool-calling messages.
src/nat/finetuning/finetuning_runtime.py (1)
27-69: Reconcile Ruff TRY hints with project exception‑handling guidelines.
run_finetuningcorrectly useslogger.errorand re‑raises, which matches the project rule to avoidlogger.exceptionwhen re‑raising, but Ruff suggests otherwise (TRY400). Similarly, the explicitValueErrormessage infinetuning_maintriggers TRY003.To keep behavior and project guidelines while silencing Ruff, you can annotate the specific lines:
- except Exception as e: - logger.error("Finetuning failed: %s", e) + except Exception as e: + logger.error("Finetuning failed: %s", e) # noqa: TRY400 @@ - if not config.finetuning.enabled: - raise ValueError("Finetuning is not enabled in the provided configuration.") + if not config.finetuning.enabled: + raise ValueError("Finetuning is not enabled in the provided configuration.") # noqa: TRY003Also worth documenting (e.g., in the docstring) that
run_finetuningexpectsrunner.initialize()to have been called so thatrunner.run_configis populated.Also applies to: 83-88
packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/register.py (1)
29-41: Mark unusedbuilderparameter to satisfy Ruff without changing behavior.The
builderargument is required by theregister_*decorators but isn’t used in these implementations, which Ruff flags as ARG001. Renaming it to_buildermakes the intent clear and silences the warning:-@register_trajectory_builder(config_type=ARTTrajectoryBuilderConfig) -async def register_art_trajectory_builder(config: ARTTrajectoryBuilderConfig, builder: Builder): +@register_trajectory_builder(config_type=ARTTrajectoryBuilderConfig) +async def register_art_trajectory_builder(config: ARTTrajectoryBuilderConfig, _builder: Builder): @@ -@register_trainer_adapter(config_type=ARTTrainerAdapterConfig) -async def register_art_trainer_adapter(config: ARTTrainerAdapterConfig, builder: Builder): +@register_trainer_adapter(config_type=ARTTrainerAdapterConfig) +async def register_art_trainer_adapter(config: ARTTrainerAdapterConfig, _builder: Builder): @@ -@register_trainer(config_type=ARTTrainerConfig) -async def register_art_trainer(config: ARTTrainerConfig, builder: Builder): +@register_trainer(config_type=ARTTrainerConfig) +async def register_art_trainer(config: ARTTrainerConfig, _builder: Builder):Also applies to: 44-57, 59-71
src/nat/finetuning/interfaces/trajectory_builder.py (1)
33-36: TightenTrajectoryBuildertyping and docstrings, and usemetato satisfy Ruff.A few small adjustments will improve clarity and avoid ARG002:
run_configcan beNoneuntilinitializeis called.compute_rewardshould declare its return type and referencemeta.- The
finalizedocstring should match theTrajectoryCollectionreturn type.Suggested changes:
class TrajectoryBuilder(ABC): @@ - def __init__(self, trajectory_builder_config: TrajectoryBuilderConfig): - self.trajectory_builder_config = trajectory_builder_config - self.run_config: FinetuneConfig = None + def __init__(self, trajectory_builder_config: TrajectoryBuilderConfig): + self.trajectory_builder_config = trajectory_builder_config + self.run_config: FinetuneConfig | None = None @@ - async def finalize(self, run_id: str, meta: dict | None = None) -> TrajectoryCollection: + async def finalize(self, run_id: str, meta: dict | None = None) -> TrajectoryCollection: @@ - Returns: - list[Trajectory]: The list of constructed trajectories. + Returns: + TrajectoryCollection: The collection of constructed trajectories. @@ - async def compute_reward(self, output_item: EvalOutputItem, meta: dict | None = None): + async def compute_reward(self, output_item: EvalOutputItem, meta: dict | None = None) -> float: @@ - Returns: - float: The computed reward. - """ - return float(output_item.score) if output_item.score is not None else 0.0 + Returns: + float: The computed reward. + """ + # `meta` is accepted for extensibility in subclasses; unused in the default implementation. + _ = meta + return float(output_item.score) if output_item.score is not None else 0.0Also applies to: 78-88, 91-103
tests/nat/finetuning/interfaces/test_trainer.py (1)
179-187: Usetmp_pathinstead of a hard‑coded/tmppath in the log‑progress test.The hard‑coded
"/tmp/logs"path triggers Ruff’s S108 and ties the test to a specific filesystem layout. You can keep the intent while avoiding this by usingtmp_path:- def test_trainer_log_progress(self, trainer): + def test_trainer_log_progress(self, trainer, tmp_path): @@ - metrics = {"loss": 0.5, "accuracy": 0.95} - trainer.log_progress(epoch=1, metrics=metrics, output_dir="/tmp/logs") + metrics = {"loss": 0.5, "accuracy": 0.95} + output_dir = str(tmp_path / "logs") + trainer.log_progress(epoch=1, metrics=metrics, output_dir=output_dir) @@ - assert trainer.logged_progress[0]["output_dir"] == "/tmp/logs" + assert trainer.logged_progress[0]["output_dir"] == output_direxamples/finetuning/rl_with_openpipe_art/src/rl_with_openpipe_art/core.py (2)
55-60: Use unpacking for list concatenation.Per static analysis hint (RUF005), prefer unpacking syntax for clarity.
- return "\n".join([header] + rows) + return "\n".join([header, *rows])
130-136: Replacesetattrwith direct attribute assignment.Using
setattrwith a constant attribute name provides no benefit over direct assignment. Per static analysis hint (B010).outcome_cache: dict[tuple[tuple[int, ...], int], float] = getattr(evaluate_board_for_player, "_outcome_cache", None) if outcome_cache is None: outcome_cache = {} - setattr(evaluate_board_for_player, "_outcome_cache", outcome_cache) + evaluate_board_for_player._outcome_cache = outcome_cachepackages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trainer.py (3)
400-410: Remove redundantint()cast.
math.ceil()already returns anintin Python 3. The cast is unnecessary.if self.curriculum_config.random_subsample is not None: import random fraction = self.curriculum_config.random_subsample trajectory_group = trajectory_collection.trajectories[0] - max_required_trajectories = int(math.ceil(len(trajectory_group) * fraction)) + max_required_trajectories = math.ceil(len(trajectory_group) * fraction)
465-468: Remove redundantint()cast.Same issue—
math.ceil()returnsintin Python 3.# Calculate number of groups to include num_groups_to_include = max( 1, # Always include at least one group - int(math.ceil(len(group_stats) * self._curriculum_state["current_percentile"]))) + math.ceil(len(group_stats) * self._curriculum_state["current_percentile"]))
497-506: Remove redundantint()cast and hoistrandomimport.The
int()cast is unnecessary. Additionally, consider movingimport randomto the module level to avoid repeated imports during curriculum learning.if self.curriculum_config.random_subsample is not None: # Randomly select only a fraction of trajectory groups to use - import random fraction = self.curriculum_config.random_subsample # Max required groups is the theoretical max based on fraction - max_required_groups = int(math.ceil(len(group_stats) * fraction)) + max_required_groups = math.ceil(len(group_stats) * fraction)Add
import randomat the top of the file alongside other imports.src/nat/finetuning/utils/parsers/base_parser.py (1)
88-88: Replaceassertwith explicit exception for production safety.Assertions can be disabled with
python -O. For invariant checks that should always run, use an explicit exception.- assert not isinstance(parsed_msg, list), "TOOL_END or LLM_END should not produce multiple messages" + if isinstance(parsed_msg, list): + raise ValueError("TOOL_END or LLM_END should not produce multiple messages")src/nat/finetuning/interfaces/finetuning_runner.py (2)
54-61: Type annotations should allowNonefor initially unset attributes.These attributes are initialized to
Nonebut typed as non-optional. This can cause type checker warnings.def __init__(self, trainer_config: TrainerConfig, **kwargs) -> None: ... self.trainer_config = trainer_config - self.run_config: FinetuneConfig = None - self.curriculum_config = None - self.trajectory_builder: TrajectoryBuilder = None - self.trainer_adapter: TrainerAdapter = None + self.run_config: FinetuneConfig | None = None + self.curriculum_config: CurriculumLearningConfig | None = None + self.trajectory_builder: TrajectoryBuilder | None = None + self.trainer_adapter: TrainerAdapter | None = None # Curriculum learning state - self._curriculum_state = None + self._curriculum_state: dict[str, Any] | None = NoneYou'll need to import
CurriculumLearningConfigfrom the data_models module.
156-170: Unusedrun_idparameter.The
run_idparameter is documented but never used in the method body. Consider using it for logging context or removing it from the signature if not needed.async def run_validation_evaluation(self, epoch: int, run_id: str) -> dict[str, Any]: ... - logger.info("Running validation evaluation for epoch %d", epoch + 1) + logger.info("Running validation evaluation for epoch %d, run_id=%s", epoch + 1, run_id)src/nat/cli/register_workflow.py (3)
488-507: Add docstring toregister_trainerfunction.Per coding guidelines, all public APIs require Google-style docstrings. Other
register_*functions in this file have docstrings (e.g.,register_telemetry_exporter,register_front_end). Add a docstring describing the decorator's purpose.def register_trainer(config_type: type[TrainerConfigT]): + """ + Register a trainer component for fine-tuning LLMs. + + Args: + config_type: The trainer configuration type to register. + + Returns: + A decorator that wraps the build function as an async context manager. + """ def register_trainer_inner(fn: TrainerBuildCallableT[TrainerConfigT]) -> TrainerBuildCallableT[TrainerConfigT]:
510-531: Add docstring toregister_trainer_adapterfunction.Consistent with
register_trainer, this public registration helper should include a Google-style docstring.def register_trainer_adapter(config_type: type[TrainerAdapterConfigT]): + """ + Register a trainer adapter component. + + Trainer adapters are responsible for adapting the training process to different frameworks. + + Args: + config_type: The trainer adapter configuration type to register. + + Returns: + A decorator that wraps the build function as an async context manager. + """ def register_trainer_adapter_inner(
534-555: Add docstring toregister_trajectory_builderfunction.Same pattern as above—add a docstring for consistency and discoverability.
def register_trajectory_builder(config_type: type[TrajectoryBuilderConfigT]): + """ + Register a trajectory builder component. + + Trajectory builders are responsible for building trajectories for fine-tuning. + + Args: + config_type: The trajectory builder configuration type to register. + + Returns: + A decorator that wraps the build function as an async context manager. + """ def register_trajectory_builder_inner(examples/finetuning/rl_with_openpipe_art/src/rl_with_openpipe_art/accuracy_evaluator.py (4)
45-47: Replace assertion with explicit error handling.Using
assertfor validation can be disabled with-Oflag in production. For robustness, use an explicit check with a raised exception.s = np.asarray(state_values, dtype=float) T = len(s) - 1 - assert T >= 0 + if T < 0: + raise ValueError("state_values must not be empty")
87-88: Remove commented-out code.Dead code should be removed to keep the codebase clean. If the alternative calculation is needed for reference, document it elsewhere.
- #average_score = score_sum / max(1, len(item.trajectory)) average_score = AccuracyEvaluator.episode_value_from_states(scores)
74-96: Add docstring to_eval_with_stepsmethod.This method contains non-trivial logic for step-based evaluation. A docstring would improve maintainability.
@staticmethod async def _eval_with_steps(item: EvalInputItem) -> EvalOutputItem: + """Evaluate item using intermediate step values. + + Collects 'value' metadata from CUSTOM_END steps and computes + an episode score using time-discounted rewards. + + Args: + item: The evaluation input item with trajectory steps. + + Returns: + EvalOutputItem with computed average score and reasoning. + """ score_sum = 0.0
77-85: Unused variablescore_sum.The variable
score_sumis accumulated but never used (onlyscoreslist is passed toepisode_value_from_states). Remove it to avoid confusion.- score_sum = 0.0 scores = [] for step in item.trajectory: if step.event_type == IntermediateStepType.CUSTOM_END: payload = step.payload if payload.metadata and "value" in payload.metadata: step_score = float(payload.metadata["value"]) scores.append(step_score) - score_sum += step_scorepackages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.py (2)
143-144: Use%-style formatting in logger calls instead of f-strings.Per Python logging best practices, use lazy
%-formatting to avoid string interpolation when the log level is disabled.if not reward_results: - logger.warning(f"No reward results found for run_id: {run_id}, generation: {gen_idx}") + logger.warning("No reward results found for run_id: %s, generation: %d", run_id, gen_idx) continue
71-81: Consider usingmetaparameter or document why it's reserved.The
metaparameter is unused (flagged by static analysis). If it's intended for future use, add a docstring note or prefix with underscore to indicate it's intentionally unused.- async def start_run(self, run_id: str, meta: dict | None = None) -> None: + async def start_run(self, run_id: str, _meta: dict | None = None) -> None: """ Start multiple evaluation runs to collect trajectories. Args: run_id (str): The ID of the run. - meta (dict): Metadata for the run. + _meta (dict): Reserved for future use. Currently unused. """examples/finetuning/rl_with_openpipe_art/src/rl_with_openpipe_art/rl_with_openpipe_art.py (2)
220-221: Simplify redundant boolean expression.
True if condition else Falseis equivalent to justcondition.max_retries=max_retries, - choose_random=True if config.opponent_model is None else False, + choose_random=config.opponent_model is None, )Apply the same fix at line 236:
max_retries=max_retries, - choose_random=True if config.opponent_model is None else False, + choose_random=config.opponent_model is None, )
157-159: Clarify the return value when game is aborted.When
RuntimeErroris caught,current_player.steps(an integer) is returned instead of a game outcome. The subsequent logic (lines 242-258) handles this by converting tostr(winner). Consider adding a comment to clarify this behavior or using a more explicit sentinel value.except RuntimeError as _: logger.debug("*** Game aborted due to too many invalid moves. ***") - return current_player.steps + # Return step count as abort indicator; handled specially in outcome logic + return current_player.stepssrc/nat/finetuning/utils/parsers/langchain_parser.py (3)
70-78: Consider narrowing exception types for payload parsing.Catching bare
Exceptionobscures potential issues. Consider catching more specific exceptions likeTypeErrororValueError.if isinstance(payload, dict) and "message" in payload: # Handle dict payloads try: msg = AIMessage(**payload["message"]) - except Exception as _: + except (TypeError, ValueError, KeyError): try: msg = AIMessageChunk(**payload["message"]) - except Exception as _: + except (TypeError, ValueError, KeyError): msg = None
229-247: Remove commented-out code.Line 234 contains commented-out code that should be removed to maintain code cleanliness.
def _parse_dict_message(msg_dict: dict) -> dict: """Parse a dictionary-based message.""" result = {} # Extract role - #result["role"] = msg_dict.get("role", "user") if "role" in msg_dict: role = msg_dict["role"] elif "type" in msg_dict: role = msg_dict["type"] else: role = "user"
286-306: Add type hint fordataparameter.Adding
Anytype hint improves code clarity and IDE support.-def _extract_content(data) -> str: +def _extract_content(data: Any) -> str: """Extract string content from various data formats."""Note: You'll need to add
Anyto the imports fromtyping.tests/nat/finetuning/interfaces/test_trainer_adapter.py (2)
88-122: Fixtures should follow naming convention per coding guidelines.Per coding guidelines, pytest fixtures should define the
nameargument and usefixture_prefix for the function.- @pytest.fixture - def adapter_config(self): + @pytest.fixture(name="adapter_config") + def fixture_adapter_config(self): """Create a test adapter config.""" return TrainerAdapterConfig(type="test_adapter", reward=RewardFunctionConfig(name="test_reward")) - @pytest.fixture - def finetune_config(self, tmp_path): + @pytest.fixture(name="finetune_config") + def fixture_finetune_config(self, tmp_path): """Create a test finetune config.""" # ... rest of fixture - @pytest.fixture - def adapter(self, adapter_config): + @pytest.fixture(name="adapter") + def fixture_adapter(self, adapter_config): """Create a concrete adapter instance.""" return ConcreteTrainerAdapter(adapter_config=adapter_config) - @pytest.fixture - def sample_trajectories(self): + @pytest.fixture(name="sample_trajectories") + def fixture_sample_trajectories(self): """Create sample trajectories for testing."""As per coding guidelines.
205-224: Apply same fixture naming convention here.- @pytest.fixture - def failing_adapter_config(self): + @pytest.fixture(name="failing_adapter_config") + def fixture_failing_adapter_config(self): """Create an adapter config that might fail.""" - @pytest.fixture - def finetune_config(self, tmp_path): + @pytest.fixture(name="finetune_config") + def fixture_finetune_config(self, tmp_path): """Create a test finetune config."""As per coding guidelines.
src/nat/builder/builder.py (1)
273-315: Consider consistency for@experimentaldecorator on finetuning methods.The
add_*methods are marked@experimental(feature_name="Finetuning")but the correspondingget_*methods are not. If finetuning is experimental, consider marking all related methods consistently, or document the distinction.packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trainer_adapter.py (3)
117-118: Remove or document commented-out validation code.If the last-message-must-be-assistant constraint was intentionally relaxed, add a comment explaining why. Otherwise, remove the dead code.
- # if traj.episode[-1].role != EpisodeItemRole.ASSISTANT: - # raise ValueError("The last message in the trajectory must be from 'assistant'.") + # Note: The last message does not need to be from 'assistant' to allow + # for partial trajectories or user-initiated conversation endings.
187-189: Uselogger.exceptionto capture full stack trace.When catching and logging exceptions without re-raising, use
logger.exception()to capture the full stack trace. Per coding guidelines.except Exception as e: - logger.error(f"Error constructing trajectory: {e}. Skipping.") + logger.exception("Error constructing trajectory. Skipping.") continue
98-122:_validate_episode_ordercould be synchronous.This method performs only synchronous validation and doesn't await anything. Consider making it a regular method to avoid unnecessary coroutine overhead.
- async def _validate_episode_order(self, traj: Trajectory): + def _validate_episode_order(self, traj: Trajectory) -> None:And update the call site at line 168:
- await self._validate_episode_order(traj) + self._validate_episode_order(traj)packages/nvidia_nat_openpipe_art/tests/test_trajectory_builder.py (1)
40-71: Consider renaming fixtures to usefixture_prefix.As per coding guidelines, pytest fixtures should be named using the
fixture_prefix and define thenameargument in the decorator. This helps distinguish fixtures from regular values.@pytest.fixture - def builder_config(self): + def fixture_builder_config(self): """Create test trajectory builder configuration.""" return ARTTrajectoryBuilderConfig(num_generations=2, reward=RewardFunctionConfig(name="test_reward")) @pytest.fixture - def finetune_config(self, tmp_path): + def fixture_finetune_config(self, tmp_path): """Create test finetune configuration.""" ... @pytest.fixture - def builder(self, builder_config): + def fixture_builder(self, fixture_builder_config): """Create ARTTrajectoryBuilder instance.""" - return ARTTrajectoryBuilder(trajectory_builder_config=builder_config) + return ARTTrajectoryBuilder(trajectory_builder_config=fixture_builder_config)src/nat/builder/workflow_builder.py (1)
1063-1083: Consider making config getter methods synchronous.The new
get_trainer_config,get_trainer_adapter_config, andget_trajectory_builder_configmethods are marked asasync, but they only perform synchronous dictionary lookups. Most other config getters in this file (e.g.,get_llm_config,get_embedder_config,get_function_config) are synchronous.@override - async def get_trainer_config(self, trainer_name: str | TrainerRef) -> TrainerConfig: + def get_trainer_config(self, trainer_name: str | TrainerRef) -> TrainerConfig: if trainer_name not in self._trainers: raise ValueError(f"Trainer '{trainer_name}' not found") return self._trainers[trainer_name].config @override - async def get_trainer_adapter_config(self, trainer_adapter_name: str | TrainerAdapterRef) -> TrainerAdapterConfig: + def get_trainer_adapter_config(self, trainer_adapter_name: str | TrainerAdapterRef) -> TrainerAdapterConfig: if trainer_adapter_name not in self._trainer_adapters: raise ValueError(f"Trainer adapter '{trainer_adapter_name}' not found") return self._trainer_adapters[trainer_adapter_name].config @override - async def get_trajectory_builder_config( + def get_trajectory_builder_config( self, trajectory_builder_name: str | TrajectoryBuilderRef) -> (TrajectoryBuilderConfig): if trajectory_builder_name not in self._trajectory_builders: raise ValueError(f"Trajectory builder '{trajectory_builder_name}' not found") return self._trajectory_builders[trajectory_builder_name].configNote: This would require corresponding changes to the
Builderinterface andChildBuilderdelegation methods.
examples/finetuning/rl_with_openpipe_art/src/rl_with_openpipe_art/evaluator_register.py
Show resolved
Hide resolved
examples/finetuning/rl_with_openpipe_art/src/rl_with_openpipe_art/llm_agents.py
Show resolved
Hide resolved
packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trainer_adapter.py
Show resolved
Hide resolved
packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.py
Outdated
Show resolved
Hide resolved
Updated the relative paths to the Custom Evaluators page to ensure proper navigation in the documentation. This resolves incorrect linking issues in the finetuning reference guides. Signed-off-by: dnandakumar-nv <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (17)
docs/source/reference/finetuning/rl_with_openpipe.md (4)
20-20: Clarify NAT naming convention on first use.Line 20 references "NAT's finetuning harness" but should introduce the full product name first. Per coding guidelines, use "NVIDIA NeMo Agent Toolkit" on first mention, then "NeMo Agent Toolkit" for subsequent uses.
Apply this change:
-This guide covers the integration between NAT's finetuning harness and [OpenPipe ART](https://art.openpipe.ai/) (Agent Reinforcement Trainer), an open-source framework for teaching LLMs through reinforcement learning. +This guide covers the integration between the NVIDIA NeMo Agent Toolkit's finetuning harness and [OpenPipe ART](https://art.openpipe.ai/) (Agent Reinforcement Trainer), an open-source framework for teaching LLMs through reinforcement learning.
31-39: Reduce repetitive phrasing for better readability.The sentences at lines 35 and 38 both start with "You want to," which creates repetition and impacts flow. Consider rewriting to vary sentence structure.
Apply this change:
- You want to improve agent reliability on specific tasks - You have a way to score agent performance (even if you don't have "correct" answers) - You're working with agentic workflows that make decisions or take actions -- You want to iterate quickly with online training +- Online training enables rapid iteration and deployment
401-410: Clarify the phrasing for better readability.Line 404 uses a split infinitive ("to progressively include"). Consider restructuring for clearer expression while maintaining technical accuracy.
Apply this change:
-1. **Curriculum Learning**: The trainer implements curriculum learning to progressively include harder examples: +1. **Curriculum Learning**: The trainer implements curriculum learning by progressively including harder examples:
1-600: Verify ASCII diagram code blocks are properly formatted.The architectural diagrams at lines 42–65, 238–273, 290–330, and 354–400 are in fenced code blocks without language specification. While these are ASCII art rather than executable code, markdownlint recommends adding a language tag (e.g.,
```textor```plaintext) for clarity and linting compliance.Apply changes to all diagram blocks (example shown for lines 42–65):
-``` +```text ┌─────────────────────────────────────────────────────────────────────────┐Repeat this change for the diagrams at lines 238, 290, and 354.
docs/source/reference/finetuning/extending.md (13)
18-50: Introduce full product name on first reference.The document introduces the finetuning harness without first using the full product name. Per coding guidelines, use "NVIDIA NeMo Agent Toolkit" on first mention.
Apply this change:
-# Extending the Finetuning Harness - -This guide covers how to create custom components for the NAT finetuning harness. You'll learn about the three core interfaces, how to implement them, and best practices for creating robust, reusable components. +# Extending the Finetuning Harness + +This guide covers how to create custom components for the NVIDIA NeMo Agent Toolkit finetuning harness. You'll learn about the three core interfaces, how to implement them, and best practices for creating robust, reusable components.
26-50: Add language specification to architecture diagram code block.The ASCII art diagram lacks a language tag. Add
textorplaintextfor consistency with markdown linting standards.Apply this change:
-``` +```text ┌────────────────────────────────────────────────────────────────────────┐
66-111: Add Python language specification to code block.The Python code example demonstrating the
TrajectoryBuilderinterface lacks a language tag.Apply this change:
-```python +```python from abc import ABC, abstractmethod
135-158: Add Python language specification to code block.The configuration class example lacks a language tag.
Apply this change:
-```python +```python from pydantic import Field
170-190: Add Python language specification to code block.The registration module example lacks a language tag.
Apply this change:
-```python +```python from nat.builder.builder import Builder
198-247: Add Python language specification to code block.The
TrainerAdapterinterface definition lacks a language tag.Apply this change:
-```python +```python from abc import ABC, abstractmethod
262-288: Add Python language specification to code block.The trainer adapter configuration example lacks a language tag.
Apply this change:
-```python +```python from pydantic import BaseModel, Field
296-307: Add Python language specification to code block.The trainer adapter registration example lacks a language tag.
Apply this change:
-```python +```python from nat.builder.builder import Builder
315-401: Add Python language specification to code block.The
Trainerinterface definition lacks a language tag.Apply this change:
-```python +```python from abc import ABC, abstractmethod
430-437: Add Python language specification to code block.The error handling example lacks a language tag.
Apply this change:
-```python +```python async def run_epoch(self, epoch: int, run_id: str) -> TrainingJobRef | None:
443-447: Add Python language specification to code block.The logging example lacks a language tag.
Apply this change:
-```python +```python logger.info("Starting epoch %d with %d examples", epoch, num_examples)
472-491: Add Python language specification to code block.The testing example lacks a language tag.
Apply this change:
-```python +```python import pytest
497-556: Add YAML language specification to code block.The configuration example lacks a language tag.
Apply this change:
-```yaml +```yaml llms:
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
docs/source/reference/finetuning/extending.md(1 hunks)docs/source/reference/finetuning/rl_with_openpipe.md(1 hunks)
🧰 Additional context used
📓 Path-based instructions (8)
**/*.{md,rst,py}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Use 'NVIDIA NeMo Agent toolkit' on first use, then 'NeMo Agent toolkit' for subsequent references
Files:
docs/source/reference/finetuning/rl_with_openpipe.mddocs/source/reference/finetuning/extending.md
**/*.{md,rst}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Use 'NeMo Agent Toolkit' (capitalize 'T') when the name appears in headings
Files:
docs/source/reference/finetuning/rl_with_openpipe.mddocs/source/reference/finetuning/extending.md
docs/**/*.{md,rst}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Never use deprecated names: Agent Intelligence toolkit, aiqtoolkit, AgentIQ, AIQ/aiq in documentation (unless intentionally referring to deprecated versions)
Files:
docs/source/reference/finetuning/rl_with_openpipe.mddocs/source/reference/finetuning/extending.md
**/*.{py,js,ts,yaml,yml,json,md,rst}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Indent with 4 spaces, never tabs, and ensure every file ends with a single newline
Files:
docs/source/reference/finetuning/rl_with_openpipe.mddocs/source/reference/finetuning/extending.md
docs/source/**/*.{md,rst}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
docs/source/**/*.{md,rst}: Documentation must be clear and comprehensive, without TODOs, FIXMEs, or placeholder text
Ensure documentation is free of offensive or outdated terms
Ensure documentation is free of spelling mistakes and do not use words listed in ci/vale/styles/config/vocabularies/nat/reject.txt
Files:
docs/source/reference/finetuning/rl_with_openpipe.mddocs/source/reference/finetuning/extending.md
**/*.{py,md,rst}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Version numbers are derived automatically by setuptools-scm; never hard-code them in code or docs
Files:
docs/source/reference/finetuning/rl_with_openpipe.mddocs/source/reference/finetuning/extending.md
**/*
⚙️ CodeRabbit configuration file
**/*: # Code Review Instructions
- Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.- Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values (except for return values of
None,
in that situation no return type hint is needed).
Example:def my_function(param1: int, param2: str) -> bool: pass- For Python exception handling, ensure proper stack trace preservation:
- When re-raising exceptions: use bare
raisestatements to maintain the original stack trace,
and uselogger.error()(notlogger.exception()) to avoid duplicate stack trace output.- When catching and logging exceptions without re-raising: always use
logger.exception()
to capture the full stack trace information.Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any
words listed in the
ci/vale/styles/config/vocabularies/nat/reject.txtfile, words that might appear to be
spelling mistakes but are listed in theci/vale/styles/config/vocabularies/nat/accept.txtfile are OK.Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,
and should contain an Apache License 2.0 header comment at the top of each file.
- Confirm that copyright years are up-to date whenever a file is changed.
Files:
docs/source/reference/finetuning/rl_with_openpipe.mddocs/source/reference/finetuning/extending.md
docs/source/**/*
⚙️ CodeRabbit configuration file
This directory contains the source code for the documentation. All documentation should be written in Markdown format. Any image files should be placed in the
docs/source/_staticdirectory.
Files:
docs/source/reference/finetuning/rl_with_openpipe.mddocs/source/reference/finetuning/extending.md
🪛 LanguageTool
docs/source/reference/finetuning/rl_with_openpipe.md
[style] ~38-~38: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...at make decisions or take actions - You want to iterate quickly with online training #...
(REP_WANT_TO_VB)
[style] ~404-~404: Try moving the adverb to make the sentence clearer.
Context: ...ng**: The trainer implements curriculum learning to progressively include harder examples: - Groups trajectories by average re...
(SPLIT_INFINITIVE)
🪛 markdownlint-cli2 (0.18.1)
docs/source/reference/finetuning/extending.md
42-42: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
189-189: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
214-214: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
237-237: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
289-289: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
353-353: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
464-464: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
536-536: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
549-549: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
557-557: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
Corrected the case-insensitivity pattern for 'logprob' to ensure proper matching in the Vale style vocabulary. This update aligns with consistent handling of vocabulary terms. Signed-off-by: dnandakumar-nv <[email protected]>
Updated module and project naming to align with the "nat" prefix convention. Adjusted logging suppression prefix and enhanced setup instructions in README for better clarity, including specifying virtual environment and dependencies. Signed-off-by: dnandakumar-nv <[email protected]>
Updated module and project naming to align with the "nat" prefix convention. Adjusted logging suppression prefix and enhanced setup instructions in README for better clarity, including specifying virtual environment and dependencies. Signed-off-by: dnandakumar-nv <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
pyproject.toml (2)
100-105: Moveopenpipe-artto maintain alphabetical ordering in optional-dependencies.The section header at line 74 specifies "Keep sorted!!!" Reorder
openpipe-artto its alphabetically correct position betweenmem0aiandopentelemetry, not at the end afterzep-cloud.opentelemetry = ["nvidia-nat-opentelemetry"] -phoenix = ["nvidia-nat-phoenix"] +openpipe-art = ["nvidia-nat-openpipe-art"] +phoenix = ["nvidia-nat-phoenix"]
200-204: Movenvidia-nat-openpipe-artto maintain alphabetical ordering in workspace members.Reorder
nvidia-nat-openpipe-artto its alphabetically correct position. It should appear betweennvidia-nat-mem0aiandnvidia-nat-mysql(or afternvidia-nat-mysqland beforenvidia-nat-opentelemetry, depending on exact sort rules), not at the end afternvidia-nat-zep-cloud.-nvidia-nat-mem0ai = { workspace = true } +nvidia-nat-mem0ai = { workspace = true } +nvidia-nat-mysql = { workspace = true } +nvidia-nat-openpipe-art = { workspace = true }Then remove from line 204 (the current end-of-list location).
♻️ Duplicate comments (2)
src/nat/utils/io/supress_logs.py (2)
1-33: Consider renaming the module tosuppress_logs.pyto avoid a permanent typo in the API.The filename
supress_logs.pyis misspelled. If this utility becomes part of the publicnat.utils.iosurface, the typo will be baked into the import path. While this PR is still in flight, it’s safer to:
- Rename the file to
suppress_logs.py.- Update all imports in the PR to use
nat.utils.io.suppress_logs.This was already raised on a previous revision; addressing it now will prevent long-term API awkwardness.
20-33: Add type hints/docstring and correctly restorepropagatestate for each logger.
supress_logs.suppress_logsis a public async context manager and should follow project typing/docstring rules, and it still does not restorelogger.propagateto its original value:
- No type hints on
prefix/levelor the async context manager return type.- No Google-style docstring explaining behavior and parameters.
- You snapshot only
lg.level(line 24) but infinallyyou unconditionally setlg.propagate = True(line 33), which can permanently change logging behavior for loggers that previously hadpropagate = False.A minimal refactor that addresses all three points:
-import logging -from contextlib import asynccontextmanager +import logging +from collections.abc import AsyncIterator +from contextlib import asynccontextmanager @@ -@asynccontextmanager -async def suppress_logs(prefix, level=logging.ERROR): - # gather every logger created so far whose name starts with "nat" - loggers = [logging.getLogger(name) for name in logging.root.manager.loggerDict if name.startswith(prefix)] - old = {lg: lg.level for lg in loggers} +@asynccontextmanager +async def suppress_logs(prefix: str, level: int = logging.ERROR) -> AsyncIterator[None]: + """Temporarily raise log level and disable propagation for matching loggers. + + This adjusts existing ``logging.Logger`` instances whose names start with + ``prefix``, then restores their original configuration on exit. + + Args: + prefix: Logger name prefix to match (e.g., ``"nat"``). + level: Temporary log level to apply while inside the context. + """ + # Gather every logger created so far whose name starts with ``prefix``. + loggers = [logging.getLogger(name) for name in logging.root.manager.loggerDict if name.startswith(prefix)] + old = {lg: (lg.level, lg.propagate) for lg in loggers} @@ - finally: - for lg, lvl in old.items(): - lg.setLevel(lvl) - lg.propagate = True + finally: + for lg, (lvl, propagate) in old.items(): + lg.setLevel(lvl) + lg.propagate = propagateAlso add a short module-level Google-style docstring after the license header to describe this utility.
🧹 Nitpick comments (3)
examples/finetuning/rl_with_openpipe_art/README.md (1)
1-10: Use full "NVIDIA NeMo Agent Toolkit" name on first reference.Per coding guidelines, the first mention of the toolkit should use the full name "NVIDIA NeMo Agent Toolkit" before using the shortened form "NeMo Agent Toolkit".
Apply this diff:
-This example demonstrates how to use the NeMo Agent Toolkit finetuning harness with [OpenPipe ART](https://art.openpipe.ai/) (Agent Reinforcement Trainer) to improve an LLM's performance at playing Tic-Tac-Toe through reinforcement learning. +This example demonstrates how to use the NVIDIA NeMo Agent Toolkit finetuning harness with [OpenPipe ART](https://art.openpipe.ai/) (Agent Reinforcement Trainer) to improve an LLM's performance at playing Tic-Tac-Toe through reinforcement learning.packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.py (2)
79-80: Consider extracting the exception message to a constant.The exception message could be extracted to improve maintainability, though this is a minor style preference.
Example refactor:
+ _RUN_IN_PROGRESS_ERROR = "Run {run_id} is already in progress." + async def start_run(self, run_id: str, meta: dict | None = None) -> None: ... if run_id in self.evaluation_runs: - raise ValueError(f"Run {run_id} is already in progress.") + raise ValueError(self._RUN_IN_PROGRESS_ERROR.format(run_id=run_id))
124-125: Consider extracting the exception message to a constant.Similar to the earlier suggestion, this exception message could be extracted for consistency.
Example refactor:
+ _NO_EVAL_RUNS_ERROR = "No evaluation runs found for run_id: {run_id}" + async def finalize(self, run_id: str, meta: dict | None = None) -> TrajectoryCollection: ... if run_id not in self.evaluation_runs: - raise ValueError(f"No evaluation runs found for run_id: {run_id}") + raise ValueError(self._NO_EVAL_RUNS_ERROR.format(run_id=run_id))
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (5)
examples/finetuning/rl_with_openpipe_art/README.md(1 hunks)examples/finetuning/rl_with_openpipe_art/pyproject.toml(1 hunks)packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.py(1 hunks)pyproject.toml(5 hunks)src/nat/utils/io/supress_logs.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- examples/finetuning/rl_with_openpipe_art/pyproject.toml
🧰 Additional context used
📓 Path-based instructions (14)
**/*.{md,rst,py}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Use 'NVIDIA NeMo Agent toolkit' on first use, then 'NeMo Agent toolkit' for subsequent references
Files:
src/nat/utils/io/supress_logs.pypackages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.pyexamples/finetuning/rl_with_openpipe_art/README.md
**/*.{py,toml,yaml,yml}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Use abbreviations: 'nat' for API namespace and CLI tool, 'nvidia-nat' for package name, 'NAT' for environment variable prefixes and informal comments
Files:
src/nat/utils/io/supress_logs.pypackages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.pypyproject.toml
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
**/*.py: Follow PEP 20 and PEP 8 for Python style guidelines
Run yapf second (PEP 8 base, column_limit = 120) for Python formatting
Use ruff check --fix (via pre-commit) for linting using configuration embedded in pyproject.toml, fix warnings unless explicitly ignored
Use snake_case for functions and variables, PascalCase for classes, UPPER_CASE for constants
All public APIs require Python 3.11+ type hints on parameters and return values
Prefer collections.abc / typing abstractions (Sequence over list) for type hints
Use typing.Annotated for units or extra metadata when useful
Treat pyright warnings (configured in pyproject.toml) as errors during development
Preserve stack traces and prevent duplicate logging when handling exceptions; use bare raise statements and logger.error() when re-raising
When catching and logging exceptions without re-raising, always use logger.exception() to capture the full stack trace information
Provide Google-style docstrings for every public module, class, function and CLI command
The first line of docstrings must be a concise description ending with a period (Vale checks this)
Surround code entities with backticks in docstrings to avoid Vale false-positives
Validate and sanitise all user input, especially in web or CLI interfaces
Prefer httpx with SSL verification enabled by default and follow OWASP Top-10 recommendations
Use async/await for I/O-bound work (HTTP, DB, file reads)
Cache expensive computations with functools.lru_cache or an external cache when appropriate
Leverage NumPy vectorised operations whenever beneficial and feasible
Files:
src/nat/utils/io/supress_logs.pypackages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.py
**/*.{py,js,ts,yaml,yml,json,md,rst}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Indent with 4 spaces, never tabs, and ensure every file ends with a single newline
Files:
src/nat/utils/io/supress_logs.pypackages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.pyexamples/finetuning/rl_with_openpipe_art/README.md
**/*.{py,env,toml,yaml,yml,json}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Never commit API keys, credentials or personal data; use environment variables or .env files excluded from Git
Files:
src/nat/utils/io/supress_logs.pypackages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.pypyproject.toml
**/*.{py,js,ts,java,cpp,c,go,rb,php}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Every file must start with the standard SPDX Apache-2.0 header
Files:
src/nat/utils/io/supress_logs.pypackages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.py
**/*.{py,md,rst}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Version numbers are derived automatically by setuptools-scm; never hard-code them in code or docs
Files:
src/nat/utils/io/supress_logs.pypackages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.pyexamples/finetuning/rl_with_openpipe_art/README.md
**/*.{py,js,ts,java,cpp,c,go,rb,php,sh}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
All source files must include the SPDX Apache-2.0 header template
Files:
src/nat/utils/io/supress_logs.pypackages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.py
**/*
⚙️ CodeRabbit configuration file
**/*: # Code Review Instructions
- Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.- Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values (except for return values of
None,
in that situation no return type hint is needed).
Example:def my_function(param1: int, param2: str) -> bool: pass- For Python exception handling, ensure proper stack trace preservation:
- When re-raising exceptions: use bare
raisestatements to maintain the original stack trace,
and uselogger.error()(notlogger.exception()) to avoid duplicate stack trace output.- When catching and logging exceptions without re-raising: always use
logger.exception()
to capture the full stack trace information.Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any
words listed in the
ci/vale/styles/config/vocabularies/nat/reject.txtfile, words that might appear to be
spelling mistakes but are listed in theci/vale/styles/config/vocabularies/nat/accept.txtfile are OK.Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,
and should contain an Apache License 2.0 header comment at the top of each file.
- Confirm that copyright years are up-to date whenever a file is changed.
Files:
src/nat/utils/io/supress_logs.pypackages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.pypyproject.tomlexamples/finetuning/rl_with_openpipe_art/README.md
src/nat/**/*
⚙️ CodeRabbit configuration file
This directory contains the core functionality of the toolkit. Changes should prioritize backward compatibility.
Files:
src/nat/utils/io/supress_logs.py
packages/**/*
⚙️ CodeRabbit configuration file
packages/**/*: - This directory contains optional plugin packages for the toolkit, each should contain apyproject.tomlfile. - Thepyproject.tomlfile should declare a dependency onnvidia-nator another package with a name starting
withnvidia-nat-. This dependency should be declared using~=<version>, and the version should be a two
digit version (ex:~=1.0).
- Not all packages contain Python code, if they do they should also contain their own set of tests, in a
tests/directory at the same level as thepyproject.tomlfile.
Files:
packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.py
**/pyproject.toml
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
**/pyproject.toml: Dependencies should use ~= format with two-digit versions (e.g., ~=1.0)
New dependencies must be added to both pyproject.toml (alphabetically) and uv.lock via uv pip install --sync
Files:
pyproject.toml
**/*.{md,rst}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Use 'NeMo Agent Toolkit' (capitalize 'T') when the name appears in headings
Files:
examples/finetuning/rl_with_openpipe_art/README.md
examples/**/*
⚙️ CodeRabbit configuration file
examples/**/*: - This directory contains example code and usage scenarios for the toolkit, at a minimum an example should
contain a README.md or file README.ipynb.
- If an example contains Python code, it should be placed in a subdirectory named
src/and should
contain apyproject.tomlfile. Optionally, it might also contain scripts in ascripts/directory.- If an example contains YAML files, they should be placed in a subdirectory named
configs/. - If an example contains sample data files, they should be placed in a subdirectory nameddata/, and should
be checked into git-lfs.
Files:
examples/finetuning/rl_with_openpipe_art/README.md
🧠 Learnings (4)
📚 Learning: 2025-11-10T21:26:35.059Z
Learnt from: jiaxiangr
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 974
File: packages/nvidia_nat_all/pyproject.toml:39-39
Timestamp: 2025-11-10T21:26:35.059Z
Learning: In packages/nvidia_nat_all/pyproject.toml, workspace dependencies (nvidia-nat-* plugin packages) should NOT have version constraints because they are managed as workspace dependencies. Version constraints are only applied to the base nvidia-nat package and external dependencies, not to internal workspace packages.
Applied to files:
pyproject.toml
📚 Learning: 2025-11-24T18:56:53.109Z
Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-11-24T18:56:53.109Z
Learning: Applies to packages/*/pyproject.toml : The pyproject.toml should declare a dependency on nvidia-nat or another package with a name starting with nvidia-nat-
Applied to files:
pyproject.toml
📚 Learning: 2025-11-24T18:56:53.109Z
Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-11-24T18:56:53.109Z
Learning: Applies to **/pyproject.toml : New dependencies must be added to both pyproject.toml (alphabetically) and uv.lock via uv pip install <pkg> --sync
Applied to files:
pyproject.toml
📚 Learning: 2025-11-24T18:56:53.109Z
Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-11-24T18:56:53.109Z
Learning: Applies to **/*.{py,toml,yaml,yml} : Use abbreviations: 'nat' for API namespace and CLI tool, 'nvidia-nat' for package name, 'NAT' for environment variable prefixes and informal comments
Applied to files:
pyproject.toml
🧬 Code graph analysis (1)
packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.py (4)
src/nat/data_models/finetuning.py (4)
EpisodeItem(109-123)EpisodeItemRole(99-106)Trajectory(126-134)TrajectoryCollection(137-143)src/nat/data_models/intermediate_step.py (2)
IntermediateStep(236-311)IntermediateStepCategory(32-39)src/nat/finetuning/interfaces/trajectory_builder.py (6)
TrajectoryBuilder(28-114)start_run(67-75)run_eval(44-64)finalize(78-89)compute_reward(91-102)log_progress(105-114)src/nat/finetuning/utils/parsers/base_parser.py (1)
parse_to_openai_messages(29-102)
🪛 LanguageTool
examples/finetuning/rl_with_openpipe_art/README.md
[grammar] ~335-~335: Use a hyphen to join words.
Context: ...ses a sophisticated alpha-beta pruning based reward instead of simple win/los...
(QB_NEW_EN_HYPHEN)
🪛 markdownlint-cli2 (0.18.1)
examples/finetuning/rl_with_openpipe_art/README.md
166-166: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
312-312: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
515-515: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🪛 Ruff (0.14.7)
packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.py
71-71: Unused method argument: meta
(ARG002)
80-80: Avoid specifying long messages outside the exception class
(TRY003)
125-125: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (3)
examples/finetuning/rl_with_openpipe_art/README.md (1)
101-123: Specify language identifiers on all fenced code blocks.Several code blocks lack language specifiers, which is flagged by markdownlint (MD040). Add the appropriate language identifier after the opening triple backticks.
Apply these diffs for clarity:
-``` +```python @register_function(config_type=RlWithOpenpipeArtFunctionConfig) async def rl_with_openpipe_art_function(config, builder):-``` +```bash INFO: Started server process [3671624] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:7623 (Press CTRL+C to quit)-``` +```python def evaluate_board_for_player(board: np.ndarray, player_val: int) -> float:-``` +```python def static_eval(b: np.ndarray) -> float:-``` +```python def solve_outcome(b: np.ndarray, side_to_move: int, alpha=-1.0, beta=1.0) -> float:-``` +```python @staticmethod def episode_value_from_states(-``` +```python # In rl_with_openpipe_art.py if current_player.name == self.role:-``` +```text .tmp/nat/finetuning/tic_tac_toe/ ├── training_metrics.jsonl # Per-epoch metrics ├── reward_history.json # Reward progressionAlso applies to: 222-227, 369-380, 387-398, 403-425, 443-472, 490-503, 515-520
⛔ Skipped due to learnings
Learnt from: CR Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0 File: .cursor/rules/general.mdc:0-0 Timestamp: 2025-11-24T18:56:53.109Z Learning: Applies to **/*.py : Surround code entities with backticks in docstrings to avoid Vale false-positivespyproject.toml (1)
171-176: Verify the override-dependencies version constraints align with project guidelines.The new overrides at lines 174–175 (
litellm==1.74.9andopenai>=1.106,<2.dev0) use exact pins and range constraints, while the coding guidelines recommend~=<version>format with two-digit versions. The comment explains these overrides prevent regressions from the OpenPipe ART dependency. However, please confirm:
- Is this deviation from the
~=format intentional to maintain exact compatibility?- Have these version overrides been synchronized with
uv.lockviauv pip install <pkg> --sync?packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.py (1)
229-231: The type mismatch issue has been correctly resolved.The past review comment flagged a type mismatch where
item.event_type(which returnsIntermediateStepType) was compared toIntermediateStepCategory.LLM. The code now correctly usesitem.event_category, which returnsIntermediateStepCategoryand matches the expected enum type.
packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.py
Outdated
Show resolved
Hide resolved
Changed the Python version to 3.11 and specified the exact OpenPipe ART package version for consistent setup. This ensures compatibility and avoids potential installation issues. Signed-off-by: dnandakumar-nv <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
examples/finetuning/rl_with_openpipe_art/README.md (1)
334-334: Fix hyphenation in compound modifier.The phrase "alpha-beta pruning based reward" should use proper hyphenation for the compound modifier: "alpha-beta pruning-based reward".
-The reward function is the key to effective RL training. This example uses a sophisticated **alpha-beta pruning based reward** instead of simple win/loss signals. +The reward function is the key to effective RL training. This example uses a sophisticated **alpha-beta pruning-based reward** instead of simple win/loss signals.
🧹 Nitpick comments (4)
examples/finetuning/rl_with_openpipe_art/README.md (4)
3-3: Correct toolkit name on first use.Per coding guidelines, the first reference should be "NVIDIA NeMo Agent Toolkit" rather than "NeMo Agent Toolkit".
-This example demonstrates how to use the NeMo Agent Toolkit finetuning harness with [OpenPipe ART](https://art.openpipe.ai/) (Agent Reinforcement Trainer) to improve an LLM's performance at playing Tic-Tac-Toe through reinforcement learning. +This example demonstrates how to use the NVIDIA NeMo Agent Toolkit finetuning harness with [OpenPipe ART](https://art.openpipe.ai/) (Agent Reinforcement Trainer) to improve an LLM's performance at playing Tic-Tac-Toe through reinforcement learning.
165-170: Add language specifier to curl command block.Markdown requires code blocks to have a language specified. This curl command should be labeled as
bash.Verify the server is running: -``` +```bash curl http://localhost:8000/v1/models -``` +```
311-326: Add language specifier to console output block.This block showing training progress should specify a language (e.g.,
bashortextfor console output).Training progress is logged to the console and saved to files: -``` +```bash INFO - Starting finetuning with config: src/rl_with_openpipe_art/configs/config.yml INFO - Initializing OpenPipe ART Runner INFO - Successfully registered with ART backend.
514-519: Add language specifier to directory structure block.The file tree output should specify a language identifier for markdown compliance.
After training, check the output directory: -``` +```text .tmp/nat/finetuning/tic_tac_toe/ ├── training_metrics.jsonl # Per-epoch metrics ├── reward_history.json # Reward progression ├── reward_plot.png # Visual reward chart -``` +```
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
examples/finetuning/rl_with_openpipe_art/README.md(1 hunks)
🧰 Additional context used
📓 Path-based instructions (6)
**/*.{md,rst,py}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Use 'NVIDIA NeMo Agent toolkit' on first use, then 'NeMo Agent toolkit' for subsequent references
Files:
examples/finetuning/rl_with_openpipe_art/README.md
**/*.{md,rst}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Use 'NeMo Agent Toolkit' (capitalize 'T') when the name appears in headings
Files:
examples/finetuning/rl_with_openpipe_art/README.md
**/*.{py,js,ts,yaml,yml,json,md,rst}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Indent with 4 spaces, never tabs, and ensure every file ends with a single newline
Files:
examples/finetuning/rl_with_openpipe_art/README.md
**/*.{py,md,rst}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Version numbers are derived automatically by setuptools-scm; never hard-code them in code or docs
Files:
examples/finetuning/rl_with_openpipe_art/README.md
**/*
⚙️ CodeRabbit configuration file
**/*: # Code Review Instructions
- Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.- Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values (except for return values of
None,
in that situation no return type hint is needed).
Example:def my_function(param1: int, param2: str) -> bool: pass- For Python exception handling, ensure proper stack trace preservation:
- When re-raising exceptions: use bare
raisestatements to maintain the original stack trace,
and uselogger.error()(notlogger.exception()) to avoid duplicate stack trace output.- When catching and logging exceptions without re-raising: always use
logger.exception()
to capture the full stack trace information.Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any
words listed in the
ci/vale/styles/config/vocabularies/nat/reject.txtfile, words that might appear to be
spelling mistakes but are listed in theci/vale/styles/config/vocabularies/nat/accept.txtfile are OK.Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,
and should contain an Apache License 2.0 header comment at the top of each file.
- Confirm that copyright years are up-to date whenever a file is changed.
Files:
examples/finetuning/rl_with_openpipe_art/README.md
examples/**/*
⚙️ CodeRabbit configuration file
examples/**/*: - This directory contains example code and usage scenarios for the toolkit, at a minimum an example should
contain a README.md or file README.ipynb.
- If an example contains Python code, it should be placed in a subdirectory named
src/and should
contain apyproject.tomlfile. Optionally, it might also contain scripts in ascripts/directory.- If an example contains YAML files, they should be placed in a subdirectory named
configs/. - If an example contains sample data files, they should be placed in a subdirectory nameddata/, and should
be checked into git-lfs.
Files:
examples/finetuning/rl_with_openpipe_art/README.md
🪛 LanguageTool
examples/finetuning/rl_with_openpipe_art/README.md
[grammar] ~334-~334: Use a hyphen to join words.
Context: ...ses a sophisticated alpha-beta pruning based reward instead of simple win/los...
(QB_NEW_EN_HYPHEN)
🪛 markdownlint-cli2 (0.18.1)
examples/finetuning/rl_with_openpipe_art/README.md
165-165: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
311-311: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
514-514: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
Refactor token usage extraction to capture additional details like cached and reasoning tokens. Add robust handling of tool outputs in callbacks, improving support for server-side tool usage and error resilience during schema parsing. Signed-off-by: dnandakumar-nv <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
src/nat/profiler/callbacks/llama_index_callback_handler.py (1)
211-213: LGTM! Payload field correctly propagated.The addition of
payload=responsetoStreamEventDataaligns with the finetuning architecture for capturing raw generation data.Optional consistency improvement: Consider whether
responseshould be deep copied for consistency with the tool event handling at line 227, which usescopy.deepcopy. If theChatResponseobject won't be mutated elsewhere, the current approach is fine.packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.py (2)
56-67: Unusedmetaparameter instart_run.
metais accepted but never used instart_run, which triggers ARG002 and can confuse readers. If the baseTrajectoryBuilderinterface requires the parameter, consider renaming it to_meta(or similar) to indicate intentional non-use and satisfy linters.
288-298: Verify intended shape ofTrajectoryCollection.trajectoriesfor a single generation.The comments say “Flatten the trajectories list into a 1 d list of trajectories” and “If only one generation, return flat list”, but the code still wraps the flattened list in another list:
flat_trajectories = [traj for sublist in trajectories_list for traj in sublist] return TrajectoryCollection(trajectories=[flat_trajectories], run_id=run_id)If
TrajectoryCollection.trajectoriesis meant to be flat whennum_generations == 1, this should likely be:return TrajectoryCollection(trajectories=flat_trajectories, run_id=run_id)If the outer list dimension is required by the broader finetuning API, then the code is correct and the comments should be updated to reflect that it remains a nested structure even for a single generation.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.py(1 hunks)src/nat/profiler/callbacks/langchain_callback_handler.py(2 hunks)src/nat/profiler/callbacks/llama_index_callback_handler.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (11)
**/*.{md,rst,py}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Use 'NVIDIA NeMo Agent toolkit' on first use, then 'NeMo Agent toolkit' for subsequent references
Files:
src/nat/profiler/callbacks/llama_index_callback_handler.pypackages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.pysrc/nat/profiler/callbacks/langchain_callback_handler.py
**/*.{py,toml,yaml,yml}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Use abbreviations: 'nat' for API namespace and CLI tool, 'nvidia-nat' for package name, 'NAT' for environment variable prefixes and informal comments
Files:
src/nat/profiler/callbacks/llama_index_callback_handler.pypackages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.pysrc/nat/profiler/callbacks/langchain_callback_handler.py
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
**/*.py: Follow PEP 20 and PEP 8 for Python style guidelines
Run yapf second (PEP 8 base, column_limit = 120) for Python formatting
Use ruff check --fix (via pre-commit) for linting using configuration embedded in pyproject.toml, fix warnings unless explicitly ignored
Use snake_case for functions and variables, PascalCase for classes, UPPER_CASE for constants
All public APIs require Python 3.11+ type hints on parameters and return values
Prefer collections.abc / typing abstractions (Sequence over list) for type hints
Use typing.Annotated for units or extra metadata when useful
Treat pyright warnings (configured in pyproject.toml) as errors during development
Preserve stack traces and prevent duplicate logging when handling exceptions; use bare raise statements and logger.error() when re-raising
When catching and logging exceptions without re-raising, always use logger.exception() to capture the full stack trace information
Provide Google-style docstrings for every public module, class, function and CLI command
The first line of docstrings must be a concise description ending with a period (Vale checks this)
Surround code entities with backticks in docstrings to avoid Vale false-positives
Validate and sanitise all user input, especially in web or CLI interfaces
Prefer httpx with SSL verification enabled by default and follow OWASP Top-10 recommendations
Use async/await for I/O-bound work (HTTP, DB, file reads)
Cache expensive computations with functools.lru_cache or an external cache when appropriate
Leverage NumPy vectorised operations whenever beneficial and feasible
Files:
src/nat/profiler/callbacks/llama_index_callback_handler.pypackages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.pysrc/nat/profiler/callbacks/langchain_callback_handler.py
**/*.{py,js,ts,yaml,yml,json,md,rst}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Indent with 4 spaces, never tabs, and ensure every file ends with a single newline
Files:
src/nat/profiler/callbacks/llama_index_callback_handler.pypackages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.pysrc/nat/profiler/callbacks/langchain_callback_handler.py
**/*.{py,env,toml,yaml,yml,json}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Never commit API keys, credentials or personal data; use environment variables or .env files excluded from Git
Files:
src/nat/profiler/callbacks/llama_index_callback_handler.pypackages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.pysrc/nat/profiler/callbacks/langchain_callback_handler.py
**/*.{py,js,ts,java,cpp,c,go,rb,php}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Every file must start with the standard SPDX Apache-2.0 header
Files:
src/nat/profiler/callbacks/llama_index_callback_handler.pypackages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.pysrc/nat/profiler/callbacks/langchain_callback_handler.py
**/*.{py,md,rst}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Version numbers are derived automatically by setuptools-scm; never hard-code them in code or docs
Files:
src/nat/profiler/callbacks/llama_index_callback_handler.pypackages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.pysrc/nat/profiler/callbacks/langchain_callback_handler.py
**/*.{py,js,ts,java,cpp,c,go,rb,php,sh}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
All source files must include the SPDX Apache-2.0 header template
Files:
src/nat/profiler/callbacks/llama_index_callback_handler.pypackages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.pysrc/nat/profiler/callbacks/langchain_callback_handler.py
**/*
⚙️ CodeRabbit configuration file
**/*: # Code Review Instructions
- Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.- Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values (except for return values of
None,
in that situation no return type hint is needed).
Example:def my_function(param1: int, param2: str) -> bool: pass- For Python exception handling, ensure proper stack trace preservation:
- When re-raising exceptions: use bare
raisestatements to maintain the original stack trace,
and uselogger.error()(notlogger.exception()) to avoid duplicate stack trace output.- When catching and logging exceptions without re-raising: always use
logger.exception()
to capture the full stack trace information.Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any
words listed in the
ci/vale/styles/config/vocabularies/nat/reject.txtfile, words that might appear to be
spelling mistakes but are listed in theci/vale/styles/config/vocabularies/nat/accept.txtfile are OK.Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,
and should contain an Apache License 2.0 header comment at the top of each file.
- Confirm that copyright years are up-to date whenever a file is changed.
Files:
src/nat/profiler/callbacks/llama_index_callback_handler.pypackages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.pysrc/nat/profiler/callbacks/langchain_callback_handler.py
src/nat/**/*
⚙️ CodeRabbit configuration file
This directory contains the core functionality of the toolkit. Changes should prioritize backward compatibility.
Files:
src/nat/profiler/callbacks/llama_index_callback_handler.pysrc/nat/profiler/callbacks/langchain_callback_handler.py
packages/**/*
⚙️ CodeRabbit configuration file
packages/**/*: - This directory contains optional plugin packages for the toolkit, each should contain apyproject.tomlfile. - Thepyproject.tomlfile should declare a dependency onnvidia-nator another package with a name starting
withnvidia-nat-. This dependency should be declared using~=<version>, and the version should be a two
digit version (ex:~=1.0).
- Not all packages contain Python code, if they do they should also contain their own set of tests, in a
tests/directory at the same level as thepyproject.tomlfile.
Files:
packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.py
🧠 Learnings (3)
📚 Learning: 2025-12-03T18:42:23.494Z
Learnt from: AnuradhaKaruppiah
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 1147
File: packages/nvidia_nat_a2a/pyproject.toml:1-10
Timestamp: 2025-12-03T18:42:23.494Z
Learning: In the packages/ directory, pyproject.toml files typically do not include SPDX license headers. Out of 34 packages, only nvidia_nat_strands is an exception. This pattern differs from the requirement for SPDX headers in source code files (.py, .js, .ts, etc.).
Applied to files:
packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.py
📚 Learning: 2025-11-24T18:56:53.109Z
Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-11-24T18:56:53.109Z
Learning: Applies to **/*.{py,js,ts,java,cpp,c,go,rb,php} : Every file must start with the standard SPDX Apache-2.0 header
Applied to files:
packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.py
📚 Learning: 2025-11-24T18:56:53.109Z
Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-11-24T18:56:53.109Z
Learning: Applies to **/*.{py,js,ts,java,cpp,c,go,rb,php,sh} : All source files must include the SPDX Apache-2.0 header template
Applied to files:
packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.py
🧬 Code graph analysis (1)
src/nat/profiler/callbacks/llama_index_callback_handler.py (1)
src/nat/data_models/intermediate_step.py (13)
data(294-295)StreamEventData(67-78)metadata(290-291)TraceMetadata(115-128)usage_info(298-299)UsageInfo(81-84)event_type(266-267)IntermediateStepPayload(131-233)IntermediateStepType(42-58)span_event_timestamp(274-275)framework(278-279)name(282-283)UUID(302-303)
🪛 Ruff (0.14.7)
packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.py
56-56: Unused method argument: meta
(ARG002)
65-65: Avoid specifying long messages outside the exception class
(TRY003)
110-110: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (4)
src/nat/profiler/callbacks/llama_index_callback_handler.py (1)
220-228: LGTM! Tool event payload correctly added.The payload field is properly propagated in the tool event data, with both
outputandpayloadset to deep copies of the function output. This duplication likely supports backward compatibility while enabling new finetuning functionality.src/nat/profiler/callbacks/langchain_callback_handler.py (2)
268-269: LGTM! Payload field correctly added to LLM end event.The addition of
payload=generationproperly captures the raw generation object alongside the extracted text output, supporting the finetuning harness's trajectory collection requirements.
321-322: LGTM! Tool event payload correctly propagated.The payload field is properly added to the tool end event data. Both
outputandpayloadreference the same tool output, which maintains backward compatibility while enabling the new finetuning trajectory collection functionality.packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.py (1)
300-327:log_progressimplementation looks solid.Synchronous JSONL logging with a stable default path, timestamped entries, and
num_generationsincluded is clear and easy to consume. No issues from a correctness or style perspective.
packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.py
Show resolved
Hide resolved
This update introduces the `nat finetune` command, providing detailed documentation for its usage, options, and examples. The finetune command allows in-situ reinforcement learning for iterative agent improvement, incorporating experimental features and extensive configuration options. Signed-off-by: dnandakumar-nv <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget to add this new package as a dependency in packages/nvidia_nat_all/pyproject.toml
Summary
This PR introduces a comprehensive finetuning harness to the NeMo Agent Toolkit, enabling in-situ reinforcement learning (RL) of agentic LLM workflows. The harness is designed with a decoupled architecture that separates training logic from backends, allowing users to train agents with any RL framework while collecting trajectories from live workflow executions.
Key Features
TrajectoryBuilder,TrainerAdapter, andTrainerinterfaces that can be implemented independentlyWhat's Included
1. Core Finetuning Harness (
src/nat/finetuning/)New Interfaces:
TrajectoryBuilderTrainerAdapterTrainerData Models (
src/nat/data_models/finetuning.py):Trajectory: Episode sequence with reward and metadataEpisodeItem: Individual conversation turns with role, content, and logprobsTrajectoryCollection: Grouped trajectories for GRPO optimizationFinetuneConfig: Complete finetuning configurationCurriculumLearningConfig: Progressive difficulty settingsTrainingJobRef/TrainingJobStatus: Job tracking primitivesRuntime (
src/nat/finetuning/finetuning_runtime.py):CLI Command (
src/nat/cli/commands/finetune.py):nat finetune --config_file=config.yml \ --validation_dataset=val.json \ --validation_interval=52. OpenPipe ART Plugin (
packages/nvidia_nat_openpipe_art/)A complete implementation of the finetuning interfaces for OpenPipe ART:
ARTTrajectoryBuilderARTTrainerAdapterARTTrainerConfiguration types:
openpipe_art_traj_builderopenpipe_art_trainer_adapteropenpipe_art_trainer3. Tic-Tac-Toe RL Example (
examples/finetuning/rl_with_openpipe_art/)A complete working example demonstrating GRPO training:
The example includes a sophisticated reward function that:
4. Documentation (
docs/source/reference/finetuning/)Comprehensive documentation split into focused guides:
index.mdconcepts.mdextending.mdrl_with_openpipe.md5. Tests
Architecture
Configuration Example
Design Decisions
Why Decoupled Architecture?
Why In-Situ Training?
Breaking Changes
None. This is a new feature addition.
Migration Guide
N/A - New feature.
Testing
Usage
Running the Example
Start the ART server (requires GPU):
Run pre-training baseline:
nat eval --config_file=examples/finetuning/rl_with_openpipe_art/src/rl_with_openpipe_art/configs/config_pre_train.ymlRun finetuning:
Run post-training evaluation:
nat eval --config_file=examples/finetuning/rl_with_openpipe_art/src/rl_with_openpipe_art/configs/config_post_train.ymlChecklist
Future Work in other PRs
Reviewers: Please pay particular attention to:
src/nat/finetuning/interfaces/src/nat/data_models/finetuning.pydocs/source/reference/finetuning/Summary by CodeRabbit
New Features
Documentation
Tests
Misc
✏️ Tip: You can customize this high-level summary in your review settings.