Introduce Finetuning Harness for In-Situ Reinforcement Learning of Agentic Workflows #1221

dnandakumar-nv · 2025-12-03T21:31:51Z

Summary

This PR introduces a comprehensive finetuning harness to the NeMo Agent Toolkit, enabling in-situ reinforcement learning (RL) of agentic LLM workflows. The harness is designed with a decoupled architecture that separates training logic from backends, allowing users to train agents with any RL framework while collecting trajectories from live workflow executions.

Key Features

Three-component architecture: TrajectoryBuilder, TrainerAdapter, and Trainer interfaces that can be implemented independently
Backend-agnostic design: Train with OpenPipe ART, NeMo Aligner, or custom implementations
In-situ training: Collect trajectories during actual workflow execution, not from offline datasets
Curriculum learning: Progressively introduce harder examples during training
GRPO support: Group trajectories by example for Group Relative Policy Optimization

What's Included

1. Core Finetuning Harness (`src/nat/finetuning/`)

New Interfaces:

Interface	Purpose
`TrajectoryBuilder`	Runs evaluations, collects episodes, computes rewards, groups trajectories
`TrainerAdapter`	Bridges NAT to external training backends, handles job submission and monitoring
`Trainer`	Orchestrates the complete finetuning loop across epochs

Data Models (src/nat/data_models/finetuning.py):

Trajectory: Episode sequence with reward and metadata
EpisodeItem: Individual conversation turns with role, content, and logprobs
TrajectoryCollection: Grouped trajectories for GRPO optimization
FinetuneConfig: Complete finetuning configuration
CurriculumLearningConfig: Progressive difficulty settings
TrainingJobRef / TrainingJobStatus: Job tracking primitives

Runtime (src/nat/finetuning/finetuning_runtime.py):

Async finetuning execution
Component initialization and binding
Epoch orchestration with validation support

CLI Command (src/nat/cli/commands/finetune.py):

nat finetune --config_file=config.yml \
    --validation_dataset=val.json \
    --validation_interval=5

2. OpenPipe ART Plugin (`packages/nvidia_nat_openpipe_art/`)

A complete implementation of the finetuning interfaces for OpenPipe ART:

Component	Description
`ARTTrajectoryBuilder`	Parallel evaluation runs, logprob extraction, reward computation
`ARTTrainerAdapter`	ART client management, trajectory format conversion, async job submission
`ARTTrainer`	Epoch orchestration, curriculum learning, validation, progress visualization

Configuration types:

openpipe_art_traj_builder
openpipe_art_trainer_adapter
openpipe_art_trainer

3. Tic-Tac-Toe RL Example (`examples/finetuning/rl_with_openpipe_art/`)

A complete working example demonstrating GRPO training:

Task: LLM plays Tic-Tac-Toe against a random opponent
Reward function: Alpha-beta pruning based position evaluation (continuous, dense rewards)
Workflow: Full game loop with intermediate step tracking
Evaluators: Win percentage and step-value computation
Configs: Pre-training, training, and post-training evaluation

The example includes a sophisticated reward function that:

Uses game-theoretic minimax with alpha-beta pruning to identify forced wins/losses
Provides continuous rewards in [0, 1] for non-terminal positions
Applies temporal discounting to weight earlier moves more heavily
Adds bonus rewards for winning positions

4. Documentation (`docs/source/reference/finetuning/`)

Comprehensive documentation split into focused guides:

Document	Content
`index.md`	Overview, architecture diagram, quick start
`concepts.md`	RL fundamentals, trajectories, curriculum learning, credit assignment, GRPO/PPO/DPO
`extending.md`	Step-by-step guide to implementing custom components
`rl_with_openpipe.md`	OpenPipe ART integration details, configuration reference

5. Tests

Unit tests for finetuning data models
Integration tests for trajectory building
End-to-end tests for the ART integration
Example workflow tests

Architecture

┌─────────────────────────────────────────────────────────────────────────┐
│                              Trainer                                     │
│  (Orchestrates the finetuning loop across epochs)                       │
│                                                                         │
│  ┌───────────────────────┐         ┌───────────────────────────┐       │
│  │  TrajectoryBuilder    │         │    TrainerAdapter         │       │
│  │                       │         │                           │       │
│  │  - Runs evaluations   │ ──────► │  - Validates trajectories │       │
│  │  - Collects episodes  │         │  - Submits to backend     │       │
│  │  - Computes rewards   │         │  - Monitors training      │       │
│  │  - Groups trajectories│         │  - Reports status         │       │
│  └───────────────────────┘         └───────────────────────────┘       │
└─────────────────────────────────────────────────────────────────────────┘
                                         │
                                         ▼
                            ┌─────────────────────────┐
                            │   Remote Training       │
                            │      Backend            │
                            │  (OpenPipe ART, etc.)   │
                            └─────────────────────────┘

Configuration Example

llms:
  training_llm:
    _type: openai
    model_name: Qwen/Qwen2.5-3B-Instruct
    base_url: http://localhost:8000/v1

workflow:
  _type: my_workflow
  llm: training_llm

eval:
  general:
    max_concurrency: 16
    dataset:
      _type: json
      file_path: data/training.json
  evaluators:
    reward:
      _type: my_evaluator

trajectory_builders:
  builder:
    _type: openpipe_art_traj_builder
    num_generations: 2

trainer_adapters:
  adapter:
    _type: openpipe_art_trainer_adapter
    backend:
      ip: localhost
      port: 7623
      base_model: Qwen/Qwen2.5-3B-Instruct
    training:
      learning_rate: 1e-6

trainers:
  trainer:
    _type: openpipe_art_trainer

finetuning:
  enabled: true
  trainer: trainer
  trajectory_builder: builder
  trainer_adapter: adapter
  reward_function:
    name: reward
  num_epochs: 20
  curriculum_learning:
    enabled: true
    initial_percentile: 0.3
    expansion_interval: 5

Design Decisions

Why Decoupled Architecture?

Backend flexibility: Organizations may have existing training infrastructure (NeMo Customizer, custom GRPO implementations) they want to use
Algorithm agnosticism: Support GRPO, PPO, DPO, or SFT without code changes
Testability: Each component can be tested independently with mocks
Future-proofing: New training backends can be added without modifying core logic

Why In-Situ Training?

Real data distribution: Training data reflects actual use cases and edge conditions
Continuous improvement: Models improve while the system remains operational
Tighter feedback loops: Hours instead of weeks between training iterations
Adaptive: Naturally handles distribution shift

Breaking Changes

None. This is a new feature addition.

Migration Guide

N/A - New feature.

Testing

# Run finetuning harness tests
pytest tests/nat/finetuning/

# Run OpenPipe ART plugin tests
pytest packages/nvidia_nat_openpipe_art/tests/

# Run example tests
pytest examples/finetuning/rl_with_openpipe_art/tests/

Usage

Running the Example

Start the ART server (requires GPU):

source ~/.venvs/art-env/bin/activate
uv run art --host 0.0.0.0 --port 7623

Run pre-training baseline:

nat eval --config_file=examples/finetuning/rl_with_openpipe_art/src/rl_with_openpipe_art/configs/config_pre_train.yml

Run finetuning:

nat finetune --config_file=examples/finetuning/rl_with_openpipe_art/src/rl_with_openpipe_art/configs/config.yml

Run post-training evaluation:

nat eval --config_file=examples/finetuning/rl_with_openpipe_art/src/rl_with_openpipe_art/configs/config_post_train.yml

Checklist

Core finetuning interfaces implemented
Finetuning runtime and CLI command
Data models for trajectories, episodes, and configurations
OpenPipe ART plugin package
Tic-Tac-Toe RL example with alpha-beta reward function
Comprehensive documentation (concepts, extending, ART integration)
Unit and integration tests
Example README with step-by-step instructions

Future Work in other PRs

Additional training backends (NeMo Customizer)
DPO/SFT trajectory builder variants
Checkpoint export utilities

Reviewers: Please pay particular attention to:

Interface design in src/nat/finetuning/interfaces/
Data model structure in src/nat/data_models/finetuning.py
Documentation completeness in docs/source/reference/finetuning/

Summary by CodeRabbit

New Features
- Full finetuning workflow: CLI command, runtime APIs, pluggable trainers/adapters/trajectory builders, OpenPipe ART backend, curriculum learning, and an RL Tic‑Tac‑Toe example with LLM agents and evaluators.
Documentation
- New Finetuning reference pages (Concepts, Extending, OpenPipe integration), example README, and added nav entries.
Tests
- Extensive test coverage for trainers, adapters, trajectory builders, parsers, and end-to-end finetuning flows.
Misc
- Packaging for optional OpenPipe plugin, small vocabulary additions, added stream payload field, and async log-suppression utility.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

This commit updates the `uv.lock` file to increase the revision number and introduce new package overrides for `litellm` and `openai`. Additionally, metadata like upload times for files has been added, providing a more detailed package resolution record. Signed-off-by: dnandakumar-nv <[email protected]>

Deleted the entire OpenPipe ART finetuning backend, including configurations, runner, and registration modules. This removal simplifies the codebase by eliminating components related to the ART backend that are no longer required. Signed-off-by: dnandakumar-nv <[email protected]>

This commit introduces a `FinetuningConfig` model and integrates fine-tuning support across the codebase. It includes new components such as `Trainer`, `TrainerAdapter`, and `TrajectoryBuilder`, along with methods to manage and bind these components. Fine-tuning is now fully configurable and enabled through the updated workflow builder and configuration interfaces. Signed-off-by: dnandakumar-nv <[email protected]>

Replaced all instances of FinetuningRunner with Trainer to reflect updated naming conventions. This change ensures consistency in naming and improves clarity in describing the finetuning workflow orchestration. Updated relevant imports, class definitions, method signatures, and type hints. Signed-off-by: dnandakumar-nv <[email protected]>

Introduced structures and methods to register, retrieve, and handle fine-tuning trainers, trainer adapters, and trajectory builders. Updated type registry and workflows to support these components, ensuring seamless integration into the fine-tuning process. Signed-off-by: dnandakumar-nv <[email protected]>

Separated `TRAINING` into `TRAINERS`, `TRAINER_ADAPTERS`, and `TRAJECTORY_BUILDERS` for better modularity. Adjusted related models, workflow logic, and configurations to align with the new structure. This improves clarity and supports more granular component handling. Signed-off-by: dnandakumar-nv <[email protected]>

Introduced Trainer, TrainerAdapter, and TrajectoryBuilder components with configurations, mocking methods, and dependency tracking. Enhanced testing and registration workflows to validate the integration of these new components into the builder and workflow lifecycle. Signed-off-by: dnandakumar-nv <[email protected]>

This commit introduces a reward field to critical configuration models, including `TrainerConfig` and `TrajectoryBuilderConfig`, enabling enhanced reward-based customization. Additionally, it implements extensive unit tests for `Trainer`, `TrajectoryBuilder`, and `TrainerAdapter`, covering initialization, run workflows, and error handling for robust validation. Signed-off-by: dnandakumar-nv <[email protected]>

Introduced comprehensive unit tests for `llama_index_parser`, `langchain_parser`, and `base_parser` under `tests/nat/finetuning/utils/parsers`. These tests cover various edge cases, ensuring proper functionality of content extraction, message parsing, and sequence validation. Added `__init__.py` files to relevant directories for module structure. Signed-off-by: dnandakumar-nv <[email protected]>

Signed-off-by: dnandakumar-nv <[email protected]>

Introduced a new `finetune` command to the CLI, allowing users to run finetuning on workflows using collected trajectories. This includes options for specifying a configuration file and overriding config values. Additionally, the command integrates with existing CLI infrastructure and logs progress during execution. Signed-off-by: dnandakumar-nv <[email protected]>

Introduced a unified `FinetuneConfig` model to replace `FinetuningConfig` and `FinetuneRunConfig`, simplifying finetuning-related data handling. Updated runtime, interfaces, and CLI to leverage the new configuration model, including enhanced validation and improved CLI parameter handling. Signed-off-by: dnandakumar-nv <[email protected]>

Introduced a `num_epochs` field to the `FinetuneConfig` model for managing training iterations. Updated `Trainer` and `TrainerAdapter` to simplify initialization, validation, and cleanup processes. Refactored logic for managing configuration and validation datasets, streamlining the finetuning execution. Signed-off-by: dnandakumar-nv <[email protected]>

Reorganized imports to adhere to consistent style, prioritizing standard and external libraries first, followed by internal ones. Removed unused imports and redundant whitespace for better readability. Simplified code structure and formatting across multiple files to enhance maintainability and clarity. Signed-off-by: dnandakumar-nv <[email protected]>

This commit introduces a new subpackage, `nvidia-nat-openpipe-art`, for OpenPipe ART integration in the NVIDIA NeMo Agent Toolkit. It includes configuration files such as `pyproject.toml`, licensing information, and placeholder plugin files for future expansion. Signed-off-by: dnandakumar-nv <[email protected]>

Integrates the nvidia-nat-openpipe-art package into the project, including its dependency on openpipe-art version 0.5.1. Updates uv.lock and pyproject.toml to reflect the new package and its associated configurations. Signed-off-by: dnandakumar-nv <[email protected]>

This commit introduces the ARTTrajectoryBuilder and its configuration, enabling trajectory generation for the OpenPipe backend. It includes registration logic, main functionalities, and associated dependency updates. Signed-off-by: dnandakumar-nv <[email protected]>

Introduced ARTTrainerAdapter for integrating with the ART backend, enabling advanced training workflows. Updated configuration classes, registration logic, and initialization to support new functionality, including reward functions in both trajectory builder and trainer adapter. Signed-off-by: dnandakumar-nv <[email protected]>

Introduced ARTTrainer for OpenPipe ART backend, enabling finetuning with curriculum learning and trajectory-based rewards. Updated configuration, registration, and dependencies to support these functionalities, including matplotlib for visualizations. Signed-off-by: dnandakumar-nv <[email protected]>

Changed the default value of the `name` field in the configuration from "trainer_run_4" to "trainer_run". This simplifies the default naming convention for trainer runs. Signed-off-by: dnandakumar-nv <[email protected]>

Introduced comprehensive unit tests for `ARTTrainer` and `ARTTrainerAdapter` to validate initialization, training, trajectory handling, status management, and error handling. These tests ensure robust functionality and resilience of training components. Signed-off-by: dnandakumar-nv <[email protected]>

Introduces a Tic-Tac-Toe game workflow utilizing LangChain for LLM-based agents. Includes game logic, XML move parsing, heuristic evaluation, and a configurable function for LLM interaction. Adds dependency management, configuration, and registration setup for OpenPipe ART integration. Signed-off-by: dnandakumar-nv <[email protected]>

Rolled back `uv.lock` file from revision 3 to revision 1. Removed redundant `upload-time` metadata for sdist and wheel dependencies to simplify the file structure and ensure consistency. Signed-off-by: dnandakumar-nv <[email protected]>

This ensures that the `rl_with_openpipe_art_function` is properly imported, triggering its registration as expected. This change improves the reliability of the registration process. Signed-off-by: dnandakumar-nv <[email protected]>

Introduced a `target_model` attribute to allow filtering of intermediate steps by model name during trajectory processing. Updated the trajectory builder logic to skip steps not matching the specified target model, ensuring more precise fine-tuning when multiple models are involved. Signed-off-by: dnandakumar-nv <[email protected]>

Added `noqa` flags to suppress unused import warnings for ruff and flake8. Introduced a new placeholder data file, `data.json`, with sample question-answer pairs for testing purposes. Signed-off-by: dnandakumar-nv <[email protected]>

Introduce `AccuracyEvaluator` class to evaluate RL outputs based on specific scoring logic. Register the evaluator for use in workflows and include its configuration. Also, update logging in the Tic-Tac-Toe workflow from `info` to `debug` for reduced verbosity. Signed-off-by: dnandakumar-nv <[email protected]>

Updated the RL workflow to handle player model configurations more flexibly, simplified move retries, and refined game termination handling. Added failure penalty in evaluation, adjusted default learning rates, and updated dataset answers to reflect new scoring outputs. Signed-off-by: dnandakumar-nv <[email protected]>

Modified parsing logic to prevent duplicate messages, improved evaluation logic for scoring aborted games, and adjusted configurations for training, evaluation, and model setup. Additional enhancements include refining message history handling and making the board display more user-friendly. Signed-off-by: dnandakumar-nv <[email protected]>

Adjusted "finish_reason" in trainer adapter to "stop" for consistency and updated training beta default to 0 for improved behavior. Additionally, set max_parser_retries default to 0 in RL example to align with updated retry strategy. Signed-off-by: dnandakumar-nv <[email protected]>

Updated configuration files to adjust temperature, concurrency, and dataset paths. Refined game logic to account for player roles. Simplified trajectory message handling and expanded evaluation dataset with additional examples. Signed-off-by: dnandakumar-nv <[email protected]>

Updated the RL evaluator to incorporate intermediate step value computation for richer insights during evaluation. Refactored game logic to enable step-level state tracking with value assignments, improving training feedback. Adjusted configurations for model, evaluator, and workflow parameters to align with the updated logic. Signed-off-by: dnandakumar-nv <[email protected]>

Updated training parameters, including learning rate, temperature, and number of epochs. Adjusted naming conventions for clarity and added `max_parser_retries` to configs. Expanded the dataset with additional entries to improve model training coverage. Signed-off-by: dnandakumar-nv <[email protected]>

This update introduces documentation on the finetuning harness, covering key concepts, reinforcement learning approaches, and curriculum learning. Additionally, it includes a reference for integrating OpenPipe ART for training LLMs using GRPO optimization. Configuration and CLI usage details are also provided. Signed-off-by: dnandakumar-nv <[email protected]>

This commit introduces a detailed example demonstrating reinforcement learning finetuning with OpenPipe ART to enhance an LLM's Tic-Tac-Toe performance. Includes a comprehensive README explaining setup, workflow, reward function, and best practices, alongside necessary configs, scripts, and assets. Signed-off-by: dnandakumar-nv <[email protected]>

Clarified the purpose of overriding dependencies to prevent regressions caused by nvidia-nat-openpipe-art. Added additional context to emphasize maintaining the status quo for other packages. Signed-off-by: dnandakumar-nv <[email protected]>

copy-pr-bot · 2025-12-03T21:31:54Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2025-12-03T21:32:03Z

Walkthrough

Adds a finetuning subsystem and OpenPipe ART integration: new finetuning data models, interfaces, runtime and CLI command, builder/registry/workflow extensions, parsers and utilities, an OpenPipe ART plugin (builder/adapter/trainer), a Tic‑Tac‑Toe RL example with evaluators, packaging, docs, and extensive tests. Also updates small tooling and vocabulary.

Changes

Cohort / File(s)	Summary
Documentation `docs/source/index.md`, `docs/source/reference/finetuning/*`	Adds Finetuning docs, toctree entries, conceptual and extension guides, and OpenPipe integration reference.
Finetuning interfaces & runtime `src/nat/finetuning/interfaces/*`, `src/nat/finetuning/finetuning_runtime.py`, `src/nat/finetuning/__init__.py`	New abstract interfaces (Trainer, TrainerAdapter, TrajectoryBuilder), runtime orchestration (sync/async entrypoints), and package exports.
Data models & config integration `src/nat/data_models/finetuning.py`, `src/nat/data_models/config.py`, `src/nat/data_models/component.py`, `src/nat/data_models/component_ref.py`, `src/nat/data_models/intermediate_step.py`	Adds finetuning models/configs (TrainerConfig, TrainerAdapterConfig, TrajectoryBuilderConfig, FinetuneConfig, Trajectory, Curriculum, TrainingJob types), integrates fields into global Config, adds component enums/refs, and adds `StreamEventData.payload`.
Builder & workflow extensions `src/nat/builder/builder.py`, `src/nat/builder/workflow_builder.py`, `src/nat/builder/component_utils.py`	Extends Builder/WorkflowBuilder APIs to add/get trainers, trainer adapters, trajectory builders; adds configured component containers and dependency sequencing updates.
CLI & type registry `src/nat/cli/commands/finetune.py`, `src/nat/cli/entrypoint.py`, `src/nat/cli/register_workflow.py`, `src/nat/cli/type_registry.py`	Adds `finetune` CLI command, register_* helpers, and type registry support for trainer/adapter/trajectory builder components.
Finetuning parsers `src/nat/finetuning/utils/parsers/base_parser.py`, `.../langchain_parser.py`, `.../llama_index_parser.py`	New parsers converting IntermediateStep → OpenAI-style messages, framework-aware parsing, strict sequence validation, and metadata extraction (logprobs, tool/function info).
Profiler callbacks `src/nat/profiler/callbacks/langchain_callback_handler.py`, `src/nat/profiler/callbacks/llama_index_callback_handler.py`	Propagates raw generation payload into StreamEventData payload at LLM/tool end events.
OpenPipe ART plugin `packages/nvidia_nat_openpipe_art/*`	New package integrating OpenPipe ART: Pydantic configs, ARTTrajectoryBuilder, ARTTrainerAdapter, ARTTrainer, registration hooks, packaging metadata, and docs/meta files.
OpenPipe ART tests `packages/nvidia_nat_openpipe_art/tests/*`	Adds fixtures and extensive unit tests for ARTTrainer, ARTTrainerAdapter, and ARTTrajectoryBuilder.
Tic‑Tac‑Toe RL example `examples/finetuning/rl_with_openpipe_art/src/rl_with_openpipe_art/`, `examples/.../README.md`, `examples/.../pyproject.toml`, `examples/.../configs/`, `examples/.../data/*`	New example package: LLM agents, game core (alpha‑beta solver + heuristics), accuracy evaluator, function registration, configs, datasets, README and packaging.
Evaluator registration `examples/.../evaluator_register.py`, `examples/.../register.py`	Registers two AccuracyEvaluator variants (`percent_games_won`, `step_value_computation`) and re-exports registrations.
Utilities & small helpers `src/nat/utils/io/supress_logs.py`, `src/nat/finetuning/utils/__init__.py`, parser package inits	Adds async suppress_logs context manager and license/init headers.
Test infra & updates `tests/`, `tests/nat/finetuning/`, `tests/nat/builder/`, `tests/nat/cli/`, `tests/_utils/configs.py`	Adds test helpers, finetuning test configs, threaded dask conftest change, and many new tests for interfaces, parsers, builder/registry and plugins.
Tooling & workspace `pyproject.toml`, `src/nat/builder/tool_test_runner.py`, `ci/vale/styles/config/vocabularies/nat/accept.txt`	Adds optional openpipe-art workspace/member and dependency overrides, ToolTestRunner helpers, and new accepted vocabulary patterns (`ART`, `[Ll]ogprob(s?)`, `OpenPipe`).

Sequence Diagram(s)

sequenceDiagram
    participant CLI as User/CLI
    participant Builder as WorkflowBuilder
    participant TrajBuilder as TrajectoryBuilder
    participant Adapter as TrainerAdapter
    participant Trainer as Trainer
    participant Backend as OpenPipe ART
    participant Eval as Eval subsystem

    CLI->>Builder: finetuning_main(run_config)
    Builder->>TrajBuilder: get_trajectory_builder(config)
    Builder->>Adapter: get_trainer_adapter(config)
    Builder->>Trainer: get_trainer(config, traj_builder, adapter)
    Trainer->>Adapter: initialize(run_config)
    Adapter->>Backend: register model / health check

    loop per epoch
        Trainer->>TrajBuilder: start_run(run_id, meta)
        TrajBuilder->>Eval: run_eval() (parallel generations)
        Eval->>TrajBuilder: evaluation outputs (intermediate steps & payloads)
        TrajBuilder->>TrajBuilder: finalize -> TrajectoryCollection
        Trainer->>Trainer: apply_curriculum_learning(collection, epoch)
        Trainer->>Adapter: submit(filtered_collection)
        Adapter->>Backend: submit training job (async)
        Trainer->>Adapter: wait_until_complete(job_ref)
        Trainer->>Trainer: log_progress(epoch, metrics)
        Trainer->>TrajBuilder: run validation eval (optional)
    end

    Trainer->>Trainer: cleanup()
    CLI->>CLI: report results/artifacts

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Heterogeneous, large-surface changes affecting public APIs, builder/registry, CLI, parsers, plugin backend, runtime orchestration, example code, packaging, and many tests.
Focus areas for review:
- packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trainer.py — curriculum filtering, plotting, persistence, failure modes
- packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.py — finalize logic, logprob extraction, grouping, reward computation
- packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trainer_adapter.py — trajectory translation, episode ordering validation, submit/status lifecycle and health checks
- src/nat/finetuning/utils/parsers/* — role alternation and message-sequence validation across frameworks
- src/nat/builder/workflow_builder.py and src/nat/cli/type_registry.py — registration mechanics, component lifecycle, dependency ordering and annotation rebuilding
- src/nat/finetuning/interfaces/* and finetuning_runtime.py — lifecycle contracts, error handling, and resource cleanup semantics
- examples/finetuning/rl_with_openpipe_art/src/rl_with_openpipe_art/core.py — alpha‑beta solver correctness, scoring mapping and caching
- tests and conftest mocks — ensure mocks reflect realistic behavior and new optional dependencies don’t leak into core test runs

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 65.71% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title is fully related to the main change in the changeset, clearly summarizing the introduction of a finetuning harness for RL of agentic workflows.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 06f2d8e and 48c3459.

📒 Files selected for processing (1)

docs/source/reference/cli.md (2 hunks)

🧰 Additional context used

📓 Path-based instructions (8)

**/*.{md,rst,py}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Use 'NVIDIA NeMo Agent toolkit' on first use, then 'NeMo Agent toolkit' for subsequent references

Files:

docs/source/reference/cli.md

**/*.{md,rst}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Use 'NeMo Agent Toolkit' (capitalize 'T') when the name appears in headings

Files:

docs/source/reference/cli.md

docs/**/*.{md,rst}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Never use deprecated names: Agent Intelligence toolkit, aiqtoolkit, AgentIQ, AIQ/aiq in documentation (unless intentionally referring to deprecated versions)

Files:

docs/source/reference/cli.md

**/*.{py,js,ts,yaml,yml,json,md,rst}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Indent with 4 spaces, never tabs, and ensure every file ends with a single newline

Files:

docs/source/reference/cli.md

docs/source/**/*.{md,rst}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

docs/source/**/*.{md,rst}: Documentation must be clear and comprehensive, without TODOs, FIXMEs, or placeholder text
Ensure documentation is free of offensive or outdated terms
Ensure documentation is free of spelling mistakes and do not use words listed in ci/vale/styles/config/vocabularies/nat/reject.txt

Files:

docs/source/reference/cli.md

**/*.{py,md,rst}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Version numbers are derived automatically by setuptools-scm; never hard-code them in code or docs

Files:

docs/source/reference/cli.md

**/*

⚙️ CodeRabbit configuration file

**/*: # Code Review Instructions
Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.
Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values (except for return values of None,
in that situation no return type hint is needed).
Example:
def my_function(param1: int, param2: str) -> bool:
    pass
For Python exception handling, ensure proper stack trace preservation:

When re-raising exceptions: use bare raise statements to maintain the original stack trace,
and use logger.error() (not logger.exception()) to avoid duplicate stack trace output.

When catching and logging exceptions without re-raising: always use logger.exception()
to capture the full stack trace information.
Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any

words listed in the ci/vale/styles/config/vocabularies/nat/reject.txt file, words that might appear to be
spelling mistakes but are listed in the ci/vale/styles/config/vocabularies/nat/accept.txt file are OK.

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

and should contain an Apache License 2.0 header comment at the top of each file.

Confirm that copyright years are up-to date whenever a file is changed.

Files:

docs/source/reference/cli.md

docs/source/**/*

⚙️ CodeRabbit configuration file

This directory contains the source code for the documentation. All documentation should be written in Markdown format. Any image files should be placed in the docs/source/_static directory.

Files:

docs/source/reference/cli.md

🪛 LanguageTool

docs/source/reference/cli.md

[typographical] ~486-~486: The conjunction “so that” does not have a comma in front.
Context: ...The finetuning process: 1. Loads the configuration with finetuning settings 2. Initialize...

(SO_THAT_UNNECESSARY_COMMA)

🔇 Additional comments (3)

docs/source/reference/cli.md (3)

36-36: Command hierarchy integration looks good.

The finetune subcommand is properly positioned in the CLI tree between eval and info, maintaining logical grouping of operational commands.

477-492: Finetune section introduction is well-structured and properly marks experimental status.

The experimental warning is clearly communicated, the purpose is well-articulated for RL workflows, and the numbered process steps provide a helpful mental model. The cross-reference to detailed finetuning documentation is appropriately placed.

494-584: Comprehensive command documentation with practical examples.

The command help, options descriptions, and examples are thorough and well-organized. All nine options are clearly documented with their purposes and use cases. The four examples progress logically from basic usage to advanced scenarios (validation monitoring, remote endpoints), making it easy for users to find relevant patterns. Path markers are correctly placed to prevent false static analysis warnings.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 13

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/nat/profiler/callbacks/llama_index_callback_handler.py (1)

170-195: Defensively handle response.additional_kwargs when building TokenUsageBaseModel to prevent profiling failures.

While ChatResponse.additional_kwargs defaults to an empty dict (not None), not all LlamaIndex integrations populate token count fields (e.g., prompt_tokens, completion_tokens, total_tokens). Attempting to instantiate TokenUsageBaseModel(**response.additional_kwargs) will fail with a validation error if required fields are missing or unexpected keys are present.

Wrap the instantiation in a try-except block and fall back to a default TokenUsageBaseModel() so LLM_END events are still recorded even when token usage data is incomplete:

                 # Append usage data to NAT usage stats
+                additional_kwargs = response.additional_kwargs or {}
+                try:
+                    token_usage = TokenUsageBaseModel(**additional_kwargs)
+                except (TypeError, ValueError):
+                    logger.exception("Error parsing token usage from additional_kwargs")
+                    token_usage = TokenUsageBaseModel()
+
                 with self._lock:
                     stats = IntermediateStepPayload(
                         event_type=IntermediateStepType.LLM_END,
                         span_event_timestamp=self._run_id_to_timestamp.get(event_id),
                         framework=LLMFrameworkEnum.LLAMA_INDEX,
                         name=model_name,
                         UUID=event_id,
-                        data=StreamEventData(input=self._run_id_to_llm_input.get(event_id),
-                                             output=llm_text_output,
-                                             payload=response),
+                        data=StreamEventData(
+                            input=self._run_id_to_llm_input.get(event_id),
+                            output=llm_text_output,
+                            payload=response,
+                        ),
                         metadata=TraceMetadata(chat_responses=response.message if response.message else None),
-                        usage_info=UsageInfo(token_usage=TokenUsageBaseModel(**response.additional_kwargs)))
+                        usage_info=UsageInfo(token_usage=token_usage),
+                    )

Also apply similar defensive handling to the TOOL_END event if TokenUsageBaseModel() is called there with incomplete data.

🟡 Minor comments (13)

tests/nat/tools/test_tool_test_runner.py-101-127 (1)

101-127: Fix unused runner variable in mocked training-components test.

The new test correctly exercises mock_trainer, mock_trainer_adapter, and mock_trajectory_builder, but runner from with_mocked_dependencies() is never used, which ruff flags (RUF059).

You can keep the behavior and silence the warning by ignoring the first element:
-    async with with_mocked_dependencies() as (runner, mock_builder):
+    async with with_mocked_dependencies() as (_, mock_builder):
Everything else in the test looks good and matches the patterns in ToolTestRunner’s mock helpers.

docs/source/reference/finetuning/extending.md-20-50 (1)

20-50: Expand NAT on first use and add a language to the ASCII code block.

To align with naming and markdown-style guidelines:

Spell out “NVIDIA NeMo Agent toolkit (NAT)” on first mention.
Tag the ASCII diagram’s fenced block with a language (e.g., text) to satisfy MD040.

-This guide covers how to create custom components for the NAT finetuning harness. You'll learn about the three core interfaces, how to implement them, and best practices for creating robust, reusable components.
+This guide covers how to create custom components for the NVIDIA NeMo Agent toolkit (NAT) finetuning harness. You'll learn about the three core interfaces, how to implement them, and best practices for creating robust, reusable components.
@@
-```
+```text
 ┌────────────────────────────────────────────────────────────────────────┐
 │                         Your Implementation                            │
@@
 └─────────────────────────────────────────────────────────────────────────┘


</blockquote></details>
<details>
<summary>docs/source/reference/finetuning/index.md-24-58 (1)</summary><blockquote>

`24-58`: **Align toolkit naming and consider adding a fenced-code language.**

Body text should use the full product name on first mention and the lowercase “toolkit” spelling per guidelines. Also, specifying a language for the ASCII diagram helps markdown linters and tooling.

- Update the first mention of the toolkit.
- Optionally, tag the ASCII diagram block as `text`.




```diff
- The NeMo Agent Toolkit provides a powerful finetuning harness designed for **in-situ reinforcement learning** of agentic LLM workflows. This enables iterative improvement of agents through experience, allowing models to learn from their interactions with environments, tools, and users.
+ The NVIDIA NeMo Agent toolkit provides a powerful finetuning harness designed for **in-situ reinforcement learning** of agentic LLM workflows. This enables iterative improvement of agents through experience, allowing models to learn from their interactions with environments, tools, and users.
@@
-```
+```text
 ┌────────────────────────────────────────────────────────────────────────┐
 │                              Trainer                                   │
@@
                             └─────────────────────────┘


</blockquote></details>
<details>
<summary>docs/source/reference/finetuning/concepts.md-90-91 (1)</summary><blockquote>

`90-91`: **Correct trajectory sequence notation.**

The trajectory example currently shows `StateₙAction, ₙ, Rewardₙ`, which appears to be a formatting typo. It should likely read `Stateₙ, Actionₙ, Rewardₙ` to match the earlier pattern.

</blockquote></details>
<details>
<summary>src/nat/finetuning/utils/parsers/base_parser.py-77-91 (1)</summary><blockquote>

`77-91`: **Potential `KeyError` if parsed message lacks `content` key.**

The hashing logic assumes all messages have a `"content"` key. If a tool call message or other special message type lacks this key, a `KeyError` will be raised. Consider using `.get()` with a default.

```diff
         if isinstance(parsed_msg, list):
             for msg in parsed_msg:
-                content_hash = hash(msg["role"] + ": " + msg["content"])
+                content_hash = hash(msg["role"] + ": " + msg.get("content", ""))
                 if content_hash not in message_content_hashes:
                     messages.append(msg)
                     message_content_hashes.add(content_hash)
         else:
-            content_hash = hash(parsed_msg["role"] + ": " + parsed_msg["content"])
+            content_hash = hash(parsed_msg["role"] + ": " + parsed_msg.get("content", ""))
             messages.append(parsed_msg)
             message_content_hashes.add(content_hash)
     else:
         assert not isinstance(parsed_msg, list), "TOOL_END or LLM_END should not produce multiple messages"
-        content_hash = hash(parsed_msg["role"] + ": " + parsed_msg["content"])
+        content_hash = hash(parsed_msg["role"] + ": " + parsed_msg.get("content", ""))
         message_content_hashes.add(content_hash)
         messages.append(parsed_msg)

examples/finetuning/rl_with_openpipe_art/src/rl_with_openpipe_art/rl_with_openpipe_art.py-192-201 (1)

192-201: Update misleading docstring for _echo.

The docstring describes echoing text with a prefix, but the function actually plays a Tic-Tac-Toe game. Update to reflect actual behavior.

     async def _echo(role: str) -> str:
         """
-        Takes a text input and echoes back with a pre-defined prefix.
+        Play a Tic-Tac-Toe game and return the outcome.
 
         Args:
-            role (str): If smaller model will be X or O
+            role (str): The symbol ('X' or 'O') assigned to the player model.
 
         Returns:
-            str: The text with the prefix.
+            str: Game outcome - 'Win!', 'Lose!', 'Draw!', or step count if aborted.
         """

examples/finetuning/rl_with_openpipe_art/src/rl_with_openpipe_art/rl_with_openpipe_art.py-162-168 (1)

162-168: Update placeholder docstring.

The docstring "NAT function template. Please update the description." is a placeholder that should be replaced with an actual description of the function's purpose.

 class RlWithOpenpipeArtFunctionConfig(FunctionBaseConfig, name="rl_with_openpipe_art"):
     """
-    NAT function template. Please update the description.
+    Configuration for the Tic-Tac-Toe RL workflow function.
+
+    This function plays a Tic-Tac-Toe game between two LLM-powered players
+    to generate training trajectories for reinforcement learning.
     """
     player_model: LLMRef = Field(description="LLMRef for the player model to use.")

packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.py-1-30 (1)

1-30: Remove duplicate SPDX license header.

The license header appears twice (lines 1-14 and lines 16-30). Remove the duplicate block.

 # SPDX-FileCopyrightText: Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 # SPDX-License-Identifier: Apache-2.0
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 # http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
-# SPDX-FileCopyrightText: Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES.
-# All rights reserved.
-# SPDX-License-Identifier: Apache-2.0
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
 import asyncio

examples/finetuning/rl_with_openpipe_art/src/rl_with_openpipe_art/accuracy_evaluator.py-27-33 (1)

27-33: Class docstring doesn't match implementation.

The docstring describes scoring based on expected_answer == workflow_output, but the actual implementation (lines 109-120) scores based on fixed strings: "Win!" → 1.0, "Draw!" → 0.5, "Lose!" → 0.0. Update the docstring to reflect the actual scoring logic.

 class AccuracyEvaluator(BaseEvaluator):
-    """Custom evaluator for RL with OpenPipe ART workflow outputs.
-
-    Scoring logic:
-    - Score 1: if expected_answer == workflow_output
-    - Score 0.5: if expected_answer != workflow_output AND expected_answer == "0"
-    - Score 0: if expected_answer != workflow_output AND expected_answer != "0"
-    """
+    """Custom evaluator for RL with OpenPipe ART workflow outputs.
+
+    Scoring logic:
+    - Score 1.0: if workflow_output == "Win!"
+    - Score 0.5: if workflow_output == "Draw!"
+    - Score 0.0: if workflow_output == "Lose!" or aborted
+    """

src/nat/finetuning/utils/parsers/langchain_parser.py-31-41 (1)

31-41: Remove stale docstring reference to non-existent parameter.

The docstring mentions previous_message: Previous message for context (reserved for future) but this parameter doesn't exist in the function signature.
 def parse_to_openai_message(message: IntermediateStep) -> dict | list[dict]:
     """
     Convert IntermediateStep to OpenAI-compatible message dictionary.

     Args:
         message: An IntermediateStep object representing a single message.
-        previous_message: Previous message for context (reserved for future).

     Returns:
         A dictionary formatted for OpenAI API consumption.
     """

src/nat/builder/builder.py-312-315 (1)

312-315: Remove unnecessary parentheses in return type annotation.

     @abstractmethod
     async def get_trajectory_builder_config(
-            self, trajectory_builder_name: str | TrajectoryBuilderRef) -> (TrajectoryBuilderConfig):
+            self, trajectory_builder_name: str | TrajectoryBuilderRef) -> TrajectoryBuilderConfig:
         pass

src/nat/finetuning/interfaces/trainer_adapter.py-53-64 (1)

53-64: Docstring parameter type mismatch.

The docstring states trajectories (list[Trajectory]) but the actual parameter type is TrajectoryCollection.

     @abstractmethod
     async def submit(self, trajectories: TrajectoryCollection) -> TrainingJobRef:
         """
         Submit trajectories to remote training backend.

         Args:
-            trajectories (list[Trajectory]): The list of trajectories to submit.
+            trajectories (TrajectoryCollection): The collection of trajectories to submit.

         Returns:
             TrainingJobRef: Reference to the submitted training job.
         """
         raise NotImplementedError

src/nat/finetuning/interfaces/trainer_adapter.py-32-34 (1)

32-34: Type hint mismatch: run_config can be None.

The type annotation claims FinetuneConfig but the initial value is None. This will cause type checker errors.
     def __init__(self, adapter_config: TrainerAdapterConfig):
         self.adapter_config = adapter_config
-        self.run_config: FinetuneConfig = None
+        self.run_config: FinetuneConfig | None = None

🧹 Nitpick comments (63)

src/nat/finetuning/utils/__init__.py (1)
1-15: Add module docstring after the license header.

The file contains the required Apache-2.0 license header but is missing a module-level docstring. Per coding guidelines, every public module should have a Google-style docstring with a concise description ending with a period.

Add a module docstring after the license header:
# SPDX-FileCopyrightText: Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES.
# All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
+
+"""Utilities for finetuning operations in the NeMo Agent toolkit."""
tests/conftest.py (1)
58-60: Good environment setup; consider os.pathsep for portability.

The additions make sense for tests (propagate SRC_DIR via PYTHONPATH and pin the Dask worker interpreter), but you can make the PYTHONPATH handling cross-platform by using os.pathsep instead of a hard-coded colon:
-os.environ["PYTHONPATH"] = (f"{SRC_DIR}:{os.environ['PYTHONPATH']}" if "PYTHONPATH" in os.environ else SRC_DIR)
+sep = os.pathsep
+os.environ["PYTHONPATH"] = (f"{SRC_DIR}{sep}{os.environ['PYTHONPATH']}" if "PYTHONPATH" in os.environ else SRC_DIR)
The current code is fine on POSIX; this change just improves Windows compatibility.
pyproject.toml (2)
105-105: Optional dependency openpipe-art is not in alphabetical order.

Per the "Keep sorted!!!" comment at line 74, optional dependencies should be alphabetically ordered. openpipe-art should appear before redis (between mysql and phoenix).

Apply this diff to fix the ordering:
 mysql = ["nvidia-nat-mysql"]
+openpipe-art = ["nvidia-nat-openpipe-art"]
 redis = ["nvidia-nat-redis"]
 ...
 zep-cloud = ["nvidia-nat-zep-cloud"]
-openpipe-art = ["nvidia-nat-openpipe-art"]
203-203: Workspace source is not alphabetically sorted.

The workspace sources appear to be sorted alphabetically. nvidia-nat-openpipe-art should be placed between nvidia-nat-opentelemetry and nvidia-nat-phoenix.
 nvidia-nat-opentelemetry = { workspace = true }
+nvidia-nat-openpipe-art = { workspace = true }
 nvidia-nat-phoenix = { workspace = true }
 ...
 nvidia-nat-zep-cloud = { workspace = true }
-nvidia-nat-openpipe-art = { workspace = true }
examples/finetuning/rl_with_openpipe_art/README.md (4)
169-174: Add language specifier to fenced code block.

Static analysis flags this code block as missing a language specifier.
-```
+```text
 INFO:     Started server process
 INFO:     Waiting for application startup.
314-329: Add language specifier to fenced code block.

Static analysis flags this code block as missing a language specifier.
-```
+```text
 INFO - Starting finetuning with config: src/rl_with_openpipe_art/configs/config.yml
337-337: Minor grammar fix: use hyphenation for compound adjective.

Per LanguageTool hint, "alpha-beta pruning based reward" should be hyphenated when used as a compound modifier.
-The reward function is the key to effective RL training. This example uses a sophisticated **alpha-beta pruning based reward** instead of simple win/loss signals.
+The reward function is the key to effective RL training. This example uses a sophisticated **alpha-beta-pruning-based reward** instead of simple win/loss signals.
517-522: Add language specifier to fenced code block.

Static analysis flags this code block as missing a language specifier.
-```
+```text
 .tmp/nat/finetuning/tic_tac_toe/
 ├── training_metrics.jsonl      # Per-epoch metrics
src/nat/data_models/component.py (2)
44-46: Enum members should be alphabetically sorted.

Per the "Keep sorted!!!" comment, these new entries should be reordered. TRAINER_ADAPTER should come between TRAINER and TRAJECTORY_BUILDER.
     TRAINER = "trainer"
+    TRAINER_ADAPTER = "trainer_adapter"
     TRAJECTORY_BUILDER = "trajectory_builder"
-    TRAINER_ADAPTER = "trainer_adapter"
     UNDEFINED = "undefined"
62-64: Enum members should be alphabetically sorted.

Per the "Keep sorted!!!" comment, these new entries should be reordered alphabetically.
     RETRIEVERS = "retrievers"
+    TRAINER_ADAPTERS = "trainer_adapters"
     TRAINERS = "trainers"
-    TRAINER_ADAPTERS = "trainer_adapters"
     TRAJECTORY_BUILDERS = "trajectory_builders"
examples/finetuning/rl_with_openpipe_art/src/rl_with_openpipe_art/llm_agents.py (2)
62-65: Misleading docstring: legacy parsing not implemented.

The docstring states "Try XML first, then legacy 'row,col'" but the function only delegates to parse_move_xml without any fallback parsing.

Either update the docstring to match the implementation, or implement the legacy fallback:
 def parse_move_any(text: str) -> tuple[int, int] | None:
-    """Try XML first, then legacy 'row,col'."""
+    """Parse move from XML format."""
     mv = parse_move_xml(text)
     return mv
100-100: Remove debug comment artifact.

The comment # ruff appears to be a leftover debug artifact and should be removed.
-        # ruff
         for attempt in range(0, self.max_retries + 1):
src/nat/data_models/finetuning.py (2)
118-118: Typo in comment.

"logprobs can't be none of role is assistant" should be "logprobs can't be none if role is assistant".
-    # Add model validator after construction that checks that logprobs can't be none of role is assistant
+    # Add model validator after construction that checks that logprobs can't be none if role is assistant
212-212: Use Field() for consistency with other field declarations.

This field uses a bare default value instead of Field(), which is inconsistent with the other fields in the class and loses the ability to add a description.
-    target_functions: list[str] = ["<workflow>"]
+    target_functions: list[str] = Field(
+        default=["<workflow>"],
+        description="List of target function names to fine-tune.")
examples/finetuning/rl_with_openpipe_art/src/rl_with_openpipe_art/configs/config_post_train.yml (1)

1-26: Config wiring looks correct; consider 4‑space indentation for YAML

The config structure (LLM, workflow, eval, evaluator) is coherent for the RL with OpenPipe ART example, and the api_key is a non‑secret placeholder, which is good for checked‑in examples. For consistency with the repo guidelines (**/*.{yaml,yml} → 4‑space indentation), consider reindenting nested mappings to 4 spaces in a follow‑up pass.

examples/finetuning/rl_with_openpipe_art/src/rl_with_openpipe_art/configs/config_pre_train.yml (1)

1-23: Pre‑train config is coherent; consider matching YAML indentation convention

The pre‑train config’s structure (LLM, workflow, eval, evaluator) is consistent with the post‑train setup and the described RL workflow. As with config_post_train.yml, you may want to switch nested keys to 4‑space indentation to match the repository’s YAML style guideline.
src/nat/cli/commands/finetune.py (1)
84-132: Refine CLI signature and exception handling to match runtime and linting expectations.

A few small tweaks will make this command align with the project’s typing and error‑handling guidance and resolve the Ruff hints.

Mark optional options as Path | None / str | None.

Use a context parameter for @click.pass_context and silence its unused status explicitly.

Wrap exceptions with ClickException while preserving the cause and logging the stack trace.
-@click.pass_context
-def finetune_command(
-    processors,  # pylint: disable=unused-argument
-    *,
-    config_file: Path,
-    dataset: Path,
-    result_json_path: str,
-    endpoint: str,
-    endpoint_timeout: int,
-    override: tuple[tuple[str, str], ...],
-    validation_dataset: Path,
-    validation_interval: int,
-    validation_config_file: Path,
-):
+@click.pass_context
+def finetune_command(
+    ctx: click.Context,  # noqa: ARG001
+    *,
+    config_file: Path,
+    dataset: Path | None,
+    result_json_path: str,
+    endpoint: str | None,
+    endpoint_timeout: int,
+    override: tuple[tuple[str, str], ...],
+    validation_dataset: Path | None,
+    validation_interval: int,
+    validation_config_file: Path | None,
+):
@@
-    except Exception as e:
-        logger.error("Finetuning failed: %s", e)
-        raise click.ClickException(str(e))
+    except Exception as exc:  # noqa: BLE001
+        logger.exception("Finetuning failed")
+        raise click.ClickException(str(exc)) from exc
This keeps the UX of wrapping all errors in a ClickException while satisfying the project’s linting and exception‑handling conventions.
examples/finetuning/rl_with_openpipe_art/src/rl_with_openpipe_art/register.py (1)
15-19: Tighten Ruff suppression for side‑effect imports.

Ruff reports the noqa: F401 directive as unused here. Since this module intentionally relies on import side effects, a file‑level ruff: noqa is simpler and avoids RUF100.
-# Import the generated workflow function to trigger registration
-# Add flags to ignore unused import with ruff and flak8
-# ruff: noqa: F401
-from .evaluator_register import register_accuracy_evaluator
-from .rl_with_openpipe_art import rl_with_openpipe_art_function
+# Import the generated workflow function to trigger registration.
+# Suppress unused-import warnings for registration side effects.
+# ruff: noqa
+from .evaluator_register import register_accuracy_evaluator
+from .rl_with_openpipe_art import rl_with_openpipe_art_function
docs/source/reference/finetuning/concepts.md (1)

24-24: Fix NeMo toolkit naming on first mention.

Per docs guidelines, the first textual mention should be “NVIDIA NeMo Agent toolkit”, with subsequent mentions as “NeMo Agent toolkit”. Please adjust the sentence starting with “The NeMo Agent Toolkit...” accordingly.

tests/nat/finetuning/interfaces/test_trajectory_builder.py (1)

91-96: Avoid passing unsupported target_functions into FinetuneRunConfig.

FinetuneRunConfig doesn’t declare a target_functions field; pydantic will silently ignore it. To keep tests aligned with the schema, either drop this argument from FinetuneRunConfig(...) or, if you need to control target functions, set them on FinetuneConfig instead.

Also applies to: 240-245
src/nat/finetuning/__init__.py (1)
20-24: Sort __all__ to satisfy ruff style check.

Ruff flags __all__ as unsorted. Reorder entries alphabetically, e.g.:
-__all__ = [
-    "Trainer",
-    "TrajectoryBuilder",
-    "TrainerAdapter",
-]
+__all__ = [
+    "Trainer",
+    "TrainerAdapter",
+    "TrajectoryBuilder",
+]
docs/source/reference/finetuning/rl_with_openpipe.md (2)
42-65: Add explicit languages to fenced code blocks with diagrams.

Markdown lint flags multiple fenced blocks without a language (the architecture/flow diagrams and console output). Consider annotating them as text for clarity and to satisfy MD040, for example:
```text
┌─────────────────────────────────────────────────────────────────────────┐
...
Also applies to: 237-273, 289-400, 464-472

---

`189-220`: **Convert emphasized section labels into proper headings.**

Labels like `**Backend Configuration**`, `**Training Arguments**`, and the troubleshooting titles are effectively sub‑headings. To appease MD036 and improve structure, convert them to markdown headings (e.g., `### Backend Configuration`) instead of bold text.  



Also applies to: 201-213, 214-220, 536-582

</blockquote></details>
<details>
<summary>src/nat/profiler/callbacks/langchain_callback_handler.py (1)</summary><blockquote>

`46-53`: **Profiler changes align with updated models; watch for invalid tool schemas.**

The token‑usage simplification and inclusion of `payload` on LLM/TOOL end events match the current `TokenUsageBaseModel` and `StreamEventData`/`TraceMetadata` structures and look good. One thing to be aware of: `_extract_tools_schema` now does a bare `ToolSchema(**tool)` for each entry in `invocation_params["tools"]`; if any tool dict is malformed, the callback will raise and break tracing. If you expect untrusted tool payloads, consider wrapping that construction in a small `try/except ValidationError` and skipping invalid tools.  




Also applies to: 90-101, 197-248, 279-298

</blockquote></details>
<details>
<summary>packages/nvidia_nat_openpipe_art/tests/test_trainer_adapter.py (1)</summary><blockquote>

`171-172`: **Tidy regex literals and unused mock argument to satisfy ruff.**

- The `match=` patterns in `pytest.raises(..., match="first message.*must be from 'user' or 'system'")` and the duplicate‑job assertion both contain regex metacharacters. To address RUF043 and avoid accidental mis‑escaping, make them raw strings, e.g. `match=r"first message.*must be from 'user' or 'system'"`.
- `mock_traj_group` in `test_construct_trajectory_with_tool_calls` is never used; to clear ARG002 either rename it to `_mock_traj_group` or drop the corresponding `@patch` decorator if that patch isn’t needed.




Also applies to: 281-282, 428-444

</blockquote></details>
<details>
<summary>tests/nat/finetuning/utils/parsers/test_base_parser.py (1)</summary><blockquote>

`27-35`: **Docstring first line should end with a period.**

Per coding guidelines, the first line of docstrings must be a concise description ending with a period.


```diff
 def create_mock_step(event_type, event_state, framework=None, data=None, name=None):
-    """Helper function to create mock IntermediateStep objects."""
+    """Helper function to create mock `IntermediateStep` objects."""
     step = MagicMock()
packages/nvidia_nat_openpipe_art/tests/test_trainer.py (1)
41-68: Consider using fixture naming conventions per coding guidelines.

Per project guidelines, pytest fixtures should define the name argument in the decorator, and the decorated function should use the fixture_ prefix. This is a minor stylistic suggestion that could be deferred.

Example for one fixture:
-    @pytest.fixture
-    def trainer_config(self):
+    @pytest.fixture(name="trainer_config")
+    def fixture_trainer_config(self):
         """Create test trainer configuration."""
         return ARTTrainerConfig(reward=RewardFunctionConfig(name="test_reward"))
src/nat/finetuning/utils/parsers/llama_index_parser.py (2)
26-37: Remove unused noqa and fix stale docstring parameter.

message is used, so # noqa: ARG001 is no longer needed, and the docstring still documents a previous_message argument that isn’t in the signature.

Consider:
-def parse_to_openai_message(message: IntermediateStep) -> dict:  # noqa: ARG001
-    """
-    Convert IntermediateStep to OpenAI-compatible message dictionary.
-    ...
-        message: An IntermediateStep object representing a single message.
-        previous_message: Previous message for context (reserved for future).
+def parse_to_openai_message(message: IntermediateStep) -> dict:
+    """
+    Convert an `IntermediateStep` to an OpenAI-compatible message dictionary.
+    ...
+        message: An `IntermediateStep` object representing a single message.
111-130: Align TOOL_END parsing with existing OpenAI/tool message conventions.

The current implementation uses role: "function" and omits a tool_call_id, while the LangChain parser uses role: "tool" plus an ID field. For consistency with the rest of the finetuning parsers and typical OpenAI-style tool messages, consider:
-def _parse_tool_message(message: IntermediateStep) -> dict:
-    """Parse a tool/function response message from TOOL_END event."""
-    result = {"role": "function"}
+def _parse_tool_message(message: IntermediateStep) -> dict:
+    """Parse a tool/function response message from a `TOOL_END` event."""
+    result = {"role": "tool"}
@@
-    # Add function name if available
-    if message.name:
-        result["name"] = message.name
+    # Add function/tool name and a stable ID if available
+    if message.name:
+        result["name"] = message.name
+    # Mirror LangChain parser behavior by providing a tool_call_id
+    result["tool_call_id"] = getattr(message.metadata, "tool_call_id", message.UUID if hasattr(message, "UUID") else 0)
This keeps the LlamaIndex parser’s output shape closer to the existing LangChain parser and typical OpenAI tool-calling messages.
src/nat/finetuning/finetuning_runtime.py (1)
27-69: Reconcile Ruff TRY hints with project exception‑handling guidelines.

run_finetuning correctly uses logger.error and re‑raises, which matches the project rule to avoid logger.exception when re‑raising, but Ruff suggests otherwise (TRY400). Similarly, the explicit ValueError message in finetuning_main triggers TRY003.

To keep behavior and project guidelines while silencing Ruff, you can annotate the specific lines:
-    except Exception as e:
-        logger.error("Finetuning failed: %s", e)
+    except Exception as e:
+        logger.error("Finetuning failed: %s", e)  # noqa: TRY400
@@
-    if not config.finetuning.enabled:
-        raise ValueError("Finetuning is not enabled in the provided configuration.")
+    if not config.finetuning.enabled:
+        raise ValueError("Finetuning is not enabled in the provided configuration.")  # noqa: TRY003
Also worth documenting (e.g., in the docstring) that run_finetuning expects runner.initialize() to have been called so that runner.run_config is populated.

Also applies to: 83-88
packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/register.py (1)
29-41: Mark unused builder parameter to satisfy Ruff without changing behavior.

The builder argument is required by the register_* decorators but isn’t used in these implementations, which Ruff flags as ARG001. Renaming it to _builder makes the intent clear and silences the warning:
-@register_trajectory_builder(config_type=ARTTrajectoryBuilderConfig)
-async def register_art_trajectory_builder(config: ARTTrajectoryBuilderConfig, builder: Builder):
+@register_trajectory_builder(config_type=ARTTrajectoryBuilderConfig)
+async def register_art_trajectory_builder(config: ARTTrajectoryBuilderConfig, _builder: Builder):
@@
-@register_trainer_adapter(config_type=ARTTrainerAdapterConfig)
-async def register_art_trainer_adapter(config: ARTTrainerAdapterConfig, builder: Builder):
+@register_trainer_adapter(config_type=ARTTrainerAdapterConfig)
+async def register_art_trainer_adapter(config: ARTTrainerAdapterConfig, _builder: Builder):
@@
-@register_trainer(config_type=ARTTrainerConfig)
-async def register_art_trainer(config: ARTTrainerConfig, builder: Builder):
+@register_trainer(config_type=ARTTrainerConfig)
+async def register_art_trainer(config: ARTTrainerConfig, _builder: Builder):
Also applies to: 44-57, 59-71
src/nat/finetuning/interfaces/trajectory_builder.py (1)
33-36: Tighten TrajectoryBuilder typing and docstrings, and use meta to satisfy Ruff.

A few small adjustments will improve clarity and avoid ARG002:

run_config can be None until initialize is called.

compute_reward should declare its return type and reference meta.

The finalize docstring should match the TrajectoryCollection return type.

Suggested changes:
 class TrajectoryBuilder(ABC):
@@
-    def __init__(self, trajectory_builder_config: TrajectoryBuilderConfig):
-        self.trajectory_builder_config = trajectory_builder_config
-        self.run_config: FinetuneConfig = None
+    def __init__(self, trajectory_builder_config: TrajectoryBuilderConfig):
+        self.trajectory_builder_config = trajectory_builder_config
+        self.run_config: FinetuneConfig | None = None
@@
-    async def finalize(self, run_id: str, meta: dict | None = None) -> TrajectoryCollection:
+    async def finalize(self, run_id: str, meta: dict | None = None) -> TrajectoryCollection:
@@
-        Returns:
-            list[Trajectory]: The list of constructed trajectories.
+        Returns:
+            TrajectoryCollection: The collection of constructed trajectories.
@@
-    async def compute_reward(self, output_item: EvalOutputItem, meta: dict | None = None):
+    async def compute_reward(self, output_item: EvalOutputItem, meta: dict | None = None) -> float:
@@
-        Returns:
-            float: The computed reward.
-        """
-        return float(output_item.score) if output_item.score is not None else 0.0
+        Returns:
+            float: The computed reward.
+        """
+        # `meta` is accepted for extensibility in subclasses; unused in the default implementation.
+        _ = meta
+        return float(output_item.score) if output_item.score is not None else 0.0
Also applies to: 78-88, 91-103
tests/nat/finetuning/interfaces/test_trainer.py (1)
179-187: Use tmp_path instead of a hard‑coded /tmp path in the log‑progress test.

The hard‑coded "/tmp/logs" path triggers Ruff’s S108 and ties the test to a specific filesystem layout. You can keep the intent while avoiding this by using tmp_path:
-    def test_trainer_log_progress(self, trainer):
+    def test_trainer_log_progress(self, trainer, tmp_path):
@@
-        metrics = {"loss": 0.5, "accuracy": 0.95}
-        trainer.log_progress(epoch=1, metrics=metrics, output_dir="/tmp/logs")
+        metrics = {"loss": 0.5, "accuracy": 0.95}
+        output_dir = str(tmp_path / "logs")
+        trainer.log_progress(epoch=1, metrics=metrics, output_dir=output_dir)
@@
-        assert trainer.logged_progress[0]["output_dir"] == "/tmp/logs"
+        assert trainer.logged_progress[0]["output_dir"] == output_dir
examples/finetuning/rl_with_openpipe_art/src/rl_with_openpipe_art/core.py (2)
55-60: Use unpacking for list concatenation.

Per static analysis hint (RUF005), prefer unpacking syntax for clarity.
-    return "\n".join([header] + rows)
+    return "\n".join([header, *rows])
130-136: Replace setattr with direct attribute assignment.

Using setattr with a constant attribute name provides no benefit over direct assignment. Per static analysis hint (B010).
     outcome_cache: dict[tuple[tuple[int, ...], int], float] = getattr(evaluate_board_for_player, "_outcome_cache", None)
     if outcome_cache is None:
         outcome_cache = {}
-        setattr(evaluate_board_for_player, "_outcome_cache", outcome_cache)
+        evaluate_board_for_player._outcome_cache = outcome_cache
packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trainer.py (3)
400-410: Remove redundant int() cast.

math.ceil() already returns an int in Python 3. The cast is unnecessary.
             if self.curriculum_config.random_subsample is not None:
                 import random
                 fraction = self.curriculum_config.random_subsample
                 trajectory_group = trajectory_collection.trajectories[0]
-                max_required_trajectories = int(math.ceil(len(trajectory_group) * fraction))
+                max_required_trajectories = math.ceil(len(trajectory_group) * fraction)
465-468: Remove redundant int() cast.

Same issue—math.ceil() returns int in Python 3.
         # Calculate number of groups to include
         num_groups_to_include = max(
             1,  # Always include at least one group
-            int(math.ceil(len(group_stats) * self._curriculum_state["current_percentile"])))
+            math.ceil(len(group_stats) * self._curriculum_state["current_percentile"]))
497-506: Remove redundant int() cast and hoist random import.

The int() cast is unnecessary. Additionally, consider moving import random to the module level to avoid repeated imports during curriculum learning.
         if self.curriculum_config.random_subsample is not None:
             # Randomly select only a fraction of trajectory groups to use
-            import random
             fraction = self.curriculum_config.random_subsample
             # Max required groups is the theoretical max based on fraction
-            max_required_groups = int(math.ceil(len(group_stats) * fraction))
+            max_required_groups = math.ceil(len(group_stats) * fraction)
Add import random at the top of the file alongside other imports.
src/nat/finetuning/utils/parsers/base_parser.py (1)
88-88: Replace assert with explicit exception for production safety.

Assertions can be disabled with python -O. For invariant checks that should always run, use an explicit exception.
-        assert not isinstance(parsed_msg, list), "TOOL_END or LLM_END should not produce multiple messages"
+        if isinstance(parsed_msg, list):
+            raise ValueError("TOOL_END or LLM_END should not produce multiple messages")
src/nat/finetuning/interfaces/finetuning_runner.py (2)
54-61: Type annotations should allow None for initially unset attributes.

These attributes are initialized to None but typed as non-optional. This can cause type checker warnings.
     def __init__(self, trainer_config: TrainerConfig, **kwargs) -> None:
         ...
         self.trainer_config = trainer_config
-        self.run_config: FinetuneConfig = None
-        self.curriculum_config = None
-        self.trajectory_builder: TrajectoryBuilder = None
-        self.trainer_adapter: TrainerAdapter = None
+        self.run_config: FinetuneConfig | None = None
+        self.curriculum_config: CurriculumLearningConfig | None = None
+        self.trajectory_builder: TrajectoryBuilder | None = None
+        self.trainer_adapter: TrainerAdapter | None = None

         # Curriculum learning state
-        self._curriculum_state = None
+        self._curriculum_state: dict[str, Any] | None = None
You'll need to import CurriculumLearningConfig from the data_models module.

156-170: Unused run_id parameter.

The run_id parameter is documented but never used in the method body. Consider using it for logging context or removing it from the signature if not needed.
     async def run_validation_evaluation(self, epoch: int, run_id: str) -> dict[str, Any]:
         ...
-        logger.info("Running validation evaluation for epoch %d", epoch + 1)
+        logger.info("Running validation evaluation for epoch %d, run_id=%s", epoch + 1, run_id)
src/nat/cli/register_workflow.py (3)
488-507: Add docstring to register_trainer function.

Per coding guidelines, all public APIs require Google-style docstrings. Other register_* functions in this file have docstrings (e.g., register_telemetry_exporter, register_front_end). Add a docstring describing the decorator's purpose.
 def register_trainer(config_type: type[TrainerConfigT]):
+    """
+    Register a trainer component for fine-tuning LLMs.
+
+    Args:
+        config_type: The trainer configuration type to register.
+
+    Returns:
+        A decorator that wraps the build function as an async context manager.
+    """
 
     def register_trainer_inner(fn: TrainerBuildCallableT[TrainerConfigT]) -> TrainerBuildCallableT[TrainerConfigT]:
510-531: Add docstring to register_trainer_adapter function.

Consistent with register_trainer, this public registration helper should include a Google-style docstring.
 def register_trainer_adapter(config_type: type[TrainerAdapterConfigT]):
+    """
+    Register a trainer adapter component.
+
+    Trainer adapters are responsible for adapting the training process to different frameworks.
+
+    Args:
+        config_type: The trainer adapter configuration type to register.
+
+    Returns:
+        A decorator that wraps the build function as an async context manager.
+    """
 
     def register_trainer_adapter_inner(
534-555: Add docstring to register_trajectory_builder function.

Same pattern as above—add a docstring for consistency and discoverability.
 def register_trajectory_builder(config_type: type[TrajectoryBuilderConfigT]):
+    """
+    Register a trajectory builder component.
+
+    Trajectory builders are responsible for building trajectories for fine-tuning.
+
+    Args:
+        config_type: The trajectory builder configuration type to register.
+
+    Returns:
+        A decorator that wraps the build function as an async context manager.
+    """
 
     def register_trajectory_builder_inner(
examples/finetuning/rl_with_openpipe_art/src/rl_with_openpipe_art/accuracy_evaluator.py (4)
45-47: Replace assertion with explicit error handling.

Using assert for validation can be disabled with -O flag in production. For robustness, use an explicit check with a raised exception.
         s = np.asarray(state_values, dtype=float)
         T = len(s) - 1
-        assert T >= 0
+        if T < 0:
+            raise ValueError("state_values must not be empty")
87-88: Remove commented-out code.

Dead code should be removed to keep the codebase clean. If the alternative calculation is needed for reference, document it elsewhere.
-        #average_score = score_sum / max(1, len(item.trajectory))
         average_score = AccuracyEvaluator.episode_value_from_states(scores)
74-96: Add docstring to _eval_with_steps method.

This method contains non-trivial logic for step-based evaluation. A docstring would improve maintainability.
     @staticmethod
     async def _eval_with_steps(item: EvalInputItem) -> EvalOutputItem:
+        """Evaluate item using intermediate step values.
+
+        Collects 'value' metadata from CUSTOM_END steps and computes
+        an episode score using time-discounted rewards.
+
+        Args:
+            item: The evaluation input item with trajectory steps.
+
+        Returns:
+            EvalOutputItem with computed average score and reasoning.
+        """
 
         score_sum = 0.0
77-85: Unused variable score_sum.

The variable score_sum is accumulated but never used (only scores list is passed to episode_value_from_states). Remove it to avoid confusion.
-        score_sum = 0.0
         scores = []
         for step in item.trajectory:
             if step.event_type == IntermediateStepType.CUSTOM_END:
                 payload = step.payload
                 if payload.metadata and "value" in payload.metadata:
                     step_score = float(payload.metadata["value"])
                     scores.append(step_score)
-                    score_sum += step_score
packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.py (2)
143-144: Use %-style formatting in logger calls instead of f-strings.

Per Python logging best practices, use lazy %-formatting to avoid string interpolation when the log level is disabled.
             if not reward_results:
-                logger.warning(f"No reward results found for run_id: {run_id}, generation: {gen_idx}")
+                logger.warning("No reward results found for run_id: %s, generation: %d", run_id, gen_idx)
                 continue
71-81: Consider using meta parameter or document why it's reserved.

The meta parameter is unused (flagged by static analysis). If it's intended for future use, add a docstring note or prefix with underscore to indicate it's intentionally unused.
-    async def start_run(self, run_id: str, meta: dict | None = None) -> None:
+    async def start_run(self, run_id: str, _meta: dict | None = None) -> None:
         """
         Start multiple evaluation runs to collect trajectories.
         Args:
             run_id (str): The ID of the run.
-            meta (dict): Metadata for the run.
+            _meta (dict): Reserved for future use. Currently unused.
         """
examples/finetuning/rl_with_openpipe_art/src/rl_with_openpipe_art/rl_with_openpipe_art.py (2)
220-221: Simplify redundant boolean expression.

True if condition else False is equivalent to just condition.
                 max_retries=max_retries,
-                choose_random=True if config.opponent_model is None else False,
+                choose_random=config.opponent_model is None,
             )
Apply the same fix at line 236:
                 max_retries=max_retries,
-                choose_random=True if config.opponent_model is None else False,
+                choose_random=config.opponent_model is None,
             )
157-159: Clarify the return value when game is aborted.

When RuntimeError is caught, current_player.steps (an integer) is returned instead of a game outcome. The subsequent logic (lines 242-258) handles this by converting to str(winner). Consider adding a comment to clarify this behavior or using a more explicit sentinel value.
         except RuntimeError as _:
             logger.debug("*** Game aborted due to too many invalid moves. ***")
-            return current_player.steps
+            # Return step count as abort indicator; handled specially in outcome logic
+            return current_player.steps
src/nat/finetuning/utils/parsers/langchain_parser.py (3)
70-78: Consider narrowing exception types for payload parsing.

Catching bare Exception obscures potential issues. Consider catching more specific exceptions like TypeError or ValueError.
         if isinstance(payload, dict) and "message" in payload:
             # Handle dict payloads
             try:
                 msg = AIMessage(**payload["message"])
-            except Exception as _:
+            except (TypeError, ValueError, KeyError):
                 try:
                     msg = AIMessageChunk(**payload["message"])
-                except Exception as _:
+                except (TypeError, ValueError, KeyError):
                     msg = None
229-247: Remove commented-out code.

Line 234 contains commented-out code that should be removed to maintain code cleanliness.
 def _parse_dict_message(msg_dict: dict) -> dict:
     """Parse a dictionary-based message."""
     result = {}

     # Extract role
-    #result["role"] = msg_dict.get("role", "user")
     if "role" in msg_dict:
         role = msg_dict["role"]
     elif "type" in msg_dict:
         role = msg_dict["type"]
     else:
         role = "user"
286-306: Add type hint for data parameter.

Adding Any type hint improves code clarity and IDE support.
-def _extract_content(data) -> str:
+def _extract_content(data: Any) -> str:
     """Extract string content from various data formats."""
Note: You'll need to add Any to the imports from typing.
tests/nat/finetuning/interfaces/test_trainer_adapter.py (2)
88-122: Fixtures should follow naming convention per coding guidelines.

Per coding guidelines, pytest fixtures should define the name argument and use fixture_ prefix for the function.
-    @pytest.fixture
-    def adapter_config(self):
+    @pytest.fixture(name="adapter_config")
+    def fixture_adapter_config(self):
         """Create a test adapter config."""
         return TrainerAdapterConfig(type="test_adapter", reward=RewardFunctionConfig(name="test_reward"))

-    @pytest.fixture
-    def finetune_config(self, tmp_path):
+    @pytest.fixture(name="finetune_config")
+    def fixture_finetune_config(self, tmp_path):
         """Create a test finetune config."""
         # ... rest of fixture

-    @pytest.fixture
-    def adapter(self, adapter_config):
+    @pytest.fixture(name="adapter")
+    def fixture_adapter(self, adapter_config):
         """Create a concrete adapter instance."""
         return ConcreteTrainerAdapter(adapter_config=adapter_config)

-    @pytest.fixture
-    def sample_trajectories(self):
+    @pytest.fixture(name="sample_trajectories")
+    def fixture_sample_trajectories(self):
         """Create sample trajectories for testing."""
As per coding guidelines.

205-224: Apply same fixture naming convention here.
-    @pytest.fixture
-    def failing_adapter_config(self):
+    @pytest.fixture(name="failing_adapter_config")
+    def fixture_failing_adapter_config(self):
         """Create an adapter config that might fail."""

-    @pytest.fixture
-    def finetune_config(self, tmp_path):
+    @pytest.fixture(name="finetune_config")
+    def fixture_finetune_config(self, tmp_path):
         """Create a test finetune config."""
As per coding guidelines.
src/nat/builder/builder.py (1)

273-315: Consider consistency for @experimental decorator on finetuning methods.

The add_* methods are marked @experimental(feature_name="Finetuning") but the corresponding get_* methods are not. If finetuning is experimental, consider marking all related methods consistently, or document the distinction.
packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trainer_adapter.py (3)
117-118: Remove or document commented-out validation code.

If the last-message-must-be-assistant constraint was intentionally relaxed, add a comment explaining why. Otherwise, remove the dead code.
-        # if traj.episode[-1].role != EpisodeItemRole.ASSISTANT:
-        #     raise ValueError("The last message in the trajectory must be from 'assistant'.")
+        # Note: The last message does not need to be from 'assistant' to allow
+        # for partial trajectories or user-initiated conversation endings.
187-189: Use logger.exception to capture full stack trace.

When catching and logging exceptions without re-raising, use logger.exception() to capture the full stack trace. Per coding guidelines.
                 except Exception as e:
-                    logger.error(f"Error constructing trajectory: {e}. Skipping.")
+                    logger.exception("Error constructing trajectory. Skipping.")
                     continue
98-122: _validate_episode_order could be synchronous.

This method performs only synchronous validation and doesn't await anything. Consider making it a regular method to avoid unnecessary coroutine overhead.
-    async def _validate_episode_order(self, traj: Trajectory):
+    def _validate_episode_order(self, traj: Trajectory) -> None:
And update the call site at line 168:
-                await self._validate_episode_order(traj)
+                self._validate_episode_order(traj)
packages/nvidia_nat_openpipe_art/tests/test_trajectory_builder.py (1)
40-71: Consider renaming fixtures to use fixture_ prefix.

As per coding guidelines, pytest fixtures should be named using the fixture_ prefix and define the name argument in the decorator. This helps distinguish fixtures from regular values.
     @pytest.fixture
-    def builder_config(self):
+    def fixture_builder_config(self):
         """Create test trajectory builder configuration."""
         return ARTTrajectoryBuilderConfig(num_generations=2, reward=RewardFunctionConfig(name="test_reward"))

     @pytest.fixture
-    def finetune_config(self, tmp_path):
+    def fixture_finetune_config(self, tmp_path):
         """Create test finetune configuration."""
         ...

     @pytest.fixture
-    def builder(self, builder_config):
+    def fixture_builder(self, fixture_builder_config):
         """Create ARTTrajectoryBuilder instance."""
-        return ARTTrajectoryBuilder(trajectory_builder_config=builder_config)
+        return ARTTrajectoryBuilder(trajectory_builder_config=fixture_builder_config)
src/nat/builder/workflow_builder.py (1)
1063-1083: Consider making config getter methods synchronous.

The new get_trainer_config, get_trainer_adapter_config, and get_trajectory_builder_config methods are marked as async, but they only perform synchronous dictionary lookups. Most other config getters in this file (e.g., get_llm_config, get_embedder_config, get_function_config) are synchronous.
     @override
-    async def get_trainer_config(self, trainer_name: str | TrainerRef) -> TrainerConfig:
+    def get_trainer_config(self, trainer_name: str | TrainerRef) -> TrainerConfig:
         if trainer_name not in self._trainers:
             raise ValueError(f"Trainer '{trainer_name}' not found")

         return self._trainers[trainer_name].config

     @override
-    async def get_trainer_adapter_config(self, trainer_adapter_name: str | TrainerAdapterRef) -> TrainerAdapterConfig:
+    def get_trainer_adapter_config(self, trainer_adapter_name: str | TrainerAdapterRef) -> TrainerAdapterConfig:
         if trainer_adapter_name not in self._trainer_adapters:
             raise ValueError(f"Trainer adapter '{trainer_adapter_name}' not found")

         return self._trainer_adapters[trainer_adapter_name].config

     @override
-    async def get_trajectory_builder_config(
+    def get_trajectory_builder_config(
             self, trajectory_builder_name: str | TrajectoryBuilderRef) -> (TrajectoryBuilderConfig):
         if trajectory_builder_name not in self._trajectory_builders:
             raise ValueError(f"Trajectory builder '{trajectory_builder_name}' not found")

         return self._trajectory_builders[trajectory_builder_name].config
Note: This would require corresponding changes to the Builder interface and ChildBuilder delegation methods.

ci/vale/styles/config/vocabularies/nat/accept.txt

examples/finetuning/rl_with_openpipe_art/data

examples/finetuning/rl_with_openpipe_art/src/rl_with_openpipe_art/evaluator_register.py

examples/finetuning/rl_with_openpipe_art/src/rl_with_openpipe_art/llm_agents.py

packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trainer_adapter.py

packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.py

src/nat/finetuning/interfaces/__init__.py

src/nat/finetuning/interfaces/finetuning_runner.py

src/nat/finetuning/utils/parsers/__init__.py

src/nat/utils/io/supress_logs.py

Updated the relative paths to the Custom Evaluators page to ensure proper navigation in the documentation. This resolves incorrect linking issues in the finetuning reference guides. Signed-off-by: dnandakumar-nv <[email protected]>

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (17)

docs/source/reference/finetuning/rl_with_openpipe.md (4)
20-20: Clarify NAT naming convention on first use.

Line 20 references "NAT's finetuning harness" but should introduce the full product name first. Per coding guidelines, use "NVIDIA NeMo Agent Toolkit" on first mention, then "NeMo Agent Toolkit" for subsequent uses.

Apply this change:
-This guide covers the integration between NAT's finetuning harness and [OpenPipe ART](https://art.openpipe.ai/) (Agent Reinforcement Trainer), an open-source framework for teaching LLMs through reinforcement learning.
+This guide covers the integration between the NVIDIA NeMo Agent Toolkit's finetuning harness and [OpenPipe ART](https://art.openpipe.ai/) (Agent Reinforcement Trainer), an open-source framework for teaching LLMs through reinforcement learning.
31-39: Reduce repetitive phrasing for better readability.

The sentences at lines 35 and 38 both start with "You want to," which creates repetition and impacts flow. Consider rewriting to vary sentence structure.

Apply this change:
 - You want to improve agent reliability on specific tasks
 - You have a way to score agent performance (even if you don't have "correct" answers)
 - You're working with agentic workflows that make decisions or take actions
-- You want to iterate quickly with online training
+- Online training enables rapid iteration and deployment
401-410: Clarify the phrasing for better readability.

Line 404 uses a split infinitive ("to progressively include"). Consider restructuring for clearer expression while maintaining technical accuracy.

Apply this change:
-1. **Curriculum Learning**: The trainer implements curriculum learning to progressively include harder examples:
+1. **Curriculum Learning**: The trainer implements curriculum learning by progressively including harder examples:
1-600: Verify ASCII diagram code blocks are properly formatted.

The architectural diagrams at lines 42–65, 238–273, 290–330, and 354–400 are in fenced code blocks without language specification. While these are ASCII art rather than executable code, markdownlint recommends adding a language tag (e.g., ```text or ```plaintext) for clarity and linting compliance.

Apply changes to all diagram blocks (example shown for lines 42–65):
-```
+```text
┌─────────────────────────────────────────────────────────────────────────┐
Repeat this change for the diagrams at lines 238, 290, and 354.
docs/source/reference/finetuning/extending.md (13)
18-50: Introduce full product name on first reference.

The document introduces the finetuning harness without first using the full product name. Per coding guidelines, use "NVIDIA NeMo Agent Toolkit" on first mention.

Apply this change:
-# Extending the Finetuning Harness
-
-This guide covers how to create custom components for the NAT finetuning harness. You'll learn about the three core interfaces, how to implement them, and best practices for creating robust, reusable components.
+# Extending the Finetuning Harness
+
+This guide covers how to create custom components for the NVIDIA NeMo Agent Toolkit finetuning harness. You'll learn about the three core interfaces, how to implement them, and best practices for creating robust, reusable components.
26-50: Add language specification to architecture diagram code block.

The ASCII art diagram lacks a language tag. Add text or plaintext for consistency with markdown linting standards.

Apply this change:
-```
+```text
┌────────────────────────────────────────────────────────────────────────┐
66-111: Add Python language specification to code block.

The Python code example demonstrating the TrajectoryBuilder interface lacks a language tag.

Apply this change:
-```python
+```python
 from abc import ABC, abstractmethod
135-158: Add Python language specification to code block.

The configuration class example lacks a language tag.

Apply this change:
-```python
+```python
 from pydantic import Field
170-190: Add Python language specification to code block.

The registration module example lacks a language tag.

Apply this change:
-```python
+```python
 from nat.builder.builder import Builder
198-247: Add Python language specification to code block.

The TrainerAdapter interface definition lacks a language tag.

Apply this change:
-```python
+```python
 from abc import ABC, abstractmethod
262-288: Add Python language specification to code block.

The trainer adapter configuration example lacks a language tag.

Apply this change:
-```python
+```python
 from pydantic import BaseModel, Field
296-307: Add Python language specification to code block.

The trainer adapter registration example lacks a language tag.

Apply this change:
-```python
+```python
 from nat.builder.builder import Builder
315-401: Add Python language specification to code block.

The Trainer interface definition lacks a language tag.

Apply this change:
-```python
+```python
 from abc import ABC, abstractmethod
430-437: Add Python language specification to code block.

The error handling example lacks a language tag.

Apply this change:
-```python
+```python
 async def run_epoch(self, epoch: int, run_id: str) -> TrainingJobRef | None:
443-447: Add Python language specification to code block.

The logging example lacks a language tag.

Apply this change:
-```python
+```python
 logger.info("Starting epoch %d with %d examples", epoch, num_examples)
472-491: Add Python language specification to code block.

The testing example lacks a language tag.

Apply this change:
-```python
+```python
 import pytest
497-556: Add YAML language specification to code block.

The configuration example lacks a language tag.

Apply this change:
-```yaml
+```yaml
 llms:

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f13e178 and 1595c87.

📒 Files selected for processing (2)

docs/source/reference/finetuning/extending.md (1 hunks)
docs/source/reference/finetuning/rl_with_openpipe.md (1 hunks)

🧰 Additional context used

📓 Path-based instructions (8)

**/*.{md,rst,py}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Use 'NVIDIA NeMo Agent toolkit' on first use, then 'NeMo Agent toolkit' for subsequent references

Files:

docs/source/reference/finetuning/rl_with_openpipe.md
docs/source/reference/finetuning/extending.md

**/*.{md,rst}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Use 'NeMo Agent Toolkit' (capitalize 'T') when the name appears in headings

Files:

docs/source/reference/finetuning/rl_with_openpipe.md
docs/source/reference/finetuning/extending.md

docs/**/*.{md,rst}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Never use deprecated names: Agent Intelligence toolkit, aiqtoolkit, AgentIQ, AIQ/aiq in documentation (unless intentionally referring to deprecated versions)

Files:

docs/source/reference/finetuning/rl_with_openpipe.md
docs/source/reference/finetuning/extending.md

**/*.{py,js,ts,yaml,yml,json,md,rst}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Indent with 4 spaces, never tabs, and ensure every file ends with a single newline

Files:

docs/source/reference/finetuning/rl_with_openpipe.md
docs/source/reference/finetuning/extending.md

docs/source/**/*.{md,rst}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

docs/source/**/*.{md,rst}: Documentation must be clear and comprehensive, without TODOs, FIXMEs, or placeholder text
Ensure documentation is free of offensive or outdated terms
Ensure documentation is free of spelling mistakes and do not use words listed in ci/vale/styles/config/vocabularies/nat/reject.txt

Files:

docs/source/reference/finetuning/rl_with_openpipe.md
docs/source/reference/finetuning/extending.md

**/*.{py,md,rst}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Version numbers are derived automatically by setuptools-scm; never hard-code them in code or docs

Files:

docs/source/reference/finetuning/rl_with_openpipe.md
docs/source/reference/finetuning/extending.md

**/*

⚙️ CodeRabbit configuration file

**/*: # Code Review Instructions
Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.
Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values (except for return values of None,
in that situation no return type hint is needed).
Example:
def my_function(param1: int, param2: str) -> bool:
    pass
For Python exception handling, ensure proper stack trace preservation:

When re-raising exceptions: use bare raise statements to maintain the original stack trace,
and use logger.error() (not logger.exception()) to avoid duplicate stack trace output.

When catching and logging exceptions without re-raising: always use logger.exception()
to capture the full stack trace information.
Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any

words listed in the ci/vale/styles/config/vocabularies/nat/reject.txt file, words that might appear to be
spelling mistakes but are listed in the ci/vale/styles/config/vocabularies/nat/accept.txt file are OK.

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

and should contain an Apache License 2.0 header comment at the top of each file.

Confirm that copyright years are up-to date whenever a file is changed.

Files:

docs/source/reference/finetuning/rl_with_openpipe.md
docs/source/reference/finetuning/extending.md

docs/source/**/*

⚙️ CodeRabbit configuration file

This directory contains the source code for the documentation. All documentation should be written in Markdown format. Any image files should be placed in the docs/source/_static directory.

Files:

docs/source/reference/finetuning/rl_with_openpipe.md
docs/source/reference/finetuning/extending.md

🪛 LanguageTool

docs/source/reference/finetuning/rl_with_openpipe.md

[style] ~38-~38: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...at make decisions or take actions - You want to iterate quickly with online training #...

(REP_WANT_TO_VB)

[style] ~404-~404: Try moving the adverb to make the sentence clearer.
Context: ...ng**: The trainer implements curriculum learning to progressively include harder examples: - Groups trajectories by average re...

(SPLIT_INFINITIVE)

🪛 markdownlint-cli2 (0.18.1)

docs/source/reference/finetuning/extending.md

42-42: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

189-189: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

214-214: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

237-237: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

289-289: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

353-353: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

464-464: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

536-536: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

549-549: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

557-557: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

Corrected the case-insensitivity pattern for 'logprob' to ensure proper matching in the Vale style vocabulary. This update aligns with consistent handling of vocabulary terms. Signed-off-by: dnandakumar-nv <[email protected]>

Updated module and project naming to align with the "nat" prefix convention. Adjusted logging suppression prefix and enhanced setup instructions in README for better clarity, including specifying virtual environment and dependencies. Signed-off-by: dnandakumar-nv <[email protected]>

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

pyproject.toml (2)
100-105: Move openpipe-art to maintain alphabetical ordering in optional-dependencies.

The section header at line 74 specifies "Keep sorted!!!" Reorder openpipe-art to its alphabetically correct position between mem0ai and opentelemetry, not at the end after zep-cloud.
opentelemetry = ["nvidia-nat-opentelemetry"]
-phoenix = ["nvidia-nat-phoenix"]
+openpipe-art = ["nvidia-nat-openpipe-art"]
+phoenix = ["nvidia-nat-phoenix"]
200-204: Move nvidia-nat-openpipe-art to maintain alphabetical ordering in workspace members.

Reorder nvidia-nat-openpipe-art to its alphabetically correct position. It should appear between nvidia-nat-mem0ai and nvidia-nat-mysql (or after nvidia-nat-mysql and before nvidia-nat-opentelemetry, depending on exact sort rules), not at the end after nvidia-nat-zep-cloud.
-nvidia-nat-mem0ai = { workspace = true }
+nvidia-nat-mem0ai = { workspace = true }
+nvidia-nat-mysql = { workspace = true }
+nvidia-nat-openpipe-art = { workspace = true }
Then remove from line 204 (the current end-of-list location).

♻️ Duplicate comments (2)

src/nat/utils/io/supress_logs.py (2)
1-33: Consider renaming the module to suppress_logs.py to avoid a permanent typo in the API.

The filename supress_logs.py is misspelled. If this utility becomes part of the public nat.utils.io surface, the typo will be baked into the import path. While this PR is still in flight, it’s safer to:

Rename the file to suppress_logs.py.

Update all imports in the PR to use nat.utils.io.suppress_logs.

This was already raised on a previous revision; addressing it now will prevent long-term API awkwardness.

20-33: Add type hints/docstring and correctly restore propagate state for each logger.

supress_logs.suppress_logs is a public async context manager and should follow project typing/docstring rules, and it still does not restore logger.propagate to its original value:

No type hints on prefix / level or the async context manager return type.

No Google-style docstring explaining behavior and parameters.

You snapshot only lg.level (line 24) but in finally you unconditionally set lg.propagate = True (line 33), which can permanently change logging behavior for loggers that previously had propagate = False.

A minimal refactor that addresses all three points:
-import logging
-from contextlib import asynccontextmanager
+import logging
+from collections.abc import AsyncIterator
+from contextlib import asynccontextmanager
@@
-@asynccontextmanager
-async def suppress_logs(prefix, level=logging.ERROR):
-    # gather every logger created so far whose name starts with "nat"
-    loggers = [logging.getLogger(name) for name in logging.root.manager.loggerDict if name.startswith(prefix)]
-    old = {lg: lg.level for lg in loggers}
+@asynccontextmanager
+async def suppress_logs(prefix: str, level: int = logging.ERROR) -> AsyncIterator[None]:
+    """Temporarily raise log level and disable propagation for matching loggers.
+
+    This adjusts existing ``logging.Logger`` instances whose names start with
+    ``prefix``, then restores their original configuration on exit.
+
+    Args:
+        prefix: Logger name prefix to match (e.g., ``"nat"``).
+        level: Temporary log level to apply while inside the context.
+    """
+    # Gather every logger created so far whose name starts with ``prefix``.
+    loggers = [logging.getLogger(name) for name in logging.root.manager.loggerDict if name.startswith(prefix)]
+    old = {lg: (lg.level, lg.propagate) for lg in loggers}
@@
-    finally:
-        for lg, lvl in old.items():
-            lg.setLevel(lvl)
-            lg.propagate = True
+    finally:
+        for lg, (lvl, propagate) in old.items():
+            lg.setLevel(lvl)
+            lg.propagate = propagate
Also add a short module-level Google-style docstring after the license header to describe this utility.

🧹 Nitpick comments (3)

examples/finetuning/rl_with_openpipe_art/README.md (1)
1-10: Use full "NVIDIA NeMo Agent Toolkit" name on first reference.

Per coding guidelines, the first mention of the toolkit should use the full name "NVIDIA NeMo Agent Toolkit" before using the shortened form "NeMo Agent Toolkit".

Apply this diff:
-This example demonstrates how to use the NeMo Agent Toolkit finetuning harness with [OpenPipe ART](https://art.openpipe.ai/) (Agent Reinforcement Trainer) to improve an LLM's performance at playing Tic-Tac-Toe through reinforcement learning.
+This example demonstrates how to use the NVIDIA NeMo Agent Toolkit finetuning harness with [OpenPipe ART](https://art.openpipe.ai/) (Agent Reinforcement Trainer) to improve an LLM's performance at playing Tic-Tac-Toe through reinforcement learning.
packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.py (2)
79-80: Consider extracting the exception message to a constant.

The exception message could be extracted to improve maintainability, though this is a minor style preference.

Example refactor:
+    _RUN_IN_PROGRESS_ERROR = "Run {run_id} is already in progress."
+
     async def start_run(self, run_id: str, meta: dict | None = None) -> None:
         ...
         if run_id in self.evaluation_runs:
-            raise ValueError(f"Run {run_id} is already in progress.")
+            raise ValueError(self._RUN_IN_PROGRESS_ERROR.format(run_id=run_id))
124-125: Consider extracting the exception message to a constant.

Similar to the earlier suggestion, this exception message could be extracted for consistency.

Example refactor:
+    _NO_EVAL_RUNS_ERROR = "No evaluation runs found for run_id: {run_id}"
+
     async def finalize(self, run_id: str, meta: dict | None = None) -> TrajectoryCollection:
         ...
         if run_id not in self.evaluation_runs:
-            raise ValueError(f"No evaluation runs found for run_id: {run_id}")
+            raise ValueError(self._NO_EVAL_RUNS_ERROR.format(run_id=run_id))

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 170b569 and 39a4a39.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (5)

examples/finetuning/rl_with_openpipe_art/README.md (1 hunks)
examples/finetuning/rl_with_openpipe_art/pyproject.toml (1 hunks)
packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.py (1 hunks)
pyproject.toml (5 hunks)
src/nat/utils/io/supress_logs.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

examples/finetuning/rl_with_openpipe_art/pyproject.toml

🧰 Additional context used

📓 Path-based instructions (14)

**/*.{md,rst,py}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Use 'NVIDIA NeMo Agent toolkit' on first use, then 'NeMo Agent toolkit' for subsequent references

Files:

src/nat/utils/io/supress_logs.py
packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.py
examples/finetuning/rl_with_openpipe_art/README.md

**/*.{py,toml,yaml,yml}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Use abbreviations: 'nat' for API namespace and CLI tool, 'nvidia-nat' for package name, 'NAT' for environment variable prefixes and informal comments

Files:

src/nat/utils/io/supress_logs.py
packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.py
pyproject.toml

**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

**/*.py: Follow PEP 20 and PEP 8 for Python style guidelines
Run yapf second (PEP 8 base, column_limit = 120) for Python formatting
Use ruff check --fix (via pre-commit) for linting using configuration embedded in pyproject.toml, fix warnings unless explicitly ignored
Use snake_case for functions and variables, PascalCase for classes, UPPER_CASE for constants
All public APIs require Python 3.11+ type hints on parameters and return values
Prefer collections.abc / typing abstractions (Sequence over list) for type hints
Use typing.Annotated for units or extra metadata when useful
Treat pyright warnings (configured in pyproject.toml) as errors during development
Preserve stack traces and prevent duplicate logging when handling exceptions; use bare raise statements and logger.error() when re-raising
When catching and logging exceptions without re-raising, always use logger.exception() to capture the full stack trace information
Provide Google-style docstrings for every public module, class, function and CLI command
The first line of docstrings must be a concise description ending with a period (Vale checks this)
Surround code entities with backticks in docstrings to avoid Vale false-positives
Validate and sanitise all user input, especially in web or CLI interfaces
Prefer httpx with SSL verification enabled by default and follow OWASP Top-10 recommendations
Use async/await for I/O-bound work (HTTP, DB, file reads)
Cache expensive computations with functools.lru_cache or an external cache when appropriate
Leverage NumPy vectorised operations whenever beneficial and feasible

Files:

src/nat/utils/io/supress_logs.py
packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.py

**/*.{py,js,ts,yaml,yml,json,md,rst}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Indent with 4 spaces, never tabs, and ensure every file ends with a single newline

Files:

src/nat/utils/io/supress_logs.py
packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.py
examples/finetuning/rl_with_openpipe_art/README.md

**/*.{py,env,toml,yaml,yml,json}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Never commit API keys, credentials or personal data; use environment variables or .env files excluded from Git

Files:

src/nat/utils/io/supress_logs.py
packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.py
pyproject.toml

**/*.{py,js,ts,java,cpp,c,go,rb,php}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Every file must start with the standard SPDX Apache-2.0 header

Files:

src/nat/utils/io/supress_logs.py
packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.py

**/*.{py,md,rst}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Version numbers are derived automatically by setuptools-scm; never hard-code them in code or docs

Files:

src/nat/utils/io/supress_logs.py
packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.py
examples/finetuning/rl_with_openpipe_art/README.md

**/*.{py,js,ts,java,cpp,c,go,rb,php,sh}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

All source files must include the SPDX Apache-2.0 header template

Files:

src/nat/utils/io/supress_logs.py
packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.py

**/*

⚙️ CodeRabbit configuration file

**/*: # Code Review Instructions
Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.
Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values (except for return values of None,
in that situation no return type hint is needed).
Example:
def my_function(param1: int, param2: str) -> bool:
    pass
For Python exception handling, ensure proper stack trace preservation:

When re-raising exceptions: use bare raise statements to maintain the original stack trace,
and use logger.error() (not logger.exception()) to avoid duplicate stack trace output.

When catching and logging exceptions without re-raising: always use logger.exception()
to capture the full stack trace information.
Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any

words listed in the ci/vale/styles/config/vocabularies/nat/reject.txt file, words that might appear to be
spelling mistakes but are listed in the ci/vale/styles/config/vocabularies/nat/accept.txt file are OK.

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

and should contain an Apache License 2.0 header comment at the top of each file.

Confirm that copyright years are up-to date whenever a file is changed.

Files:

src/nat/utils/io/supress_logs.py
packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.py
pyproject.toml
examples/finetuning/rl_with_openpipe_art/README.md

src/nat/**/*

⚙️ CodeRabbit configuration file

This directory contains the core functionality of the toolkit. Changes should prioritize backward compatibility.

Files:

src/nat/utils/io/supress_logs.py

packages/**/*

⚙️ CodeRabbit configuration file

packages/**/*: - This directory contains optional plugin packages for the toolkit, each should contain a pyproject.toml file. - The pyproject.toml file should declare a dependency on nvidia-nat or another package with a name starting
with nvidia-nat-. This dependency should be declared using ~=<version>, and the version should be a two
digit version (ex: ~=1.0).

Not all packages contain Python code, if they do they should also contain their own set of tests, in a
tests/ directory at the same level as the pyproject.toml file.

Files:

packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.py

**/pyproject.toml

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

**/pyproject.toml: Dependencies should use ~= format with two-digit versions (e.g., ~=1.0)
New dependencies must be added to both pyproject.toml (alphabetically) and uv.lock via uv pip install --sync

Files:

pyproject.toml

**/*.{md,rst}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Use 'NeMo Agent Toolkit' (capitalize 'T') when the name appears in headings

Files:

examples/finetuning/rl_with_openpipe_art/README.md

examples/**/*

⚙️ CodeRabbit configuration file

examples/**/*: - This directory contains example code and usage scenarios for the toolkit, at a minimum an example should
contain a README.md or file README.ipynb.

If an example contains Python code, it should be placed in a subdirectory named src/ and should
contain a pyproject.toml file. Optionally, it might also contain scripts in a scripts/ directory.

If an example contains YAML files, they should be placed in a subdirectory named configs/. - If an example contains sample data files, they should be placed in a subdirectory named data/, and should
be checked into git-lfs.

Files:

examples/finetuning/rl_with_openpipe_art/README.md

🧠 Learnings (4)

📚 Learning: 2025-11-10T21:26:35.059Z

Learnt from: jiaxiangr
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 974
File: packages/nvidia_nat_all/pyproject.toml:39-39
Timestamp: 2025-11-10T21:26:35.059Z
Learning: In packages/nvidia_nat_all/pyproject.toml, workspace dependencies (nvidia-nat-* plugin packages) should NOT have version constraints because they are managed as workspace dependencies. Version constraints are only applied to the base nvidia-nat package and external dependencies, not to internal workspace packages.

Applied to files:

pyproject.toml

📚 Learning: 2025-11-24T18:56:53.109Z

Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-11-24T18:56:53.109Z
Learning: Applies to packages/*/pyproject.toml : The pyproject.toml should declare a dependency on nvidia-nat or another package with a name starting with nvidia-nat-

Applied to files:

pyproject.toml

📚 Learning: 2025-11-24T18:56:53.109Z

Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-11-24T18:56:53.109Z
Learning: Applies to **/pyproject.toml : New dependencies must be added to both pyproject.toml (alphabetically) and uv.lock via uv pip install <pkg> --sync

Applied to files:

pyproject.toml

📚 Learning: 2025-11-24T18:56:53.109Z

Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-11-24T18:56:53.109Z
Learning: Applies to **/*.{py,toml,yaml,yml} : Use abbreviations: 'nat' for API namespace and CLI tool, 'nvidia-nat' for package name, 'NAT' for environment variable prefixes and informal comments

Applied to files:

pyproject.toml

🧬 Code graph analysis (1)

packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.py (4)

src/nat/data_models/finetuning.py (4)

EpisodeItem (109-123)

EpisodeItemRole (99-106)

Trajectory (126-134)

TrajectoryCollection (137-143)

src/nat/data_models/intermediate_step.py (2)

IntermediateStep (236-311)

IntermediateStepCategory (32-39)

src/nat/finetuning/interfaces/trajectory_builder.py (6)

TrajectoryBuilder (28-114)

start_run (67-75)

run_eval (44-64)

finalize (78-89)

compute_reward (91-102)

log_progress (105-114)

src/nat/finetuning/utils/parsers/base_parser.py (1)

parse_to_openai_messages (29-102)

🪛 LanguageTool

examples/finetuning/rl_with_openpipe_art/README.md

[grammar] ~335-~335: Use a hyphen to join words.
Context: ...ses a sophisticated alpha-beta pruning based reward instead of simple win/los...

(QB_NEW_EN_HYPHEN)

🪛 markdownlint-cli2 (0.18.1)

examples/finetuning/rl_with_openpipe_art/README.md

166-166: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

312-312: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

515-515: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🪛 Ruff (0.14.7)

packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.py

71-71: Unused method argument: meta

(ARG002)

80-80: Avoid specifying long messages outside the exception class

(TRY003)

125-125: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (3)

examples/finetuning/rl_with_openpipe_art/README.md (1)
101-123: Specify language identifiers on all fenced code blocks.

Several code blocks lack language specifiers, which is flagged by markdownlint (MD040). Add the appropriate language identifier after the opening triple backticks.

Apply these diffs for clarity:
-```
+```python
 @register_function(config_type=RlWithOpenpipeArtFunctionConfig)
 async def rl_with_openpipe_art_function(config, builder):
-```
+```bash
 INFO:     Started server process [3671624]
 INFO:     Waiting for application startup.
 INFO:     Application startup complete.
 INFO:     Uvicorn running on http://0.0.0.0:7623 (Press CTRL+C to quit)
-```
+```python
 def evaluate_board_for_player(board: np.ndarray, player_val: int) -> float:
-```
+```python
 def static_eval(b: np.ndarray) -> float:
-```
+```python
 def solve_outcome(b: np.ndarray, side_to_move: int, alpha=-1.0, beta=1.0) -> float:
-```
+```python
 @staticmethod
 def episode_value_from_states(
-```
+```python
 # In rl_with_openpipe_art.py
 if current_player.name == self.role:
-```
+```text
 .tmp/nat/finetuning/tic_tac_toe/
 ├── training_metrics.jsonl      # Per-epoch metrics
 ├── reward_history.json         # Reward progression
Also applies to: 222-227, 369-380, 387-398, 403-425, 443-472, 490-503, 515-520
⛔ Skipped due to learnings
Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-11-24T18:56:53.109Z
Learning: Applies to **/*.py : Surround code entities with backticks in docstrings to avoid Vale false-positives
pyproject.toml (1)

171-176: Verify the override-dependencies version constraints align with project guidelines.

The new overrides at lines 174–175 (litellm==1.74.9 and openai>=1.106,<2.dev0) use exact pins and range constraints, while the coding guidelines recommend ~=<version> format with two-digit versions. The comment explains these overrides prevent regressions from the OpenPipe ART dependency. However, please confirm:

Is this deviation from the ~= format intentional to maintain exact compatibility?

Have these version overrides been synchronized with uv.lock via uv pip install <pkg> --sync?

packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.py (1)

229-231: The type mismatch issue has been correctly resolved.

The past review comment flagged a type mismatch where item.event_type (which returns IntermediateStepType) was compared to IntermediateStepCategory.LLM. The code now correctly uses item.event_category, which returns IntermediateStepCategory and matches the expected enum type.

examples/finetuning/rl_with_openpipe_art/README.md

packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.py

Changed the Python version to 3.11 and specified the exact OpenPipe ART package version for consistent setup. This ensures compatibility and avoids potential installation issues. Signed-off-by: dnandakumar-nv <[email protected]>

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

examples/finetuning/rl_with_openpipe_art/README.md (1)
334-334: Fix hyphenation in compound modifier.

The phrase "alpha-beta pruning based reward" should use proper hyphenation for the compound modifier: "alpha-beta pruning-based reward".
-The reward function is the key to effective RL training. This example uses a sophisticated **alpha-beta pruning based reward** instead of simple win/loss signals.
+The reward function is the key to effective RL training. This example uses a sophisticated **alpha-beta pruning-based reward** instead of simple win/loss signals.

🧹 Nitpick comments (4)

examples/finetuning/rl_with_openpipe_art/README.md (4)
3-3: Correct toolkit name on first use.

Per coding guidelines, the first reference should be "NVIDIA NeMo Agent Toolkit" rather than "NeMo Agent Toolkit".
-This example demonstrates how to use the NeMo Agent Toolkit finetuning harness with [OpenPipe ART](https://art.openpipe.ai/) (Agent Reinforcement Trainer) to improve an LLM's performance at playing Tic-Tac-Toe through reinforcement learning.
+This example demonstrates how to use the NVIDIA NeMo Agent Toolkit finetuning harness with [OpenPipe ART](https://art.openpipe.ai/) (Agent Reinforcement Trainer) to improve an LLM's performance at playing Tic-Tac-Toe through reinforcement learning.
165-170: Add language specifier to curl command block.

Markdown requires code blocks to have a language specified. This curl command should be labeled as bash.
 Verify the server is running:
-```
+```bash
 curl http://localhost:8000/v1/models
-```
+```
311-326: Add language specifier to console output block.

This block showing training progress should specify a language (e.g., bash or text for console output).
 Training progress is logged to the console and saved to files:

-```
+```bash
 INFO - Starting finetuning with config: src/rl_with_openpipe_art/configs/config.yml
 INFO - Initializing OpenPipe ART Runner
 INFO - Successfully registered with ART backend.
514-519: Add language specifier to directory structure block.

The file tree output should specify a language identifier for markdown compliance.
 After training, check the output directory:

-```
+```text
 .tmp/nat/finetuning/tic_tac_toe/
 ├── training_metrics.jsonl      # Per-epoch metrics
 ├── reward_history.json         # Reward progression
 ├── reward_plot.png             # Visual reward chart
-```
+```

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 39a4a39 and c98714a.

📒 Files selected for processing (1)

examples/finetuning/rl_with_openpipe_art/README.md (1 hunks)

🧰 Additional context used

📓 Path-based instructions (6)

**/*.{md,rst,py}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Use 'NVIDIA NeMo Agent toolkit' on first use, then 'NeMo Agent toolkit' for subsequent references

Files:

examples/finetuning/rl_with_openpipe_art/README.md

**/*.{md,rst}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Use 'NeMo Agent Toolkit' (capitalize 'T') when the name appears in headings

Files:

examples/finetuning/rl_with_openpipe_art/README.md

**/*.{py,js,ts,yaml,yml,json,md,rst}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Indent with 4 spaces, never tabs, and ensure every file ends with a single newline

Files:

examples/finetuning/rl_with_openpipe_art/README.md

**/*.{py,md,rst}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Version numbers are derived automatically by setuptools-scm; never hard-code them in code or docs

Files:

examples/finetuning/rl_with_openpipe_art/README.md

**/*

⚙️ CodeRabbit configuration file

**/*: # Code Review Instructions
Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.
Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values (except for return values of None,
in that situation no return type hint is needed).
Example:
def my_function(param1: int, param2: str) -> bool:
    pass
For Python exception handling, ensure proper stack trace preservation:

When re-raising exceptions: use bare raise statements to maintain the original stack trace,
and use logger.error() (not logger.exception()) to avoid duplicate stack trace output.

When catching and logging exceptions without re-raising: always use logger.exception()
to capture the full stack trace information.
Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any

words listed in the ci/vale/styles/config/vocabularies/nat/reject.txt file, words that might appear to be
spelling mistakes but are listed in the ci/vale/styles/config/vocabularies/nat/accept.txt file are OK.

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

and should contain an Apache License 2.0 header comment at the top of each file.

Confirm that copyright years are up-to date whenever a file is changed.

Files:

examples/finetuning/rl_with_openpipe_art/README.md

examples/**/*

⚙️ CodeRabbit configuration file

examples/**/*: - This directory contains example code and usage scenarios for the toolkit, at a minimum an example should
contain a README.md or file README.ipynb.

If an example contains Python code, it should be placed in a subdirectory named src/ and should
contain a pyproject.toml file. Optionally, it might also contain scripts in a scripts/ directory.

If an example contains YAML files, they should be placed in a subdirectory named configs/. - If an example contains sample data files, they should be placed in a subdirectory named data/, and should
be checked into git-lfs.

Files:

examples/finetuning/rl_with_openpipe_art/README.md

🪛 LanguageTool

examples/finetuning/rl_with_openpipe_art/README.md

[grammar] ~334-~334: Use a hyphen to join words.
Context: ...ses a sophisticated alpha-beta pruning based reward instead of simple win/los...

(QB_NEW_EN_HYPHEN)

🪛 markdownlint-cli2 (0.18.1)

examples/finetuning/rl_with_openpipe_art/README.md

165-165: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

311-311: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

514-514: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

Refactor token usage extraction to capture additional details like cached and reasoning tokens. Add robust handling of tool outputs in callbacks, improving support for server-side tool usage and error resilience during schema parsing. Signed-off-by: dnandakumar-nv <[email protected]>

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

src/nat/profiler/callbacks/llama_index_callback_handler.py (1)

211-213: LGTM! Payload field correctly propagated.

The addition of payload=response to StreamEventData aligns with the finetuning architecture for capturing raw generation data.

Optional consistency improvement: Consider whether response should be deep copied for consistency with the tool event handling at line 227, which uses copy.deepcopy. If the ChatResponse object won't be mutated elsewhere, the current approach is fine.
packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.py (2)
56-67: Unused meta parameter in start_run.

meta is accepted but never used in start_run, which triggers ARG002 and can confuse readers. If the base TrajectoryBuilder interface requires the parameter, consider renaming it to _meta (or similar) to indicate intentional non-use and satisfy linters.

288-298: Verify intended shape of TrajectoryCollection.trajectories for a single generation.

The comments say “Flatten the trajectories list into a 1 d list of trajectories” and “If only one generation, return flat list”, but the code still wraps the flattened list in another list:
flat_trajectories = [traj for sublist in trajectories_list for traj in sublist]
return TrajectoryCollection(trajectories=[flat_trajectories], run_id=run_id)
If TrajectoryCollection.trajectories is meant to be flat when num_generations == 1, this should likely be:
return TrajectoryCollection(trajectories=flat_trajectories, run_id=run_id)
If the outer list dimension is required by the broader finetuning API, then the code is correct and the comments should be updated to reflect that it remains a nested structure even for a single generation.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c98714a and 06f2d8e.

📒 Files selected for processing (3)

packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.py (1 hunks)
src/nat/profiler/callbacks/langchain_callback_handler.py (2 hunks)
src/nat/profiler/callbacks/llama_index_callback_handler.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (11)

**/*.{md,rst,py}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Use 'NVIDIA NeMo Agent toolkit' on first use, then 'NeMo Agent toolkit' for subsequent references

Files:

src/nat/profiler/callbacks/llama_index_callback_handler.py
packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.py
src/nat/profiler/callbacks/langchain_callback_handler.py

**/*.{py,toml,yaml,yml}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Use abbreviations: 'nat' for API namespace and CLI tool, 'nvidia-nat' for package name, 'NAT' for environment variable prefixes and informal comments

Files:

src/nat/profiler/callbacks/llama_index_callback_handler.py
packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.py
src/nat/profiler/callbacks/langchain_callback_handler.py

**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

**/*.py: Follow PEP 20 and PEP 8 for Python style guidelines
Run yapf second (PEP 8 base, column_limit = 120) for Python formatting
Use ruff check --fix (via pre-commit) for linting using configuration embedded in pyproject.toml, fix warnings unless explicitly ignored
Use snake_case for functions and variables, PascalCase for classes, UPPER_CASE for constants
All public APIs require Python 3.11+ type hints on parameters and return values
Prefer collections.abc / typing abstractions (Sequence over list) for type hints
Use typing.Annotated for units or extra metadata when useful
Treat pyright warnings (configured in pyproject.toml) as errors during development
Preserve stack traces and prevent duplicate logging when handling exceptions; use bare raise statements and logger.error() when re-raising
When catching and logging exceptions without re-raising, always use logger.exception() to capture the full stack trace information
Provide Google-style docstrings for every public module, class, function and CLI command
The first line of docstrings must be a concise description ending with a period (Vale checks this)
Surround code entities with backticks in docstrings to avoid Vale false-positives
Validate and sanitise all user input, especially in web or CLI interfaces
Prefer httpx with SSL verification enabled by default and follow OWASP Top-10 recommendations
Use async/await for I/O-bound work (HTTP, DB, file reads)
Cache expensive computations with functools.lru_cache or an external cache when appropriate
Leverage NumPy vectorised operations whenever beneficial and feasible

Files:

src/nat/profiler/callbacks/llama_index_callback_handler.py
packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.py
src/nat/profiler/callbacks/langchain_callback_handler.py

**/*.{py,js,ts,yaml,yml,json,md,rst}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Indent with 4 spaces, never tabs, and ensure every file ends with a single newline

Files:

src/nat/profiler/callbacks/llama_index_callback_handler.py
packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.py
src/nat/profiler/callbacks/langchain_callback_handler.py

**/*.{py,env,toml,yaml,yml,json}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Never commit API keys, credentials or personal data; use environment variables or .env files excluded from Git

Files:

src/nat/profiler/callbacks/llama_index_callback_handler.py
packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.py
src/nat/profiler/callbacks/langchain_callback_handler.py

**/*.{py,js,ts,java,cpp,c,go,rb,php}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Every file must start with the standard SPDX Apache-2.0 header

Files:

src/nat/profiler/callbacks/llama_index_callback_handler.py
packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.py
src/nat/profiler/callbacks/langchain_callback_handler.py

**/*.{py,md,rst}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Version numbers are derived automatically by setuptools-scm; never hard-code them in code or docs

Files:

src/nat/profiler/callbacks/llama_index_callback_handler.py
packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.py
src/nat/profiler/callbacks/langchain_callback_handler.py

**/*.{py,js,ts,java,cpp,c,go,rb,php,sh}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

All source files must include the SPDX Apache-2.0 header template

Files:

src/nat/profiler/callbacks/llama_index_callback_handler.py
packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.py
src/nat/profiler/callbacks/langchain_callback_handler.py

**/*

⚙️ CodeRabbit configuration file

**/*: # Code Review Instructions
Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.
Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values (except for return values of None,
in that situation no return type hint is needed).
Example:
def my_function(param1: int, param2: str) -> bool:
    pass
For Python exception handling, ensure proper stack trace preservation:

When re-raising exceptions: use bare raise statements to maintain the original stack trace,
and use logger.error() (not logger.exception()) to avoid duplicate stack trace output.

When catching and logging exceptions without re-raising: always use logger.exception()
to capture the full stack trace information.
Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any

words listed in the ci/vale/styles/config/vocabularies/nat/reject.txt file, words that might appear to be
spelling mistakes but are listed in the ci/vale/styles/config/vocabularies/nat/accept.txt file are OK.

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

and should contain an Apache License 2.0 header comment at the top of each file.

Confirm that copyright years are up-to date whenever a file is changed.

Files:

src/nat/profiler/callbacks/llama_index_callback_handler.py
packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.py
src/nat/profiler/callbacks/langchain_callback_handler.py

src/nat/**/*

⚙️ CodeRabbit configuration file

This directory contains the core functionality of the toolkit. Changes should prioritize backward compatibility.

Files:

src/nat/profiler/callbacks/llama_index_callback_handler.py
src/nat/profiler/callbacks/langchain_callback_handler.py

packages/**/*

⚙️ CodeRabbit configuration file

packages/**/*: - This directory contains optional plugin packages for the toolkit, each should contain a pyproject.toml file. - The pyproject.toml file should declare a dependency on nvidia-nat or another package with a name starting
with nvidia-nat-. This dependency should be declared using ~=<version>, and the version should be a two
digit version (ex: ~=1.0).

Not all packages contain Python code, if they do they should also contain their own set of tests, in a
tests/ directory at the same level as the pyproject.toml file.

Files:

packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.py

🧠 Learnings (3)

📚 Learning: 2025-12-03T18:42:23.494Z

Learnt from: AnuradhaKaruppiah
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 1147
File: packages/nvidia_nat_a2a/pyproject.toml:1-10
Timestamp: 2025-12-03T18:42:23.494Z
Learning: In the packages/ directory, pyproject.toml files typically do not include SPDX license headers. Out of 34 packages, only nvidia_nat_strands is an exception. This pattern differs from the requirement for SPDX headers in source code files (.py, .js, .ts, etc.).

Applied to files:

packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.py

📚 Learning: 2025-11-24T18:56:53.109Z

Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-11-24T18:56:53.109Z
Learning: Applies to **/*.{py,js,ts,java,cpp,c,go,rb,php} : Every file must start with the standard SPDX Apache-2.0 header

Applied to files:

packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.py

📚 Learning: 2025-11-24T18:56:53.109Z

Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-11-24T18:56:53.109Z
Learning: Applies to **/*.{py,js,ts,java,cpp,c,go,rb,php,sh} : All source files must include the SPDX Apache-2.0 header template

Applied to files:

packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.py

🧬 Code graph analysis (1)

src/nat/profiler/callbacks/llama_index_callback_handler.py (1)

src/nat/data_models/intermediate_step.py (13)

data (294-295)

StreamEventData (67-78)

metadata (290-291)

TraceMetadata (115-128)

usage_info (298-299)

UsageInfo (81-84)

event_type (266-267)

IntermediateStepPayload (131-233)

IntermediateStepType (42-58)

span_event_timestamp (274-275)

framework (278-279)

name (282-283)

UUID (302-303)

🪛 Ruff (0.14.7)

packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.py

56-56: Unused method argument: meta

(ARG002)

65-65: Avoid specifying long messages outside the exception class

(TRY003)

110-110: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (4)

src/nat/profiler/callbacks/llama_index_callback_handler.py (1)

220-228: LGTM! Tool event payload correctly added.

The payload field is properly propagated in the tool event data, with both output and payload set to deep copies of the function output. This duplication likely supports backward compatibility while enabling new finetuning functionality.

src/nat/profiler/callbacks/langchain_callback_handler.py (2)

268-269: LGTM! Payload field correctly added to LLM end event.

The addition of payload=generation properly captures the raw generation object alongside the extracted text output, supporting the finetuning harness's trajectory collection requirements.

321-322: LGTM! Tool event payload correctly propagated.

The payload field is properly added to the tool end event data. Both output and payload reference the same tool output, which maintains backward compatibility while enabling the new finetuning trajectory collection functionality.

packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.py (1)

300-327: log_progress implementation looks solid.

Synchronous JSONL logging with a stable default path, timestamped entries, and num_generations included is clear and easy to consume. No issues from a correctness or style perspective.

packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.py

This update introduces the `nat finetune` command, providing detailed documentation for its usage, options, and examples. The finetune command allows in-situ reinforcement learning for iterative agent improvement, incorporating experimental features and extensive configuration options. Signed-off-by: dnandakumar-nv <[email protected]>

dagardner-nv · 2025-12-08T21:32:49Z

packages/nvidia_nat_openpipe_art/pyproject.toml

Don't forget to add this new package as a dependency in packages/nvidia_nat_all/pyproject.toml

dnandakumar-nv added 30 commits November 24, 2025 14:53

Merge remote-tracking branch 'upstream/develop' into finetuning-harness

1efdafc

Pre-commit checks

0230d23

Signed-off-by: dnandakumar-nv <[email protected]>

Update default trainer run name in configuration

a76207e

Changed the default value of the `name` field in the configuration from "trainer_run_4" to "trainer_run". This simplifies the default naming convention for trainer runs. Signed-off-by: dnandakumar-nv <[email protected]>

Add missing import for workflow function registration

0a1901d

This ensures that the `rl_with_openpipe_art_function` is properly imported, triggering its registration as expected. This change improves the reliability of the registration process. Signed-off-by: dnandakumar-nv <[email protected]>

Add ruff ignore flags and placeholder data file

2004207

Added `noqa` flags to suppress unused import warnings for ruff and flake8. Introduced a new placeholder data file, `data.json`, with sample question-answer pairs for testing purposes. Signed-off-by: dnandakumar-nv <[email protected]>

dnandakumar-nv added 10 commits November 28, 2025 17:29

Merge remote-tracking branch 'upstream/develop' into finetuning-harness

534cec5

dnandakumar-nv requested review from a team as code owners December 3, 2025 21:31

coderabbitai bot reviewed Dec 3, 2025

View reviewed changes

Fix relative link paths to Custom Evaluators in docs

1595c87

Updated the relative paths to the Custom Evaluators page to ensure proper navigation in the documentation. This resolves incorrect linking issues in the finetuning reference guides. Signed-off-by: dnandakumar-nv <[email protected]>

coderabbitai bot reviewed Dec 3, 2025

View reviewed changes

dnandakumar-nv added feature request New feature or request non-breaking Non-breaking change labels Dec 3, 2025

dnandakumar-nv added 3 commits December 3, 2025 16:57

Fix typo in 'logprob' vocabulary entry in Vale config

170b569

Corrected the case-insensitivity pattern for 'logprob' to ensure proper matching in the Vale style vocabulary. This update aligns with consistent handling of vocabulary terms. Signed-off-by: dnandakumar-nv <[email protected]>

coderabbitai bot reviewed Dec 4, 2025

View reviewed changes

examples/finetuning/rl_with_openpipe_art/README.md Show resolved Hide resolved

packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Dec 4, 2025

View reviewed changes

coderabbitai bot reviewed Dec 8, 2025

View reviewed changes

packages/nvidia_nat_openpipe_art/src/nat/plugins/openpipe/trajectory_builder.py Show resolved Hide resolved

dagardner-nv reviewed Dec 8, 2025

View reviewed changes

dagardner-nv approved these changes Dec 9, 2025

View reviewed changes

Introduce Finetuning Harness for In-Situ Reinforcement Learning of Agentic Workflows #1221

Are you sure you want to change the base?

Introduce Finetuning Harness for In-Situ Reinforcement Learning of Agentic Workflows #1221

Uh oh!

Conversation

dnandakumar-nv commented Dec 3, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Features

What's Included

1. Core Finetuning Harness (src/nat/finetuning/)

2. OpenPipe ART Plugin (packages/nvidia_nat_openpipe_art/)

3. Tic-Tac-Toe RL Example (examples/finetuning/rl_with_openpipe_art/)

4. Documentation (docs/source/reference/finetuning/)

5. Tests

Architecture

Configuration Example

Design Decisions

Why Decoupled Architecture?

Why In-Situ Training?

Breaking Changes

Migration Guide

Testing

Usage

Running the Example

Checklist

Future Work in other PRs

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Dec 3, 2025

Uh oh!

coderabbitai bot commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Pre-merge checks and finishing touches

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

Uh oh!

Uh oh!

dagardner-nv Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

dnandakumar-nv commented Dec 3, 2025 •

edited by coderabbitai bot

Loading

1. Core Finetuning Harness (`src/nat/finetuning/`)

2. OpenPipe ART Plugin (`packages/nvidia_nat_openpipe_art/`)

3. Tic-Tac-Toe RL Example (`examples/finetuning/rl_with_openpipe_art/`)

4. Documentation (`docs/source/reference/finetuning/`)

coderabbitai bot commented Dec 3, 2025 •

edited

Loading