Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
73 commits
Select commit Hold shift + click to select a range
19feb79
Update `uv.lock` to revise dependencies and add overrides
dnandakumar-nv Nov 24, 2025
149afb7
Remove OpenPipe ART finetuning backend implementation
dnandakumar-nv Nov 24, 2025
56d3ae6
Remove OpenPipe ART finetuning backend implementation
dnandakumar-nv Nov 25, 2025
06c4ade
Add fine-tuning support with configurable trainer components
dnandakumar-nv Nov 25, 2025
c2a5806
Rename FinetuningRunner to Trainer across the codebase
dnandakumar-nv Nov 25, 2025
217e897
Add fine-tuning trainer registration and type handling support
dnandakumar-nv Nov 25, 2025
d4173cd
Refactor training components into distinct groups
dnandakumar-nv Nov 25, 2025
823e374
Add support for fine-tuning components in workflow builder
dnandakumar-nv Nov 25, 2025
9bdbc0c
Add reward config and comprehensive tests for finetuning modules.
dnandakumar-nv Nov 25, 2025
a3db41f
Add unit tests for NAT finetuning parsers
dnandakumar-nv Nov 25, 2025
1efdafc
Merge remote-tracking branch 'upstream/develop' into finetuning-harness
dnandakumar-nv Nov 25, 2025
0230d23
Pre-commit checks
dnandakumar-nv Nov 25, 2025
2269e51
Add `finetune` CLI command for workflow finetuning
dnandakumar-nv Nov 25, 2025
99595d3
`Refactor finetuning configuration handling`
dnandakumar-nv Nov 25, 2025
b6bc094
Add num_epochs to Trainer and enhance finetuning workflow
dnandakumar-nv Nov 25, 2025
42c7f82
Refactor imports and clean up redundant code across finetuning modules
dnandakumar-nv Nov 25, 2025
4da1755
Refactor imports and clean up redundant code across finetuning modules
dnandakumar-nv Nov 25, 2025
631a992
Add OpenPipe ART integration subpackage to NeMo toolkit
dnandakumar-nv Nov 25, 2025
c78c03d
Add support for nvidia-nat-openpipe-art package
dnandakumar-nv Nov 25, 2025
2305d1b
Add ART trajectory builder implementation for OpenPipe backend
dnandakumar-nv Nov 25, 2025
8f727db
Add ARTTrainerAdapter and enhance OpenPipe ART framework
dnandakumar-nv Nov 26, 2025
52ef804
Add ARTTrainer implementation and curriculum learning support
dnandakumar-nv Nov 26, 2025
a76207e
Update default trainer run name in configuration
dnandakumar-nv Nov 26, 2025
0db52b0
Add tests for ARTTrainer and ARTTrainerAdapter functionality
dnandakumar-nv Nov 26, 2025
3d2c1d7
Add RL with OpenPipe ART Tic-Tac-Toe workflow and components
dnandakumar-nv Nov 26, 2025
7a14fd9
Revert uv.lock file to revision 1 and clean up metadata.
dnandakumar-nv Nov 26, 2025
0a1901d
Add missing import for workflow function registration
dnandakumar-nv Nov 26, 2025
5d5a503
Add target_model filter to trajectory processing
dnandakumar-nv Nov 26, 2025
2004207
Add ruff ignore flags and placeholder data file
dnandakumar-nv Nov 26, 2025
26a4ec2
Add custom accuracy evaluator for RL workflow outputs
dnandakumar-nv Nov 26, 2025
bfd8ce9
Add randomized move option and logging improvements
dnandakumar-nv Nov 27, 2025
210d2eb
Add finetuning configuration and update annotation handling
dnandakumar-nv Nov 27, 2025
b755dea
Add support for new trainer-related keys and handle TypeError
dnandakumar-nv Nov 27, 2025
0502bb6
Enable fine-tuning and update required configuration fields.
dnandakumar-nv Nov 27, 2025
0421b7b
Refactor payload handling and clean up unused code.
dnandakumar-nv Nov 27, 2025
9dffced
Update temperature settings, add component registrations, and fix tests
dnandakumar-nv Nov 27, 2025
a522b8c
Refactor: Remove heuristic evaluation logic and update configs
dnandakumar-nv Nov 27, 2025
d40119c
Refactor RL configs into modular files for clarity.
dnandakumar-nv Nov 27, 2025
c147bc5
Add heuristic board evaluation and cleanup config files
dnandakumar-nv Nov 27, 2025
3f972b0
Add configs and refine RL workflow for OpenPipe ART
dnandakumar-nv Nov 27, 2025
5c54c1b
Simplify config structures and update default parameters.
dnandakumar-nv Nov 27, 2025
09f4a15
Refactor player logic and evaluation in RL workflow.
dnandakumar-nv Nov 28, 2025
7f8207c
Simplify message tracking and improve evaluation logic
dnandakumar-nv Nov 29, 2025
c29b78b
Update trainer logic and configuration defaults
dnandakumar-nv Dec 2, 2025
c46d8f6
Refactor configurations and logic for improved evaluation.
dnandakumar-nv Dec 2, 2025
34be985
Refactor RL evaluator and game logic for improved traceability
dnandakumar-nv Dec 3, 2025
6f47d4b
Adjust training configs and expand dataset for RL module
dnandakumar-nv Dec 3, 2025
1520ea8
Add finetuning documentation and OpenPipe ART integration
dnandakumar-nv Dec 3, 2025
1812e01
Add RL Tic-Tac-Toe example using OpenPipe ART
dnandakumar-nv Dec 3, 2025
534cec5
Merge remote-tracking branch 'upstream/develop' into finetuning-harness
dnandakumar-nv Dec 3, 2025
f13e178
Update comment for dependency overrides in pyproject.toml
dnandakumar-nv Dec 3, 2025
1595c87
Fix relative link paths to Custom Evaluators in docs
dnandakumar-nv Dec 3, 2025
170b569
Fix typo in 'logprob' vocabulary entry in Vale config
dnandakumar-nv Dec 3, 2025
acaf955
Refactor module naming and improve setup instructions
dnandakumar-nv Dec 3, 2025
39a4a39
Refactor module naming and improve setup instructions
dnandakumar-nv Dec 3, 2025
c98714a
Update README for ART setup with corrected Python version and package
dnandakumar-nv Dec 4, 2025
06f2d8e
Enhance token usage tracking and tool output handling.
dnandakumar-nv Dec 8, 2025
48c3459
Add finetune command documentation to CLI reference
dnandakumar-nv Dec 8, 2025
5cd8303
Merge remote-tracking branch 'upstream/develop' into finetuning-harness
dnandakumar-nv Dec 11, 2025
3773737
Add support for `openpipe-art` package and related dependencies
dnandakumar-nv Dec 11, 2025
f777b5b
Update OpenPipe ART integration and dependencies
dnandakumar-nv Dec 11, 2025
f98d607
Add guideline to update dependencies in `nvidia-nat-all`
dnandakumar-nv Dec 11, 2025
9649568
Add ADK framework support and shared parser utilities
dnandakumar-nv Dec 11, 2025
5f3d8f6
Refine formatting, comments, and docstrings for consistency.
dnandakumar-nv Dec 11, 2025
3a433e6
Merge remote-tracking branch 'upstream/develop' into finetuning-harness
dnandakumar-nv Dec 11, 2025
c6d8565
Add licensing files for NVIDIA package dependencies
dnandakumar-nv Dec 11, 2025
5e88eb9
Remove outdated test README and add missing SPDX headers
dnandakumar-nv Dec 12, 2025
1e69989
Add new path patterns for openpipe_art to path checks
dnandakumar-nv Dec 12, 2025
b765ef0
Fix typo in vocabulary list for "finetuning" and "finetunable"
dnandakumar-nv Dec 12, 2025
a0dc798
Refine path checks and update vocabulary/guide.
dnandakumar-nv Dec 12, 2025
e4d8eb8
Update vocab and refine headers in RL fine-tuning docs
dnandakumar-nv Dec 12, 2025
a7940d6
Expand path checks with additional config patterns
dnandakumar-nv Dec 12, 2025
a293ed9
Add 'Qwen/Qwen.*' pattern to path checks
dnandakumar-nv Dec 12, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .coderabbit.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,8 @@ reviews:
digit version (ex: `~=1.0`).
- Not all packages contain Python code, if they do they should also contain their own set of tests, in a
`tests/` directory at the same level as the `pyproject.toml` file.
- When adding a new package, that new package name (as defined in the `pyproject.toml` file) should
be added as a dependency to the nvidia-nat-all package in `packages/nvidia_nat_all/pyproject.toml`

- path: "tests/**/*.py"
instructions: >-
Expand Down
9 changes: 9 additions & 0 deletions ci/scripts/path_checks.py
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,14 @@
r"^docs/source/workflows/mcp/.*\.md$",
r"^ghcr\.io/github/github-mcp-server",
),
(
r"^examples/finetuning/rl_with_openpipe_art/.*/configs/config.*\.yml$",
r"^examples/finetuning/rl_with_openpipe_art/.*/data/.*",
),
(
r"^examples/finetuning/rl_with_openpipe_art/src/rl_with_openpipe_art/configs/config\.yml$",
r"^examples/finetuning/rl_with_openpipe_art/.*/data/.*",
),
}

ALLOWLISTED_WORDS: set[str] = {
Expand Down Expand Up @@ -155,6 +163,7 @@
"mistralai/[Mm]ixtral.*",
"microsoft/[Pp]hi.*",
"ssmits/[Qq]wen.*",
"Qwen/Qwen.*",
"deepseek-ai/deepseek-.*", #
# MIME types
"(application|text|image|video|audio|model|dataset|token|other)/.*", #
Expand Down
13 changes: 13 additions & 0 deletions ci/vale/styles/config/vocabularies/nat/accept.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
[Aa]gno
AIQ
API(s?)
ART
Arize
arXiv
[Aa]sync
Expand Down Expand Up @@ -61,6 +62,10 @@ etcd
[Ee]xfiltration
[Ee]xplainability
Faiss
[Ff]inetune(d?)
[Ff]inetune(r|rs|ing)
[Ff]inetuning
[Ff]inetunable
Gantt
[Gg]eneratable
GitHub
Expand All @@ -86,6 +91,8 @@ LangSmith
[Ll]aunchable(s?)
# libcudf isn't styled in the way that cuDF is https://docs.rapids.ai/api/libcudf/stable/
libcudf
[Ll]earnable
[Ll]ogprob(s?)
LLM(s?)
# https://github.com/logpai/loghub/
Loghub
Expand All @@ -96,6 +103,7 @@ Milvus
MLflow
MLOps
Morpheus
Minimax
[Mm]ultimodal
[Nn]amespac(e|ed|es|ing)
NeMo
Expand All @@ -104,12 +112,14 @@ NIC
NIM(s?)
npm
NumPy
NAT's
NVIDIA
Nemotron
OAuth
URIs
OTel
onboarding
OpenPipe
[Oo]verfitting
pandas
[Pp]arallelization
Expand All @@ -120,6 +130,7 @@ PCIe
PDF(s?)
[Pp]ostprocess
[Pp]ostprocessing
[Pp]luggable
[Pp]reprocess
[Pp]retrained
[Pp]rofiler
Expand All @@ -133,13 +144,15 @@ pytest
[Rr]epo
[Rr]etarget(ed?)
[Rr]eusability
[Rr]ollout(s?)
[Rr]untime(s?)
[Ss]andboxing
[Ss]erializable
[Ss]treamable
[Ss]ubclassing
[Ss]ubcard(s?)
[Ss]ubgraph(s?)
[Ss]ubsampl(e|ing)
[Ss]ubpackage(s?)
[Ss]ubword(s?)
[Ss]uperset(s?)
Expand Down
1 change: 1 addition & 0 deletions docs/source/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -152,6 +152,7 @@ Middleware <./reference/middleware.md>
Optimizer <./reference/optimizer.md>
Test Time Compute <./reference/test-time-compute.md>
Troubleshooting <./troubleshooting.md>
Finetuning <./reference/finetuning/index.md>
```

```{toctree}
Expand Down
110 changes: 110 additions & 0 deletions docs/source/reference/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ nat
│ ├── remove
│ └── update
├── eval
├── finetune
├── info
│ ├── channels
│ └── components
Expand Down Expand Up @@ -547,6 +548,115 @@ Options:
--help Show this message and exit.
```

## Finetune

:::{warning}
**Experimental Feature**: The Finetuning Harness is experimental and may change in future releases. Future versions may introduce breaking changes without notice.
:::

The `nat finetune` command provides access to the finetuning harness for **in-situ reinforcement learning** of agentic LLM workflows. This enables iterative improvement of agents through experience, allowing models to learn from their interactions with environments, tools, and users.

The finetuning process:
1. Loads the configuration with finetuning settings
2. Initializes the finetuning runner
3. Runs evaluation to collect trajectories
4. Submits trajectories for training
5. Monitors training progress

For detailed information on finetuning concepts, configuration, and extending the harness, see the [Finetuning Harness](../reference/finetuning/index.md) documentation.

The `nat finetune --help` utility provides a brief overview of the command and its available options:

```console
$ nat finetune --help
Usage: nat finetune [OPTIONS]

Run finetuning on a workflow using collected trajectories.

Options:
--config_file FILE Path to the configuration file containing
finetuning settings [required]
--dataset FILE A json file with questions and ground truth
answers. This will override the dataset path
in the config file.
--result_json_path TEXT A JSON path to extract the result from the
workflow. Use this when the workflow returns
multiple objects or a dictionary. For
example, '$.output' will extract the 'output'
field from the result. [default: $]
--endpoint TEXT Use endpoint for running the workflow.
Example: http://localhost:8000/generate
--endpoint_timeout INTEGER HTTP response timeout in seconds. Only
relevant if endpoint is specified.
[default: 300]
-o, --override <TEXT TEXT>... Override config values (e.g., -o
finetuning.num_epochs 5)
--validation_dataset FILE Validation dataset file path for periodic
validation
--validation_interval INTEGER Run validation every N epochs [default: 5]
--validation_config_file FILE Optional separate config file for validation
runs
--help Show this message and exit.
```

### Options Description

- **`--config_file`**: The main configuration file containing both the workflow configuration and finetuning settings. The file must include a `finetuning` section that defines the training parameters, trajectory builder, trainer adapter, and reward function.

- **`--dataset`**: Path to a JSON file containing the training dataset with questions and ground truth answers. If provided, this will override the dataset path specified in the configuration file.

- **`--result_json_path`**: A JSON path expression to extract the relevant result from the workflow output. This is useful when your workflow returns complex objects or dictionaries. The default value `$` uses the entire output.

- **`--endpoint`**: Instead of running the workflow locally, you can specify an HTTP endpoint where the workflow is deployed. This is useful for distributed training scenarios.

- **`--endpoint_timeout`**: When using the `--endpoint` option, this sets the maximum time (in seconds) to wait for a response from the remote service.

- **`-o, --override`**: Override configuration values using dot notation. Multiple overrides can be specified.

- **`--validation_dataset`**: Path to a separate validation dataset for periodic evaluation during training. This helps monitor generalization and detect overfitting.

- **`--validation_interval`**: How often (in epochs) to run validation. Default is every 5 epochs.

- **`--validation_config_file`**: An optional separate configuration file for validation runs. If not specified, the main config file is used for both training and validation.

### Examples

Basic finetuning with a configuration file:

<!-- path-check-skip-begin -->
```bash
nat finetune --config_file=configs/finetune.yml
```
<!-- path-check-skip-end -->

Override the number of training epochs:

<!-- path-check-skip-begin -->
```bash
nat finetune --config_file=configs/finetune.yml -o finetuning.num_epochs 20
```
<!-- path-check-skip-end -->

Run finetuning with validation monitoring:

<!-- path-check-skip-begin -->
```bash
nat finetune --config_file=configs/finetune.yml \
--validation_dataset=data/validation.json \
--validation_interval=3
```
<!-- path-check-skip-end -->

Use a remote endpoint for workflow execution:

<!-- path-check-skip-begin -->
```bash
nat finetune --config_file=configs/finetune.yml \
--endpoint=http://localhost:8000/generate \
--endpoint_timeout=600
```
<!-- path-check-skip-end -->

## Optimize

The `nat optimize` command provides automated hyperparameter tuning and prompt engineering for NeMo Agent toolkit workflows. It intelligently searches for the best combination of parameters based on the evaluation metrics you specify. The optimizer uses [Optuna](https://optuna.org/) for numerical hyperparameter optimization and a genetic algorithm (GA) for prompt optimization. Please reference the [NeMo Agent toolkit Optimizer Guide](../reference/optimizer.md) for a comprehensive overview of the optimizer capabilities and configuration.
Expand Down
Loading