-
Notifications
You must be signed in to change notification settings - Fork 450
Introduce Finetuning Harness for In-Situ Reinforcement Learning of Agentic Workflows #1221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
rapids-bot
merged 73 commits into
NVIDIA:develop
from
dnandakumar-nv:finetuning-harness
Dec 12, 2025
Merged
Changes from all commits
Commits
Show all changes
73 commits
Select commit
Hold shift + click to select a range
19feb79
Update `uv.lock` to revise dependencies and add overrides
dnandakumar-nv 149afb7
Remove OpenPipe ART finetuning backend implementation
dnandakumar-nv 56d3ae6
Remove OpenPipe ART finetuning backend implementation
dnandakumar-nv 06c4ade
Add fine-tuning support with configurable trainer components
dnandakumar-nv c2a5806
Rename FinetuningRunner to Trainer across the codebase
dnandakumar-nv 217e897
Add fine-tuning trainer registration and type handling support
dnandakumar-nv d4173cd
Refactor training components into distinct groups
dnandakumar-nv 823e374
Add support for fine-tuning components in workflow builder
dnandakumar-nv 9bdbc0c
Add reward config and comprehensive tests for finetuning modules.
dnandakumar-nv a3db41f
Add unit tests for NAT finetuning parsers
dnandakumar-nv 1efdafc
Merge remote-tracking branch 'upstream/develop' into finetuning-harness
dnandakumar-nv 0230d23
Pre-commit checks
dnandakumar-nv 2269e51
Add `finetune` CLI command for workflow finetuning
dnandakumar-nv 99595d3
`Refactor finetuning configuration handling`
dnandakumar-nv b6bc094
Add num_epochs to Trainer and enhance finetuning workflow
dnandakumar-nv 42c7f82
Refactor imports and clean up redundant code across finetuning modules
dnandakumar-nv 4da1755
Refactor imports and clean up redundant code across finetuning modules
dnandakumar-nv 631a992
Add OpenPipe ART integration subpackage to NeMo toolkit
dnandakumar-nv c78c03d
Add support for nvidia-nat-openpipe-art package
dnandakumar-nv 2305d1b
Add ART trajectory builder implementation for OpenPipe backend
dnandakumar-nv 8f727db
Add ARTTrainerAdapter and enhance OpenPipe ART framework
dnandakumar-nv 52ef804
Add ARTTrainer implementation and curriculum learning support
dnandakumar-nv a76207e
Update default trainer run name in configuration
dnandakumar-nv 0db52b0
Add tests for ARTTrainer and ARTTrainerAdapter functionality
dnandakumar-nv 3d2c1d7
Add RL with OpenPipe ART Tic-Tac-Toe workflow and components
dnandakumar-nv 7a14fd9
Revert uv.lock file to revision 1 and clean up metadata.
dnandakumar-nv 0a1901d
Add missing import for workflow function registration
dnandakumar-nv 5d5a503
Add target_model filter to trajectory processing
dnandakumar-nv 2004207
Add ruff ignore flags and placeholder data file
dnandakumar-nv 26a4ec2
Add custom accuracy evaluator for RL workflow outputs
dnandakumar-nv bfd8ce9
Add randomized move option and logging improvements
dnandakumar-nv 210d2eb
Add finetuning configuration and update annotation handling
dnandakumar-nv b755dea
Add support for new trainer-related keys and handle TypeError
dnandakumar-nv 0502bb6
Enable fine-tuning and update required configuration fields.
dnandakumar-nv 0421b7b
Refactor payload handling and clean up unused code.
dnandakumar-nv 9dffced
Update temperature settings, add component registrations, and fix tests
dnandakumar-nv a522b8c
Refactor: Remove heuristic evaluation logic and update configs
dnandakumar-nv d40119c
Refactor RL configs into modular files for clarity.
dnandakumar-nv c147bc5
Add heuristic board evaluation and cleanup config files
dnandakumar-nv 3f972b0
Add configs and refine RL workflow for OpenPipe ART
dnandakumar-nv 5c54c1b
Simplify config structures and update default parameters.
dnandakumar-nv 09f4a15
Refactor player logic and evaluation in RL workflow.
dnandakumar-nv 7f8207c
Simplify message tracking and improve evaluation logic
dnandakumar-nv c29b78b
Update trainer logic and configuration defaults
dnandakumar-nv c46d8f6
Refactor configurations and logic for improved evaluation.
dnandakumar-nv 34be985
Refactor RL evaluator and game logic for improved traceability
dnandakumar-nv 6f47d4b
Adjust training configs and expand dataset for RL module
dnandakumar-nv 1520ea8
Add finetuning documentation and OpenPipe ART integration
dnandakumar-nv 1812e01
Add RL Tic-Tac-Toe example using OpenPipe ART
dnandakumar-nv 534cec5
Merge remote-tracking branch 'upstream/develop' into finetuning-harness
dnandakumar-nv f13e178
Update comment for dependency overrides in pyproject.toml
dnandakumar-nv 1595c87
Fix relative link paths to Custom Evaluators in docs
dnandakumar-nv 170b569
Fix typo in 'logprob' vocabulary entry in Vale config
dnandakumar-nv acaf955
Refactor module naming and improve setup instructions
dnandakumar-nv 39a4a39
Refactor module naming and improve setup instructions
dnandakumar-nv c98714a
Update README for ART setup with corrected Python version and package
dnandakumar-nv 06f2d8e
Enhance token usage tracking and tool output handling.
dnandakumar-nv 48c3459
Add finetune command documentation to CLI reference
dnandakumar-nv 5cd8303
Merge remote-tracking branch 'upstream/develop' into finetuning-harness
dnandakumar-nv 3773737
Add support for `openpipe-art` package and related dependencies
dnandakumar-nv f777b5b
Update OpenPipe ART integration and dependencies
dnandakumar-nv f98d607
Add guideline to update dependencies in `nvidia-nat-all`
dnandakumar-nv 9649568
Add ADK framework support and shared parser utilities
dnandakumar-nv 5f3d8f6
Refine formatting, comments, and docstrings for consistency.
dnandakumar-nv 3a433e6
Merge remote-tracking branch 'upstream/develop' into finetuning-harness
dnandakumar-nv c6d8565
Add licensing files for NVIDIA package dependencies
dnandakumar-nv 5e88eb9
Remove outdated test README and add missing SPDX headers
dnandakumar-nv 1e69989
Add new path patterns for openpipe_art to path checks
dnandakumar-nv b765ef0
Fix typo in vocabulary list for "finetuning" and "finetunable"
dnandakumar-nv a0dc798
Refine path checks and update vocabulary/guide.
dnandakumar-nv e4d8eb8
Update vocab and refine headers in RL fine-tuning docs
dnandakumar-nv a7940d6
Expand path checks with additional config patterns
dnandakumar-nv a293ed9
Add 'Qwen/Qwen.*' pattern to path checks
dnandakumar-nv File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.