Added an experimental Muon optimizer to PET #977

abmazitov · 2025-12-12T13:29:45Z

This PR adds an option to choose the Muon optimizer while training the PET model

Contributor (creator of pull-request) checklist

Tests updated (for new features and bugfixes)?
Documentation updated (for new features)?
~~[ ] Issue referenced (for PRs that solve an issue)?~~

Maintainer/Reviewer checklist

CHANGELOG updated with public API or any other important changes?
GPU tests passed (maintainer comment: "cscs-ci run")?

📚 Documentation preview 📚: https://metatrain--977.org.readthedocs.build/en/977/

jwa7

Looking great! I'll add the min_lr as discussed, but in the meantime just a comment on the creation of parameter groups :)

src/metatrain/pet/modules/optimizer.py

frostedoyster · 2025-12-13T11:13:57Z

Are we sure the minimum learning rate is worth a new hyperparameter? New hyperparameters make the code harder to maintain and are confusing for users when they see a very large default hyperparameter file. I would rather 1) keep the code as is without a minimum learning rate, at the cost of a tiny inefficiency or 2) choose a minimum learning rate ourselves (e.g., 1/1000 of the initial learning rate) without it being configurable

frostedoyster · 2025-12-13T11:17:16Z

pyproject.toml

+pet = [
+    "torch >= 2.9.1",
+]


This might be too aggressive at the moment. For example, the torch version is fixed on many HPC clusters and users can't change it unless they're ready to perform a custom installation. The correct pattern would be to raise an error in case the user chooses Muon and their torch is too old for it

I agree, but the main issue is that Muon is only available at torch == 2.9.1 and higher. I'm also sceptical about raising the requirement. Maybe we can add a check for the Muon optimizer selection and suggest to update torch manually if the old version detected and one really wants to use Muon

Yes this think is a good idea: remove the constraint and raise an error if Muon in requested

abmazitov · 2025-12-13T15:57:02Z

Are we sure the minimum learning rate is worth a new hyperparameter? New hyperparameters make the code harder to maintain and are confusing for users when they see a very large default hyperparameter file. I would rather 1) keep the code as is without a minimum learning rate, at the cost of a tiny inefficiency or 2) choosing a minimum learning rate ourselves (e.g., 1/1000 of the initial learning rate) without it being configurable

I tend to agree with that. Maybe we can even just hardcode it to be 1e-7

jwa7 · 2025-12-13T17:54:26Z

It's difficult to know whether it should be hardcoded as a ratio relative to the base/maximum learning rate, or as an absolute value. In different training runs I've seen benefit in being able to change it, though I agree that having an extra parameter might be over-engineering

jwa7 · 2025-12-14T15:50:43Z

src/metatrain/pet/modules/optimizer.py

+                muon_params.append(p)
+            else:
+                adam_params.append(p)
+        adam_group = dict(params=adam_params, use_muon=False)


I think we want two separate learning rates for the Adam and Muon parameter groups.

If you look at the example from the README of https://github.com/KellerJordan/Muon:

from muon import MuonWithAuxAdam hidden_weights = [p for p in model.body.parameters() if p.ndim >= 2] hidden_gains_biases = [p for p in model.body.parameters() if p.ndim < 2] nonhidden_params = [*model.head.parameters(), *model.embed.parameters()] param_groups = [ dict(params=hidden_weights, use_muon=True, lr=0.02, weight_decay=0.01), dict(params=hidden_gains_biases+nonhidden_params, use_muon=False, lr=3e-4, betas=(0.9, 0.95), weight_decay=0.01), ] optimizer = MuonWithAuxAdam(param_groups)

the Adam LR is more what we'd normally expect but the Muon one can be pushed much higher.

I'm a bit sceptical with setting the LR values like this. I mean, then should be highly architecture-dependent, right? In the same time @sirmarcel has tested Muon for PET and noticed that it works nice even with a common LR of ~1e-3 for both Adam and Muon parameters

Hmm ok, I noticed in my tests that I could push the Muon LR to 1e-1 even and it was still stable, but as soon as the Adam LR went above 1e-3 training diverged. But again, an extra hyperparameter is more complexity, so let's keep it simple and have one as you say for now

abmazitov · 2025-12-15T10:04:23Z

It's difficult to know whether it should be hardcoded as a ratio relative to the base/maximum learning rate, or as an absolute value. In different training runs I've seen benefit in being able to change it, though I agree that having an extra parameter might be over-engineering

Optionally, we can add a hard-coded minimal LR which is always equal to initial LR * 1e-3. So for initial LR of 1e-4 the min_lr will be 1e-7, and so on.

johannes-spies · 2025-12-16T09:57:15Z

src/metatrain/pet/tests/checkpoints/model-v11_trainer-v12.ckpt.gz

Can this be regenerated with different hyper parameters?

Sure - just the PET one or the others too?

It would be great if you could do it for all the newly generated checkpoints!

jwa7 · 2025-12-16T09:59:00Z

@abmazitov @frostedoyster I've hardcoded the minimum LR ratio to 1e-4. If we use high LRs with Muon such as 1e-3, training will finish with LR 1e-7. I think this is a reasonable balance of being low but not too low. You both ok with this?

abmazitov · 2025-12-16T10:04:14Z

@abmazitov @frostedoyster I've hardcoded the minimum LR ratio to 1e-4. If we use high LRs with Muon such as 1e-3, training will finish with LR 1e-7. I think this is a reasonable balance of being low but not too low. You both ok with this?

I think it should be good

jwa7 · 2025-12-16T10:26:12Z

cscs-ci run

abmazitov added 4 commits December 12, 2025 14:27

Added an experimetal Muon optimizer

a896d58

Added a better error message for choosing the optimizer

82e9641

Added a description of Muon

65723f5

Added an error message

70311db

abmazitov changed the title ~~Added an experimetal Muon optimizer to PET~~ Added an experimental Muon optimizer to PET Dec 12, 2025

Linting fix

101ddd4

jwa7 requested changes Dec 12, 2025

View reviewed changes

src/metatrain/pet/modules/optimizer.py Outdated Show resolved Hide resolved

Implement a minimum LR

89c9f76

frostedoyster reviewed Dec 13, 2025

View reviewed changes

abmazitov added 3 commits December 13, 2025 17:05

Clarified parameter conditions for using Muon

d40f5f4

Change the edge_embedder to edge_linear in GNN

0af2b19

Merge branch 'main' into add-muon-optimizer

96b0ecd

jwa7 reviewed Dec 14, 2025

View reviewed changes

abmazitov added 2 commits December 15, 2025 10:41

Removed the strict pytorch requirement

2865824

Merge branch 'main' into add-muon-optimizer

d6eb0b2

jwa7 added 4 commits December 16, 2025 10:22

Hardcode minimum LR

3d608b2

Update checkpoints

7fa1b5a

Add changelog, fix checkpoints

0603a26

Remove LLPR trainer update (not needed)

6cef387

jwa7 marked this pull request as ready for review December 16, 2025 09:53

jwa7 requested review from SanggyuChong and johannes-spies as code owners December 16, 2025 09:53

johannes-spies reviewed Dec 16, 2025

View reviewed changes

Reduce size of the checkpoint

7b1d2b8

jwa7 requested a review from frostedoyster December 16, 2025 11:40

Added an experimental Muon optimizer to PET #977

Are you sure you want to change the base?

Added an experimental Muon optimizer to PET #977

Uh oh!

Conversation

abmazitov commented Dec 12, 2025 • edited by jwa7 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Contributor (creator of pull-request) checklist

Maintainer/Reviewer checklist

Uh oh!

jwa7 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

frostedoyster commented Dec 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

frostedoyster Dec 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

abmazitov Dec 13, 2025

Choose a reason for hiding this comment

Uh oh!

jwa7 Dec 13, 2025

Choose a reason for hiding this comment

Uh oh!

abmazitov commented Dec 13, 2025

Uh oh!

jwa7 commented Dec 13, 2025

Uh oh!

jwa7 Dec 14, 2025

Choose a reason for hiding this comment

Uh oh!

abmazitov Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jwa7 Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

abmazitov commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

johannes-spies Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

jwa7 Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

johannes-spies Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

jwa7 commented Dec 16, 2025

Uh oh!

abmazitov commented Dec 16, 2025

Uh oh!

jwa7 commented Dec 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

abmazitov commented Dec 12, 2025 •

edited by jwa7

Loading

frostedoyster commented Dec 13, 2025 •

edited

Loading

frostedoyster Dec 13, 2025 •

edited

Loading

abmazitov Dec 15, 2025 •

edited

Loading

abmazitov commented Dec 15, 2025 •

edited

Loading