fix pretrain gpt and llama bug #476

9LLPPLL6 · 2025-05-08T12:21:56Z

Fixed some issues when running gpt and llama pre-training

GuanhuaWang · 2025-05-12T23:44:39Z

megatron/model/transformer.py

        num_experts = num_experts * (num_layers // expert_interval)
    experts_per_layer = []
    for i in range(num_layers):
-        layer_num = i + 1 + offset


Hi, why do we need to delete this offset ?

This would cause num_experts to go out of bounds
See this issue for details: issue

fix pretrain gpt and llama bug

8f494db

9LLPPLL6 requested review from GuanhuaWang, jeffra and tjruwase as code owners May 8, 2025 12:21

GuanhuaWang requested changes May 12, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix pretrain gpt and llama bug #476

fix pretrain gpt and llama bug #476

Uh oh!

9LLPPLL6 commented May 8, 2025

Uh oh!

GuanhuaWang May 12, 2025

Uh oh!

9LLPPLL6 May 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix pretrain gpt and llama bug #476

Are you sure you want to change the base?

fix pretrain gpt and llama bug #476

Uh oh!

Conversation

9LLPPLL6 commented May 8, 2025

Uh oh!

GuanhuaWang May 12, 2025

Choose a reason for hiding this comment

Uh oh!

9LLPPLL6 May 13, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants