Skip to content

Conversation

@9LLPPLL6
Copy link

@9LLPPLL6 9LLPPLL6 commented May 8, 2025

Fixed some issues when running gpt and llama pre-training

num_experts = num_experts * (num_layers // expert_interval)
experts_per_layer = []
for i in range(num_layers):
layer_num = i + 1 + offset

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, why do we need to delete this offset ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would cause num_experts to go out of bounds
See this issue for details: issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants