Throughput degradation on Qwen3-30B-A3B with EAGLE3

I observed a throughput degradation when trying to use EAGLE3 to speed up Qwen3-30B-A3B (on 2x H100).

I suspect the overhead might be overshadowing the gains. It would be great if we could have some profiling analysis to pinpoint exactly where the cost is coming from.

Also, tuning parameters for MoE models feels much more difficult than for dense models. Do you think it would be possible to provide a guidance or a micro-benchmarking script? This would really help users quickly identify the optimal parameters for their specific hardware.

(For reference, the related issue is [this](https://github.com/sgl-project/SpecForge/issues/339).)

Two quick questions:

I’m still wondering: why does EAGLE3 seem less effective on Qwen3 compared to other models?

Are there any specific tricks for training a high-quality EAGLE3 draft model for this architecture?

Thanks! 🥹🥹


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Throughput degradation on Qwen3-30B-A3B with EAGLE3 #14824

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Throughput degradation on Qwen3-30B-A3B with EAGLE3 #14824

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions