Skip to content

Throughput degradation on Qwen3-30B-A3B with EAGLE3 #14824

@Zzsf11

Description

@Zzsf11

I observed a throughput degradation when trying to use EAGLE3 to speed up Qwen3-30B-A3B (on 2x H100).

I suspect the overhead might be overshadowing the gains. It would be great if we could have some profiling analysis to pinpoint exactly where the cost is coming from.

Also, tuning parameters for MoE models feels much more difficult than for dense models. Do you think it would be possible to provide a guidance or a micro-benchmarking script? This would really help users quickly identify the optimal parameters for their specific hardware.

(For reference, the related issue is this.)

Two quick questions:

I’m still wondering: why does EAGLE3 seem less effective on Qwen3 compared to other models?

Are there any specific tricks for training a high-quality EAGLE3 draft model for this architecture?

Thanks! 🥹🥹

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions