-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Description
I observed a throughput degradation when trying to use EAGLE3 to speed up Qwen3-30B-A3B (on 2x H100).
I suspect the overhead might be overshadowing the gains. It would be great if we could have some profiling analysis to pinpoint exactly where the cost is coming from.
Also, tuning parameters for MoE models feels much more difficult than for dense models. Do you think it would be possible to provide a guidance or a micro-benchmarking script? This would really help users quickly identify the optimal parameters for their specific hardware.
(For reference, the related issue is this.)
Two quick questions:
I’m still wondering: why does EAGLE3 seem less effective on Qwen3 compared to other models?
Are there any specific tricks for training a high-quality EAGLE3 draft model for this architecture?
Thanks! 🥹🥹