-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Is your feature request related to a problem? Please describe.
While Megatron Core supports Multi-Latent Attention (MLA) for KV cache compression (as used in DS-V2/V3 and Kimi-K2), it lacks support for Kimi Delta Attention (KDA), a sparse attention mechanism that selectively computes attention based on the delta from previous layers or cached representations.
Describe the solution you'd like
Add support for Kimi Delta Attention including the following components:
- New configuration options in
TransformerConfig - New attention module:
DeltaAttentionorKDASelfAttentionclass extending the existingAttentionbase class - Layer spec support: Add KDA layer specs similar to how MLA is handled
Describe alternatives you've considered
Additional context
Kimi Delta Attention arxiv
switiz and kibong
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request