Skip to content

Implement ATT-INPUT method for computing alignments from cross-attention #261

@ddaspit

Description

@ddaspit

The HuggingFaceNmtEngine class currently implements the ATT-OUTPUT approach from this paper. The ATT-INPUT method would generate better quality alignments. In order to implement ATT-INPUT, the class would need to shift the attentions to the left one step. This can be done by not adding a 0 matrix at the beginning of the attentions. We would also need to change what layer the attentions are retrieved from (bottom layers). For ATT-INPUT, it is possible for the last token to not get aligned if the translation has hit the max generation length. This edge case should be handled properly.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    Status

    🆕 New

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions