Implement ATT-INPUT method for computing alignments from cross-attention

The `HuggingFaceNmtEngine` class currently implements the ATT-OUTPUT approach from [this paper](https://ojs.aaai.org/index.php/AAAI/article/view/17496). The ATT-INPUT method would generate better quality alignments. In order to implement ATT-INPUT, the class would need to shift the attentions to the left one step. This can be done by not adding a 0 matrix at the beginning of the attentions. We would also need to change what layer the attentions are retrieved from (bottom layers). For ATT-INPUT, it is possible for the last token to not get aligned if the translation has hit the max generation length. This edge case should be handled properly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement ATT-INPUT method for computing alignments from cross-attention #261

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Implement ATT-INPUT method for computing alignments from cross-attention #261

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions