ICLR 2021 paper Is Attention Better Than Matrix Decomposition? Pytorch implementation. I haved tested this on IWSLT for the correctness and the efficacy. The hamburger-pytorch is not correct.
import torch
from krabbypatty_pytorch import KrabbyPatty
x = torch.randn(42,64,512) # [Sequence Length, Batch Size, Hidden Dimension]
krabbypatty = KrabbyPatty(input_dim=512)
output = krabbypatty(x) + x
@inproceedings{
title={Is Attention Better Than Matrix Decomposition?},
author={Geng, Zhengyang and Guo, Meng-Hao and Chen, Hongxu and Li, Xia and Wei, Ke and Lin Zhouchen}
year={2021},
url={https://openreview.net/forum?id=1FvkSpWosOl},
note={ICLR 2021}
}