Hello lim, thanks for sharing the source code of your impressive work! However, I find several problems in the code that are mismatched with the description in your paper.
In the paper, in order to get a low-rank approximation of the attention matrix, a weight matrix Wdown is left-multiplied to K matrix, and then the Adown matrix is up-projected. But in Sparsifiner.py, both Q and K are down-projected by right-multiplied their weight matrix to form Adown, and then Adown matrix is up projected. Why this mismatch happen?
Besides, there's a bug lies in the 633th row of main.py. In the 92th row of default_config.py, each item in C.SPAR.TRAIN_STRATEGY.INDICATOR only has 3 elements, when trying to unpack "cfg.SPAR.TRAIN_STRATEGY.INDICATOR" in the 633th row of main.py which need 4 element, an error will occur.