question about training paradigm

Hi, this is a very interesting work! One thing I don't understand is whether the self-distillation is rewriting using Llama2-chat and further fine-tuning Llama2-chat as well, or is it just fine-tuning Llama2?

Issue Replicating Paper Results with scripts/gsm8k/


I've been attempting to replicate the results presented in your paper by using the provided script located at scripts/gsm8k/ Despite following all the instructions and ensuring that my setup matches the recommended configuration, I'm unable to achieve the results as reported in the paper.

Fine-tuning using sdft

Evaluation on gsm8k:
Accuracy for math: 387 / 1319 = 29.34%

Evaluation on multiarith:
Accuracy for math: 146 / 180 = 81.11%

Evaluation on OpenFunctions:
Accuracy for openfunction: 25 / 112 = 22.32%

Could you please provide any insights or suggestions that might help in correctly replicating the results? Am I missing an update or a crucial step in the process?

Thank you for your assistance.

Best regards





Evaluation on gsm8k:
Accuracy for math: 26 / 1319 = 1.97%

Evaluation on multiarith:
Accuracy for math: 4 / 180 = 2.22%


Evaluation on gsm8k:
Accuracy for math: 18 / 1319 = 1.36%

Evaluation on multiarith:
Accuracy for math: 3 / 180 = 1.67%

Evaluation on OpenFunctions:
Accuracy for openfunction: 8 / 112 = 7.14%


Hello, I followed the instructions to download the LLamA-Factory and bigcode into a directory, and then run the file under the alpha directory, but an error occurred. Could you please tell me what the problem is and how to solve it?
屏幕截图 2024-03-19 090858

