trying to understand LLMs. This is my journey so far:
- A failed experiment with LISA: "Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning", code, paper
- š ļø Memory-efficient LLM Training with GaLore, yet another PEFT approach, code
- āļø Evaluating LLMs with Semantic Similarity, code
- š ļø Finetune TinyLlama and StableLM 2, code
- š ļø Finetune Microsoft's Phi-2, code
- š ļø Finetune Mamba, code
- š ļø Finetune Llama2 and Mistral using QLoRA, code
- āļø Evaluate LLM language capabilities with meta's Belebele benchmark, code
- āļø Evaluate LLM language capabilities with BLEU, code
- āļø Llama2-70B as a judge of LLMs performs almost as good as GPT-4, code
- āļø Validation loss is not a good metric for chatbot quality
- āļø Use GPT3.5 as a judge of open-source LLMs, code
- š ļø Finetune Llama on podcast transripts with QLoRA, code
- š Use Stable Diffusion for sketch-guided image generation, code