aahouzi / llama2-chatbot-cpu Goto Github PK
View Code? Open in Web Editor NEWA LLaMA2-7b chatbot with memory running on CPU, and optimized using smooth quantization, 4-bit quantization or Intel® Extension For PyTorch with bfloat16.
License: MIT License