Giter Club home page Giter Club logo

talk-llama-fast's Introduction

talk-llama-fast

Early pre beta!

based on talk-llama https://github.com/ggerganov/whisper.cpp

I added:

  • xTTSv2 support
  • UTF8 and Russian
  • Speed-ups: streaming for generation, streaming for xtts, aggresive VAD
  • voice commands: Google, stop, regenerate, delete, reset
  • generation/tts interruption when user is speaking

I used:

  • whisper.cpp ggml-medium-q5_0.bin
  • mistral-7b-instruct-v0.2.Q6_K.gguf
  • xTTSv2 server in streaming-mode
  • langchain google-serper

News

  • 2024.02.25 - I added --vad-start-thold param for tuning stop on speech detection (0.000270: default, 0 to turn off). VAD checks current noise level, if it is loud - xtts and llama stops. Turn it up if you are in a noisy room, also check --print-energy
  • 2024.02.22 - initial public release

Requirements

  • Windows 10/11 x64
  • python, cuda
  • nvidia GPU with 12 GB vram, but i guess you can try with 8 GB. Also you can try to use CPU instead of GPU, but it will be slow (you need to build cpu version yourself).
  • For AMD, macos, linux, android - first you need to compile everything. I don't know if it works.
  • Android version is TODO.

Installation

For Windows 10/11 x64 with CUDA

Optional

stop xtts when user is speaking

  • To stop playing XTTS: In talk-llama.bat change param --xtts-control-path to full path where you have xtts_play_allowed.txt
  • Then you need to modify c:\Users[USERNAME]\miniconda3\Lib\site-packages\xtts_api_server\RealtimeTTS\text_to_stream.py
  • download /xtts/text_to_stream.py from my repo, compare its contents with original file (e.g. using notepad++ compare plugin), make changes. I will make automatic patcher later.

Optional, better coma handling for xtts

Better speech, but a little slower for first sentence: c:\Users[USERNAME]\miniconda3\Lib\site-packages\stream2sentence\stream2sentence.py line 191, replace

sentence_delimiters = '.?!;:,\n…)]}。'
with
sentence_delimiters = '.?!;:\n…)]}。'

Optional, google search plugin

  • download search_server.py from my repo
  • install langchain: pip install langchain
  • sign up at https://serper.dev/api-key it is free and fast, it will give you 2500 free searches. Get an API key, paste it to search_server.py at line 15 os.environ["SERPER_API_KEY"] = "your_key"
  • start search server by double clicking it. Now you can use voice commands like these: Please google who is Barack Obama or Пожалуйста погугли погоду в Москве.

Building, optional

git clone https://github.com/Microsoft/vcpkg.git
cd vcpkg
./bootstrap-vcpkg.sh
./vcpkg integrate install
vcpkg install curl[tool]
  • Modify path c:\\DATA\\Soft\\vcpkg\\scripts\\buildsystems\\vcpkg.cmake below to folder where you installed vcpkg. Then build.
git clone https://github.com/Mozer/talk-llama-fast
cd talk-llama-fast
set SDL2_DIR=SDL2\cmake
cmake.exe -DWHISPER_SDL2=ON -DWHISPER_CUBLAS=1 -DCMAKE_TOOLCHAIN_FILE="c:\\DATA\\Soft\\vcpkg\\scripts\\buildsystems\\vcpkg.cmake" -B build
cmake.exe --build build --config release --target clean
del build\bin\Release\talk-llama.exe & cmake.exe --build build --config release

talk-llama.exe params

  -h,       --help           [default] show this help message and exit
  -t N,     --threads N      [4      ] number of threads to use during computation
  -vms N,   --voice-ms N     [10000  ] voice duration in milliseconds
  -c ID,    --capture ID     [-1     ] capture device ID
  -mt N,    --max-tokens N   [32     ] maximum number of tokens per audio chunk
  -ac N,    --audio-ctx N    [0      ] audio context size (0 - all)
  -ngl N,   --n-gpu-layers N [999    ] number of layers to store in VRAM
  -vth N,   --vad-thold N    [0.60   ] voice avg activity detection threshold
  -vths N,  --vad-start-thold N [0.000270] vad min level to stop tts, 0: off, 0.000270: default
  -vlm N,   --vad-last-ms N  [0      ] vad min silence after speech, ms
  -fth N,   --freq-thold N   [100.00 ] high-pass frequency cutoff
  -su,      --speed-up       [false  ] speed up audio by x2 (reduced accuracy)
  -tr,      --translate      [false  ] translate from source language to english
  -ps,      --print-special  [false  ] print special tokens
  -pe,      --print-energy   [false  ] print sound energy (for debugging)
  -vp,      --verbose-prompt [false  ] print prompt at start
  -ng,      --no-gpu         [false  ] disable GPU
  -p NAME,  --person NAME    [Georgi ] person name (for prompt selection)
  -bn NAME, --bot-name NAME  [LLaMA  ] bot name (to display)
  -w TEXT,  --wake-command T [       ] wake-up command to listen for
  -ho TEXT, --heard-ok TEXT  [       ] said by TTS before generating reply
  -l LANG,  --language LANG  [en     ] spoken language
  -mw FILE, --model-whisper  [models/ggml-base.en.bin] whisper model file
  -ml FILE, --model-llama    [models/ggml-llama-7B.bin] llama model file
  -s FILE,  --speak TEXT     [./examples/talk-llama/speak] command for TTS
  --prompt-file FNAME        [       ] file with custom prompt to start dialog
  --session FNAME                   file to cache model state in (may be large!) (default: none)
  -f FNAME, --file FNAME     [       ] text output file name
   --ctx_size N              [2048   ] Size of the prompt context
  -n N, --n_predict N        [64     ] Number of tokens to predict
  --temp N                   [0.90   ] Temperature
  --top_k N                  [40.00  ] top_k
  --top_p N                  [1.00   ] top_p
  --repeat_penalty N         [1.10   ] repeat_penalty
  --xtts-voice NAME          [emma_1 ] xtts voice without .wav
  --xtts-url TEXT            [http://localhost:8020/] xtts/silero server URL, with trailing slash
  --xtts-control-path FNAME  [c:\DATA\LLM\xtts\xtts_play_allowed.txt] path to xtts_play_allowed.txt  
  --google-url TEXT          [http://localhost:8003/] langchain google-serper server URL, with /

Voice commands:

Full list of commands and variations is in talk-llama.cpp, search user_command.

  • Stop (остановись)
  • Regenerate (переделай) - will regenerate llama answer
  • Delete (удали) - will delete user question and llama answer.
  • Delete 3 messages (удали 3 сообщениия)
  • Reset (удали все) - will delete all context except for a initial prompt
  • Google something (погугли что-то)

Bugs

  • Reset voice command won't work nice if current context length is over --ctx_size
  • Rope context scaling is not working nice
  • sometimes whisper is hallucinating, need to put hallucinations to stop-words. Check misheard text in talk-llama.cpp
  • don't put cyrillic (Russian) letters for characters or paths in .bat files, they may not work nice because of weird encoding. Use cmd instead if you need to use cyrillic letters.

talk-llama-fast's People

Contributors

ggerganov avatar mozer avatar ikawrakow avatar bobqianic avatar slaren avatar johannesgaessler avatar jhen0409 avatar asmaloney avatar przemoc avatar nalbion avatar digipom avatar katsu560 avatar fitzsim avatar sandrohanea avatar marmistrz avatar didzis avatar 0cc4m avatar felrock avatar finnvoor avatar cherts avatar ptsochantaris avatar tamo avatar boolemancer avatar alonfaraj avatar luciferous avatar dokterbob avatar herrera-luis avatar djthorpe avatar ibob avatar bjnortier avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.