mozer / talk-llama-fast Goto Github PK

This project forked from ggerganov/whisper.cpp

Port of OpenAI's Whisper model in C/C++ with xtts and wav2lip

License: MIT License

Shell 0.10% JavaScript 0.01% Ruby 0.02% C++ 65.57% Python 0.19% C 26.47% Objective-C 1.33% Java 0.17% Go 0.17% Objective-C++ 0.01% Cuda 3.94% Swift 0.01% Makefile 0.11% CMake 0.42% Batchfile 0.01% Metal 1.48% Dockerfile 0.01%

talk-llama-fast's People

Contributors

Stargazers

Watchers

talk-llama-fast's Issues

Is online AI models a possibility via OpenRouter ?

Your work is absolutely amazing, I love it. The problem is 8b models are too much limited. Would it be possible to include a possibility to connect with any online models with OpenRouter for example ? Llama 70B is very fast to answer and it's a lot better than the 8b model ...

Thanks for the great work, very appreciated.

Failed building wheel for TTS

Hi,
Doing a clean installation of talk-llama-fast. I ran pip install git+https://github.com/Mozer/tts but the results i get is this.

C:\Users\miniconda3\envs\xtts\include\pyconfig.h(59): fatal error C1083: Cannot open include file: 'io.h': No such file or directory
error: command 'C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.39.33519\bin\HostX86\x64\cl.exe' failed with exit code 2
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for TTS
Failed to build TTS
ERROR: Could not build wheels for TTS, which is required to install pyproject.toml-based projects

I've installed visual studio build tools. Here is what i've selected

audio.init() failed! (No audio device found (0x88780078))

I installed and started all modules according to the instructions, silly_extras and xtts_wav2lip start without errors, but when starting talk-lama-wav2lip-ru I get an error connecting to the microphone (No audio device found (0x88780078)). I used different microphones (external USB and built-in laptop, updated drivers on both microphones, in startup parameters of talk-lama.exe I manually set the device selection via "-c 1", changed default devices in Windows settings. The error did not disappear

I tried to fix it by changing the driver to DirectSound, but got the same thing. Even updated DirectX, the problem did not go away

Windows 10 Pro 64

Please learn how to use Git

Dear @Mozer ,

You work is valuable, important and greatly appreciated! I just don't want your work to be misrepresented due to really bad repository management practice here. Git is not a file server to upload the source files. The important part of repository purpose is the history of changes. By not giving others a chance to understand your changes into the original llama.cpp, you increase the community effort to review and improve your project. If you are interested in this feature of Git, please do not hesitate asking how to use it, it's my pleasure to help!

need help

1.so It won't work on gt 730 + ryzen 3600 16gb ram ?
Device 0: GeForce GT 730, compute capability 3.5, VMM: yes
llm_load_tensors: ggml ctx size = 0.27 MiB
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 4679.55 MiB on device 0: cudaMalloc failed: out of memory
llama_model_load: error loading model: unable to allocate backend buffer
llama_load_model_from_file: failed to load model
llama_new_context_with_model: model cannot be NULL

also clicked talk-llama.exe by mistake. do I need to restart install?

thank you so much!

Does not work with microphone

So i followed the basic Installation process, I can start the server for extras as well as the xtts but the final talk.exe does not seem to be working , when i speak into the microphone nothing goes through to the app. I checked my microphone on discord and OBS and it seems to be working fine , please advice.

Once again apologies for the noob question , But Even with Mic on, nothing happens to the program kindly advice?

here is the terminal log for silly_extras.bat:
`C:\Users\USER\Desktop\coding\python\realtime\talk-llama-fast-v0.1.3\SillyTavern-Extras>call conda activate extras
Using torch device: cpu
Initializing wav2lip module
wav2lip: running init generation with default and silence.wav
in wav2lip_server_generate: is busy: 0, face_detect_running: 0, chunk: 0, chunk_needed: 0, reply: 0
speech detected, wav2lip_server won't generate
Deleting old temporary wavs and mp4s.
No API key given because you are running locally.

Serving Flask app 'server'
Debug mode: off

Wav2lip videos can be played now.

WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.

Running on http://localhost:5100
Press CTRL+C to quit
`

here is the start for the xtts_wav2lip.bat
`C:\Users\USER\Desktop\coding\python\realtime\talk-llama-fast-v0.1.3\xtts>call conda activate xtts
2024-04-14 16:53:52.789 | INFO | xtts_api_server.modeldownloader:upgrade_tts_package:80 - TTS will be using 0.22.0 by Mozer
2024-04-14 16:53:52.789 | INFO | xtts_api_server.server::76 - Model: 'v2.0.2' starts to load,wait until it loads
[2024-04-14 16:54:04,084] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-04-14 16:54:04,414] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs.
[2024-04-14 16:54:04,600] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.11.2+unknown, git-hash=unknown, git-branch=unknown
[2024-04-14 16:54:04,601] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter replace_method is deprecated. This parameter is no longer needed, please remove from your call to DeepSpeed-inference
[2024-04-14 16:54:04,601] [WARNING] [config_utils.py:69:process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead
[2024-04-14 16:54:04,602] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1
[2024-04-14 16:54:04,775] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed-Inference config: {'layer_id': 0, 'hidden_size': 1024, 'intermediate_size': 4096, 'heads': 16, 'num_hidden_layers': -1, 'dtype': torch.float32, 'pre_layer_norm': True, 'norm_type': <NormType.LayerNorm: 1>, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 1, 'scale_attention': True, 'triangular_masking': True, 'local_attention': False, 'window_size': 1, 'rotary_dim': -1, 'rotate_half': False, 'rotate_every_two': True, 'return_tuple': True, 'mlp_after_attn': True, 'mlp_act_func_type': <ActivationFuncType.GELU: 1>, 'specialized_mode': False, 'training_mp_size': 1, 'bigscience_bloom': False, 'max_out_tokens': 1024, 'min_out_tokens': 1, 'scale_attn_by_inverse_layer_idx': False, 'enable_qkv_quantization': False, 'use_mup': False, 'return_single_tuple': False, 'set_empty_params': False, 'transposed_mode': False, 'use_triton': False, 'triton_autotune': False, 'num_kv': -1, 'rope_theta': 10000}
2024-04-14 16:54:05.268 | INFO | xtts_api_server.tts_funcs:load_model:190 - Pre-create latents for all current speakers
2024-04-14 16:54:05.268 | INFO | xtts_api_server.tts_funcs:get_or_create_latents:259 - creating latents for Anna: speakers/Anna.wav
2024-04-14 16:54:08.066 | INFO | xtts_api_server.tts_funcs:get_or_create_latents:259 - creating latents for default: speakers/default.wav
2024-04-14 16:54:08.109 | INFO | xtts_api_server.tts_funcs:get_or_create_latents:259 - creating latents for Google: speakers/Google.wav
2024-04-14 16:54:08.169 | INFO | xtts_api_server.tts_funcs:get_or_create_latents:259 - creating latents for Kurt Cobain: speakers/Kurt Cobain.wav
2024-04-14 16:54:08.234 | INFO | xtts_api_server.tts_funcs:create_latents_for_all:270 - Latents created for all 4 speakers.
2024-04-14 16:54:08.235 | INFO | xtts_api_server.tts_funcs:load_model:193 - Model successfully loaded
C:\Users\USER\Miniconda3\envs\xtts\Lib\site-packages\pydantic_internal_fields.py:160: UserWarning: Field "model_name" has conflict with protected namespace "model".

You may be able to resolve this warning by setting model_config['protected_namespaces'] = ().
warnings.warn(
INFO: Started server process [13136]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://localhost:8020 (Press CTRL+C to quit)`

and here is the one for talk-llama-wav2Lip.bat
`C:\Users\USER\Desktop\coding\python\realtime\talk-llama-fast-v0.1.3>talk-llama.exe -mw ggml-medium.en-q5_0.bin -ml zephyr-7b-beta.Q4_K_S.gguf -p "Alex" --speak speak --vad-last-ms 200 --vad-start-thold 0.000270 --bot-name "Anna" --prompt-file assistant.txt --temp 1.15 --ctx_size 3548 --multi-chars --allow-newline --seqrep --stop-words Aleks:;alex:;---;ALex -ngl 99 -n 60 --threads 4 --split-after 5 --sleep-before-xtts 1000
Warning: c:\DATA\LLM\xtts\xtts_play_allowed.txt file not found, xtts wont stop on user speech without it
whisper_init_from_file_with_params_no_state: loading model from 'ggml-medium.en-q5_0.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab = 51864
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 1024
whisper_model_load: n_audio_head = 16
whisper_model_load: n_audio_layer = 24
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 1024
whisper_model_load: n_text_head = 16
whisper_model_load: n_text_layer = 2
whisper_model_load: n_mels = 80
whisper_model_load: ftype = 1
whisper_model_load: qntvr = 0
whisper_model_load: type = 4 (medium)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs = 99
ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes
whisper_backend_init: using CUDA backend
whisper_model_load: CUDA0 total size = 793.41 MB
whisper_model_load: model size = 793.41 MB
whisper_backend_init: using CUDA backend
whisper_init_state: kv self size = 11.01 MB
whisper_init_state: kv cross size = 12.29 MB
whisper_init_state: compute buffer (conv) = 28.68 MB
whisper_init_state: compute buffer (encode) = 594.22 MB
whisper_init_state: compute buffer (cross) = 7.85 MB
whisper_init_state: compute buffer (decode) = 98.31 MB
llama_model_loader: loaded meta data with 21 key-value pairs and 291 tensors from zephyr-7b-beta.Q4_K_S.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.name str = huggingfaceh4_zephyr-7b-beta
llama_model_loader: - kv 2: llama.context_length u32 = 32768
llama_model_loader: - kv 3: llama.embedding_length u32 = 4096
llama_model_loader: - kv 4: llama.block_count u32 = 32
llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336
llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128
llama_model_loader: - kv 7: llama.attention.head_count u32 = 32
llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 8
llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 10: llama.rope.freq_base f32 = 10000.000000
llama_model_loader: - kv 11: general.file_type u32 = 14
llama_model_loader: - kv 12: tokenizer.ggml.model str = llama
llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,32000] = ["", "~~", "~~", "<0x00>", "<...
llama_model_loader: - kv 14: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv 15: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv 16: tokenizer.ggml.bos_token_id u32 = 1
llama_model_loader: - kv 17: tokenizer.ggml.eos_token_id u32 = 2
llama_model_loader: - kv 18: tokenizer.ggml.unknown_token_id u32 = 0
llama_model_loader: - kv 19: tokenizer.ggml.padding_token_id u32 = 2
llama_model_loader: - kv 20: general.quantization_version u32 = 2
llama_model_loader: - type f32: 65 tensors
llama_model_loader: - type q4_K: 217 tensors
llama_model_loader: - type q5_K: 8 tensors
llama_model_loader: - type q6_K: 1 tensors
llm_load_vocab: special tokens definition check successful ( 259/32000 ).
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = SPM
llm_load_print_meta: n_vocab = 32000
llm_load_print_meta: n_merges = 0
llm_load_print_meta: n_ctx_train = 32768
llm_load_print_meta: n_embd = 4096
llm_load_print_meta: n_head = 32
llm_load_print_meta: n_head_kv = 8
llm_load_print_meta: n_layer = 32
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_embd_head_k = 128
llm_load_print_meta: n_embd_head_v = 128
llm_load_print_meta: n_gqa = 4
llm_load_print_meta: n_embd_k_gqa = 1024
llm_load_print_meta: n_embd_v_gqa = 1024
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff = 14336
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx = 32768
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: model type = 7B
llm_load_print_meta: model ftype = Q4_K - Small
llm_load_print_meta: model params = 7.24 B
llm_load_print_meta: model size = 3.86 GiB (4.57 BPW)
llm_load_print_meta: general.name = huggingfaceh4_zephyr-7b-beta
llm_load_print_meta: BOS token = 1 ''
llm_load_print_meta: EOS token = 2 ''
llm_load_print_meta: UNK token = 0 ''
llm_load_print_meta: PAD token = 2 ''
llm_load_print_meta: LF token = 13 '<0x0A>'
llm_load_tensors: ggml ctx size = 0.22 MiB
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU
llm_load_tensors: CPU buffer size = 70.31 MiB
llm_load_tensors: CUDA0 buffer size = 3877.55 MiB
..................................................................................................
llama_new_context_with_model: n_ctx = 3548
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: CUDA0 KV buffer size = 443.50 MiB
llama_new_context_with_model: KV self size = 443.50 MiB, K (f16): 221.75 MiB, V (f16): 221.75 MiB
llama_new_context_with_model: CUDA_Host input buffer size = 29.88 MiB
llama_new_context_with_model: CUDA0 compute buffer size = 521.38 MiB
llama_new_context_with_model: CUDA_Host compute buffer size = 16.00 MiB
llama_new_context_with_model: graph splits (measure): 3

WARNING: model is not multilingualrun: processing, 4 threads, lang = en, task = transcribe, timestamps = 0 ...

init: found 2 capture devices:
init: - Capture device #0: 'CABLE Output (VB-Audio Virtual Cable)'
init: - Capture device #1: 'Microphone (Logi C270 HD WebCam)'
init: attempt to open default capture device ...
init: obtained spec for input device (SDL Id = 2):
init: - sample rate: 16000
init: - format: 33056 (required: 33056)
init: - channels: 1 (required: 1)
init: - samples per frame: 1024

run : initializing - please wait ...
run : done! start speaking in the microphone
Llama stop words: 'Alex:', 'Aleks:', 'alex:', '---', 'ALex',

Alex:`

Please can you write step by step guide for Linux

If it helps i can write docs by you detailed instructions.

llama text generation stops too early or other problems?

It seems that the llama text generation stops too early? Managed to get long answer, audio and video only one time.

from talk-llama-wav2lip-bat:
run : initializing - please wait ...

Llama start prompt: 536/3548 tokens in 1.312 s at 408 t/s
Llama stop words: 'Alex:', 'Alex :', 'Aleks:', 'alex:', '---', 'ALex',
Voice commands: Stop(Ctrl+Space), Regenerate(Ctrl+Right), Delete(Ctrl+Delete), Reset(Ctrl+R)
Start speaking or typing:

Alex: hello
Anna: helloo [Speech/Stop!]

Alex: Tell me a joke
Anna: Why have [Speech/Stop!]

[t: 560]

Alex: Tell me a joke
Anna: Why did [Speech/Stop!]

from xtts_wav2lip.bat:
xtts_api_server.server:tts_to_audio:337 - Processing TTS to audio with request: text='Why did' speaker_wav='Anna' language='en' reply_part=0
speech detected, xtts won't generate
INFO: ::1:56219 - "POST /tts_to_audio/ HTTP/1.1" 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\uvicorn\protocols\http\httptools_impl.py", line 411, in run_asgi
result = await app( # type: ignore[func-returns-value]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\uvicorn\middleware\proxy_headers.py", line 69, in call
return await self.app(scope, receive, send)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\fastapi\applications.py", line 1054, in call
await super().call(scope, receive, send)
File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\starlette\applications.py", line 123, in call
await self.middleware_stack(scope, receive, send)
File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\starlette\middleware\errors.py", line 186, in call
raise exc
File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\starlette\middleware\errors.py", line 164, in call
await self.app(scope, receive, _send)
File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\starlette\middleware\cors.py", line 85, in call
await self.app(scope, receive, send)
File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\starlette\middleware\exceptions.py", line 65, in call
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\starlette_exception_handler.py", line 64, in wrapped_app
raise exc
File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\starlette_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\starlette\routing.py", line 756, in call
await self.middleware_stack(scope, receive, send)
File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\starlette\routing.py", line 776, in app
await route.handle(scope, receive, send)
File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\starlette\routing.py", line 297, in handle
await self.app(scope, receive, send)
File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\starlette\routing.py", line 77, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\starlette_exception_handler.py", line 64, in wrapped_app
raise exc
File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\starlette_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\starlette\routing.py", line 72, in app
response = await func(request)
^^^^^^^^^^^^^^^^^^^
File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\fastapi\routing.py", line 278, in app
raw_response = await run_endpoint_function(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\fastapi\routing.py", line 191, in run_endpoint_function
return await dependant.call(**values)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\xtts_api_server\server.py", line 347, in tts_to_audio
output_file_path = XTTS.process_tts_to_file(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\xtts_api_server\tts_funcs.py", line 609, in process_tts_to_file
raise e # Propagate exceptions for endpoint handling.
^^^^^^^
File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\xtts_api_server\tts_funcs.py", line 598, in process_tts_to_file
self.local_generation(clear_text,speaker_name_or_path,speaker_wav,language,output_file)
File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\xtts_api_server\tts_funcs.py", line 495, in local_generation
out = self.model.inference(
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\TTS\tts\models\xtts.py", line 608, in inference
"wav": torch.cat(wavs, dim=0).numpy(),
^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: torch.cat(): expected a non-empty list of Tensors
1715128015.1753356 in server request
2024-05-08 03:26:55.176 | INFO | xtts_api_server.server:tts_to_audio:337 - Processing TTS to audio with request: text='Fuck youuu' speaker_wav='Anna' language='en' reply_part=0
speech detected, xtts won't generate
INFO: ::1:56232 - "POST /tts_to_audio/ HTTP/1.1" 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\uvicorn\protocols\http\httptools_impl.py", line 411, in run_asgi
result = await app( # type: ignore[func-returns-value]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\uvicorn\middleware\proxy_headers.py", line 69, in call
return await self.app(scope, receive, send)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\fastapi\applications.py", line 1054, in call
await super().call(scope, receive, send)
File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\starlette\applications.py", line 123, in call
await self.middleware_stack(scope, receive, send)
File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\starlette\middleware\errors.py", line 186, in call
raise exc
File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\starlette\middleware\errors.py", line 164, in call
await self.app(scope, receive, _send)
File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\starlette\middleware\cors.py", line 85, in call
await self.app(scope, receive, send)
File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\starlette\middleware\exceptions.py", line 65, in call
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\starlette_exception_handler.py", line 64, in wrapped_app
raise exc
File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\starlette_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\starlette\routing.py", line 756, in call
await self.middleware_stack(scope, receive, send)
File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\starlette\routing.py", line 776, in app
await route.handle(scope, receive, send)
File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\starlette\routing.py", line 297, in handle
await self.app(scope, receive, send)
File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\starlette\routing.py", line 77, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\starlette_exception_handler.py", line 64, in wrapped_app
raise exc
File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\starlette_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\starlette\routing.py", line 72, in app
response = await func(request)
^^^^^^^^^^^^^^^^^^^
File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\fastapi\routing.py", line 278, in app
raw_response = await run_endpoint_function(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\fastapi\routing.py", line 191, in run_endpoint_function
return await dependant.call(**values)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\xtts_api_server\server.py", line 347, in tts_to_audio
output_file_path = XTTS.process_tts_to_file(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\xtts_api_server\tts_funcs.py", line 609, in process_tts_to_file
raise e # Propagate exceptions for endpoint handling.
^^^^^^^
File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\xtts_api_server\tts_funcs.py", line 598, in process_tts_to_file
self.local_generation(clear_text,speaker_name_or_path,speaker_wav,language,output_file)
File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\xtts_api_server\tts_funcs.py", line 495, in local_generation
out = self.model.inference(
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\TTS\tts\models\xtts.py", line 608, in inference
"wav": torch.cat(wavs, dim=0).numpy(),
^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: torch.cat(): expected a non-empty list of Tensors

incomplete video tutorial and errors

greetings,

1.I don't understand how to set environment variables. this isn't in written description, only in youtube video. exact time, up to 9:33:
https://youtu.be/0MEZ84uH4-E?t=542
unfortunately, your windows is in russian and it's 11 no 10 so may be differences
do I need to go to system properties>Advanced>Environment variables>new> add path to C:\talk-llama-fast-0.2.0\xtts\SillyTavern-Extras\tts_out ?

2.you also skipped h264 codec. currently availabe, don't know which one to get:
http://ciscobinary.openh264.org/openh264-2.4.1-win32.dll.bz2
http://ciscobinary.openh264.org/openh264-2.4.1-win32.dll.signed.md5.txt
http://ciscobinary.openh264.org/openh264-2.4.1-win64.dll.bz2
http://ciscobinary.openh264.org/openh264-2.4.1-win64.dll.signed.md5.txt

where do I put models ggml-medium.en-q5_0 and mistral-7b-instruct-v0.2.Q5_0 ? straight into \talk-llama-fast-0.2.0 folder?
when running xtts_wav2lip.ba got this error C:\Users\xxx\miniconda3\envs\xtts\Lib\site-packages\pydantic_internal_fields.py:161: UserWarning: Field "model_name" has conflict with protected namespace "model_".

I can't thank you enough for your work!

No module named xtts_api_server

I'm trying to run xtts_streaming_audio.bat but it can't find xtts_api_server. I'm a bit confused because when I start up xtts_wav2lip.bat that works fine despite it also using the xtts_api_server module. When I check for that module in C:\Users\<USER>\miniconda3\envs\xtts\Lib\site-packages it's right there.

The complete error:
C:\Users\<USER>\miniconda3\envs\extras\python.exe: No module named xtts_api_server

So basically I can do video chat with Emma Watson alright but not simple audio chat.

I'm running on windows 10 on a rtx 3060 12GB. I've also replaced the LLM with Meta-Llama-3-8B-Instruct.Q3_K_M.gguf and deleted Emma Watson's character prompt, if any of that info is useful.

Error reading file xtts_play_allowed.txt

Здравствуйте, сделай всё как было показано на инструкции в youtube. Но когда я открываю silly_extras.bat у меня выходит ошибка

nothing happens

Hello,
I did every steps then when I launch talk-llama-wav2lip.bat it finishes with the following statement :

G:\talk-llama>pause
Press any key to continue . . .

Edit : It's working now, sorry :) It looks great, thx !

the voice (and video) cuts out early and doesn't complete the response- would love the SillyTavern instructions.

XTTS seems to cut out early before response is finished.
set chunks to --wav-chunk-sizes=100,200,300,400,9999
no go.

Sillytavern proper with Koboldcpp.exe and another model with extras enabled without the video encoder (talk-llama-wave2lip.bat) -- no problems with the xtts.

would love full SillyTavern instructions to get this video to work -- don't care much for fast-llama .

Where can I get talk-llama-wav2lip.bat or talk-llama-wav2lip-ru.bat or talk-llama-just-audio.bat?

Google Colab Notebook // Enchancement

Please create a demo / Google Colaboratory Pro notebook to quickly run and test the project. This will allow us to easily experiment with the code and see the results without having to set up a full development environment. Looking forward to getting this set up and thak you for this superb project ✨

Forgotten unicode.h ?

Upon building from source, I'm getting the following error:

CMake Error at CMakeLists.txt:88 (add_library):
  Cannot find source file:

    unicode.h

Looks like the unicode.h file was not added into the repo, could you please check?

talk-llama.exe вылетает

Добрый день!

При запуске talk-llama-audio-ru.bat, talk-llama.exe вылетает как показано в приложении, в чем может быть проблема?

Geforce 4060 16Gb, cuda toolkit и последние драйвера установлены.

Заранее спасибо за ответ,
С уважением, Конст

cant cmpile

error: use of ‘auto’ in lambda parameter declaration only available with ‘-std=c++14’ or ‘-std=gnu++14’

the video window may close unexpectedly in the middle of a sentence

Hello.

Sometimes the video window may close unexpectedly in the middle of a sentence when the assistant is saying something. If you start talking yourself, so that the assistant begins to regenerate the text and voice it, the window will appear again.

I found out that this is a bug in silly_extras.bat. It wants to open a wav file, but can't. Any ideas why?

cv2: missing video frame 
Exception in thread Thread-4 (wav2lip_server_play_init): Traceback (most recent call last): 
	File "D:\miniconda\envs\extras\Lib\threading.py", line 1045, in _bootstrap_inner 
		self.run() 
	File "D:\miniconda\envs\extras\Lib\threading.py", line 982, in run 
		self._target(*self._args, *self._kwargs) 
	File "D:\talk-llama-fast\xtts\SillyTavern-Extras\modules/wav2lip\server_wav2lip.py", line 342, in wav2lip_server_play_init 
	next_video_chunk_global = play_video_with_audio(video_file_path, audio_file_path, True, next_video_chunk_global, rand_caption) 

	File "D:\talk-llama-fast\xtts\SillyTavern-Extras\modules/wav2lip\server_wav2lip.py", line 475, in play_video_with_audio 
	wf = wave.open(audio_file, 'rb') # Open the next audio file 
	
	File "D:\miniconda\envs\extras\Lib\wave.py", line 631, in open
		return Wave_read(f)
	File "D:\miniconda\envs\extras\Lib\wave.py", line 283, in __init__ 
		self.initfp(f) 
	File "D:\miniconda\envs\extras\lib\wave.py", line 274, in initfp 
	raise Error('fmt chunk and/or data chunk missing') wave.Error: fmt chunk and/or data chunk missing 1718026936.074945 in wav2lip gen server chunk:23_2
in wav2lip_server_generate: is busy: 0, face_detect_running: 0, chunk: 23, chunk needed: 23, reply: 2

Срочно помощь нужна! Ошибка invalid literal for int() with base 10: '\x00'

как то странно получилось. В самый первый раз как ставил ИИ - было всё норм, потом решил переустоновить в другое место и вот такая ошибка, сделал ВСЁ по инструкции (даже несколько раз).

No audio or video.

I launched talk-llama-wav2lip-ru.bat, and only text output worked, i tried reloading SillyTavern and xtts and this doesn't seem to help, it says there are no speakers.
Help, please.
I use Windows 10 on PC with 4070ti and 16gb ram.
Here is output of xtts:
(xtts) C:\Windows\system32>python -m xtts_api_server --bat-dir %~dp0 -d=cuda --deepspeed --stream-to-wavs --call-wav2lip --output C:\Windows\System32\SillyTavern-Extras\tts_out\ --extras-url http://127.0.0.1:5100/ --wav-chunk-sizes=10,20,40,100,200,300,400,9999
2024-04-15 13:57:48.282 | INFO | xtts_api_server.modeldownloader:upgrade_tts_package:80 - TTS will be using 0.22.0 by Mozer
2024-04-15 13:57:48.283 | INFO | xtts_api_server.server::76 - Model: 'v2.0.2' starts to load,wait until it loads
[2024-04-15 13:58:01,165] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-04-15 13:58:01,457] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs.
[2024-04-15 13:58:01,647] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.11.2+unknown, git-hash=unknown, git-branch=unknown
[2024-04-15 13:58:01,648] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter replace_method is deprecated. This parameter is no longer needed, please remove from your call to DeepSpeed-inference
[2024-04-15 13:58:01,648] [WARNING] [config_utils.py:69:process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead
[2024-04-15 13:58:01,649] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1
[2024-04-15 13:58:01,855] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed-Inference config: {'layer_id': 0, 'hidden_size': 1024, 'intermediate_size': 4096, 'heads': 16, 'num_hidden_layers': -1, 'dtype': torch.float32, 'pre_layer_norm': True, 'norm_type': <NormType.LayerNorm: 1>, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 1, 'scale_attention': True, 'triangular_masking': True, 'local_attention': False, 'window_size': 1, 'rotary_dim': -1, 'rotate_half': False, 'rotate_every_two': True, 'return_tuple': True, 'mlp_after_attn': True, 'mlp_act_func_type': <ActivationFuncType.GELU: 1>, 'specialized_mode': False, 'training_mp_size': 1, 'bigscience_bloom': False, 'max_out_tokens': 1024, 'min_out_tokens': 1, 'scale_attn_by_inverse_layer_idx': False, 'enable_qkv_quantization': False, 'use_mup': False, 'return_single_tuple': False, 'set_empty_params': False, 'transposed_mode': False, 'use_triton': False, 'triton_autotune': False, 'num_kv': -1, 'rope_theta': 10000}
2024-04-15 13:58:02.325 | INFO | xtts_api_server.tts_funcs:load_model:190 - Pre-create latents for all current speakers
2024-04-15 13:58:02.326 | INFO | xtts_api_server.tts_funcs:create_latents_for_all:270 - Latents created for all 0 speakers.
2024-04-15 13:58:02.326 | INFO | xtts_api_server.tts_funcs:load_model:193 - Model successfully loaded
C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\pydantic_internal_fields.py:160: UserWarning: Field "model_name" has conflict with protected namespace "model".

You may be able to resolve this warning by setting model_config['protected_namespaces'] = ().
warnings.warn(
INFO: Started server process [2164]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://localhost:8020 (Press CTRL+C to quit)
voice Anna(speakers/Anna.wav) is not found, switching to 'default'
1713178894.8860521 in server request
2024-04-15 14:01:34.886 | INFO | xtts_api_server.server:tts_to_audio:337 - Processing TTS to audio with request: text='Что ты говоришь' speaker_wav='default' language='ru' reply_part=0
INFO: ::1:58595 - "POST /tts_to_audio/ HTTP/1.1" 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\uvicorn\protocols\http\h11_impl.py", line 407, in run_asgi
result = await app( # type: ignore[func-returns-value]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\uvicorn\middleware\proxy_headers.py", line 69, in call
return await self.app(scope, receive, send)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\fastapi\applications.py", line 1054, in call
await super().call(scope, receive, send)
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette\applications.py", line 123, in call
await self.middleware_stack(scope, receive, send)
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette\middleware\errors.py", line 186, in call
raise exc
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette\middleware\errors.py", line 164, in call
await self.app(scope, receive, _send)
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette\middleware\cors.py", line 85, in call
await self.app(scope, receive, send)
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette\middleware\exceptions.py", line 65, in call
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette_exception_handler.py", line 64, in wrapped_app
raise exc
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette\routing.py", line 756, in call
await self.middleware_stack(scope, receive, send)
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette\routing.py", line 776, in app
await route.handle(scope, receive, send)
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette\routing.py", line 297, in handle
await self.app(scope, receive, send)
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette\routing.py", line 77, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette_exception_handler.py", line 64, in wrapped_app
raise exc
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette\routing.py", line 72, in app
response = await func(request)
^^^^^^^^^^^^^^^^^^^
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\fastapi\routing.py", line 278, in app
raw_response = await run_endpoint_function(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\fastapi\routing.py", line 191, in run_endpoint_function
return await dependant.call(**values)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\xtts_api_server\server.py", line 347, in tts_to_audio
output_file_path = XTTS.process_tts_to_file(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\xtts_api_server\tts_funcs.py", line 609, in process_tts_to_file
raise e # Propagate exceptions for endpoint handling.
^^^^^^^
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\xtts_api_server\tts_funcs.py", line 548, in process_tts_to_file
speaker_wav = self.get_speaker_wav(speaker_name_or_path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\xtts_api_server\tts_funcs.py", line 540, in get_speaker_wav
raise ValueError(f"Speaker {speaker_name_or_path} not found.")
ValueError: Speaker default not found.
voice Anna(speakers/Anna.wav) is not found, switching to 'default'
1713178895.9273908 in server request

LLVM ERROR: Symbol not found: __svml_cosf8_ha

при запуске сервера silly_extras.bat, такая ошибка!

No module named 'TTS.api'

When i`m trying run xtts_wav2lip.bat

I refactored this repo for Linux, but am a noob at running gguf and need help...

https://github.com/purplishdev/talk-llama-fast-linux/

I forked the repo and got it compiled on Linux, but when downloading the models, the names don't match up to what it expects. Here is what I did.

The instructions say to download these files.

Download whisper model to folder with talk-llama.exe: https://huggingface.co/ggerganov/whisper.cpp/blob/main/ggml-medium.en-q5_0.bin (for English) or https://huggingface.co/ggerganov/whisper.cpp/blob/main/ggml-medium-q5_0.bin (for Russian, or even ggml-large-v3-q5_0.bin it is larger but better). You can try small-q5 if you don't have much VRAM.

Download LLM to same folder https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/blob/main/mistral-7b-instruct-v0.2.Q6_K.gguf , you can try q4_K_S if you don't have much VRAM.

Which I did, but when I run the program, I get this error, because it's expecting a .bin file with a different name

purpledev@amethyst  ~/talk-llama-fast/build/bin   master ±  ./talk-llama

whisper_init_from_file_with_params_no_state: loading model from 'models/ggml-base.en.bin'

whisper_init_from_file_with_params_no_state: failed to open 'models/ggml-base.en.bin'

ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no

ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes

ggml_init_cublas: found 1 CUDA devices:

Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes

llama_model_load: error loading model: failed to open models/ggml-llama-7B.bin: No such file or directory

llama_load_model_from_file: failed to load model

[1] 571898 segmentation fault (core dumped) ./talk-llama

At first I tried renaming the .bin file I downloaded to ggml-llama-7B.bin, thinking since they were the same file type that it was what was needed, but that returned this error.

✘ purpledev@amethyst  ~/talk-llama-fast/build/bin   master ±  ./talk-llama

whisper_init_from_file_with_params_no_state: loading model from 'models/ggml-base.en.bin'

whisper_init_from_file_with_params_no_state: failed to open 'models/ggml-base.en.bin'

ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no

ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes

ggml_init_cublas: found 1 CUDA devices:

Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes

gguf_init_from_file: invalid magic characters 'lmgg'

llama_model_load: error loading model: llama_model_loader: failed to load model from models/ggml-llama-7B.bin

llama_load_model_from_file: failed to load model

[1] 582242 segmentation fault (core dumped) ./talk-llama

✘ purpledev@amethyst  ~/talk-llama-fast/build/bin   master ± 

Then I tried renaming the gguf file to ggml-llama-7B.bin, not sure if it would work to just change the file type...

That actually seemed to load the model, but led to this error.

✘ purpledev@amethyst  ~/talk-llama-fast/build/bin   master ±  ./talk-llama

whisper_init_from_file_with_params_no_state: loading model from 'models/ggml-base.en.bin'

whisper_init_from_file_with_params_no_state: failed to open 'models/ggml-base.en.bin'

ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no

ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes

ggml_init_cublas: found 1 CUDA devices:

Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes

llama_model_loader: loaded meta data with 24 key-value pairs and 291 tensors from models/ggml-llama-7B.bin (version GGUF V3 (latest))

llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.

llama_model_loader: - kv 0: general.architecture str = llama

llama_model_loader: - kv 1: general.name str = mistralai_mistral-7b-instruct-v0.2

llama_model_loader: - kv 2: llama.context_length u32 = 32768

llama_model_loader: - kv 3: llama.embedding_length u32 = 4096

llama_model_loader: - kv 4: llama.block_count u32 = 32

llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336

llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128

llama_model_loader: - kv 7: llama.attention.head_count u32 = 32

llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 8

llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010

llama_model_loader: - kv 10: llama.rope.freq_base f32 = 1000000.000000

llama_model_loader: - kv 11: general.file_type u32 = 18

llama_model_loader: - kv 12: tokenizer.ggml.model str = llama

llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,32000] = ["", "~~", "~~", "<0x00>", "<...

llama_model_loader: - kv 14: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000...

llama_model_loader: - kv 15: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...

llama_model_loader: - kv 16: tokenizer.ggml.bos_token_id u32 = 1

llama_model_loader: - kv 17: tokenizer.ggml.eos_token_id u32 = 2

llama_model_loader: - kv 18: tokenizer.ggml.unknown_token_id u32 = 0

llama_model_loader: - kv 19: tokenizer.ggml.padding_token_id u32 = 0

llama_model_loader: - kv 20: tokenizer.ggml.add_bos_token bool = true

llama_model_loader: - kv 21: tokenizer.ggml.add_eos_token bool = false

llama_model_loader: - kv 22: tokenizer.chat_template str = {{ bos_token }}{% for message in mess...

llama_model_loader: - kv 23: general.quantization_version u32 = 2

llama_model_loader: - type f32: 65 tensors

llama_model_loader: - type q6_K: 226 tensors

llm_load_vocab: special tokens definition check successful ( 259/32000 ).

llm_load_print_meta: format = GGUF V3 (latest)

llm_load_print_meta: arch = llama

llm_load_print_meta: vocab type = SPM

llm_load_print_meta: n_vocab = 32000

llm_load_print_meta: n_merges = 0

llm_load_print_meta: n_ctx_train = 32768

llm_load_print_meta: n_embd = 4096

llm_load_print_meta: n_head = 32

llm_load_print_meta: n_head_kv = 8

llm_load_print_meta: n_layer = 32

llm_load_print_meta: n_rot = 128

llm_load_print_meta: n_embd_head_k = 128

llm_load_print_meta: n_embd_head_v = 128

llm_load_print_meta: n_gqa = 4

llm_load_print_meta: n_embd_k_gqa = 1024

llm_load_print_meta: n_embd_v_gqa = 1024

llm_load_print_meta: f_norm_eps = 0.0e+00

llm_load_print_meta: f_norm_rms_eps = 1.0e-05

llm_load_print_meta: f_clamp_kqv = 0.0e+00

llm_load_print_meta: f_max_alibi_bias = 0.0e+00

llm_load_print_meta: n_ff = 14336

llm_load_print_meta: n_expert = 0

llm_load_print_meta: n_expert_used = 0

llm_load_print_meta: rope scaling = linear

llm_load_print_meta: freq_base_train = 1000000.0

llm_load_print_meta: freq_scale_train = 1

llm_load_print_meta: n_yarn_orig_ctx = 32768

llm_load_print_meta: rope_finetuned = unknown

llm_load_print_meta: model type = 7B

llm_load_print_meta: model ftype = Q6_K

llm_load_print_meta: model params = 7.24 B

llm_load_print_meta: model size = 5.53 GiB (6.56 BPW)

llm_load_print_meta: general.name = mistralai_mistral-7b-instruct-v0.2

llm_load_print_meta: BOS token = 1 ''

~~llm_load_print_meta: EOS token = 2 '~~'

llm_load_print_meta: UNK token = 0 ''

llm_load_print_meta: PAD token = 0 ''

llm_load_print_meta: LF token = 13 '<0x0A>'

llm_load_tensors: ggml ctx size = 0.22 MiB

llm_load_tensors: offloading 32 repeating layers to GPU

llm_load_tensors: offloading non-repeating layers to GPU

llm_load_tensors: offloaded 33/33 layers to GPU

llm_load_tensors: CPU buffer size = 102.54 MiB

llm_load_tensors: CUDA0 buffer size = 5563.55 MiB

...................................................................................................

llama_new_context_with_model: n_ctx = 2048

llama_new_context_with_model: freq_base = 1000000.0

llama_new_context_with_model: freq_scale = 1

llama_kv_cache_init: CUDA0 KV buffer size = 256.00 MiB

llama_new_context_with_model: KV self size = 256.00 MiB, K (f16): 128.00 MiB, V (f16): 128.00 MiB

llama_new_context_with_model: CUDA_Host input buffer size = 24.02 MiB

GGML_ASSERT: /home/purpledev/talk-llama-fast/ggml.c:5155: pos

[1] 584047 IOT instruction (core dumped) ./talk-llama

✘ purpledev@amethyst  ~/talk-llama-fast/build/bin   master ± 

text as input

Hello,
Thank you very much for your work.
It's not really an issue but is it possible to add the option to write text as user input with talk llama ?
Another thing, llama.cpp has an argument to set the llm amount load on multi gpu setup with --tensor-split or -ts argument, it is possible to add this argument on talk-llama ?

Распознавание экрана

Надеюсь можно создавать ищью на русском.
Есть возможно прикрутить распознавание экрана? Чтобы нейронка могла взаимодействовать с ПК.
Типо ты ей говоришь "запусти PyCharm", она видит его на рабочем столе и запускает его.
Да, скорее всего нужно добавить еще какую-то нейронку которая с изображениями работает и скорее всего объяснить ей что в каких-то моментах надо именно на экран смотреть, а в каких-то просто отвечать на вопросы.
Но, выглядит реально.
Если нет, можешь объяснить почему?
Заранее, спасибо за ответ.

whisper.cpp

Rofl?
Why not faster-whisper?
Faster, smaller, better.

And streaming mode is terrible, sure.
But, for speed…

Software is usual and simple, but all-in-one.

Try faster-whisper. =)

docker/docker-compose

Can you prepeare Dockerfile and docker-compose ? Its musthave to fast deploy. And no problem to pass gpu to on windows/linux os.

Where should the video output be?

I can't figure out where the video output should be from, I launched all parts of the application but there was no understanding

I looked at the localhost that open in the console, but they are empty

Входящий текст по API или ещё как-либо

Приветствую!
Огромное спасибо за такую реализацию, всё отлично работает (кроме момента с аудио вводом сообщения: когда llama слушает аудио, нет от неё видео - выставляю --vad-start-thold на 0 видео соответствнно появляется, но тогда нет аудио ввода - но это не суть :)
Возник другой вопрос, есть ли сейчас какой-то во всём этом функционал, чтобы запрос от пользователя попадал в модуль talk-llama не через ввод текста в диалоговом окне (или аудио), а, например, через API или, может, командную строку или ещё как-то (подхват, например, из текстового файла)
Делаю ТГ-бота, вывод видео-сообщений сделал, но пока не могу понять, как скормить в запущенный модуль talk-llama входящий промпт от пользователя

Whisper Transcription not working

So I've done all the steps. but when I ran talk-llama-wav2lip.bat it said that cudart64_110.dll, cublasLt64_11.dll and cublas64_11.dll was missing. so I went online, downloaded those DLLs and dropped them in the main directory and it worked. No errors after that. I then tweaked the parameters to my liking alongside the audio input device. However, I'm speaking and nothing is transcribing. I feel like this may be an issue with Whisper.

Build for mac (Сборка под мак)

Modern Arm based macbooks are very powerfull and can be used to inference LLMs with acceptable speed without gpu. Can you create a build for MacOs, without cuda, or it is not possible?

Use mic always on and interrupt in SillyTavern

Hi. I would like to have the mic always on and allow interrupt in ST. Is it a matter of setting up the XTTSv2 up with your patch? Please brief explanation I'm very impressed with your video demonstrations and with talk-llama-fast It's fun to use and get such fast responses

Does it really work with extended ASCII used for Cyrillic symbols?

The Cyrillic support added by this project raises an interesting problem.

If the source files are saved in Unicode (which is usual for Linux), the '«' and '»' symbols are represented in UTF-8. Char codes for these symbols are outside of the 0..255 range of 1-byte char, which is strongly pointed out by Clang:

talk-llama-fast.cpp:1248:55: error: character too large for enclosing character literal type
 1248 |                                 text_heard = RemoveTrailingCharacters(text_heard, '«');
      |                                                                                   ^
talk-llama-fast.cpp:1249:55: error: character too large for enclosing character literal type
 1249 |                                 text_heard = RemoveTrailingCharacters(text_heard, '»');

I think what you meant to have was actually the same symbols of extended ASCII table given by 0xab and 0xbb codes:

--- a/src/talk-llama-fast.cpp
+++ b/src/talk-llama-fast.cpp
@@ -1245,8 +1245,8 @@ int run(int argc, const char ** argv) {
                                text_heard = RemoveTrailingCharacters(text_heard, '!');
                                text_heard = RemoveTrailingCharacters(text_heard, ',');
                                text_heard = RemoveTrailingCharacters(text_heard, '.');
-                               text_heard = RemoveTrailingCharacters(text_heard, '«');
-                               text_heard = RemoveTrailingCharacters(text_heard, '»');
+                               text_heard = RemoveTrailingCharacters(text_heard, 0xab); // '«'
+                               text_heard = RemoveTrailingCharacters(text_heard, 0xbb); // '»'
                                if (text_heard[0] == '.') text_heard.erase(0, 1);
                                if (text_heard[0] == '!') text_heard.erase(0, 1);
                                trim(text_heard);

I'm afraid this is not all, yet. The other Cyrillic symbols in the source file also use Unicode. Therefore, they all are likely to be mishandled in the same way. On the other hand, does LLAMA really output extended ASCII, not Unicode or both?? This code clearly assumes that it is always extended ASCII.

error: unknown argument: --vad-start-thold

branch: master
OS: Ubuntu Server 22.04
Compilers: gcc 11.4.0 and g++ 11.4.0

I tried compiling your code. I was able to generate talk-llama, downloaded your script talk-llama.bat, but when I run it I get the following error continued by instructions on how to use talk-llama :

error: unknown argument: --vad-start-thold

usage: ./talk-llama [options]

options:
  -h,       --help           [default] show this help message and exit
  -t N,     --threads N      [4      ] number of threads to use during computation
  -vms N,   --voice-ms N     [10000  ] voice duration in milliseconds
  -c ID,    --capture ID     [-1     ] capture device ID
  -mt N,    --max-tokens N   [32     ] maximum number of tokens per audio chunk
  -ac N,    --audio-ctx N    [0      ] audio context size (0 - all)
  -ngl N,   --n-gpu-layers N [999    ] number of layers to store in VRAM
  -vth N,   --vad-thold N    [0.60   ] voice activity detection threshold
  -vlm N,   --vad-last-ms N  [500    ] vad min silence after speech, ms
  -fth N,   --freq-thold N   [100.00 ] high-pass frequency cutoff
  -su,      --speed-up       [false  ] speed up audio by x2 (reduced accuracy)
  -tr,      --translate      [false  ] translate from source language to english
  -ps,      --print-special  [false  ] print special tokens
  -pe,      --print-energy   [false  ] print sound energy (for debugging)
  -vp,      --verbose-prompt [false  ] print prompt at start
  -ng,      --no-gpu         [false  ] disable GPU
  -p NAME,  --person NAME    [Alex   ] person name (for prompt selection)
  -bn NAME, --bot-name NAME  [LLaMA  ] bot name (to display)
  -w TEXT,  --wake-command T [       ] wake-up command to listen for
  -ho TEXT, --heard-ok TEXT  [       ] said by TTS before generating reply
  -l LANG,  --language LANG  [en     ] spoken language
  -mw FILE, --model-whisper  [./ggml-medium.en-q5_0.bin] whisper model file
  -ml FILE, --model-llama    [./mistral-7b-instruct-v0.2.Q6_K.gguf] llama model file
  -s FILE,  --speak TEXT     [speak  ] command for TTS
  --prompt-file FNAME        [       ] file with custom prompt to start dialog
  --session FNAME                   file to cache model state in (may be large!) (default: none)
  -f FNAME, --file FNAME     [       ] text output file name
   --ctx_size N              [2048   ] Size of the prompt context
  -n N, --n_predict N        [64     ] Number of tokens to predict
  --temp N                   [0.90   ] Temperature 
  --top_k N                  [40.00  ] top_k 
  --top_p N                  [1.00   ] top_p 
  --repeat_penalty N         [1.10   ] repeat_penalty 
  --xtts-voice NAME          [emma_1 ] xtts voice without .wav
  --xtts-url TEXT            [http://localhost:8020/] xtts/silero server URL, with trailing slash
  --xtts-control-path FNAME  [./talk-llama-fast/xtts/xtts_play_allowed.txt] path to xtts_play_allowed.txt  --google-url TEXT          [http://localhost:8003/] langchain google-serper server URL, with /

Have you used a branch that you haven't pushed yet to build the demo version? If not, can you tell me what is it that I am missing?

mozer / talk-llama-fast Goto Github PK

talk-llama-fast's People

Contributors

Stargazers

Watchers

Forkers

talk-llama-fast's Issues

Recommend Projects

Recommend Topics

Recommend Org