kuvaus / llamagptj-chat Goto Github PK

Simple chat program for LLaMa, GPT-J, and MPT models.

License: MIT License

CMake 3.82% C++ 89.49% C 3.81% Python 2.88%

llamagptj-chat's Introduction

LlamaGPTJ-chat

Simple command line chat program for GPT-J, LLaMA and MPT models written in C++. Based on llama.cpp and uses gpt4all-backend for full compatibility.

Warning Very early progress, might have bugs

Installation
Usage
GPT-J, LLaMA, and MPT models
Detailed command list
Useful features
License

Installation

Since the program is made using c++ it should build and run on most Linux, MacOS and Windows systems. The Releases link has ready-made binaries. AVX2 is faster and works on most newer computers. If you run the program, it will check and print if your computer has AVX2 support.

Download

git clone --recurse-submodules https://github.com/kuvaus/LlamaGPTJ-chat
cd LlamaGPTJ-chat

You need to also download a model file, see supported models for details and links.

Build

Since the program is made using c++ it should build and run on most Linux, MacOS and Windows systems. On most systems, you only need this to build:

mkdir build
cd build
cmake ..
cmake --build . --parallel

Note

If you have an old processor, you can turn AVX2 instructions OFF in the build step with -DAVX2=OFF flag.

If you have a new processor, you can turn AVX512 instructions ON in the build step with -DAVX512=ON flag.

On old macOS, set -DBUILD_UNIVERSAL=OFF to make the build x86 only instead of the universal Intel/ARM64 binary. On really old macOS, set -DOLD_MACOS=ON. This disables /save and /load but compiles on old Xcode.

On Windows you can now use Visual Studio (MSVC) or MinGW. If you want MinGW build instead, set -G "MinGW Makefiles".

On ARM64 Linux there are no ready-made binaries, but you can now build it from source.

Usage

After compiling, the binary is located at:

build/bin/chat

But you're free to move it anywhere. Simple command for 4 threads to get started:

./chat -m "/path/to/modelfile/ggml-vicuna-13b-1.1-q4_2.bin" -t 4

./chat -m "/path/to/modelfile/ggml-gpt4all-j-v1.3-groovy.bin" -t 4

Happy chatting!

GPT-J, LLaMA, and MPT models

Current backend supports the GPT-J, LLaMA and MPT models.

GPT-J model

You need to download a GPT-J model first. Here are direct links to models:

The default version is v1.0: ggml-gpt4all-j.bin

At the time of writing the newest is 1.3-groovy: ggml-gpt4all-j-v1.3-groovy.bin

They're around 3.8 Gb each. The chat program stores the model in RAM on runtime so you need enough memory to run. You can get more details on GPT-J models from gpt4all.io or nomic-ai/gpt4all github.

LLaMA model

Alternatively you need to download a LLaMA model first. The original weights are for research purposes and you can apply for access here. Below are direct links to derived models:

Vicuna 7b v1.1: ggml-vicuna-7b-1.1-q4_2.bin

Vicuna 13b v1.1: ggml-vicuna-13b-1.1-q4_2.bin

GPT-4-All l13b-snoozy: ggml-gpt4all-l13b-snoozy.bin

The LLaMA models are quite large: the 7B parameter versions are around 4.2 Gb and 13B parameter 8.2 Gb each. The chat program stores the model in RAM on runtime so you need enough memory to run. You can get more details on LLaMA models from the whitepaper or META AI website.

MPT model

You can also download and use an MPT model instead. Here are direct links to MPT-7B models:

MPT-7B base model pre-trained by Mosaic ML: ggml-mpt-7b-base.bin

MPT-7B instruct model trained by Mosaic ML: ggml-mpt-7b-instruct.bin

Non-commercial MPT-7B chat model trained by Mosaic ML: ggml-mpt-7b-chat.bin

They're around 4.9 Gb each. The chat program stores the model in RAM on runtime so you need enough memory to run. You can get more details on MPT models from MosaicML website or mosaicml/llm-foundry github.

Detailed command list

You can view the help and full parameter list with: ./chat -h

usage: ./bin/chat [options]

A simple chat program for GPT-J, LLaMA, and MPT models.
You can set specific initial prompt with the -p flag.
Runs default in interactive and continuous mode.
Type '/reset' to reset the chat context.
Type '/save','/load' to save network state into a binary file.
Type '/save NAME','/load NAME' to rename saves. Default: --save_name NAME.
Type '/help' to show this help dialog.
Type 'quit', 'exit' or, 'Ctrl+C' to quit.

options:
  -h, --help            show this help message and exit
  -v, --version         show version and license information
  --run-once            disable continuous mode
  --no-interactive      disable interactive mode altogether (uses given prompt only)
  --no-animation        disable chat animation
  --no-saves            disable '/save','/load' functionality
  -s SEED, --seed SEED  RNG seed for --random-prompt (default: -1)
  -t N, --threads    N  number of threads to use during computation (default: 4)
  -p PROMPT, --prompt PROMPT
                        prompt to start generation with (default: empty)
  --random-prompt       start with a randomized prompt.
  -n N, --n_predict  N  number of tokens to predict (default: 200)
  --top_k            N  top-k sampling (default: 40)
  --top_p            N  top-p sampling (default: 0.9)
  --temp             N  temperature (default: 0.9)
  --n_ctx            N  number of tokens in context window (default: 0)
  -b N, --batch_size N  batch size for prompt processing (default: 20)
  --repeat_penalty   N  repeat_penalty (default: 1.1)
  --repeat_last_n    N  last n tokens to penalize  (default: 64)
  --context_erase    N  percent of context to erase  (default: 0.8)
  --b_token             optional beginning wrap token for response (default: empty)
  --e_token             optional end wrap token for response (default: empty)
  -j,   --load_json FNAME
                        load options instead from json at FNAME (default: empty/no)
  --load_template   FNAME
                        load prompt template from a txt file at FNAME (default: empty/no)
  --save_log        FNAME
                        save chat log to a file at FNAME (default: empty/no)
  --load_log        FNAME
                        load chat log from a file at FNAME (default: empty/no)
  --save_dir        DIR
                        directory for saves (default: ./saves)
  --save_name       NAME
                        save/load model state binary at save_dir/NAME.bin (current: model_state)
                        context is saved to save_dir/NAME.ctx (current: model_state)
  -m FNAME, --model FNAME
                        model path (current: ./models/ggml-vicuna-13b-1.1-q4_2.bin)

Useful features

Here are some handy features and details on how to achieve them using command line options.

Save/load chat log and read output from other apps

By default, the program prints the chat to standard (stdout) output, so if you're including the program into your app, it only needs to read stdout. You can also save the whole chat log to a text file with --save_log option. There's an elementary way to remember your past conversation by simply loading the saved chat log with --load_log option when you start a new session.

Run the program once without user interaction

If you only need the program to run once without any user interactions, one way is to set prompt with -p "prompt" and using --no-interactive and --no-animation flags. The program will read the prompt, print the answer, and close.

Add AI personalities and characters

If you want a personality for your AI, you can change prompt_template_sample.txt and use --load_template to load the modified file. The only constant is that your input during chat will be put on the %1 line. Instructions, prompt, response, and everything else can be replaced any way you want. Having different personality_template.txt files is an easy way to add different AI characters. With some models, giving both AI and user names instead of Prompt: and Response:, can make the conversation flow more naturally as the AI tries to mimic a conversation between two people.

Ability to reset chat context

You can reset the chat at any time during chatting by typing /reset in the input field. This will clear the AI's memory of past conversations, logits, and tokens. You can then start the chat from a blank slate without having to reload the whole model again.

Load all parameters using JSON

You can also fetch parameters from a json file with --load_json "/path/to/file.json" flag. Different models might perform better or worse with different input parameters so using json files is a handy way to store and load all the settings at once. The JSON file loader is designed to be simple in order to prevent any external dependencies, and as a result, the JSON file must follow a specific format. Here is a simple example:

{"top_p": 1.0, "top_k": 50400, "temp": 0.9, "n_batch": 9}

This is useful when you want to store different temperature and sampling settings.

And a more detailed one:

{
"top_p": 1.0,
"top_k": 50400,
"temp": 0.9,
"n_batch": 20,
"threads": 12,
"prompt": "Once upon a time",
"load_template": "/path/to/prompt_template_sample.txt",
"model": "/path/to/ggml-gpt4all-j-v1.3-groovy.bin",
"no-interactive": "true"
}

This one loads the prompt from the json, uses a specific template, and runs the program once in no-interactive mode so user does not have to press any input.

License

This project is licensed under the MIT License

llamagptj-chat's People

Contributors

Stargazers

Watchers

llamagptj-chat's Issues

Build fails on aarch64 linux

After cloning recursively I inspected the cmake config with ccmake ..

Not having intel cpu, I disabled all *AVX
`

AVX2 OFF
CMAKE_BUILD_TYPE Release
CMAKE_INSTALL_PREFIX /usr/local
LLAMA_ACCELERATE ON
LLAMA_ALL_WARNINGS ON
LLAMA_ALL_WARNINGS_3RD_PARTY OFF
LLAMA_AVX OFF
LLAMA_AVX2 OFF
LLAMA_AVX512 OFF
LLAMA_AVX512_VBMI OFF
LLAMA_AVX512_VNNI OFF
LLAMA_BUILD_EXAMPLES OFF
LLAMA_BUILD_TESTS OFF
LLAMA_CLBLAST OFF
LLAMA_CUBLAS OFF
LLAMA_F16C ON
LLAMA_FMA ON
LLAMA_GPROF OFF
LLAMA_LTO OFF
LLAMA_NATIVE OFF
LLAMA_OPENBLAS OFF
LLAMA_SANITIZE_ADDRESS OFF
LLAMA_SANITIZE_THREAD OFF
LLAMA_SANITIZE_UNDEFINED OFF
LLAMA_STATIC OFF
`

then reconfigured and generated new cmake files.

cmake --build . --parallel 2>&1> errors.txt /usr/bin/ld: CMakeFiles/chat.dir/chat.cpp.o: in functionparse_json_string(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)':
chat.cpp:(.text+0x555c): undefined reference to __aarch64_ldadd4_acq_rel' /usr/bin/ld: chat.cpp:(.text+0x556c): undefined reference to __aarch64_ldadd4_acq_rel'
/usr/bin/ld: CMakeFiles/chat.dir/chat.cpp.o: in function std::__future_base::_Deferred_state<std::thread::_Invoker<std::tuple<void (*)()> >, void>::_M_complete_async()': chat.cpp:(.text._ZNSt13__future_base15_Deferred_stateINSt6thread8_InvokerISt5tupleIJPFvvEEEEEvE17_M_complete_asyncEv[_ZNSt13__future_base15_Deferred_stateINSt6thread8_InvokerISt5tupleIJPFvvEEEEEvE17_M_complete_asyncEv]+0xe8): undefined reference to __aarch64_swp4_rel'
/usr/bin/ld: CMakeFiles/chat.dir/chat.cpp.o: in function std::__cxx11::basic_regex<char, std::__cxx11::regex_traits<char> >::~basic_regex()': chat.cpp:(.text._ZNSt7__cxx1111basic_regexIcNS_12regex_traitsIcEEED2Ev[_ZNSt7__cxx1111basic_regexIcNS_12regex_traitsIcEEED5Ev]+0x9c): undefined reference to __aarch64_ldadd4_acq_rel'
/usr/bin/ld: chat.cpp:(.text._ZNSt7__cxx1111basic_regexIcNS_12regex_traitsIcEEED2Ev[_ZNSt7__cxx1111basic_regexIcNS_12regex_traitsIcEEED5Ev]+0xac): undefined reference to __aarch64_ldadd4_acq_rel' /usr/bin/ld: CMakeFiles/chat.dir/chat.cpp.o: in function std::__basic_future::wait() const':
chat.cpp:(.text._ZNKSt14__basic_futureIvE4waitEv[_ZNKSt14__basic_futureIvE4waitEv]+0x40): undefined reference to __aarch64_ldset4_relax' /usr/bin/ld: CMakeFiles/chat.dir/chat.cpp.o: in function std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release()':
chat.cpp:(.text._ZNSt16_Sp_counted_baseILN9__gnu_cxx12_Lock_policyE2EE10_M_releaseEv[_ZNSt16_Sp_counted_baseILN9__gnu_cxx12_Lock_policyE2EE10_M_releaseEv]+0x8c): undefined reference to __aarch64_ldadd4_acq_rel' /usr/bin/ld: chat.cpp:(.text._ZNSt16_Sp_counted_baseILN9__gnu_cxx12_Lock_policyE2EE10_M_releaseEv[_ZNSt16_Sp_counted_baseILN9__gnu_cxx12_Lock_policyE2EE10_M_releaseEv]+0x9c): undefined reference to __aarch64_ldadd4_acq_rel'
/usr/bin/ld: CMakeFiles/chat.dir/chat.cpp.o: in function std::future<std::__invoke_result<std::decay<void (&)()>::type>::type> std::async<void (&)()>(std::launch, void (&)())': chat.cpp:(.text._ZSt5asyncIRFvvEJEESt6futureINSt15__invoke_resultINSt5decayIT_E4typeEJDpNS4_IT0_E4typeEEE4typeEESt6launchOS5_DpOS8_[_ZSt5asyncIRFvvEJEESt6futureINSt15__invoke_resultINSt5decayIT_E4typeEJDpNS4_IT0_E4typeEEE4typeEESt6launchOS5_DpOS8_]+0xb0): undefined reference to __aarch64_ldadd4_acq_rel'
/usr/bin/ld: chat.cpp:(.text.ZSt5asyncIRFvvEJEESt6futureINSt15__invoke_resultINSt5decayIT_E4typeEJDpNS4_IT0_E4typeEEE4typeEESt6launchOS5_DpOS8[ZSt5asyncIRFvvEJEESt6futureINSt15__invoke_resultINSt5decayIT_E4typeEJDpNS4_IT0_E4typeEEE4typeEESt6launchOS5_DpOS8]+0x1ac): undefined reference to __aarch64_swp1_acq_rel' /usr/bin/ld: chat.cpp:(.text._ZSt5asyncIRFvvEJEESt6futureINSt15__invoke_resultINSt5decayIT_E4typeEJDpNS4_IT0_E4typeEEE4typeEESt6launchOS5_DpOS8_[_ZSt5asyncIRFvvEJEESt6futureINSt15__invoke_resultINSt5decayIT_E4typeEJDpNS4_IT0_E4typeEEE4typeEESt6launchOS5_DpOS8_]+0x234): undefined reference to __aarch64_ldadd4_acq_rel'
/usr/bin/ld: chat.cpp:(.text.ZSt5asyncIRFvvEJEESt6futureINSt15__invoke_resultINSt5decayIT_E4typeEJDpNS4_IT0_E4typeEEE4typeEESt6launchOS5_DpOS8[ZSt5asyncIRFvvEJEESt6futureINSt15__invoke_resultINSt5decayIT_E4typeEJDpNS4_IT0_E4typeEEE4typeEESt6launchOS5_DpOS8]+0x244): undefined reference to __aarch64_ldadd4_acq_rel' ...
If anyone has ideas for getting it built on aarch64, please share. Thanks!

How to make it generate longer results?

Hi!

How would one make it generate longer results? I need it to generate very long answers, but it always stops after a few lines.

Way to reset the chat?

My chats often get stuck in repetitive loops, or there are situations where I want to issue a new command with no memory of the previous prompts and responses. On GPT4ALL you can press the button to clear the chat to do that.

In LlamaGPTJ-chat I've been shutting it down and restarting to achieve the same, but this comes with the time overhead and potential memory issues of re-loading the model. I know it's not much, but shaving 5 seconds off the time by not having to reload would be handy.

Is there already a way to reset the chat (especially the gptj models since that's what I'm using) and if not, can you suggest how I might do this? As a wild guess I did a quick experiment that used a modified version of code from where the context gets cleared if it runs out of room but it didn't work, it'd crash or get stuck in loops or something. So I was looking at this bit of code as a possible hint at how to do it, but don't know if that's really relevant at all.

    // Check if the context has run out...
    if (promptCtx.n_past + batch.size() > promptCtx.n_ctx) {
        const int32_t erasePoint = promptCtx.n_ctx * promptCtx.contextErase;
        // Erase the first percentage of context from the tokens...
        std::cerr << "GPTJ: reached the end of the context window so resizing\n";
        promptCtx.tokens.erase(promptCtx.tokens.begin(), promptCtx.tokens.begin() + erasePoint);
        promptCtx.n_past = promptCtx.tokens.size();
        recalculateContext(promptCtx, recalculateCallback);
        assert(promptCtx.n_past + batch.size() <= promptCtx.n_ctx);
    }

Illegal instruction

$ ./chat -m "E:/Program Files (x86)/ChatGPT/Models/ggml-vicuna-13b-1.1-q4_2.bin" -t 2
LlamaGPTJ-chat (v. 0.3.0)
LlamaGPTJ-chat: loading E:/Program Files (x86)/ChatGPT/Models/ggml-vicuna-13b-1.1-q4_2.bin
llama.cpp: loading model from E:/Program Files (x86)/ChatGPT/Models/ggml-vicuna-13b-1.1-q4_2.bin
Illegal instruction

What is this? Can you help me?

Converting Models from .pth to .ggml

Hello,
I wanted to ask how you are able to convert the MPT-7B weights from the HuggingFace checkpoints to .ggml. Is there a guide to this? Does the llama.cpp repo conversion code work out of the box?

Thanks!

Ingest local documents for analisys?

Is it possible to feed local documents/files and chat about them with this as with imartinez/privateGPT ?? if so, how

cmake .. error

on windows (i had no problems compiling llama.cpp)

but here upon
cmake ..

`H:\LlamaGPTJ-chat\build>cmake ..
-- Building for: Visual Studio 16 2019
-- Selecting Windows SDK version 10.0.19041.0 to target Windows 10.0.19045.
-- The C compiler identification is MSVC 19.29.30037.0
-- The CXX compiler identification is MSVC 19.29.30037.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/BuildTools/VC/Tools/MSVC/14.29.30037/bin/Hostx64/x64/cl.exe - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/BuildTools/VC/Tools/MSVC/14.29.30037/bin/Hostx64/x64/cl.exe - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - not found
-- Found Threads: TRUE
CMake Error at CMakeLists.txt:94 (add_subdirectory):
The source directory

H:/LlamaGPTJ-chat/llmodel/llama.cpp

does not contain a CMakeLists.txt file.

-- Configuring incomplete, errors occurred!`

Error report: Terminated when loading configuration from a json file

I put some options into a json file, and run the chat program by "chat.exe -j chat.json", then an error occurred:

LlamaGPTJ-chat (v. 0.2.3)
Your computer supports AVX2
LlamaGPTJ-chat: parsing options from json: chat.json
LlamaGPTJ-chat: loading models\ggml-mpt-7b-chat.bin
mpt_model_load: loading model from 'models\ggml-mpt-7b-chat.bin' - please wait ...
mpt_model_load: n_vocab = 50432
mpt_model_load: n_ctx = 2048
mpt_model_load: n_embd = 4096
mpt_model_load: n_head = 32
mpt_model_load: n_layer = 32
mpt_model_load: alibi_bias_max = 8.000000
mpt_model_load: clip_qkv = 0.000000
mpt_model_load: ftype = 2
mpt_model_load: ggml ctx size = 5653.09 MB
mpt_model_load: kv self size = 1024.00 MB
mpt_model_load: .............................. done
mpt_model_load: model size = 4629.02 MB / num tensors = 194
LlamaGPTJ-chat: done loading!

terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc

The content of chat.json is:

{
"top_p": 0.9,
"top_k": 40,
"temp": 0.9,
"threads": 4,
"save_log": "chat.log",
"load_template": "prompt_template.txt",
"model": "models\ggml-mpt-7b-chat.bin"
}

While, I use the same options by means of a Windows batch file, it runs OK.

The batch file contains:

@echo off
chat.exe --top_p 0.9 --top_k 40 --temp 0.9 --threads 4 --save_log chat.log --load_template prompt_template.txt --model models\ggml-mpt-7b-chat.bin

Functionality to have "memory" from loading a chat log file

Right now, every time you load the program and the model, its like starting a brand-new chat. Considering that it's possible to save the chat to a log file, it would be quite awesome to be able to load that log and continue the conversation from there. Kind of like how the ChatGPT API works. (You can pass a string of messages to the "history" parameter and the assistant would "remember" the previous conversation).

bug: Seed parameter doesn't work.

It appears the seed parameter doesn't work:

./chat -m ggml-gpt4all-j-v1.3-groovy.bin -s 1

I can provide it with the same prompt (What is a dog?) on two different clean sessions and the response may be different. I would have expected the exact same result for model + template + context + seed + prompt combination.

Have I missed something else that might cause different results or is seed not working?

Can't compile, llamamodel errors.

Hi, I really like your project because I need to run mpt-7b on windows. I might be doing this wrong, but I've can't get it to compile.

The gpt4all-backend\llama.cpp folder is empty so I assume I'm supposed to put a copy in there. I've put in the latest llama.cpp project in there. cmake runs fine and sets up the build.

I know llama.cpp keeps changing, and when I try to compile I get the following errors (I've skipped most the output and warnings):

4>C:\Projects\LlamaGPTJ_chat_00\gpt4all-backend\gptj.cpp(223,25): error C2065: 'GGML_TYPE_Q4_2': undeclared identifier

4>C:\Projects\LlamaGPTJ_chat_00\gpt4all-backend\llamamodel.cpp(52,19): error C2039: 'n_parts': is not a member of 'llama_context_params'

4>C:\Projects\LlamaGPTJ_chat_00\gpt4all-backend\llamamodel.cpp(52,39): error C2039: 'n_parts': is not a member of 'gpt_params'

4>C:\Projects\LlamaGPTJ_chat_00\gpt4all-backend\llamamodel.cpp(100,12): error C2664: 'size_t llama_set_state_data(llama_context *,uint8_t *)': cannot convert argument 2 from 'const uint8_t *' to 'uint8_t *'

4>C:\Projects\LlamaGPTJ_chat_00\gpt4all-backend\llamamodel.cpp(184,26): error C3861: 'llama_sample_top_p_top_k': identifier not found

4>C:\Projects\LlamaGPTJ_chat_00\gpt4all-backend\mpt.cpp(230,25): error C2065: 'GGML_TYPE_Q4_2': undeclared identifier

4>C:\Projects\LlamaGPTJ_chat_00\gpt4all-backend\mpt.cpp(553,53): error C2660: 'ggml_alibi': function does not take 4 arguments

4>Done building project "llmodel.vcxproj" -- FAILED.

5>LINK : warning LNK4044: unrecognized option '/static-libgcc'; ignored
5>LINK : warning LNK4044: unrecognized option '/static-libstdc++'; ignored
5>LINK : warning LNK4044: unrecognized option '/static'; ignored
5>LINK : fatal error LNK1181: cannot open input file '..\gpt4all-backend\Release\llmodel.lib'
5>Done building project "chat.vcxproj" -- FAILED.
6>------ Skipped Rebuild All: Project: ALL_BUILD, Configuration: Release x64 ------
6>Project not selected to build for this solution configuration
========== Rebuild All: 3 succeeded, 2 failed, 1 skipped ==========

Do you have any plans to update the project, and if not, could you give many any tips for fixing the code? I'm really not familiar with how any of this stuff works so I'm fumbling around in the dark at the moment.

What does save/load do?

Type '/save','/load' to save network state into a binary file.

After I run save and load it back into a new session it seems to have forgotten my previous prompts, what exactly do these commands do and what are they used for?

Windows Defender Treats it as Virus

Hello! I wanted to download the latest version of the programme. But when I do, Windows Defender detects it as trojan. This is kind of worrying since this did not happen before? Here's the picture of it blocking and removing it.

Also, I noticed that you also remove all the older version downloads. May I ask why you did so?

Functionality to have load history from a chat log file

nm asdf

Somehow this got crossed from another project. I'm checking you guys out too. :)

High cpu when launching as a process, and parent app quits

I have an app and I added a feature to launch this 'chat' tool as a separate process and I send it stdin and parse stdout.

It works great, but if I quit my app, the chat process keeps running and the cpu goes to 80%.

It seems to get stuck in an endless loop when stdin goes down, but I haven't looked at it that carefully.

feat: Better template formatting options.

It would be nice to have more formatting options like being able to wrap the output in certain tokens to assist with reading from stdout.

The following is an example format, altering the users input and the models output:

### Instruction:
The prompt below is a question to answer, write an appropriate response.
### Prompt:
How many words does the sentence '%1' have?
### Response:
<START>%2<FINISH>

This sort of behavior may be possible asking the model to format it in a specific way but that is error prone and may not work on all models.

Furthermore, it would be nice to have more information about the template format and how it is parsed and what parts are important. (E.g. are the # symbols before the headers important? Can they be replaced with other symbols or removed?)

Cannot build the app

I got some errors when building the app on AlmaLinux 8.8. Can you help?

$ ls /usr/lib/libpthread.*
/usr/lib/libpthread.so  /usr/lib/libpthread.so.0
$ ls /usr/lib/libstdc++.*
/usr/lib/libstdc++.so  /usr/lib/libstdc++.so.6  /usr/lib/libstdc++.so.6.0.25
$ ls /usr/lib/libm.*
/usr/lib/libm.so  /usr/lib/libm.so.6
$ ls /usr/lib/libc.*
/usr/lib/libc.so  /usr/lib/libc.so.6

$ '/mnt/Archive/Downloads/cmake-3.28.0-rc5-linux-x86_64/bin/cmake' --build . --parallel
[  8%] Built target ggml
[ 25%] Built target llama
[ 83%] Built target llmodel
[ 91%] Linking CXX executable ../bin/chat
/usr/bin/ld: cannot find -lpthread
/usr/bin/ld: cannot find -lstdc++
/usr/bin/ld: cannot find -lm
/usr/bin/ld: cannot find -lc
collect2: error: ld returned 1 exit status
gmake[2]: *** [src/CMakeFiles/chat.dir/build.make:99: bin/chat] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:217: src/CMakeFiles/chat.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2

Not enough space in the context's memory pool (needed 865484240, available 863773936

I'm attempting to use ggml-mpt-7b-chat.bin to summarize the following article and am consistently getting the error:

Jamie Komoroski’s blood alcohol level was over three times the legal limit when she allegedly drove her car into a golf-cart style vehicle carrying a newly married couple away from their wedding reception last month, killing the bride, according to a South Carolina Law Enforcement Division toxicology report.In the report shared with CNN by the Folly Beach Police Department, Komoroski, 25, was found to have had a blood alcohol content of 0.261%. South Carolina law prohibits driving with a blood alcohol content of 0.08% or higher.The bride, Samantha Hutchinson, 34, from Charlotte, North Carolina, died of blunt force injuries according to the Charleston County Coroner’s Office. Her husband, Aric Hutchinson, and two others were also injured in the crash.Komoroski is charged with one count of reckless homicide and three counts of felony DUI resulting in great bodily harm, according to online court records. Her vehicle was traveling 65 mph in a 25-mph zone, according to Police Chief Andrew Gilreath.Komoroski refused a field sobriety test after the incident on April 28 and a warrant was issued for her blood to be taken for testing, according to an affidavit.“We cannot fathom what the families are going through and offer our deepest sympathies. We simply ask that there not be a rush to judgment. Our court system is founded upon principles of justice and mercy and that is where all facts will come to light,” Christopher Gramiccioni, an attorney for Komoroski, told CNN in a statement. Please summarize the article.

Here are the software details:
LlamaGPTJ-chat (v. 0.1.8)
LlamaGPTJ-chat: parsing options from json: chat.json
LlamaGPTJ-chat: loading ggml-mpt-7b-chat.bin
mpt_model_load: loading model from 'ggml-mpt-7b-chat.bin' - please wait ...
mpt_model_load: n_vocab = 50432
mpt_model_load: n_ctx = 2048
mpt_model_load: n_embd = 4096
mpt_model_load: n_head = 32
mpt_model_load: n_layer = 32
mpt_model_load: alibi_bias_max = 8.000000
mpt_model_load: clip_qkv = 0.000000
mpt_model_load: ftype = 2
mpt_model_load: ggml ctx size = 5653.09 MB
mpt_model_load: kv self size = 1024.00 MB
mpt_model_load: ................................ done
mpt_model_load: model size = 4629.02 MB / num tensors = 194
LlamaGPTJ-chat: done loading!

chat.json:
{
"top_p": 0.9,
"top_k": 50432,
"temp": 0.3,
"n_batch": 56,
"model": "ggml-mpt-7b-chat.bin",
"threads": 4,
"n_predict": 64,
"n_ctx": 512
}

The hardware specifications (Windows 11):
Processor AMD Ryzen 7 1700X Eight-Core Processor 3.40 GHz
Installed RAM 64.0 GB
System type 64-bit operating system, x64-based processor

Is there something I don't have configured correctly?

Update GPT4All/llama.cpp

GPT4All uses a newer version of llama.cpp which can handle the new ggml formats. Currently this throws an error similar to the following if you attempt to load a model of a newer version:

error loading model: unknown (magic, version) combination: 67676a74, 00000002; is this really a GGML file?

Here's how to compile and run under MINGW64 from Msys2

Fixit@DAD MINGW64 ~/LlamaGPTJ-chat
$ mkdir build
(myenv)
Fixit@DAD MINGW64 ~/LlamaGPTJ-chat
$ cd build
(myenv)
Fixit@DAD MINGW64 ~/LlamaGPTJ-chat/build
$ mkdir models
(myenv)
Fixit@DAD MINGW64 /LlamaGPTJ-chat/build
$ cp ../../MODELS/ggml-vicuna-13b-1.1-q4_2.bin models
(myenv)
Fixit@DAD MINGW64 /LlamaGPTJ-chat/build
$ cmake --fresh .. -DCMAKE_CXX_COMPILER=g++.exe -DCMAKE_C_COMPILER=gcc.exe
-- The C compiler identification is GNU 12.2.0
-- The CXX compiler identification is GNU 12.2.0
System is unknown to cmake, create:
Platform/MINGW64_NT-10.0-19045 to use this system, please post your config file on discourse.cmake.org so it can be added to cmake
-- Detecting C compiler ABI info
System is unknown to cmake, create:
Platform/MINGW64_NT-10.0-19045 to use this system, please post your config file on discourse.cmake.org so it can be added to cmake
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /mingw64/bin/gcc.exe - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
System is unknown to cmake, create:
Platform/MINGW64_NT-10.0-19045 to use this system, please post your config file on discourse.cmake.org so it can be added to cmake
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /mingw64/bin/g++.exe - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
System is unknown to cmake, create:
Platform/MINGW64_NT-10.0-19045 to use this system, please post your config file on discourse.cmake.org so it can be added to cmake
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- CMAKE_SYSTEM_PROCESSOR: unknown
-- Unknown architecture
-- Configuring done (25.8s)
-- Generating done (0.5s)
-- Build files have been written to: /home/Fixit/LlamaGPTJ-chat/build
(myenv)
Fixit@DAD MINGW64 /LlamaGPTJ-chat/build
$ cmake --build . --parallel
[ 8%] Building C object gpt4all-backend/llama.cpp/CMakeFiles/ggml.dir/ggml.c.obj
C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c: In function 'quantize_row_q4_
':
C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c:781:15: warning: unused variabl
e 'nb' [-Wunused-variable]
781 | const int nb = k / QK4_0;
| ^
C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c: In function 'quantize_row_q4_
':
C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c:1129:27: warning: unused variab
le 'y' [-Wunused-variable]
1129 | block_q4_1 * restrict y = vy;
| ^
C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c:1127:15: warning: unused variab
le 'nb' [-Wunused-variable]
1127 | const int nb = k / QK4_1;
| ^
C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c: In function 'quantize_row_q8_
':
C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c:1507:15: warning: unused variab
le 'nb' [-Wunused-variable]
1507 | const int nb = k / QK8_1;
| ^
C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c: In function 'ggml_compute_forw
ard_alibi_f32':
C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c:9357:15: warning: unused variab
le 'ne2_ne3' [-Wunused-variable]
9357 | const int ne2_ne3 = n/ne1; // ne2*ne3
| ^~~~~~~
C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c: In function 'ggml_compute_forw
ard_alibi_f16':
C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c:9419:15: warning: unused variab
le 'ne2' [-Wunused-variable]
9419 | const int ne2 = src0->ne[2]; // n_head -> this is k
| ^~~
C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c: In function 'ggml_compute_forw
ard_alibi':
C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c:9468:5: warning: enumeration va
lue 'GGML_TYPE_Q4_3' not handled in switch [-Wswitch]
9468 | switch (src0->type) {
| ^~~~~~
[ 8%] Built target ggml
[ 16%] Building CXX object gpt4all-backend/llama.cpp/CMakeFiles/llama.dir/llama.cpp.obj
In file included from C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/llama.cpp:8:
C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/llama_util.h: In constructor 'llama_mm
ap::llama_mmap(llama_file*, bool)':
C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/llama_util.h:233:94: note: '#pragma me
ssage: warning: You are building for pre-Windows 8; prefetch not supported'
233 | #pragma message("warning: You are building for pre-Windows 8; prefetch not supported
")
|
^
C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/llama_util.h:201:47: warning: unused p
arameter 'prefetch' [-Wunused-parameter]
201 | llama_mmap(struct llama_file * file, bool prefetch = true) {
| ~~~~~^~~~~~~~~~~~~~~
[ 25%] Linking CXX static library libllama.a
[ 25%] Built target llama
[ 33%] Building CXX object gpt4all-backend/CMakeFiles/llmodel.dir/llama.cpp/examples/common.cpp.obj
[ 41%] Building CXX object gpt4all-backend/CMakeFiles/llmodel.dir/llmodel_c.cpp.obj
[ 50%] Building CXX object gpt4all-backend/CMakeFiles/llmodel.dir/gptj.cpp.obj
[ 58%] Building CXX object gpt4all-backend/CMakeFiles/llmodel.dir/llamamodel.cpp.obj
[ 66%] Building CXX object gpt4all-backend/CMakeFiles/llmodel.dir/mpt.cpp.obj
[ 75%] Building CXX object gpt4all-backend/CMakeFiles/llmodel.dir/utils.cpp.obj
[ 83%] Linking CXX static library libllmodel.a
/mingw64/bin/ar.exe qc libllmodel.a CMakeFiles/llmodel.dir/gptj.cpp.obj CMakeFiles/llmodel.dir/llamamodel.cpp.obj CMakeFiles/llmodel.dir/llama.cpp/examples/common.cpp.obj CMakeFiles/llmodel.dir/llmodel_c.cpp.obj CMakeFiles/llmodel.dir/mpt.cpp.obj CMakeFiles/llmodel.dir/utils.cpp.obj
/mingw64/bin/ranlib.exe libllmodel.a
[ 83%] Built target llmodel
[ 91%] Building CXX object src/CMakeFiles/chat.dir/chat.cpp.obj
[100%] Linking CXX executable ../bin/chat
[100%] Built target chat
(myenv)
Fixit@DAD MINGW64 ~/LlamaGPTJ-chat/build
$ bin/chat
LlamaGPTJ-chat (v. 0.3.0)
Your computer supports AVX2
LlamaGPTJ-chat: loading .\models\ggml-vicuna-13b-1.1-q4_2.bin
llama.cpp: loading model from .\models\ggml-vicuna-13b-1.1-q4_2.bin
llama_model_load_internal: format = ggjt v1 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 2048
llama_model_load_internal: n_embd = 5120
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 40
llama_model_load_internal: n_layer = 40
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 5 (mostly Q4_2)
llama_model_load_internal: n_ff = 13824
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size = 73.73 KB
llama_model_load_internal: mem required = 9807.47 MB (+ 1608.00 MB per state)
llama_init_from_file: kv self size = 1600.00 MB
LlamaGPTJ-chat: done loading!

hello
Hello! How can I help you today?
/quit
(myenv)
Fixit@DAD MINGW64 ~/LlamaGPTJ-chat/build
$

Took 5 minutes to respond with just saying hello

UPDATE #1
$ ./bin/chat -m "models/ggml-gpt4all-j-v1.3-groovy.bin" -t 4
LlamaGPTJ-chat (v. 0.3.0)
Your computer supports AVX2
LlamaGPTJ-chat: loading models/ggml-gpt4all-j-v1.3-groovy.bin
gptj_model_load: loading model from 'models/ggml-gpt4all-j-v1.3-groovy.bin' - please wait ...
gptj_model_load: n_vocab = 50400
gptj_model_load: n_ctx = 2048
gptj_model_load: n_embd = 4096
gptj_model_load: n_head = 16
gptj_model_load: n_layer = 28
gptj_model_load: n_rot = 64
gptj_model_load: f16 = 2
gptj_model_load: ggml ctx size = 5401.45 MB
gptj_model_load: kv self size = 896.00 MB
gptj_model_load: .............................................. done
gptj_model_load: model size = 3609.38 MB / num tensors = 285
LlamaGPTJ-chat: done loading!

hello
Hi! How can I assist you today?
/quit
(myenv)
Fixit@DAD MINGW64 ~/LlamaGPTJ-chat/build
$
Just over 2 minutes to respond to my hello

how to get rid of AVX? -DAVX2=OFF doesn't work

user@gpt4all:$ uname -a
Linux gpt4all 6.2.16-5-pve #1 SMP PREEMPT_DYNAMIC PVE 6.2.16-6 (2023-07-25T15:33Z) x86_64 x86_64 x86_64 GNU/Linux
user@gpt4all:$ git clone --recurse-submodules https://github.com/kuvaus/LlamaGPTJ-chat
cd LlamaGPTJ-chat
Cloning into 'LlamaGPTJ-chat'...
remote: Enumerating objects: 1191, done.
remote: Counting objects: 100% (300/300), done.
remote: Compressing objects: 100% (64/64), done.
remote: Total 1191 (delta 258), reused 250 (delta 233), pack-reused 891
Receiving objects: 100% (1191/1191), 1.09 MiB | 9.24 MiB/s, done.
Resolving deltas: 100% (736/736), done.
Submodule 'llama.cpp' (https://github.com/manyoso/llama.cpp) registered for path 'gpt4all-backend/llama.cpp'
Cloning into '/home/user/LlamaGPTJ-chat/gpt4all-backend/llama.cpp'...
remote: Enumerating objects: 1977, done.
remote: Counting objects: 100% (542/542), done.
remote: Compressing objects: 100% (31/31), done.
remote: Total 1977 (delta 516), reused 511 (delta 511), pack-reused 1435
Receiving objects: 100% (1977/1977), 2.03 MiB | 6.20 MiB/s, done.
Resolving deltas: 100% (1277/1277), done.
Submodule path 'gpt4all-backend/llama.cpp': checked out '03ceb39c1e729bed4ad1dfa16638a72f1843bf0c'
user@gpt4all:/LlamaGPTJ-chat$ mkdir buid
user@gpt4all:/LlamaGPTJ-chat$ rmdir buid
user@gpt4all:/LlamaGPTJ-chat$ mkdir build
user@gpt4all:/LlamaGPTJ-chat$ cd build
user@gpt4all:/LlamaGPTJ-chat/build$ cmake -DAVX2=OFF ..
-- The C compiler identification is GNU 12.3.0
-- The CXX compiler identification is GNU 12.3.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- x86 detected
-- Configuring done
-- Generating done
-- Build files have been written to: /home/user/LlamaGPTJ-chat/build
user@gpt4all:/LlamaGPTJ-chat/build$ cmake --build -DAVX2=OFF . --parallel
Unknown argument .
Usage: cmake --build

[options] [-- [native-options]]
cmake --build --preset [options] [-- [native-options]]
Options:

= Project binary directory to be built. --preset , --preset= = Specify a build preset. --list-presets[=] = List available build presets. --parallel [], -j [] = Build in parallel using the given number of jobs. If is omitted the native build tool's default number is used. The CMAKE_BUILD_PARALLEL_LEVEL environment variable specifies a default parallel level when this option is not given. -t ..., --target ... = Build instead of default targets. --config = For multi-configuration tools, choose . --clean-first = Build target 'clean' first, then build. (To clean only, use --target 'clean'.) --resolve-package-references={on|only|off} = Restore/resolve package references during build. -v, --verbose = Enable verbose output - if supported - including the build commands to be executed. -- = Pass remaining options to the native tool. user@gpt4all:~/LlamaGPTJ-chat/build$ cmake --build . --parallel [ 8%] Building C object gpt4all-backend/llama.cpp/CMakeFiles/ggml.dir/ggml.c.o /home/user/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c: In function 'quantize_row_q4_1': /home/user/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c:1129:27: warning: unused variable 'y' [-Wunused-variable] 1129 | block_q4_1 * restrict y = vy; | ^ /home/user/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c:1127:15: warning: unused variable 'nb' [-Wunused-variable] 1127 | const int nb = k / QK4_1; | ^~ /home/user/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c: In function 'ggml_compute_forward_alibi_f32': /home/user/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c:9357:15: warning: unused variable 'ne2_ne3' [-Wunused-variable] 9357 | const int ne2_ne3 = n/ne1; // ne2*ne3 | ^~~~~~~ /home/user/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c: In function 'ggml_compute_forward_alibi_f16': /home/user/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c:9419:15: warning: unused variable 'ne2' [-Wunused-variable] 9419 | const int ne2 = src0->ne[2]; // n_head -> this is k | ^~~ /home/user/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c: In function 'ggml_compute_forward_alibi': /home/user/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c:9468:5: warning: enumeration value 'GGML_TYPE_Q4_3' not handled in switch [-Wswitch] 9468 | switch (src0->type) { | ^~~~~~ [ 8%] Built target ggml [ 16%] Building CXX object gpt4all-backend/llama.cpp/CMakeFiles/llama.dir/llama.cpp.o [ 25%] Linking CXX static library libllama.a [ 25%] Built target llama [ 33%] Building CXX object gpt4all-backend/CMakeFiles/llmodel.dir/gptj.cpp.o [ 41%] Building CXX object gpt4all-backend/CMakeFiles/llmodel.dir/llamamodel.cpp.o [ 50%] Building CXX object gpt4all-backend/CMakeFiles/llmodel.dir/llmodel_c.cpp.o [ 58%] Building CXX object gpt4all-backend/CMakeFiles/llmodel.dir/llama.cpp/examples/common.cpp.o [ 66%] Building CXX object gpt4all-backend/CMakeFiles/llmodel.dir/mpt.cpp.o [ 75%] Building CXX object gpt4all-backend/CMakeFiles/llmodel.dir/utils.cpp.o /home/user/LlamaGPTJ-chat/gpt4all-backend/llmodel_c.cpp: In function 'void* llmodel_model_create(const char*)': /home/user/LlamaGPTJ-chat/gpt4all-backend/llmodel_c.cpp:59:10: warning: ignoring return value of 'size_t fread(void*, size_t, size_t, FILE*)' declared with attribute 'warn_unused_result' [-Wunused-result] 59 | fread(&magic, sizeof(magic), 1, f); | ~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [ 83%] Linking CXX static library libllmodel.a /usr/bin/ar qc libllmodel.a CMakeFiles/llmodel.dir/gptj.cpp.o CMakeFiles/llmodel.dir/llamamodel.cpp.o CMakeFiles/llmodel.dir/llama.cpp/examples/common.cpp.o CMakeFiles/llmodel.dir/llmodel_c.cpp.o CMakeFiles/llmodel.dir/mpt.cpp.o CMakeFiles/llmodel.dir/utils.cpp.o /usr/bin/ranlib libllmodel.a [ 83%] Built target llmodel [ 91%] Building CXX object src/CMakeFiles/chat.dir/chat.cpp.o /home/user/LlamaGPTJ-chat/src/chat.cpp: In function 'llmodel_prompt_context load_ctx_from_binary(chatParams&, std::string&)': /home/user/LlamaGPTJ-chat/src/chat.cpp:206:10: warning: ignoring return value of 'size_t fread(void*, size_t, size_t, FILE*)' declared with attribute 'warn_unused_result' [-Wunused-result] 206 | fread(&prompt_context, sizeof(prompt_context), 1, file); | ~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /home/user/LlamaGPTJ-chat/src/chat.cpp: In function 'int main(int, char**)': /home/user/LlamaGPTJ-chat/src/chat.cpp:395:21: warning: ignoring return value of 'FILE* freopen(const char*, const char*, FILE*)' declared with attribute 'warn_unused_result' [-Wunused-result] 395 | std::freopen("/dev/null", "w", stderr); | ~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~ [100%] Linking CXX executable ../bin/chat [100%] Built target chat user@gpt4all:~/LlamaGPTJ-chat/build$ bin/chat LlamaGPTJ-chat (v. 0.3.0) Your computer does not support AVX1 or AVX2 The program will likely not run. .Segmentation fault user@gpt4all:~/LlamaGPTJ-chat/build$ cd .. user@gpt4all:~/LlamaGPTJ-chat$ rm -fr build user@gpt4all:~/LlamaGPTJ-chat$ mkdir build user@gpt4all:~/LlamaGPTJ-chat$ cd build user@gpt4all:~/LlamaGPTJ-chat/build$ cmake -D AVX2=OFF .. -- The C compiler identification is GNU 12.3.0 -- The CXX compiler identification is GNU 12.3.0 -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: /usr/bin/cc - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /usr/bin/c++ - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Performing Test CMAKE_HAVE_LIBC_PTHREAD -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success -- Found Threads: TRUE -- CMAKE_SYSTEM_PROCESSOR: x86_64 -- x86 detected -- Configuring done -- Generating done -- Build files have been written to: /home/user/LlamaGPTJ-chat/build user@gpt4all:~/LlamaGPTJ-chat/build$ cmake --build . --parallel [ 8%] Building C object gpt4all-backend/llama.cpp/CMakeFiles/ggml.dir/ggml.c.o /home/user/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c: In function 'quantize_row_q4_1': /home/user/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c:1129:27: warning: unused variable 'y' [-Wunused-variable] 1129 | block_q4_1 * restrict y = vy; | ^ /home/user/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c:1127:15: warning: unused variable 'nb' [-Wunused-variable] 1127 | const int nb = k / QK4_1; | ^~ /home/user/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c: In function 'ggml_compute_forward_alibi_f32': /home/user/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c:9357:15: warning: unused variable 'ne2_ne3' [-Wunused-variable] 9357 | const int ne2_ne3 = n/ne1; // ne2*ne3 | ^~~~~~~ /home/user/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c: In function 'ggml_compute_forward_alibi_f16': /home/user/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c:9419:15: warning: unused variable 'ne2' [-Wunused-variable] 9419 | const int ne2 = src0->ne[2]; // n_head -> this is k | ^~~ /home/user/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c: In function 'ggml_compute_forward_alibi': /home/user/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c:9468:5: warning: enumeration value 'GGML_TYPE_Q4_3' not handled in switch [-Wswitch] 9468 | switch (src0->type) { | ^~~~~~ [ 8%] Built target ggml [ 16%] Building CXX object gpt4all-backend/llama.cpp/CMakeFiles/llama.dir/llama.cpp.o [ 25%] Linking CXX static library libllama.a [ 25%] Built target llama [ 33%] Building CXX object gpt4all-backend/CMakeFiles/llmodel.dir/gptj.cpp.o [ 41%] Building CXX object gpt4all-backend/CMakeFiles/llmodel.dir/llama.cpp/examples/common.cpp.o [ 50%] Building CXX object gpt4all-backend/CMakeFiles/llmodel.dir/llamamodel.cpp.o [ 58%] Building CXX object gpt4all-backend/CMakeFiles/llmodel.dir/llmodel_c.cpp.o [ 66%] Building CXX object gpt4all-backend/CMakeFiles/llmodel.dir/mpt.cpp.o [ 75%] Building CXX object gpt4all-backend/CMakeFiles/llmodel.dir/utils.cpp.o /home/user/LlamaGPTJ-chat/gpt4all-backend/llmodel_c.cpp: In function 'void* llmodel_model_create(const char*)': /home/user/LlamaGPTJ-chat/gpt4all-backend/llmodel_c.cpp:59:10: warning: ignoring return value of 'size_t fread(void*, size_t, size_t, FILE*)' declared with attribute 'warn_unused_result' [-Wunused-result] 59 | fread(&magic, sizeof(magic), 1, f); | ~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [ 83%] Linking CXX static library libllmodel.a /usr/bin/ar qc libllmodel.a CMakeFiles/llmodel.dir/gptj.cpp.o CMakeFiles/llmodel.dir/llamamodel.cpp.o CMakeFiles/llmodel.dir/llama.cpp/examples/common.cpp.o CMakeFiles/llmodel.dir/llmodel_c.cpp.o CMakeFiles/llmodel.dir/mpt.cpp.o CMakeFiles/llmodel.dir/utils.cpp.o /usr/bin/ranlib libllmodel.a [ 83%] Built target llmodel [ 91%] Building CXX object src/CMakeFiles/chat.dir/chat.cpp.o /home/user/LlamaGPTJ-chat/src/chat.cpp: In function 'llmodel_prompt_context load_ctx_from_binary(chatParams&, std::string&)': /home/user/LlamaGPTJ-chat/src/chat.cpp:206:10: warning: ignoring return value of 'size_t fread(void*, size_t, size_t, FILE*)' declared with attribute 'warn_unused_result' [-Wunused-result] 206 | fread(&prompt_context, sizeof(prompt_context), 1, file); | ~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /home/user/LlamaGPTJ-chat/src/chat.cpp: In function 'int main(int, char**)': /home/user/LlamaGPTJ-chat/src/chat.cpp:395:21: warning: ignoring return value of 'FILE* freopen(const char*, const char*, FILE*)' declared with attribute 'warn_unused_result' [-Wunused-result] 395 | std::freopen("/dev/null", "w", stderr); | ~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~ [100%] Linking CXX executable ../bin/chat [100%] Built target chat user@gpt4all:~/LlamaGPTJ-chat/build$ bin/chat LlamaGPTJ-chat (v. 0.3.0) Your computer does not support AVX1 or AVX2 The program will likely not run. .Segmentation fault user@gpt4all:~/LlamaGPTJ-chat/build$

I tried -DAVX2=OFF first and then -D AVX2=OFF second

What am I doing wrong?

How give context

How can I add a context? For example, I would like to create a chatbot which would ask me to select between 3 products.

Apple
Strawberry
Orange.
And then ask what product I want and how much. How do I create that context?

Diego.