pabannier / biogpt.cpp Goto Github PK
View Code? Open in Web Editor NEWPort of Microsoft's BioGPT in C/C++ using ggml
Port of Microsoft's BioGPT in C/C++ using ggml
compiling using:
make CC=gcc-11 CPP=g++-11 CXX=g++-11 LD=g++-1
failed to create biogpt. only create file main
log info:
I biogpt.cpp build info:
I UNAME_S: Linux
I UNAME_P: x86_64
I UNAME_M: x86_64
I CFLAGS: -I. -O3 -std=c11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -pthread -march=native -mtune=native
I CXXFLAGS: -I. -O3 -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native
I LDFLAGS:
I CC:
I CXX:
gcc-11 -I. -O3 -std=c11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -pthread -march=native -mtune=native -c ggml.c -o ggml.o
g++-11 -I. -O3 -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -c mosestokenizer.cpp -o mosestokenizer.o
g++-11 -I. -O3 -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -c bpe.cpp -o bpe.o
g++-11 -I. -O3 -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -c biogpt.cpp -o biogpt.o
biogpt.cpp: In function ‘bool biogpt_model_load(const string&, biogpt_model&, biogpt_vocab&, uint8_t)’:
biogpt.cpp:210:13: warning: C++ designated initializers only available with ‘-std=c++20’ or ‘-std=gnu++20’ [-Wpedantic]
210 | .mem_size = ctx_size,
| ^
biogpt.cpp:211:13: warning: C++ designated initializers only available with ‘-std=c++20’ or ‘-std=gnu++20’ [-Wpedantic]
211 | .mem_buffer = NULL,
| ^
biogpt.cpp:212:13: warning: C++ designated initializers only available with ‘-std=c++20’ or ‘-std=gnu++20’ [-Wpedantic]
212 | .no_alloc = false,
| ^
biogpt.cpp:364:89: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 5 has type ‘int64_t’ {aka ‘long int’} [-Wformat=]
364 | fprintf(stderr, "%s: tensor '%s' has wrong shape in model file: got [%lld, %lld], expected [%d, %d]\n",
| ~~~^
| |
| long long int
| %ld
365 | func, name.data(), tensor->ne[0], tensor->ne[1], ne[0], ne[1]);
| ~~~~~~~~~~~~~
| |
| int64_t {aka long int}
biogpt.cpp:364:95: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 6 has type ‘int64_t’ {aka ‘long int’} [-Wformat=]
364 | fprintf(stderr, "%s: tensor '%s' has wrong shape in model file: got [%lld, %lld], expected [%d, %d]\n",
| ~~~^
| |
| long long int
| %ld
365 | func, name.data(), tensor->ne[0], tensor->ne[1], ne[0], ne[1]);
| ~~~~~~~~~~~~~
| |
| int64_t {aka long int}
biogpt.cpp: In function ‘bool biogpt_eval(const biogpt_model&, int, int, const std::vector&, std::vector&, size_t&)’:
biogpt.cpp:596:9: warning: C++ designated initializers only available with ‘-std=c++20’ or ‘-std=gnu++20’ [-Wpedantic]
596 | .mem_size = buf_size,
| ^
biogpt.cpp:597:9: warning: C++ designated initializers only available with ‘-std=c++20’ or ‘-std=gnu++20’ [-Wpedantic]
597 | .mem_buffer = buf,
| ^
biogpt.cpp:598:9: warning: C++ designated initializers only available with ‘-std=c++20’ or ‘-std=gnu++20’ [-Wpedantic]
598 | .no_alloc = false,
| ^
g++-11 -I. -O3 -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -c main.cpp -o main.o
g++-11 -I. -O3 -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native main.o biogpt.o mosestokenizer.o bpe.o ggml.o -o main
Decode the entire sequence at once, instead of one a per-token basis.
Thanks for your great tool.
I've compiled the biogpt and converted the model to ggml successfully, but cannot run it. When ./bin/biogpt -m path/to/model
or just ./bin/biogpt -h
, it throwed an error: libc++abi.dylib: terminating with uncaught exception of type std::runtime_error: Perl Uniprops file not available.
.
I checked that the perluniprops folder was in data directory.
I used macos 10.15 on Intel CPU., what should I do to fix this?
Thanks for your help.
Hi Pierre,
Awesome idea to do this project. I am trying to get it running but their is no biogpt executable being generated after running make - their is just the main executable. If I try to run ./main -p "trastuzumab" I get an error:
libc++abi: terminating with uncaught exception of type char const*
zsh: abort ./main -p "trastuzumab"
Was wondering what the right way to run this is.
Thanks!
How to run a large model https://huggingface.co/microsoft/BioGPT-Large/tree/main
Thanks for taking the time to build this! Awesome initiative.
So I'm stuck here because I did this:
mkdir build && cd build
cmake ..
cmake --build . --config Release
I go back to the root project folder and I download the weights into a weights
folder and run the convert script
python convert.py --dir-model ./weights/ --out-dir ./ggml_weights
and all is well. I get the ggml_weights
folder.
This is now my directory structure:
.
├── biogpt.cpp
├── biogpt.h
├── bpe.cpp
├── bpe.h
├── build
│ ├── bin
│ ├── CMakeCache.txt
│ ├── CMakeFiles
│ ├── cmake_install.cmake
│ ├── compile_commands.json
│ ├── examples
│ ├── ggml
│ └── Makefile
├── CMakeLists.txt
├── convert.py
├── data
│ ├── nonbreaking_prefixes
│ └── perluniprops
├── examples
│ ├── CMakeLists.txt
│ ├── main
│ └── quantize
├── ggml
│ ├── build.zig
│ ├── ci
│ ├── cmake
│ ├── CMakeLists.txt
│ ├── examples
│ ├── ggml.pc.in
│ ├── include
│ ├── LICENSE
│ ├── README.md
│ ├── requirements.txt
│ ├── scripts
│ ├── src
│ └── tests
├── ggml_weights
│ └── ggml-model.bin
├── mosestokenizer.cpp
├── mosestokenizer.h
├── README.md
└── weights
├── config.json
├── merges.txt
├── pytorch_model.bin
├── README.md
└── vocab.json
Then I go to
cd build/bin
./main -p "trastuzumab" 15:27:19
terminate called after throwing an instance of 'std::runtime_error'
what(): Perl Uniprops file not available.
fish: Job 1, './main -p "trastuzumab"' terminated by signal SIGABRT (Abort)
So for some reason the executable doesn't run and it's missing perl uniprops which are already located in your data
folder. But it still doesn't work.
What am I doing wrong?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.