Giter Club home page Giter Club logo

tiktoken_ruby's Introduction

Gem Version

tiktoken_ruby

Tiktoken is BPE tokenizer from OpenAI used with their GPT models. This is a wrapper around it aimed primarily at enabling accurate counts of GPT model tokens used.

Request for maintainers

I can't really put substantial time into maintaining this. Probably nothing more than a couple hours every few months. If you have experience maintaining ruby gems and would like to lend a hand please send me an email or reply to this issue

Installation

Install the gem and add to the application's Gemfile by executing:

$ bundle add tiktoken_ruby

If bundler is not being used to manage dependencies, install the gem by executing:

$ gem install tiktoken_ruby

Usage

Usage should be very similar to the python library. Here's a simple example

Encode and decode text

require 'tiktoken_ruby'
enc = Tiktoken.get_encoding("cl100k_base")
enc.decode(enc.encode("hello world")) #=> "hello world"

Encoders can also be retrieved by model name

require 'tiktoken_ruby'

enc = Tiktoken.encoding_for_model("gpt-4")
enc.encode("hello world").length #=> 2

Development

After checking out the repo, run bin/setup to install dependencies. Then, run rake spec to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.

To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and the created tag, and push the .gem file to rubygems.org.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/iapark/tiktoken_ruby.

To get started with development:

git clone https://github.com/IAPark/tiktoken_ruby.git
cd tiktoken_ruby
bundle install
bundle exec rake compile
bundle exec rake spec

License

The gem is available as open source under the terms of the MIT License.

tiktoken_ruby's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

tiktoken_ruby's Issues

LoadError: /lib/x86_64-linux-gnu/libc.so: invalid ELF header

I'm receiving the following error when I'm loading this gem in Github Actions, with the ubuntu-latest image. Any ideas?

LoadError: /lib/x86_64-linux-gnu/libc.so: invalid ELF header - /home/runner/work/payscore/payscore/payscore/vendor/ruby/3.0.0/gems/tiktoken_ruby-0.0.5-x86_64-linux-musl/lib/tiktoken_ruby/3.0/tiktoken_ruby.so
/home/runner/work/payscore/payscore/payscore/vendor/ruby/3.0.0/gems/tiktoken_ruby-0.0.5-x86_64-linux-musl/lib/tiktoken_ruby.rb:8:in `require_relative'
/home/runner/work/payscore/payscore/payscore/vendor/ruby/3.0.0/gems/tiktoken_ruby-0.0.5-x86_64-linux-musl/lib/tiktoken_ruby.rb:8:in `<top (required)>'
/home/runner/work/payscore/payscore/payscore/config/application.rb:7:in `<top (required)>'
/home/runner/work/payscore/payscore/payscore/Rakefile:4:in `require_relative'
/home/runner/work/payscore/payscore/payscore/Rakefile:4:in `<top (required)>'
/home/runner/work/payscore/payscore/payscore/vendor/ruby/3.0.0/gems/railties-6.1.4.1/lib/rails/commands/rake/rake_command.rb:20:in `block in perform'
/home/runner/work/payscore/payscore/payscore/vendor/ruby/3.0.0/gems/railties-6.1.4.1/lib/rails/commands/rake/rake_command.rb:18:in `perform'
/home/runner/work/payscore/payscore/payscore/vendor/ruby/3.0.0/gems/railties-6.1.4.1/lib/rails/command.rb:50:in `invoke'
/home/runner/work/payscore/payscore/payscore/vendor/ruby/3.0.0/gems/railties-6.1.4.1/lib/rails/commands.rb:18:in `<top (required)>'
bin/rails:4:in `require'
bin/rails:4:in `<main>'

Failed to build on Ventura 13.4.1 M1 and Ruby 2.7.6

When trying to build the gem under Ventura 13.4.1 M1 and Ruby 2.7.6, I get the following error:

checking for gcc... yes
checking for g++... yes
checking for libtool -static... no
checking for cargo... yes

current directory: /Users/dev/gems/tiktoken_ruby-0.0.4/ext/tiktoken_ruby
make "DESTDIR=" clean

current directory: /Users/dev/gems/tiktoken_ruby-0.0.4/ext/tiktoken_ruby
make "DESTDIR="
generating target/release/libtiktoken_ruby.dylib (release)
cargo rustc  --manifest-path ./Cargo.toml --target-dir target --lib --profile release -- -C linker=gcc -L native=/Users/dev/rubies/ruby-2.7.6/lib -L native=/opt/homebrew/opt/postgresql@11/lib -L native=/opt/homebrew/opt/libffi/lib -C link-arg=-Wl,-undefined,dynamic_lookup -C
link-arg=-Wl,-multiply_defined,suppress -L native=/opt/homebrew/opt/libyaml/lib -L native=/opt/homebrew/opt/libksba/lib -L native=/opt/homebrew/opt/readline/lib -L native=/opt/homebrew/opt/zlib/lib -L native=/opt/homebrew/opt/[email protected]/lib
error: failed to get `tiktoken-rs` as a dependency of package `tiktoken_ruby v0.1.0 (/Users/dev/gems/tiktoken_ruby-0.0.4/ext/tiktoken_ruby)`

Caused by:
  failed to load source for dependency `tiktoken-rs`

Caused by:
  Unable to update https://github.com/IAPark/tiktoken-rs.git#5231fbf4

Caused by:
  failed to parse manifest at `/Users/dev/.cargo/git/checkouts/tiktoken-rs-c5fa179ec3aafdfe/5231fbf/tiktoken-rs/Cargo.toml`

Caused by:
  invalid type: map, expected a sequence for key `package.authors`
make: *** [target/release/libtiktoken_ruby.dylib] Error 101

Any idea what could be the issue?

Can't compile on Ruby 3.0: No builder for extension 'ext/tiktoken_ruby/Cargo.toml'

Gem::Ext::BuildError: ERROR: Failed to build gem native extension.

No builder for extension 'ext/tiktoken_ruby/Cargo.toml'

Gem files will remain installed in /Users/taf2/.rvm/gems/ruby-3.0.5@ctm3/gems/tiktoken_ruby-0.0.3 for inspection.
Results logged to /Users/taf2/.rvm/gems/ruby-3.0.5@ctm3/extensions/arm64-darwin-22/3.0.0/tiktoken_ruby-0.0.3/gem_make.out

/Users/taf2/.rvm/rubies/ruby-3.0.5/lib/ruby/3.0.0/rubygems/ext/builder.rb:144:in build_error' /Users/taf2/.rvm/rubies/ruby-3.0.5/lib/ruby/3.0.0/rubygems/ext/builder.rb:125:in builder_for'
/Users/taf2/.rvm/rubies/ruby-3.0.5/lib/ruby/3.0.0/rubygems/ext/builder.rb:150:in build_extension' /Users/taf2/.rvm/rubies/ruby-3.0.5/lib/ruby/3.0.0/rubygems/ext/builder.rb:193:in block in build_extensions'
/Users/taf2/.rvm/rubies/ruby-3.0.5/lib/ruby/3.0.0/rubygems/ext/builder.rb:190:in each' /Users/taf2/.rvm/rubies/ruby-3.0.5/lib/ruby/3.0.0/rubygems/ext/builder.rb:190:in build_extensions'
/Users/taf2/.rvm/rubies/ruby-3.0.5/lib/ruby/3.0.0/rubygems/installer.rb:837:in build_extensions' /Users/taf2/.rvm/gems/ruby-3.0.5@ctm3/gems/bundler-2.4.4/lib/bundler/rubygems_gem_installer.rb:72:in build_extensions'
/Users/taf2/.rvm/gems/ruby-3.0.5@ctm3/gems/bundler-2.4.4/lib/bundler/rubygems_gem_installer.rb:28:in install' /Users/taf2/.rvm/gems/ruby-3.0.5@ctm3/gems/bundler-2.4.4/lib/bundler/source/rubygems.rb:200:in install'
/Users/taf2/.rvm/gems/ruby-3.0.5@ctm3/gems/bundler-2.4.4/lib/bundler/installer/gem_installer.rb:54:in install' /Users/taf2/.rvm/gems/ruby-3.0.5@ctm3/gems/bundler-2.4.4/lib/bundler/installer/gem_installer.rb:16:in install_from_spec'
/Users/taf2/.rvm/gems/ruby-3.0.5@ctm3/gems/bundler-2.4.4/lib/bundler/installer/parallel_installer.rb:155:in do_install' /Users/taf2/.rvm/gems/ruby-3.0.5@ctm3/gems/bundler-2.4.4/lib/bundler/installer/parallel_installer.rb:146:in block in worker_pool'
/Users/taf2/.rvm/gems/ruby-3.0.5@ctm3/gems/bundler-2.4.4/lib/bundler/worker.rb:62:in apply_func' /Users/taf2/.rvm/gems/ruby-3.0.5@ctm3/gems/bundler-2.4.4/lib/bundler/worker.rb:57:in block in process_queue'
/Users/taf2/.rvm/gems/ruby-3.0.5@ctm3/gems/bundler-2.4.4/lib/bundler/worker.rb:54:in loop' /Users/taf2/.rvm/gems/ruby-3.0.5@ctm3/gems/bundler-2.4.4/lib/bundler/worker.rb:54:in process_queue'
/Users/taf2/.rvm/gems/ruby-3.0.5@ctm3/gems/bundler-2.4.4/lib/bundler/worker.rb:90:in `block (2 levels) in create_threads'

An error occurred while installing tiktoken_ruby (0.0.3), and Bundler cannot continue.

In Gemfile:
tiktoken_ruby

"Failed to build gem native extension" on Linux (Ubuntu)

Thanks for the great gem.
Had the following error-trace when trying to install the gem

Gem::Ext::BuildError: ERROR: Failed to build gem native extension.
....
--- stderr
Using bindgen with clang args:
["-I/home/bdegomme/.rbenv/versions/3.1.3/include/ruby-3.1.0",
"-I/home/bdegomme/.rbenv/versions/3.1.3/include/ruby-3.1.0/x86_64-linux",
"-fms-extensions", "-O3", "-fno-fast-math", "-ggdb3", "-Wall", "-Wextra",
"-Wdeprecated-declarations", "-Wduplicated-cond", "-Wimplicit-function-declaration",
"-Wimplicit-int", "-Wmisleading-indentation", "-Wpointer-arith", "-Wwrite-strings",
"-Wold-style-definition", "-Wimplicit-fallthrough=0", "-Wmissing-noreturn",
"-Wno-cast-function-type", "-Wno-constant-logical-operand", "-Wno-long-long",
"-Wno-missing-field-initializers", "-Wno-overlength-strings",
"-Wno-packed-bitfield-compat", "-Wno-parentheses-equality", "-Wno-self-assign",
"-Wno-tautological-compare", "-Wno-unused-parameter", "-Wno-unused-value",
"-Wsuggest-attribute=format", "-Wsuggest-attribute=noreturn", "-Wunused-variable",
"-Wundef", "-I/home/bdegomme/.rbenv/versions/3.1.3/include"]
  warning: unknown warning option '-Wduplicated-cond' [-Wunknown-warning-option]
warning: unknown warning option '-Wimplicit-fallthrough=0'; did you mean
'-Wimplicit-fallthrough'? [-Wunknown-warning-option]
warning: unknown warning option '-Wno-packed-bitfield-compat'
[-Wunknown-warning-option]
warning: unknown warning option '-Wsuggest-attribute=format'; did you mean
'-Wproperty-attribute-mismatch'? [-Wunknown-warning-option]
warning: unknown warning option '-Wsuggest-attribute=noreturn'
[-Wunknown-warning-option]
/home/bdegomme/.rbenv/versions/3.1.3/include/ruby-3.1.0/ruby/ruby.h:23:10: fatal
error: 'stdarg.h' file not found
clang diag: warning: unknown warning option '-Wduplicated-cond'
[-Wunknown-warning-option]
clang diag: warning: unknown warning option '-Wimplicit-fallthrough=0'; did you
mean '-Wimplicit-fallthrough'? [-Wunknown-warning-option]
clang diag: warning: unknown warning option '-Wno-packed-bitfield-compat'
[-Wunknown-warning-option]
clang diag: warning: unknown warning option '-Wsuggest-attribute=format'; did you
mean '-Wproperty-attribute-mismatch'? [-Wunknown-warning-option]
clang diag: warning: unknown warning option '-Wsuggest-attribute=noreturn'
[-Wunknown-warning-option]
thread 'main' panicked at 'generate bindings:
ClangDiagnostic("/home/xxx/.rbenv/versions/3.1.3/include/ruby-3.1.0/ruby/ruby.h:23:10:
fatal error: 'stdarg.h' file not found\n")'

Resolved it by installing the latest version of clang sudo apt install clang

Putting this here for anyone who might have the same issue.

Error installing gem on Linux Xenial Ruby 3.0/1 AMD64

when Running the tests on travis-ci using Linux Xenial Ruby 3.0/1 AMD64 I get

Could not find gems matching 'tiktoken_ruby' valid for all resolution platforms
(aarch64-linux, arm-linux, arm64-darwin, x86_64-darwin, x86_64-linux-musl) in
rubygems repository 

[Proposal] Add model token limits to tiktoken_ruby

Would you be opposed to adding token limits to https://github.com/IAPark/tiktoken_ruby/blob/main/lib/tiktoken_ruby.rb and a get_token_limit method?
Seems better here than in the project I'm helping with (https://github.com/andreibondarev/langchainrb).

TOKEN_LIMITS = {
# Source:
# https://platform.openai.com/docs/api-reference/embeddings
# https://platform.openai.com/docs/models/gpt-4
"text-embedding-ada-002" => 8191,
"gpt-3.5-turbo" => 4096,
"gpt-3.5-turbo-0301" => 4096,
"text-davinci-003" => 4097,
"text-davinci-002" => 4097,
"code-davinci-002" => 8001,
"gpt-4" => 8192,
"gpt-4-0314" => 8192,
"gpt-4-32k" => 32768,
"gpt-4-32k-0314" => 32768,
"text-curie-001" => 2049,
"text-babbage-001" => 2049,
"text-ada-001" => 2049,
"davinci" => 2049,
"curie" => 2049,
"babbage" => 2049,
"ada" => 2049
}

Install failing on Ruby 3.3.0 and Heroku

I am using Ruby 3.3.0-rc1, it works great with v3.2.2

Error....

Fetching tiktoken_ruby 0.0.6
       Installing tiktoken_ruby 0.0.6 with native extensions
       Gem::Ext::BuildError: ERROR: Failed to build gem native extension.
       
       current directory:
       /tmp/build_6a57c05e/vendor/bundle/ruby/3.3.0+0/gems/tiktoken_ruby-0.0.6/ext/tiktoken_ruby
       /tmp/build_6a57c05e/vendor/ruby-3.3.0.rc1/bin/ruby extconf.rb
       checking for gcc... yes
       checking for g++... yes
       checking for gcc-ar... yes
       checking for cargo... no
       
       current directory:
       /tmp/build_6a57c05e/vendor/bundle/ruby/3.3.0+0/gems/tiktoken_ruby-0.0.6/ext/tiktoken_ruby
       make DESTDIR\= sitearchdir\=./.gem.20231212-4291-rgiys4
       sitelibdir\=./.gem.20231212-4291-rgiys4 clean
       
       current directory:
       /tmp/build_6a57c05e/vendor/bundle/ruby/3.3.0+0/gems/tiktoken_ruby-0.0.6/ext/tiktoken_ruby
       make DESTDIR\= sitearchdir\=./.gem.20231212-4291-rgiys4
       sitelibdir\=./.gem.20231212-4291-rgiys4
       info: downloading installer
       info: profile set to 'minimal'
       info: default host triple is x86_64-unknown-linux-gnu
       info: skipping toolchain installation
       
       
       Rust is installed now. Great!
       
       To get started you need Cargo's bin directory 
       (/tmp/build_6a57c05e/vendor/bundle/ruby/3.3.0+0/gems/tiktoken_ruby-0.0.6/ext/tiktoken_ruby/.rb-sys/stable/cargo/bin)
       in your PATH
       environment variable. This has not been done automatically.
       
       To configure your current shell, run:
       source 
       "/tmp/build_6a57c05e/vendor/bundle/ruby/3.3.0+0/gems/tiktoken_ruby-0.0.6/ext/tiktoken_ruby/.rb-sys/stable/cargo/env"
       info: syncing channel updates for 'stable-x86_64-unknown-linux-gnu'
       info: latest update on 2023-12-07, rust version 1.74.1 (a28077b28 2023-12-04)
       info: downloading component 'cargo'
       info: downloading component 'rust-std'
       info: downloading component 'rustc'
       info: installing component 'cargo'
       info: installing component 'rust-std'
       info: installing component 'rustc'
       
       stable-x86_64-unknown-linux-gnu installed - rustc 1.74.1 (a28077b28
       2023-12-04)
       
       info: default toolchain set to 'stable-x86_64-unknown-linux-gnu'
       info: checking for self-update
       info: using existing install for 'stable-x86_64-unknown-linux-gnu'
       info: default toolchain set to 'stable-x86_64-unknown-linux-gnu'
       
       stable-x86_64-unknown-linux-gnu unchanged - rustc 1.74.1 (a28077b28
       2023-12-04)
       
       info: note that the toolchain 'stable-x86_64-unknown-linux-gnu' is currently in
       use (environment override by RUSTUP_TOOLCHAIN)
       generating target/release/libtiktoken_ruby.so (release)
       /tmp/build_6a57c05e/vendor/bundle/ruby/3.3.0+0/gems/tiktoken_ruby-0.0.6/ext/tiktoken_ruby/.rb-sys/stable/cargo/bin/cargo
       rustc  --manifest-path ./Cargo.toml --target-dir target --lib --profile release
       -- -C linker=gcc -L native=/tmp/build_6a57c05e/vendor/ruby-3.3.0.rc1/lib -C
       link-arg=-lm -l pthread
           Updating crates.io index
           Updating git repository `https://github.com/IAPark/tiktoken-rs.git`
           Updating git submodule `https://github.com/zurawiki/tiktoken`
        Downloading crates ...
         Downloaded bit-set v0.5.3
         Downloaded parking_lot v0.12.1
         Downloaded parking_lot_core v0.9.7
         Downloaded bindgen v0.66.1
         Downloaded proc-macro2 v1.0.66
         Downloaded lock_api v0.4.9
         Downloaded once_cell v1.17.1
         Downloaded memchr v2.5.0
         Downloaded scopeguard v1.1.0
         Downloaded magnus-macros v0.6.0
         Downloaded serde v1.0.157
         Downloaded bitflags v2.4.0
         Downloaded shell-words v1.1.0
         Downloaded shlex v1.1.0
         Downloaded peeking_take_while v0.1.2
         Downloaded regex-automata v0.1.10
         Downloaded libloading v0.7.4
         Downloaded fancy-regex v0.11.0
         Downloaded cfg-if v1.0.0
         Downloaded cexpr v0.6.0
         Downloaded libc v0.2.140
         Downloaded seq-macro v0.3.5
         Downloaded rustc-hash v1.1.0
         Downloaded rb-sys-env v0.1.2
         Downloaded rb-sys-build v0.9.81
         Downloaded rb-sys v0.9.81
         Downloaded lazycell v1.3.0
         Downloaded clang-sys v1.6.0
         Downloaded smallvec v1.10.0
         Downloaded regex-syntax v0.6.28
         Downloaded magnus v0.6.1
         Downloaded regex v1.7.1
         Downloaded syn v2.0.31
         Downloaded nom v7.1.3
         Downloaded unicode-ident v1.0.8
         Downloaded minimal-lexical v0.2.1
         Downloaded glob v0.3.1
         Downloaded bstr v1.4.0
         Downloaded lazy_static v1.4.0
         Downloaded autocfg v1.1.0
         Downloaded base64 v0.21.0
         Downloaded quote v1.0.33
         Downloaded bit-vec v0.6.3
         Downloaded anyhow v1.0.70
         Downloaded aho-corasick v0.7.20
          Compiling memchr v2.5.0
          Compiling proc-macro2 v1.0.66
          Compiling unicode-ident v1.0.8
          Compiling glob v0.3.1
          Compiling clang-sys v1.6.0
          Compiling libc v0.2.140
          Compiling quote v1.0.33
          Compiling cfg-if v1.0.0
          Compiling minimal-lexical v0.2.1
          Compiling syn v2.0.31
          Compiling nom v7.1.3
          Compiling libloading v0.7.4
          Compiling aho-corasick v0.7.20
          Compiling regex-syntax v0.6.28
          Compiling bindgen v0.66.1
          Compiling regex v1.7.1
          Compiling cexpr v0.6.0
          Compiling lazy_static v1.4.0
          Compiling bitflags v2.4.0
          Compiling lazycell v1.3.0
          Compiling peeking_take_while v0.1.2
          Compiling shlex v1.1.0
          Compiling rustc-hash v1.1.0
          Compiling shell-words v1.1.0
          Compiling autocfg v1.1.0
          Compiling lock_api v0.4.9
          Compiling parking_lot_core v0.9.7
          Compiling rb-sys-build v0.9.81
          Compiling rb-sys v0.9.81
          Compiling smallvec v1.10.0
          Compiling scopeguard v1.1.0
          Compiling rb-sys-env v0.1.2
          Compiling anyhow v1.0.70
       error: failed to run custom build command for `rb-sys v0.9.81`
       
       Caused by:
       process didn't exit successfully:
       `/tmp/build_6a57c05e/vendor/bundle/ruby/3.3.0+0/gems/tiktoken_ruby-0.0.6/ext/tiktoken_ruby/target/release/build/rb-sys-f895743fce8c911c/build-script-main`
       (exit status: 101)
         --- stdout
         cargo:rerun-if-env-changed=RUBY
         cargo:rerun-if-env-changed=RBCONFIG_CROSS_COMPILING
         cargo:rerun-if-env-changed=RBCONFIG_RUBY_PROGRAM_VERSION
         cargo:rerun-if-env-changed=RBCONFIG_platform
         cargo:rerun-if-env-changed=RUBY_ROOT
         cargo:rerun-if-env-changed=RUBY_VERSION
         cargo:rerun-if-env-changed=RUBY
         cargo:rerun-if-changed=build/main.rs
         cargo:rerun-if-changed=build/stable_api_config.rs
         cargo:rerun-if-changed=build/version.rs
         cargo:rerun-if-changed=build/features.rs
         cargo:rerun-if-env-changed=RUBY_STATIC
         cargo:rerun-if-env-changed=RBCONFIG_ENABLE_SHARED
         cargo:rerun-if-env-changed=RBCONFIG_rubyhdrdir
         cargo:rerun-if-env-changed=RBCONFIG_rubyarchhdrdir
         cargo:rerun-if-env-changed=RBCONFIG_CPPFLAGS
         cargo:rerun-if-env-changed=RBCONFIG_rubyhdrdir
         cargo:rerun-if-env-changed=TARGET
         cargo:rerun-if-env-changed=TARGET
         cargo:rerun-if-env-changed=BINDGEN_EXTRA_CLANG_ARGS_x86_64-unknown-linux-gnu
         cargo:rerun-if-env-changed=BINDGEN_EXTRA_CLANG_ARGS_x86_64-unknown-linux-gnu
         cargo:rerun-if-env-changed=BINDGEN_EXTRA_CLANG_ARGS_x86_64_unknown_linux_gnu
         cargo:rerun-if-env-changed=BINDGEN_EXTRA_CLANG_ARGS_x86_64_unknown_linux_gnu
         cargo:rerun-if-env-changed=BINDGEN_EXTRA_CLANG_ARGS
         cargo:rerun-if-env-changed=BINDGEN_EXTRA_CLANG_ARGS
       
         --- stderr
       INFO: using bindgen with clang args:
       ["-I/tmp/build_6a57c05e/vendor/ruby-3.3.0.rc1/include/ruby-3.3.0+0",
       "-I/tmp/build_6a57c05e/vendor/ruby-3.3.0.rc1/include/ruby-3.3.0+0/x86_64-linux",
       "-fms-extensions", "-O3", "-fno-fast-math", "-g", "-Wall", "-Wextra",
       "-Wdeprecated-declarations", "-Wdiv-by-zero", "-Wduplicated-cond",
       "-Wimplicit-function-declaration", "-Wimplicit-int", "-Wpointer-arith",
       "-Wwrite-strings", "-Wold-style-definition", "-Wimplicit-fallthrough=0",
       "-Wmissing-noreturn", "-Wno-cast-function-type",
       "-Wno-constant-logical-operand", "-Wno-long-long",
       "-Wno-missing-field-initializers", "-Wno-overlength-strings",
       "-Wno-packed-bitfield-compat", "-Wno-parentheses-equality", "-Wno-self-assign",
       "-Wno-tautological-compare", "-Wno-unused-parameter", "-Wno-unused-value",
       "-Wsuggest-attribute=format", "-Wsuggest-attribute=noreturn",
       "-Wunused-variable", "-Wmisleading-indentation", "-Wundef"]
       thread 'main' panicked at
       /tmp/build_6a57c05e/vendor/bundle/ruby/3.3.0+0/gems/tiktoken_ruby-0.0.6/ext/tiktoken_ruby/.rb-sys/stable/cargo/registry/src/index.crates.io-6f17d22bba15001f/bindgen-0.66.1/lib.rs:604:31:
       Unable to find libclang: "couldn't find any valid shared libraries matching:
       ['libclang.so', 'libclang-*.so', 'libclang.so.*', 'libclang-*.so.*'], set the
       `LIBCLANG_PATH` environment variable to a path where one of these files can be
       found (invalid: [])"
         note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
       warning: build failed, waiting for other jobs to finish...
       make: *** [Makefile:566: target/release/libtiktoken_ruby.so] Error 101
       
       make failed, exit code 2
       
       Gem files will remain installed in
       /tmp/build_6a57c05e/vendor/bundle/ruby/3.3.0+0/gems/tiktoken_ruby-0.0.6 for
       inspection.
       Results logged to
       /tmp/build_6a57c05e/vendor/bundle/ruby/3.3.0+0/extensions/x86_64-linux/3.3.0+0/tiktoken_ruby-0.0.6/gem_make.out
       
       /tmp/build_6a57c05e/vendor/ruby-3.3.0.rc1/lib/ruby/3.3.0+0/rubygems/ext/builder.rb:118:in
       `run'
       /tmp/build_6a57c05e/vendor/ruby-3.3.0.rc1/lib/ruby/3.3.0+0/rubygems/ext/builder.rb:51:in
       `block in make'
       /tmp/build_6a57c05e/vendor/ruby-3.3.0.rc1/lib/ruby/3.3.0+0/rubygems/ext/builder.rb:43:in
       `each'
       /tmp/build_6a57c05e/vendor/ruby-3.3.0.rc1/lib/ruby/3.3.0+0/rubygems/ext/builder.rb:43:in
       `make'
       /tmp/build_6a57c05e/vendor/ruby-3.3.0.rc1/lib/ruby/3.3.0+0/rubygems/ext/ext_conf_builder.rb:42:in
       `build'
       /tmp/build_6a57c05e/vendor/ruby-3.3.0.rc1/lib/ruby/3.3.0+0/rubygems/ext/builder.rb:186:in
       `build_extension'
       /tmp/build_6a57c05e/vendor/ruby-3.3.0.rc1/lib/ruby/3.3.0+0/rubygems/ext/builder.rb:220:in
       `block in build_extensions'
       /tmp/build_6a57c05e/vendor/ruby-3.3.0.rc1/lib/ruby/3.3.0+0/rubygems/ext/builder.rb:217:in
       `each'
       /tmp/build_6a57c05e/vendor/ruby-3.3.0.rc1/lib/ruby/3.3.0+0/rubygems/ext/builder.rb:217:in
       `build_extensions'
       /tmp/build_6a57c05e/vendor/ruby-3.3.0.rc1/lib/ruby/3.3.0+0/rubygems/installer.rb:861:in
       `build_extensions'
       /tmp/build_6a57c05e/vendor/ruby-3.3.0.rc1/lib/ruby/3.3.0+0/bundler/rubygems_gem_installer.rb:76:in
       `build_extensions'
       /tmp/build_6a57c05e/vendor/ruby-3.3.0.rc1/lib/ruby/3.3.0+0/bundler/rubygems_gem_installer.rb:28:in
       `install'
       /tmp/build_6a57c05e/vendor/ruby-3.3.0.rc1/lib/ruby/3.3.0+0/bundler/source/rubygems.rb:205:in
       `install'
       /tmp/build_6a57c05e/vendor/ruby-3.3.0.rc1/lib/ruby/3.3.0+0/bundler/installer/gem_installer.rb:54:in
       `install'
       /tmp/build_6a57c05e/vendor/ruby-3.3.0.rc1/lib/ruby/3.3.0+0/bundler/installer/gem_installer.rb:16:in
       `install_from_spec'
       /tmp/build_6a57c05e/vendor/ruby-3.3.0.rc1/lib/ruby/3.3.0+0/bundler/installer/parallel_installer.rb:129:in
       `do_install'
       /tmp/build_6a57c05e/vendor/ruby-3.3.0.rc1/lib/ruby/3.3.0+0/bundler/installer/parallel_installer.rb:120:in
       `block in worker_pool'
       /tmp/build_6a57c05e/vendor/ruby-3.3.0.rc1/lib/ruby/3.3.0+0/bundler/worker.rb:62:in
       `apply_func'
       /tmp/build_6a57c05e/vendor/ruby-3.3.0.rc1/lib/ruby/3.3.0+0/bundler/worker.rb:57:in
       `block in process_queue'
         <internal:kernel>:187:in `loop'
       /tmp/build_6a57c05e/vendor/ruby-3.3.0.rc1/lib/ruby/3.3.0+0/bundler/worker.rb:54:in
       `process_queue'
       /tmp/build_6a57c05e/vendor/ruby-3.3.0.rc1/lib/ruby/3.3.0+0/bundler/worker.rb:90:in
       `block (2 levels) in create_threads'
       
       An error occurred while installing tiktoken_ruby (0.0.6), and Bundler cannot
       continue.
       
       In Gemfile:
         tiktoken_ruby
       Bundler Output: Fetching gem metadata from https://rubygems.org/........
       Fetching gem metadata from https://gems.contribsys.com/..
       Fetching rb_sys 0.9.83
       Installing rb_sys 0.9.83
       Fetching tiktoken_ruby 0.0.6
       Installing tiktoken_ruby 0.0.6 with native extensions
       Gem::Ext::BuildError: ERROR: Failed to build gem native extension.
       
       current directory:
       /tmp/build_6a57c05e/vendor/bundle/ruby/3.3.0+0/gems/tiktoken_ruby-0.0.6/ext/tiktoken_ruby
       /tmp/build_6a57c05e/vendor/ruby-3.3.0.rc1/bin/ruby extconf.rb
       checking for gcc... yes
       checking for g++... yes
       checking for gcc-ar... yes
       checking for cargo... no
       
       current directory:
       /tmp/build_6a57c05e/vendor/bundle/ruby/3.3.0+0/gems/tiktoken_ruby-0.0.6/ext/tiktoken_ruby
       make DESTDIR\= sitearchdir\=./.gem.20231212-4291-rgiys4
       sitelibdir\=./.gem.20231212-4291-rgiys4 clean
       
       current directory:
       /tmp/build_6a57c05e/vendor/bundle/ruby/3.3.0+0/gems/tiktoken_ruby-0.0.6/ext/tiktoken_ruby
       make DESTDIR\= sitearchdir\=./.gem.20231212-4291-rgiys4
       sitelibdir\=./.gem.20231212-4291-rgiys4
       info: downloading installer
       info: profile set to 'minimal'
       info: default host triple is x86_64-unknown-linux-gnu
       info: skipping toolchain installation
       
       
       Rust is installed now. Great!
       
       To get started you need Cargo's bin directory 
       (/tmp/build_6a57c05e/vendor/bundle/ruby/3.3.0+0/gems/tiktoken_ruby-0.0.6/ext/tiktoken_ruby/.rb-sys/stable/cargo/bin)
       in your PATH
       environment variable. This has not been done automatically.
       
       To configure your current shell, run:
       source 
       "/tmp/build_6a57c05e/vendor/bundle/ruby/3.3.0+0/gems/tiktoken_ruby-0.0.6/ext/tiktoken_ruby/.rb-sys/stable/cargo/env"
       info: syncing channel updates for 'stable-x86_64-unknown-linux-gnu'
       info: latest update on 2023-12-07, rust version 1.74.1 (a28077b28 2023-12-04)
       info: downloading component 'cargo'
       info: downloading component 'rust-std'
       info: downloading component 'rustc'
       info: installing component 'cargo'
       info: installing component 'rust-std'
       info: installing component 'rustc'
       
       stable-x86_64-unknown-linux-gnu installed - rustc 1.74.1 (a28077b28
       2023-12-04)
       
       info: default toolchain set to 'stable-x86_64-unknown-linux-gnu'
       info: checking for self-update
       info: using existing install for 'stable-x86_64-unknown-linux-gnu'
       info: default toolchain set to 'stable-x86_64-unknown-linux-gnu'
       
       stable-x86_64-unknown-linux-gnu unchanged - rustc 1.74.1 (a28077b28
       2023-12-04)
       
       info: note that the toolchain 'stable-x86_64-unknown-linux-gnu' is currently in
       use (environment override by RUSTUP_TOOLCHAIN)
       generating target/release/libtiktoken_ruby.so (release)
       /tmp/build_6a57c05e/vendor/bundle/ruby/3.3.0+0/gems/tiktoken_ruby-0.0.6/ext/tiktoken_ruby/.rb-sys/stable/cargo/bin/cargo
       rustc  --manifest-path ./Cargo.toml --target-dir target --lib --profile release
       -- -C linker=gcc -L native=/tmp/build_6a57c05e/vendor/ruby-3.3.0.rc1/lib -C
       link-arg=-lm -l pthread
           Updating crates.io index
           Updating git repository `https://github.com/IAPark/tiktoken-rs.git`
           Updating git submodule `https://github.com/zurawiki/tiktoken`
        Downloading crates ...
         Downloaded bit-set v0.5.3
         Downloaded parking_lot v0.12.1
         Downloaded parking_lot_core v0.9.7
         Downloaded bindgen v0.66.1
         Downloaded proc-macro2 v1.0.66
         Downloaded lock_api v0.4.9
         Downloaded once_cell v1.17.1
         Downloaded memchr v2.5.0
         Downloaded scopeguard v1.1.0
         Downloaded magnus-macros v0.6.0
         Downloaded serde v1.0.157
         Downloaded bitflags v2.4.0
         Downloaded shell-words v1.1.0
         Downloaded shlex v1.1.0
         Downloaded peeking_take_while v0.1.2
         Downloaded regex-automata v0.1.10
         Downloaded libloading v0.7.4
         Downloaded fancy-regex v0.11.0
         Downloaded cfg-if v1.0.0
         Downloaded cexpr v0.6.0
         Downloaded libc v0.2.140
         Downloaded seq-macro v0.3.5
         Downloaded rustc-hash v1.1.0
         Downloaded rb-sys-env v0.1.2
         Downloaded rb-sys-build v0.9.81
         Downloaded rb-sys v0.9.81
         Downloaded lazycell v1.3.0
         Downloaded clang-sys v1.6.0
         Downloaded smallvec v1.10.0
         Downloaded regex-syntax v0.6.28
         Downloaded magnus v0.6.1
         Downloaded regex v1.7.1
         Downloaded syn v2.0.31
         Downloaded nom v7.1.3
         Downloaded unicode-ident v1.0.8
         Downloaded minimal-lexical v0.2.1
         Downloaded glob v0.3.1
         Downloaded bstr v1.4.0
         Downloaded lazy_static v1.4.0
         Downloaded autocfg v1.1.0
         Downloaded base64 v0.21.0
         Downloaded quote v1.0.33
         Downloaded bit-vec v0.6.3
         Downloaded anyhow v1.0.70
         Downloaded aho-corasick v0.7.20
          Compiling memchr v2.5.0
          Compiling proc-macro2 v1.0.66
          Compiling unicode-ident v1.0.8
          Compiling glob v0.3.1
          Compiling clang-sys v1.6.0
          Compiling libc v0.2.140
          Compiling quote v1.0.33
          Compiling cfg-if v1.0.0
          Compiling minimal-lexical v0.2.1
          Compiling syn v2.0.31
          Compiling nom v7.1.3
          Compiling libloading v0.7.4
          Compiling aho-corasick v0.7.20
          Compiling regex-syntax v0.6.28
          Compiling bindgen v0.66.1
          Compiling regex v1.7.1
          Compiling cexpr v0.6.0
          Compiling lazy_static v1.4.0
          Compiling bitflags v2.4.0
          Compiling lazycell v1.3.0
          Compiling peeking_take_while v0.1.2
          Compiling shlex v1.1.0
          Compiling rustc-hash v1.1.0
          Compiling shell-words v1.1.0
          Compiling autocfg v1.1.0
          Compiling lock_api v0.4.9
          Compiling parking_lot_core v0.9.7
          Compiling rb-sys-build v0.9.81
          Compiling rb-sys v0.9.81
          Compiling smallvec v1.10.0
          Compiling scopeguard v1.1.0
          Compiling rb-sys-env v0.1.2
          Compiling anyhow v1.0.70
       error: failed to run custom build command for `rb-sys v0.9.81`
       
       Caused by:
       process didn't exit successfully:
       `/tmp/build_6a57c05e/vendor/bundle/ruby/3.3.0+0/gems/tiktoken_ruby-0.0.6/ext/tiktoken_ruby/target/release/build/rb-sys-f895743fce8c911c/build-script-main`
       (exit status: 101)
         --- stdout
         cargo:rerun-if-env-changed=RUBY
         cargo:rerun-if-env-changed=RBCONFIG_CROSS_COMPILING
         cargo:rerun-if-env-changed=RBCONFIG_RUBY_PROGRAM_VERSION
         cargo:rerun-if-env-changed=RBCONFIG_platform
         cargo:rerun-if-env-changed=RUBY_ROOT
         cargo:rerun-if-env-changed=RUBY_VERSION
         cargo:rerun-if-env-changed=RUBY
         cargo:rerun-if-changed=build/main.rs
         cargo:rerun-if-changed=build/stable_api_config.rs
         cargo:rerun-if-changed=build/version.rs
         cargo:rerun-if-changed=build/features.rs
         cargo:rerun-if-env-changed=RUBY_STATIC
         cargo:rerun-if-env-changed=RBCONFIG_ENABLE_SHARED
         cargo:rerun-if-env-changed=RBCONFIG_rubyhdrdir
         cargo:rerun-if-env-changed=RBCONFIG_rubyarchhdrdir
         cargo:rerun-if-env-changed=RBCONFIG_CPPFLAGS
         cargo:rerun-if-env-changed=RBCONFIG_rubyhdrdir
         cargo:rerun-if-env-changed=TARGET
         cargo:rerun-if-env-changed=TARGET
         cargo:rerun-if-env-changed=BINDGEN_EXTRA_CLANG_ARGS_x86_64-unknown-linux-gnu
         cargo:rerun-if-env-changed=BINDGEN_EXTRA_CLANG_ARGS_x86_64-unknown-linux-gnu
         cargo:rerun-if-env-changed=BINDGEN_EXTRA_CLANG_ARGS_x86_64_unknown_linux_gnu
         cargo:rerun-if-env-changed=BINDGEN_EXTRA_CLANG_ARGS_x86_64_unknown_linux_gnu
         cargo:rerun-if-env-changed=BINDGEN_EXTRA_CLANG_ARGS
         cargo:rerun-if-env-changed=BINDGEN_EXTRA_CLANG_ARGS
       
         --- stderr
       INFO: using bindgen with clang args:
       ["-I/tmp/build_6a57c05e/vendor/ruby-3.3.0.rc1/include/ruby-3.3.0+0",
       "-I/tmp/build_6a57c05e/vendor/ruby-3.3.0.rc1/include/ruby-3.3.0+0/x86_64-linux",
       "-fms-extensions", "-O3", "-fno-fast-math", "-g", "-Wall", "-Wextra",
       "-Wdeprecated-declarations", "-Wdiv-by-zero", "-Wduplicated-cond",
       "-Wimplicit-function-declaration", "-Wimplicit-int", "-Wpointer-arith",
       "-Wwrite-strings", "-Wold-style-definition", "-Wimplicit-fallthrough=0",
       "-Wmissing-noreturn", "-Wno-cast-function-type",
       "-Wno-constant-logical-operand", "-Wno-long-long",
       "-Wno-missing-field-initializers", "-Wno-overlength-strings",
       "-Wno-packed-bitfield-compat", "-Wno-parentheses-equality", "-Wno-self-assign",
       "-Wno-tautological-compare", "-Wno-unused-parameter", "-Wno-unused-value",
       "-Wsuggest-attribute=format", "-Wsuggest-attribute=noreturn",
       "-Wunused-variable", "-Wmisleading-indentation", "-Wundef"]
       thread 'main' panicked at
       /tmp/build_6a57c05e/vendor/bundle/ruby/3.3.0+0/gems/tiktoken_ruby-0.0.6/ext/tiktoken_ruby/.rb-sys/stable/cargo/registry/src/index.crates.io-6f17d22bba15001f/bindgen-0.66.1/lib.rs:604:31:
       Unable to find libclang: "couldn't find any valid shared libraries matching:
       ['libclang.so', 'libclang-*.so', 'libclang.so.*', 'libclang-*.so.*'], set the
       `LIBCLANG_PATH` environment variable to a path where one of these files can be
       found (invalid: [])"
         note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
       warning: build failed, waiting for other jobs to finish...
       make: *** [Makefile:566: target/release/libtiktoken_ruby.so] Error 101
       
       make failed, exit code 2
       
       Gem files will remain installed in
       /tmp/build_6a57c05e/vendor/bundle/ruby/3.3.0+0/gems/tiktoken_ruby-0.0.6 for
       inspection.
       Results logged to
       /tmp/build_6a57c05e/vendor/bundle/ruby/3.3.0+0/extensions/x86_64-linux/3.3.0+0/tiktoken_ruby-0.0.6/gem_make.out
       
       /tmp/build_6a57c05e/vendor/ruby-3.3.0.rc1/lib/ruby/3.3.0+0/rubygems/ext/builder.rb:118:in
       `run'
       /tmp/build_6a57c05e/vendor/ruby-3.3.0.rc1/lib/ruby/3.3.0+0/rubygems/ext/builder.rb:51:in
       `block in make'
       /tmp/build_6a57c05e/vendor/ruby-3.3.0.rc1/lib/ruby/3.3.0+0/rubygems/ext/builder.rb:43:in
       `each'
       /tmp/build_6a57c05e/vendor/ruby-3.3.0.rc1/lib/ruby/3.3.0+0/rubygems/ext/builder.rb:43:in
       `make'
       /tmp/build_6a57c05e/vendor/ruby-3.3.0.rc1/lib/ruby/3.3.0+0/rubygems/ext/ext_conf_builder.rb:42:in
       `build'
       /tmp/build_6a57c05e/vendor/ruby-3.3.0.rc1/lib/ruby/3.3.0+0/rubygems/ext/builder.rb:186:in
       `build_extension'
       /tmp/build_6a57c05e/vendor/ruby-3.3.0.rc1/lib/ruby/3.3.0+0/rubygems/ext/builder.rb:220:in
       `block in build_extensions'
       /tmp/build_6a57c05e/vendor/ruby-3.3.0.rc1/lib/ruby/3.3.0+0/rubygems/ext/builder.rb:217:in
       `each'
       /tmp/build_6a57c05e/vendor/ruby-3.3.0.rc1/lib/ruby/3.3.0+0/rubygems/ext/builder.rb:217:in
       `build_extensions'
       /tmp/build_6a57c05e/vendor/ruby-3.3.0.rc1/lib/ruby/3.3.0+0/rubygems/installer.rb:861:in
       `build_extensions'
       /tmp/build_6a57c05e/vendor/ruby-3.3.0.rc1/lib/ruby/3.3.0+0/bundler/rubygems_gem_installer.rb:76:in
       `build_extensions'
       /tmp/build_6a57c05e/vendor/ruby-3.3.0.rc1/lib/ruby/3.3.0+0/bundler/rubygems_gem_installer.rb:28:in
       `install'
       /tmp/build_6a57c05e/vendor/ruby-3.3.0.rc1/lib/ruby/3.3.0+0/bundler/source/rubygems.rb:205:in
       `install'
       /tmp/build_6a57c05e/vendor/ruby-3.3.0.rc1/lib/ruby/3.3.0+0/bundler/installer/gem_installer.rb:54:in
       `install'
       /tmp/build_6a57c05e/vendor/ruby-3.3.0.rc1/lib/ruby/3.3.0+0/bundler/installer/gem_installer.rb:16:in
       `install_from_spec'
       /tmp/build_6a57c05e/vendor/ruby-3.3.0.rc1/lib/ruby/3.3.0+0/bundler/installer/parallel_installer.rb:129:in
       `do_install'
       /tmp/build_6a57c05e/vendor/ruby-3.3.0.rc1/lib/ruby/3.3.0+0/bundler/installer/parallel_installer.rb:120:in
       `block in worker_pool'
       /tmp/build_6a57c05e/vendor/ruby-3.3.0.rc1/lib/ruby/3.3.0+0/bundler/worker.rb:62:in
       `apply_func'
       /tmp/build_6a57c05e/vendor/ruby-3.3.0.rc1/lib/ruby/3.3.0+0/bundler/worker.rb:57:in
       `block in process_queue'
         <internal:kernel>:187:in `loop'
       /tmp/build_6a57c05e/vendor/ruby-3.3.0.rc1/lib/ruby/3.3.0+0/bundler/worker.rb:54:in
       `process_queue'
       /tmp/build_6a57c05e/vendor/ruby-3.3.0.rc1/lib/ruby/3.3.0+0/bundler/worker.rb:90:in
       `block (2 levels) in create_threads'
       
       An error occurred while installing tiktoken_ruby (0.0.6), and Bundler cannot
       continue.
       
       In Gemfile:
         tiktoken_ruby

 !
 !     Failed to install gems via Bundler.
 !
 !     Push rejected, failed to compile Ruby app.
 !     Push failed

Thread safe?

Is it possible that Tiktoken.encoding_for_model is not thread safe?

Error installing gem in MacOS M1

Installing tiktoken_ruby 0.0.8 with native extensions
Gem::Ext::BuildError: ERROR: Failed to build gem native extension.

    current directory: /usr/local/bundle/gems/tiktoken_ruby-0.0.8/ext/tiktoken_ruby
/usr/local/bin/ruby -I /usr/local/lib/ruby/3.0.0 -r ./siteconf20240418-10855-35ug4t.rb extconf.rb
checking for gcc... yes
checking for g++... yes
checking for gcc-ar... yes
checking for cargo... no

current directory: /usr/local/bundle/gems/tiktoken_ruby-0.0.8/ext/tiktoken_ruby
make DESTDIR\= clean

current directory: /usr/local/bundle/gems/tiktoken_ruby-0.0.8/ext/tiktoken_ruby
make DESTDIR\=
info: downloading installer
info: profile set to 'minimal'
info: default host triple is aarch64-unknown-linux-gnu
info: skipping toolchain installation


Rust is installed now. Great!

To get started you need Cargo's bin directory 
(/usr/local/bundle/gems/tiktoken_ruby-0.0.8/ext/tiktoken_ruby/.rb-sys/stable/cargo/bin)
in your PATH
environment variable. This has not been done automatically.

To configure your current shell, you need to source
the corresponding env file under 
/usr/local/bundle/gems/tiktoken_ruby-0.0.8/ext/tiktoken_ruby/.rb-sys/stable/cargo.

This is usually done by running one of the following (note the leading DOT):
. 
"/usr/local/bundle/gems/tiktoken_ruby-0.0.8/ext/tiktoken_ruby/.rb-sys/stable/cargo/env"
# For sh/bash/zsh/ash/dash/pdksh
source 
"/usr/local/bundle/gems/tiktoken_ruby-0.0.8/ext/tiktoken_ruby/.rb-sys/stable/cargo/env.fish"
# For fish
info: syncing channel updates for 'stable-aarch64-unknown-linux-gnu'
info: latest update on 2024-04-09, rust version 1.77.2 (25ef9e3d8 2024-04-09)
info: downloading component 'cargo'
info: downloading component 'rust-std'
info: downloading component 'rustc'
info: installing component 'cargo'
info: installing component 'rust-std'
info: installing component 'rustc'

  stable-aarch64-unknown-linux-gnu installed - rustc 1.77.2 (25ef9e3d8 2024-04-09)

info: default toolchain set to 'stable-aarch64-unknown-linux-gnu'
info: checking for self-update
info: using existing install for 'stable-aarch64-unknown-linux-gnu'
info: default toolchain set to 'stable-aarch64-unknown-linux-gnu'

  stable-aarch64-unknown-linux-gnu unchanged - rustc 1.77.2 (25ef9e3d8 2024-04-09)

info: note that the toolchain 'stable-aarch64-unknown-linux-gnu' is currently in use (environment override by RUSTUP_TOOLCHAIN)
generating target/release/libtiktoken_ruby.so (release)
/usr/local/bundle/gems/tiktoken_ruby-0.0.8/ext/tiktoken_ruby/.rb-sys/stable/cargo/bin/cargo rustc  --manifest-path ./Cargo.toml --target-dir target --lib --profile release -- -C linker=gcc -L native=/usr/local/lib -C link-arg=-lm
    Updating crates.io index
    Updating git repository `https://github.com/IAPark/tiktoken-rs.git`
    Updating git submodule `https://github.com/zurawiki/tiktoken`
 Downloading crates ...
  Downloaded aho-corasick v0.7.20
  Downloaded magnus-macros v0.6.0
  Downloaded shlex v1.1.0
  Downloaded autocfg v1.1.0
  Downloaded bit-set v0.5.3
  Downloaded either v1.10.0
  Downloaded cfg-if v1.0.0
  Downloaded glob v0.3.1
  Downloaded rb-sys-env v0.1.2
  Downloaded seq-macro v0.3.5
  Downloaded rustc-hash v1.1.0
  Downloaded scopeguard v1.1.0
  Downloaded once_cell v1.17.1
  Downloaded smallvec v1.10.0
  Downloaded quote v1.0.33
  Downloaded parking_lot_core v0.9.7
  Downloaded parking_lot v0.12.1
  Downloaded unicode-ident v1.0.8
  Downloaded serde v1.0.157
  Downloaded minimal-lexical v0.2.1
  Downloaded regex-automata v0.1.10
  Downloaded syn v2.0.31
  Downloaded bstr v1.4.0
  Downloaded libc v0.2.140
  Downloaded regex-syntax v0.6.28
  Downloaded regex v1.7.1
  Downloaded nom v7.1.3
  Downloaded magnus v0.6.1
  Downloaded itertools v0.12.1
  Downloaded bindgen v0.69.4
  Downloaded fancy-regex v0.11.0
  Downloaded base64 v0.21.0
  Downloaded proc-macro2 v1.0.66
  Downloaded memchr v2.5.0
  Downloaded lock_api v0.4.9
  Downloaded clang-sys v1.6.0
  Downloaded anyhow v1.0.70
  Downloaded shell-words v1.1.0
  Downloaded libloading v0.7.4
  Downloaded bitflags v2.4.0
  Downloaded rb-sys-build v0.9.87
  Downloaded cexpr v0.6.0
  Downloaded rb-sys v0.9.87
  Downloaded lazycell v1.3.0
  Downloaded lazy_static v1.4.0
  Downloaded bit-vec v0.6.3
   Compiling memchr v2.5.0
   Compiling proc-macro2 v1.0.66
   Compiling unicode-ident v1.0.8
   Compiling glob v0.3.1
   Compiling libc v0.2.140
   Compiling cfg-if v1.0.0
   Compiling minimal-lexical v0.2.1
   Compiling libloading v0.7.4
   Compiling clang-sys v1.6.0
   Compiling aho-corasick v0.7.20
   Compiling nom v7.1.3
   Compiling quote v1.0.33
   Compiling syn v2.0.31
   Compiling regex-syntax v0.6.28
   Compiling either v1.10.0
   Compiling bindgen v0.69.4
   Compiling itertools v0.12.1
   Compiling cexpr v0.6.0
   Compiling lazycell v1.3.0
   Compiling rustc-hash v1.1.0
   Compiling shlex v1.1.0
   Compiling regex v1.7.1
   Compiling bitflags v2.4.0
   Compiling lazy_static v1.4.0
   Compiling shell-words v1.1.0
   Compiling autocfg v1.1.0
   Compiling lock_api v0.4.9
   Compiling parking_lot_core v0.9.7
   Compiling anyhow v1.0.70
   Compiling scopeguard v1.1.0
   Compiling bit-vec v0.6.3
   Compiling smallvec v1.10.0
   Compiling rb-sys-env v0.1.2
   Compiling magnus v0.6.1
   Compiling bit-set v0.5.3
   Compiling regex-automata v0.1.10
   Compiling once_cell v1.17.1
   Compiling magnus-macros v0.6.0
   Compiling bstr v1.4.0
   Compiling rb-sys-build v0.9.87
   Compiling fancy-regex v0.11.0
   Compiling parking_lot v0.12.1
   Compiling base64 v0.21.0
   Compiling rb-sys v0.9.87
   Compiling seq-macro v0.3.5
   Compiling tiktoken-rs v0.3.2 (https://github.com/IAPark/tiktoken-rs.git#5231fbf4)
error: failed to run custom build command for `rb-sys v0.9.87`

Caused by:
  process didn't exit successfully: `/usr/local/bundle/gems/tiktoken_ruby-0.0.8/ext/tiktoken_ruby/target/release/build/rb-sys-c65a4afa838c9b06/build-script-main` (exit status: 101)
  --- stdout
  cargo:rerun-if-env-changed=RUBY
  cargo:rerun-if-env-changed=RBCONFIG_CROSS_COMPILING
  cargo:rerun-if-env-changed=RBCONFIG_RUBY_PROGRAM_VERSION
  cargo:rerun-if-env-changed=RBCONFIG_platform
  cargo:rerun-if-env-changed=RBCONFIG_arch
  cargo:rerun-if-env-changed=RUBY_ROOT
  cargo:rerun-if-env-changed=RUBY_VERSION
  cargo:rerun-if-env-changed=RUBY
  cargo:rerun-if-changed=build/version.rs
  cargo:rerun-if-changed=build/features.rs
  cargo:rerun-if-changed=build/stable_api_config.rs
  cargo:rerun-if-changed=build/main.rs
  cargo:rerun-if-env-changed=RUBY_STATIC
  cargo:rerun-if-env-changed=RBCONFIG_ENABLE_SHARED
  cargo:rerun-if-env-changed=RBCONFIG_rubyhdrdir
  cargo:rerun-if-env-changed=RBCONFIG_rubyarchhdrdir
  cargo:rerun-if-env-changed=RBCONFIG_CPPFLAGS
  cargo:rerun-if-env-changed=RBCONFIG_rubyhdrdir
  cargo:rerun-if-env-changed=RBCONFIG_MAJOR
  cargo:rerun-if-env-changed=RBCONFIG_MINOR
  cargo:rerun-if-env-changed=TARGET
  cargo:rerun-if-env-changed=TARGET
  cargo:rerun-if-env-changed=BINDGEN_EXTRA_CLANG_ARGS_aarch64-unknown-linux-gnu
  cargo:rerun-if-env-changed=BINDGEN_EXTRA_CLANG_ARGS_aarch64-unknown-linux-gnu
  cargo:rerun-if-env-changed=BINDGEN_EXTRA_CLANG_ARGS_aarch64_unknown_linux_gnu
  cargo:rerun-if-env-changed=BINDGEN_EXTRA_CLANG_ARGS_aarch64_unknown_linux_gnu
  cargo:rerun-if-env-changed=BINDGEN_EXTRA_CLANG_ARGS
  cargo:rerun-if-env-changed=BINDGEN_EXTRA_CLANG_ARGS

  --- stderr
INFO: using bindgen with clang args: ["-I/usr/local/include/ruby-3.0.0", "-I/usr/local/include/ruby-3.0.0/aarch64-linux", "-fms-extensions", "-O3", "-ggdb3", "-Wall", "-Wextra", "-Wdeprecated-declarations", "-Wduplicated-cond", "-Wimplicit-function-declaration", "-Wimplicit-int",
"-Wmisleading-indentation", "-Wpointer-arith", "-Wwrite-strings", "-Wimplicit-fallthrough=0", "-Wmissing-noreturn", "-Wno-cast-function-type", "-Wno-constant-logical-operand", "-Wno-long-long", "-Wno-missing-field-initializers", "-Wno-overlength-strings", "-Wno-packed-bitfield-compat",
"-Wno-parentheses-equality", "-Wno-self-assign", "-Wno-tautological-compare", "-Wno-unused-parameter", "-Wno-unused-value", "-Wsuggest-attribute=format", "-Wsuggest-attribute=noreturn", "-Wunused-variable"]
  #include "ruby.h"

  #ifdef HAVE_RUBY_DEBUG_H
  #include "ruby/debug.h"
  #endif
  #ifdef HAVE_RUBY_DEFINES_H
  #include "ruby/defines.h"
  #endif
  #ifdef HAVE_RUBY_ENCODING_H
  #include "ruby/encoding.h"
  #endif
  #ifdef HAVE_RUBY_FIBER_SCHEDULER_H
  #include "ruby/fiber/scheduler.h"
  #endif
  #ifdef HAVE_RUBY_INTERN_H
  #include "ruby/intern.h"
  #endif
  #ifdef HAVE_RUBY_IO_H
  #include "ruby/io.h"
  #endif
  #ifdef HAVE_RUBY_MEMORY_VIEW_H
  #include "ruby/memory_view.h"
  #endif
  #ifdef HAVE_RUBY_MISSING_H
  #include "ruby/missing.h"
  #endif
  #ifdef HAVE_RUBY_ONIGMO_H
  #include "ruby/onigmo.h"
  #endif
  #ifdef HAVE_RUBY_ONIGURUMA_H
  #include "ruby/oniguruma.h"
  #endif
  #ifdef HAVE_RUBY_RACTOR_H
  #include "ruby/ractor.h"
  #endif
  #ifdef HAVE_RUBY_RANDOM_H
  #include "ruby/random.h"
  #endif
  #ifdef HAVE_RUBY_RE_H
  #include "ruby/re.h"
  #endif
  #ifdef HAVE_RUBY_REGEX_H
  #include "ruby/regex.h"
  #endif
  #ifdef HAVE_RUBY_RUBY_H
  #include "ruby/ruby.h"
  #endif
  #ifdef HAVE_RUBY_ST_H
  #include "ruby/st.h"
  #endif
  #ifdef HAVE_RUBY_THREAD_H
  #include "ruby/thread.h"
  #endif
  #ifdef HAVE_RUBY_THREAD_NATIVE_H
  #include "ruby/thread_native.h"
  #endif
  #ifdef HAVE_RUBY_UTIL_H
  #include "ruby/util.h"
  #endif
  #ifdef HAVE_RUBY_VERSION_H
  #include "ruby/version.h"
  #endif
  #ifdef HAVE_RUBY_VM_H
  #include "ruby/vm.h"
  #endif
  #ifdef HAVE_RUBY_WIN32_H
  #include "ruby/win32.h"
  #endif
  #ifdef HAVE_RUBY_IO_BUFFER_H
  #include "ruby/io/buffer.h"
  #endif
  #ifdef HAVE_RUBY_ATOMIC_H
  #include "ruby/atomic.h"
  #endif
  struct rb_sys__Opaque__RString { struct RString dummy; };
  struct rb_sys__Opaque__RArray { struct RArray dummy; };
  thread 'main' panicked at /usr/local/bundle/gems/tiktoken_ruby-0.0.8/ext/tiktoken_ruby/.rb-sys/stable/cargo/registry/src/index.crates.io-6f17d22bba15001f/bindgen-0.69.4/lib.rs:622:31:
  Unable to find libclang: "couldn't find any valid shared libraries matching: ['libclang.so', 'libclang-*.so', 'libclang.so.*', 'libclang-*.so.*'], set the `LIBCLANG_PATH` environment variable to a path where one of these files can be found (invalid: [])"
  note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
warning: build failed, waiting for other jobs to finish...
make: *** [Makefile:552: target/release/libtiktoken_ruby.so] Error 101

make failed, exit code 2

Gem files will remain installed in /usr/local/bundle/gems/tiktoken_ruby-0.0.8 for inspection.
Results logged to /usr/local/bundle/extensions/aarch64-linux/3.0.0/tiktoken_ruby-0.0.8/gem_make.out

  /usr/local/lib/ruby/3.0.0/rubygems/ext/builder.rb:93:in `run'
  /usr/local/lib/ruby/3.0.0/rubygems/ext/builder.rb:44:in `block in make'
  /usr/local/lib/ruby/3.0.0/rubygems/ext/builder.rb:36:in `each'
  /usr/local/lib/ruby/3.0.0/rubygems/ext/builder.rb:36:in `make'
  /usr/local/lib/ruby/3.0.0/rubygems/ext/ext_conf_builder.rb:63:in `block in build'
  /usr/local/lib/ruby/3.0.0/tempfile.rb:317:in `open'
  /usr/local/lib/ruby/3.0.0/rubygems/ext/ext_conf_builder.rb:26:in `build'
  /usr/local/lib/ruby/3.0.0/rubygems/ext/builder.rb:159:in `build_extension'
  /usr/local/lib/ruby/3.0.0/rubygems/ext/builder.rb:193:in `block in build_extensions'
  /usr/local/lib/ruby/3.0.0/rubygems/ext/builder.rb:190:in `each'
  /usr/local/lib/ruby/3.0.0/rubygems/ext/builder.rb:190:in `build_extensions'
  /usr/local/lib/ruby/3.0.0/rubygems/installer.rb:837:in `build_extensions'
  /usr/local/bundle/gems/bundler-2.5.7/lib/bundler/rubygems_gem_installer.rb:76:in `build_extensions'
  /usr/local/bundle/gems/bundler-2.5.7/lib/bundler/rubygems_gem_installer.rb:28:in `install'
  /usr/local/bundle/gems/bundler-2.5.7/lib/bundler/source/rubygems.rb:205:in `install'
  /usr/local/bundle/gems/bundler-2.5.7/lib/bundler/installer/gem_installer.rb:54:in `install'
  /usr/local/bundle/gems/bundler-2.5.7/lib/bundler/installer/gem_installer.rb:16:in `install_from_spec'
  /usr/local/bundle/gems/bundler-2.5.7/lib/bundler/installer/parallel_installer.rb:132:in `do_install'
  /usr/local/bundle/gems/bundler-2.5.7/lib/bundler/installer/parallel_installer.rb:123:in `block in worker_pool'
  /usr/local/bundle/gems/bundler-2.5.7/lib/bundler/worker.rb:62:in `apply_func'
  /usr/local/bundle/gems/bundler-2.5.7/lib/bundler/worker.rb:57:in `block in process_queue'
  /usr/local/bundle/gems/bundler-2.5.7/lib/bundler/worker.rb:54:in `loop'
  /usr/local/bundle/gems/bundler-2.5.7/lib/bundler/worker.rb:54:in `process_queue'
  /usr/local/bundle/gems/bundler-2.5.7/lib/bundler/worker.rb:90:in `block (2 levels) in create_threads'

An error occurred while installing tiktoken_ruby (0.0.8), and Bundler cannot continue.

Error loading shared library ld-linux-x86-64.so.2 in Alpine Docker image

I am running a Rails App in a Docker image:

FROM ruby:3.2.4-alpine3.19

I was seeing this error when loading:

/usr/local/bundle/gems/tiktoken_ruby-0.0.9-x86_64-linux/lib/tiktoken_ruby.rb:10:in `require_relative': cannot load such file -- /usr/local/bundle/gems/tiktoken_ruby-0.0.9-x86_64-linux/lib/tiktoken_ruby/tiktoken_ruby (LoadError)
	from /usr/local/bundle/gems/tiktoken_ruby-0.0.9-x86_64-linux/lib/tiktoken_ruby.rb:10

I added this line to the script to see the original error:

begin
  RUBY_VERSION =~ /(\d+\.\d+)/
  require_relative "tiktoken_ruby/#{$1}/tiktoken_ruby"
rescue LoadError => e
  puts ">>>> e: #{e.inspect}" # This line
  require_relative "tiktoken_ruby/tiktoken_ruby"
end

And the original error is:

Error loading shared library ld-linux-x86-64.so.2: No such file or directory (needed by /usr/local/bundle/gems/tiktoken_ruby-0.0.9-x86_64-linux/lib/tiktoken_ruby/3.2/tiktoken_ruby.so)

I don't know how to solve the issue :/

Unable to build on OpenBSD

$ gem install --user-install tiktoken_ruby
Building native extensions. This could take a while...
ERROR:  Error installing tiktoken_ruby:
        ERROR: Failed to build gem native extension.

    current directory: /home/dev/.local/share/gem/ruby/3.2/gems/tiktoken_ruby-0.0.5/ext/tiktoken_ruby
/usr/local/bin/ruby32 extconf.rb
checking for /usr/bin/clang... yes
checking for /usr/bin/clang++... yes
checking for ar... yes
checking for cargo... yes

current directory: /home/dev/.local/share/gem/ruby/3.2/gems/tiktoken_ruby-0.0.5/ext/tiktoken_ruby
make INSTALL_PROGRAM\=/usr/bin/install\ -c\ \ -s\ -m\ 755 INSTALL_SCRIPT\=/usr/bin/install\ -c\ \ -m\ 755 INSTALL_DATA\=/usr/bin/install\ -c\ \ -m\ 644 INSTALL\=/usr/bin/install\ -c\  DESTDIR\= sitearchdir\=./.gem.20230718-28529-91ilur sitelibdir\=./.gem.20230718-28529-91ilur clean
*** Parse error in /home/dev/.local/share/gem/ruby/3.2/gems/tiktoken_ruby-0.0.5/ext/tiktoken_ruby: Missing dependency operator (Makefile:8)
*** Parse error: Need an operator in 'endif' (Makefile:10)
*** Parse error: Missing dependency operator (Makefile:20)
*** Parse error: Need an operator in 'else' (Makefile:22)
*** Parse error: Need an operator in 'endif' (Makefile:24)
*** Parse error: Missing dependency operator (Makefile:31)
*** Parse error: Need an operator in 'else' (Makefile:33)
*** Parse error: Need an operator in 'endif' (Makefile:35)
*** Parse error: Missing dependency operator (Makefile:265)
*** Parse error: Need an operator in 'endif' (Makefile:267)
*** Parse error: Missing dependency operator (Makefile:523)
*** Parse error: Missing dependency operator (Makefile:528)
*** Parse error: Need an operator in 'endif' (Makefile:530)
*** Parse error: Missing dependency operator (Makefile:534)
*** Parse error: Need an operator in 'else' (Makefile:536)
*** Parse error: Need an operator in 'endif' (Makefile:538)
*** Parse error: Need an operator in 'endif' (Makefile:557)

current directory: /home/dev/.local/share/gem/ruby/3.2/gems/tiktoken_ruby-0.0.5/ext/tiktoken_ruby
make INSTALL_PROGRAM\=/usr/bin/install\ -c\ \ -s\ -m\ 755 INSTALL_SCRIPT\=/usr/bin/install\ -c\ \ -m\ 755 INSTALL_DATA\=/usr/bin/install\ -c\ \ -m\ 644 INSTALL\=/usr/bin/install\ -c\  DESTDIR\= sitearchdir\=./.gem.20230718-28529-91ilur sitelibdir\=./.gem.20230718-28529-91ilur
*** Parse error in /home/dev/.local/share/gem/ruby/3.2/gems/tiktoken_ruby-0.0.5/ext/tiktoken_ruby: Missing dependency operator (Makefile:8)
*** Parse error: Need an operator in 'endif' (Makefile:10)
*** Parse error: Missing dependency operator (Makefile:20)
*** Parse error: Need an operator in 'else' (Makefile:22)
*** Parse error: Need an operator in 'endif' (Makefile:24)
*** Parse error: Missing dependency operator (Makefile:31)
*** Parse error: Need an operator in 'else' (Makefile:33)
*** Parse error: Need an operator in 'endif' (Makefile:35)
*** Parse error: Missing dependency operator (Makefile:265)
*** Parse error: Need an operator in 'endif' (Makefile:267)
*** Parse error: Missing dependency operator (Makefile:523)
*** Parse error: Missing dependency operator (Makefile:528)
*** Parse error: Need an operator in 'endif' (Makefile:530)
*** Parse error: Missing dependency operator (Makefile:534)
*** Parse error: Need an operator in 'else' (Makefile:536)
*** Parse error: Need an operator in 'endif' (Makefile:538)
*** Parse error: Need an operator in 'endif' (Makefile:557)

make failed, exit code 1

Gem files will remain installed in /home/dev/.local/share/gem/ruby/3.2/gems/tiktoken_ruby-0.0.5 for inspection.
Results logged to /home/dev/.local/share/gem/ruby/3.2/extensions/x86_64-openbsd/3.2/tiktoken_ruby-0.0.5/gem_make.out

I see it's aware of OpenBSD's clang and clang++ which is great. For those who don't know, OpenBSD's has its own rewrite alternative of almost everything, including make.

$ env
_=/usr/bin/env
LOGNAME=dev
HOME=/home/dev
SSH_TTY=/dev/ttyp0
GEM_PATH=/home/dev/.bundle/ruby/3.2:/home/dev/.local/share/gem/ruby/3.2:/usr/local/lib/ruby/gems/3.2
CXX=/usr/bin/clang++
SHELL=/bin/ksh
GEM_HOME=/home/dev/.bundle/ruby/3.2:/home/dev/.local/share/gem/ruby/3.2
PWD=/home/dev
CC=/usr/bin/clang
PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/X11R6/bin:/usr/local/bin:/usr/local/sbin:/home/dev/.bundle/ruby/3.2/bin:/home/dev/.local/share/gem/ruby/3.2/bin
$ gem env
RubyGems Environment:
  - RUBYGEMS VERSION: 3.4.10
  - RUBY VERSION: 3.2.2 (2023-03-30 patchlevel 53) [x86_64-openbsd]
  - INSTALLATION DIRECTORY: /home/dev/.bundle/ruby/3.2:/home/dev/.local/share/gem/ruby/3.2
  - USER INSTALLATION DIRECTORY: /home/dev/.local/share/gem/ruby/3.2
  - RUBY EXECUTABLE: /usr/local/bin/ruby32
  - GIT EXECUTABLE: /usr/local/bin/git
  - EXECUTABLE DIRECTORY: /home/dev/.bundle/ruby/3.2:/home/dev/.local/share/gem/ruby/3.2/bin
  - SPEC CACHE DIRECTORY: /home/dev/.local/share/gem/specs
  - SYSTEM CONFIGURATION DIRECTORY: /etc
  - RUBYGEMS PLATFORMS:
     - ruby
     - x86_64-openbsd
  - GEM PATHS:
     - /home/dev/.bundle/ruby/3.2:/home/dev/.local/share/gem/ruby/3.2
     - /home/dev/.bundle/ruby/3.2
     - /home/dev/.local/share/gem/ruby/3.2
     - /usr/local/lib/ruby/gems/3.2
  - GEM CONFIGURATION:
     - :update_sources => true
     - :verbose => true
     - :backtrace => true
     - :bulk_threshold => 1000
  - REMOTE SOURCES:
     - https://rubygems.org/
  - SHELL PATH:
     - /bin
     - /sbin
     - /usr/bin
     - /usr/sbin
     - /usr/X11R6/bin
     - /usr/local/bin
     - /usr/local/sbin
     - /home/dev/.bundle/ruby/3.2/bin
     - /home/dev/.local/share/gem/ruby/3.2/bin

Anybody know what to do?

Tiktoken.encoding_for_model has bad return value on unrecognized model name

When you pass an unrecognized model name to Tiktoken.encoding_for_model, instead of returning nil it returns an unexpected Hash.

irb(main):001:0> require 'tiktoken_ruby'
=> true
irb(main):002:0> Tiktoken.encoding_for_model 'foo'
=> 
{:"gpt-4-"=>"cl100k_base",
 :"gpt-3.5-turbo-"=>"cl100k_base",
 :"gpt-35-turbo-"=>"cl100k_base",
 :"ft:gpt-4"=>"cl100k_base",
 :"ft:gpt-3.5-turbo"=>"cl100k_base",
 :"ft:davinci-002"=>"cl100k_base",
 :"ft:babbage-002"=>"cl100k_base"}

The bug seems to have been introduced by 0c1a45b which lets Hash#each return self if nothing is found.

[Proposal] Claude Tokenizer

tiktoken_ruby gem currently supports 4 encoders:

  • r50k_base
  • p50k_base
  • p50k_edit
  • cl100k_base

Claude appears to use tiktoken parameters outlined here and implemented here.

The BPE rankings are in an alternate format but doing some reverse engineering by looking at the javascript tiktoken implementation here I was able to use the following code to create a tiktoken encoder for Claude in Python. Note claude.json was sourced from the referenced javascript tiktoken library which is apart of the official Anthropic account.

import tiktoken
import json
import base64


def decode_claude_bpe(claude_configs):
    _, offset, *tokens = claude_configs['bpe_ranks'].split(" ")
    offset = int(offset)

    # This starts at 5 (offset) for some reason, this is what the original JS code does
    rankMap = {base64.b64decode(token): offset+idx for idx, token in enumerate(tokens)}

    return rankMap

if __name__ == "__main__":
    with open("claude.json") as f:
        claude_configs = json.load(f)
        bpe_ranks = decode_claude_bpe(claude_configs)

    enc = tiktoken.Encoding(
        name="claude_tokenizer",
        pat_str=claude_configs['pat_str'],
        mergeable_ranks=bpe_ranks,
        special_tokens=claude_configs['special_tokens'],
    )
    print(enc.encode("hello world"))

Alternatively an option to create a tiktoken encoder using custom BPE ranks etc. like in the Python library would be a more general solution.

Getting "LoadError" when trying to use the gem as dependency

I'm trying to use langchainrb gem, which has tiktoken_rb as a dependency. I'm getting a LoadError when trying to use the gem:

LoadError - cannot load such file -- /gems/ruby/3.0.0/gems/tiktoken_ruby-0.0.5-x86_64-linux-musl/lib/tiktoken_ruby/tiktoken_ruby:
 |   tiktoken_ruby-0.0.5-x86_64-linux (musl) lib/tiktoken_ruby.rb:10:in `require_relative'
 |   tiktoken_ruby-0.0.5-x86_64-linux (musl) lib/tiktoken_ruby.rb:10:in `rescue in <top (required)>'
 |   tiktoken_ruby-0.0.5-x86_64-linux (musl) lib/tiktoken_ruby.rb:6:in `<top (required)>'

Is this something I can get help with? Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.