Comments (11)
Good example of how to count tokens:
- https://github.com/pkoukk/tiktoken-go#counting-tokens-for-chat-api-calls
- https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb
from go-openai.
Related: https://github.com/openai/tiktoken
from go-openai.
Related: https://github.com/openai/tiktoken
Thanks, but I think I need a library that can be called through golang.
from go-openai.
@OneSeven sure, I mean, we either would need to be able to embed this library (via cgo or otherwise) or would need to translate it from Rust to Go.
from go-openai.
Since the original issue was opened, there has been some progress!
The documentation on the official OpenAI repository currently points to pkoukk/tiktoken-go
as the Go library for tokenizing (no endorsements, just a link).
You can see from the test script that it deals with tokens in different languages and alphabets. It might still get things wrong, but at least they are as wrong as the official OpenAI Python version!
Dependencies currently listed by its go.mod
:
module github.com/pkoukk/tiktoken-go
go 1.19
require (
github.com/dlclark/regexp2 v1.10.0
github.com/google/uuid v1.3.0
github.com/stretchr/testify v1.8.2
)
require (
github.com/davecgh/go-spew v1.1.1 // indirect
github.com/pmezard/go-difflib v1.0.0 // indirect
gopkg.in/yaml.v3 v3.0.1 // indirect
)
It's not "zero" dependencies as you'd prefer, but close! I haven't looked into the code very deeply.
The dependency upon google/uuid
is pretty standard; one wonders why the Go core developers haven't incorporated it into the Go Standard Library yet (it does have a few quirks, though, but because it comes from Google itself, I guess it's ok to use).
The inclusion of dlclark/regexp2
— as opposed to using the standard regexp
built on top of Google's RE2 engine — is very likely because the former closely follows the algorithm used by .NET, which might be a requirement for the tokenizer to come up with the same results as tiktoken.
And stretchr/testify
is evidently only used for the testing bits; it has no relevance to the overall tokenizer code itself.
Performance, according to the published benchmarks (e.g., those included in its test suite), seems to be the same as the original Python code.
I think you've got your tiktokenizer candidate! 😀
from go-openai.
@OneSeven sure, I mean, we either would need to be able to embed this library (via cgo or otherwise) or would need to translate it from Rust to Go.
Do you have plans to add this functionality to the current SDK.
I would love to contribute, but my level is far from enough, sorry.
from go-openai.
There's no plan for that right now, but we are open for contributions 😄
I guess you can also call github.com/openai/tiktoken as a separate binary from Go.
from go-openai.
@OneSeven sure, I mean, we either would need to be able to embed this library (via cgo or otherwise) or would need to translate it from Rust to Go.
Isn't this library in Python? and if porting; how would you prefer the scaffolding of the porting into your repo? would it be a separate repo and then you import it into go-gpt3, etc. In other words, I am attempting to see your vision if porting it from Python to Go is feasible.
from go-openai.
There's a go library already: https://github.com/samber/go-gpt-3-encoder
from go-openai.
There's a go library already: https://github.com/samber/go-gpt-3-encoder
This library can only be used for English characters, and the correct results cannot be obtained for other languages
from go-openai.
@ealvar3z It's Rust wrapped in Python https://github.com/openai/tiktoken/blob/main/src/lib.rs
If it would be possible to bring tokenization with zero (or minimal) dependencies — I'm all for merging it. Otherwise, I think it makes sense to implement it in a separate repo.
from go-openai.
Related Issues (20)
- Unclear error messages for failure errors (x < 200 and x >= 400), 413 in this case HOT 2
- need extra_data
- Inability to Parse API Response Data Due to Missing Space After "data:"
- Tools don't work in Assistant v2
- example :ChatGPT streaming completion, stream.Recv() return T, response.Choices[0] failed
- Can I upload pictures and request gpt-4o now? HOT 5
- ClientConfig authToken 声明问题 HOT 2
- Completion request should support seed
- CreateFile方法中文件上传完成后fileData未关闭导致想要删除该文件无法删除
- how to closely linked with langchain RAG
- Azure openai assistans list interface parameter splicing error HOT 6
- Add CI tests for the oldest Go version we support
- json: cannot unmarshal array into Go struct field .errors.data HOT 1
- Struct Objects For Rate Limits
- 返回的错误无法直观表达真实错误 HOT 8
- Ask openai.GPT4o with a file request HOT 2
- Support for Structured Outputs HOT 4
- v1.28.0 breaks API requests that specify a response format without a JSONSchema HOT 6
- Structured response format not working on CreateChatCompletion HOT 2
- Switch CI to Go 1.23
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from go-openai.