Giter Club home page Giter Club logo

langchaingo's Introduction

πŸ¦œοΈπŸ”— LangChain Go

go.dev reference scorecard Open in Dev Containers

⚑ Building applications with LLMs through composability, with Go! ⚑

πŸ€” What is this?

This is the Go language implementation of LangChain.

πŸ“– Documentation

πŸŽ‰ Examples

See ./examples for example usage.

package main

import (
  "context"
  "fmt"
  "log"

  "github.com/tmc/langchaingo/llms"
  "github.com/tmc/langchaingo/llms/openai"
)

func main() {
  ctx := context.Background()
  llm, err := openai.New()
  if err != nil {
    log.Fatal(err)
  }
  prompt := "What would be a good company name for a company that makes colorful socks?"
  completion, err := llms.GenerateFromSinglePrompt(ctx, llm, prompt)
  if err != nil {
    log.Fatal(err)
  }
  fmt.Println(completion)
}
$ go run .
Socktastic

Resources

Here are some links to blog posts and articles on using Langchain Go:

Contributors

langchaingo's People

Contributors

abirdcfly avatar abraxas-365 avatar anush008 avatar baoist avatar byebyebruce avatar cduggn avatar chyroc avatar corani avatar crazywr avatar dependabot[bot] avatar devalexandre avatar devinyf avatar edocevol avatar eliben avatar elnoro avatar evilfreelancer avatar felixgunawan avatar fluffykebab avatar lowczarc avatar mdelapenya avatar mvrilo avatar nekomeowww avatar nidzola avatar obitech avatar pieterclaerhout avatar ryomak avatar struki84 avatar tmc avatar tobiade avatar zivkovicn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

langchaingo's Issues

Stack overflow error `RecursiveCharacter`

code :

package vectorizer

import (
	"context"

	"github.com/tmc/langchaingo/textsplitter"
)

func Splitter(ctx context.Context, content string) ([]string, error) {
	splitter := textsplitter.RecursiveCharacter{Separators: []string{" "}}
	res, err := splitter.SplitText(content)
	if err != nil {
		return nil, err
	}
	
	return res, nil
}

Error:

runtime: goroutine stack exceeds 1000000000-byte limit
runtime: sp=0xc0204b4368 stack=[0xc0204b4000, 0xc0404b4000]
fatal error: stack overflow

runtime stack:
runtime.throw({0xb5dbcf?, 0x10ec200?})
        /usr/lib/go-1.18/src/runtime/panic.go:992 +0x71
runtime.newstack()
        /usr/lib/go-1.18/src/runtime/stack.go:1101 +0x5cc
runtime.morestack()
        /usr/lib/go-1.18/src/runtime/asm_amd64.s:547 +0x8b

Introduce `chatModels`

I already started with openai but have the need to discuss how we want to map langchain to langchaingo in a few instances. I'll make a PR soon.

[feature request] pass in openai.Option options into embeddings.NewOpenAI() (to be able to use openai.WithToken(OPENAI_API_KEY) without env var)

I was wondering if some others would also agree that it would make sense to refactor the embeddings NewOpenAI function so that you can pass in the OPENAI_API_KEY as an openai.Option ( like done in the openaillm.go -> NewChat(opts ...Option) ) and do not depend on setting the OPENAI_API_KEY as environment variable.

This would make sense so you can actually make use of your users apiKeys for embedding documents.

embeddings/openai.go
embeddings/openai_test.go
llms/openai/openaillm.go

feature: add Map-Rerank Combine Documents Chain

Description

This method involves running an initial prompt on each chunk of data, that not only tries to complete a task but also gives a score for how certain it is in its answer. The responses are then ranked according to this score, and the highest score is returned.

[source]

Acceptance Criteria

  • It must rank in parallel with the ranking for each chunk of data and return the result with the highest score.
  • It must work similarly to "Map Reduce Combine Documents Chain" but perform fewer LLM calls.
  • It shouldn't combine results from one document but return a response for each chunk of data.

References

run openai-chat-example failed due to undefined func

I am new to langchaingo. When I tried to run this openai-chat-example locally, it tolds me

❯ go run .
# basic-llm-example
./openai_chat_example.go:22:10: undefined: llms.WithStreamingFunc

I did a quick check on this part. Found
image

[feature request] Set custom http client for OpenAI

  • I am creating tests that don't make calls to the LLM by mocking the HTTP client.
  • The underlying openai client supports passing in an HTTP client as an option, but the wrapper around it does not (it hard codes the options passed in openaiclient.New(options.token, options.model, options.baseURL, options.organization) .
  • The Go Openai client also supports setting the client using WithHTTPClient, but to use this, the client on the LLM struct would need to be accessible.

Proposed Fix
Make the openai client accessible with a getter GetClient

Features request: Converting llms.CallOptions to local-llm script arguments

Hi!

I have a suggestion regarding the options we currently have.

In localllm.go file, we define the following strings:

const (
	// The name of the environment variable that contains the path to the local LLM binary.
	localLLMBinVarName = "LOCAL_LLM_BIN"
	// The name of the environment variable that contains the CLI arguments to pass to the local LLM binary.
	localLLMArgsVarName = "LOCAL_LLM_ARGS"
)

My idea is to make these environment variables optional. It would be better to have the ability to change them programmatically. For example, if the bin and args values are not set in the New function, we can use the values from the environment variables. If neither is set, we can check and trigger an error.

As a result, we would have something like this:

func New(opts ...Option) (*LLM, error) {
	options := &options{
		bin: os.Getenv(localLLMBinVarName),
		args: os.Getenv(localLLMArgsVarName),
	}

	for _, opt := range opts {
		opt(options)
	}

        path, err := exec.LookPath(options.bin)
	if err != nil {
		return nil, errors.Join(ErrMissingBin, err)
	}

	c, err := localclient.New(path, options.args)
	return &LLM{
		client: c,
	}, err
}

Another idea is to implement dynamic generation of the arguments array based on the options passed through the llms.CallOptions struct.

Before the execution stage, we can append these options to the args array. This would result in a string like the following:

/path/to/executable/file --top_k="1" --top_p="10" --seed="42" --prompt="prompt"

What are your thoughts on these ideas?

SplitText not working

Do have a look. I need this library in my project to integrate.
Also please assign me. I will also try to fix it if I can.
Are unit tests written for the functions?

Add model parameter to openai.New()

The following New function in llms/openai/openaillm.go gets the model form either the environment, or from a default value in llms/openai/internal/openaiclient/openaiclient.go. The default value is behind an internal package. I propose that the openai.New() function a take an optional argument (using the optional function parameter pattern) that accepts an option for model. This would easily and elegantly allow multiple instances to openai.New() to use different models and not be fixed to a single environment variable.

// New returns a new OpenAI client.
func New() (*LLM, error) {
	// Require the OpenAI API key to be set.
	token := os.Getenv(tokenEnvVarName)
	if token == "" {
		return nil, ErrMissingToken
	}

	// Allow model selection.
	model := os.Getenv(modelEnvVarName)

	// Create the client.
	c, err := openaiclient.New(token, model)
	return &LLM{
		client: c,
	}, err
}

Allows setting custom http client

Reasons for example: In many areas, the local network cannot directly access the address of openai, and proxy settings are required to access

[Feature Request] Introducing MessagePromptTemplate

First of all, please correct me if I am mistaken about your roadmap plan.

Your proposal is related to a problem

Currently, I couldn't find a proper way to invoke schema.ChatMessage with the package prompts. The only related one I found is prompts.StringPromptValue which Format the message to schema.HumanChatMessage.

The original Langchain provides different types of MessagePromptTemplate such as AIMessagePromptTemplate, HumanMessagePromptTemplate, and SystemMessagePromptTemplate.

FYI: https://langchain-langchain.vercel.app/docs/modules/model_io/prompts/prompt_templates/msg_prompt_templates

This is more convenient instead of using schema.ChatMessage directly.

Describe the solution you'd like

So let's introduce the same thing into this lib.

To declare an interface BaseMessagePromptTemplate inside prompts:

type BaseMessagePromptTemplate interface {
	IntputVariables() []string
	FormatPrompt(values map[string]interface{}) (schema.PromptValue, error)
	FormatMessages(values map[string]interface{}) ([]schema.ChatMessage, error)
}

Then we implement AIMessagePromptTemplate, HumanMessagePromptTemplate, and SystemMessagePromptTemplate based on prompts.PromptTemplate for the interface. That tree prompt templates should produce their respective schema.PromptValue (AIPromptValue, HumanPromptValue, and SystemPromptValue) so that can be easily used by Chat Models in such way:

func TestChatWithTemplate(t *testing.T) {
	template := prompts.NewChatPromptTemplate([]prompts.BaseMessagePromptTemplate{
		chat.NewSystemMessagePromptTemplate(
			`You are a translation engine that can only translate text and cannot interpret it.`,
			nil,
		),
		chat.NewHumanMessagePromptTemplate(
			`translate this text from {{.input_lang}} to {{.output_lang}}:\n{{.input}}`,
			[]string{"input_lang", "output_lang", "input"},
		),
	})
	value, err := template.FormatPrompt(map[string]interface{}{
		"input_lang":  "English",
		"output_lang": "Chinese",
		"input":       INPUT,
	})
	if err != nil {
		t.Fatalf("%s", err)
	}
	completion, err := llm.Chat(context.Background(), value.Messages())
	if err != nil {
		t.Fatalf("%s", err)
	}
	fmt.Println(completion.Message.GetText())
}

Additional context

I would like to create a PR and make a contribution to this feature request if you let me know.:)

What is the role of the textKey option when defining a new Pinecone store, and Is it possible to circumvent it being used as a filter?

What is the proper usage of the Pinecone textKey field? It's described as an option for storing the text of the document a vector represents.

The way it's used however , when executing a query request, is more like a filter. It excludes document matches once the query to Pinecone has completed. In the following snippet results are excluded if they don't contain a metadata field that matches what was provided as a textKey. If no textKey is provided a default textKey ("text") is used. There doesn't appear to be a way to make a request without filtering the resulting documents.

	docs := make([]schema.Document, 0, len(response.Matches))
	for _, match := range response.Matches {
		pageContent, ok := match.Metadata[s.textKey].(string)
		if !ok {
			return nil, ErrMissingTextKey
		}

Documents that match the textKey filter are placed in a slice of type schema.Document however the metadata field is empty when returned from restQuery. match.Metadata is assigned to the Metadata field in the Document slice however it has already been deleted and no longer points to anything.

      delete(match.Metadata, s.textKey)

      docs = append(docs, schema.Document{
	  PageContent: pageContent,
	  Metadata: match.Metadata,
})

This might be a issue with my understanding of how textKey should be used but it is a little unclear. Another of the fields returned from the Pinecone query request is ID, which helps associate each vector with its corresponding document. This would also be useful to include in the Document struct ( although it is very specific to Pinecone)

retrieval_qa discard ChainOption

When I was using the RetrievalQA chain, I noticed that there was no streaming output. Upon inspection, I found that the Call function in the chains/retrieval_qa.go file was discarding the ChainOption. If there is a reason for this? I tried adding the ChainOption and passing it to the called function, and found that the streaming output started working properly.

chains/retrieval_qa.go
func (c RetrievalQA) Call(ctx context.Context, values map[string]any, _ ...ChainCallOption)

Suggest implementation technology for documentation site

We should have a high quality rendered documentation site modeled after the existing langchain docs sites.

We need to determine what tool(s) to use and set up a CI pipeline to publish somewhere (probably github pages).

python langchain uses sphinx
typescript langchain uses docusaurus

I'm open to docusaurus but a go-based toolchain would be nice.

Shall we use Hugo?

Flushing ChatHistory based on token size

Hey everyone!

I'm currently working on building a test chatbot using langchain-go and I need to be able to flush messages from the chat history when the token size of the full prompt hits a certain limit.

To tackle this, I've been digging into the codebase and exploring options to contribute to the repo. The python version has this nifty reference to BaseLanguageModel, which handles the logic for measuring the token size of the stored memory buffer. However, the Go version doesn't yet have anything similar built-in, so I ended up using the titoken-go module to get the token size of my history.

Since I'm still a bit new to GoLang, I was wondering if this difference in design approach is something specific to Golang or just a design choice made by the author. As far as I understand, Golang has its own version of inheritance through embeddings so I created my own memory wrapper by embedding the memory Buffer struct.

package memory

import (
	"github.com/pkoukk/tiktoken-go"
	mem "github.com/tmc/langchaingo/memory"
	"github.com/tmc/langchaingo/schema"
)

type AsaiMemory struct {
	*mem.Buffer
	Encoding      string
	EncodingModel string
	TokenLimit    int
	Messages      []schema.ChatMessage
}

func NewAsaiMemory() *AsaiMemory {
	m := AsaiMemory{
		Buffer:        mem.NewBuffer(),
		Encoding:      "",
		EncodingModel: "gpt-3.5-turbo",
		TokenLimit:    2800,
	}

	return &m
}

func (m *AsaiMemory) LoadMemory() error {
	m.Messages = m.ChatHistory.Messages()
	return nil
}

func (m *AsaiMemory) GetMemoryString() string {
	bufferString, err := schema.GetBufferString(m.Messages, m.HumanPrefix, m.AIPrefix)
	if err != nil {
		return ""
	}
	return bufferString
}

func (m *AsaiMemory) TrimContext() error {
	tkm, err := tiktoken.EncodingForModel(m.EncodingModel)
	if err != nil {
		return err
	}

	bufferString, err := schema.GetBufferString(m.Messages, m.HumanPrefix, m.AIPrefix)
	if err != nil {
		return err
	}

	bufferLength := len(tkm.Encode(bufferString, nil, nil))

	if bufferLength > m.TokenLimit {
		for bufferLength > m.TokenLimit {
			m.Messages = m.Messages[1:]
			bufferString, _ := schema.GetBufferString(m.Messages, m.HumanPrefix, m.AIPrefix)
			bufferLength = len(tkm.Encode(bufferString, nil, nil))
		}
	}

	return nil
}

I might be missing something, but the main pickle for me and the reason I needed the wrapper in the first place was because I couldn't access ChatHistory.messages since it's a "private" property, and I couldn't find a way to pop/slice out the messages that were overflowing the token limit, I needed my own storage that I can manipulate and access.

So I was wondering if it would be a good idea, and the simplest solution for now to just extend the existing type Buffer struct with something like:

func (m *Buffer) TrimContext(limit int, encodingModel string) error {
	tkm, err := tiktoken.EncodingForModel(encodingModel)
	if err != nil {
		return err
	}

	bufferString, err := schema.GetBufferString(m.ChatHistory.messages, m.HumanPrefix, m.AIPrefix)
	if err != nil {
		return err
	}

	bufferLength := len(tkm.Encode(bufferString, nil, nil))

	if bufferLength > limit {
		for bufferLength > limit {
			m.ChatHistory.messages = m.ChatHistory.messages[1:]
			bufferString, _ := schema.GetBufferString(m.ChatHistory.messages, m.HumanPrefix, m.AIPrefix)
			bufferLength = len(tkm.Encode(bufferString, nil, nil))
		}
	}

	return nil
}

I’m wondering if this should be included in a way in every type of memory since I don’t see a case where I won’t worry about the token size of the prompt. Maybe a separate package that will take care of token counting?

feat: FewShotPromptTemplate

It's a clean way to have FewShotPromptTemplate, to implement the ConstitutionalChain, and also enable people to build prompts in that way.
source

I'd like to work on it.

Need to fix and improve the implementations and tests under `exp` sub-directory

Summary

It is impossible to develop, fix, and improve now because of the #14 changes didn't rename the imported packages in tests of exp packages, and the test code and file structure isn't idiomatic for Golang.

And also, I found out there are a lot of bugs and critical logical issues of implementation and tests within the packages under the exp sub-dir, however, I have to go step by step and break down things apart in order to boost up the review speed.

TODOs

  • Fix structuring issues (#17)
  • Fix import issues (#17)
  • Fix testing logical issue (such as t.FailNow() not called if error occurred and will interfere the further steps, didn't assert whether the if statement branch is entered or not, etc.) (#28 )
  • Fix the lint issues (#27, #28)
  • Fix the implementation issues (#27)

good job

I was about to write a rough version of it myself, and there were other people working on projects

feat: SimpleSequentialChain

As shown in #61, we need to have a simple sequential chain feature, which enables people to interact with the LLM through a chain of prompts, and the output of each one is fed into the next.
source

I'd like to work on it.
I'm creating this issue not to waste contributors' time, and get aligned.

Refactor LLMs to have optional parameters

Quoting @tmc from #43 :

Let's do variadic/functional Options arguments to pass these (and have some reasonable defaults).

like

type Option func(LLM)

func New(options ...Option) (*LLM, error) {
    llm := &LLM{...}
    for _, o := range options {
       o(llm)
   }
}

/e: I wanted to also document which LLMs need to be updated, but as this is changing so quickly, I don't think that's feasible atm

Features request: Options like WithSeed, WithTopK and so on

Hello!

I wanted to clarify why they were not added earlier? Perhaps there is some reason for this, or maybe you just forgot about them.

Can I add some of these options for more smooth usage of models in future PR? What should be taken into account?

feat: ConstitutionalChain

As shown in #61, we need to have a constitutional chain feature, which enables people to have some sort of control over the output of their chain.
source

I'd like to work on it.
I'm creating this issue not to waste contributors' time, and get aligned.

`outputparser.NewStructured` is restrictive & not robust enough

right now, outputparser.NewStructured is based on an internal struct (https://github.com/tmc/langchaingo/blob/main/outputparser/structured.go#L36) that's restrictive to only strings and output values, but I think it should be user-defined, a good example is from vector store here (https://github.com/tmc/langchaingo/blob/main/vectorstores/options.go#L30)

I can create a PR if this is the right way to go

Workaround

for now, I defined a custom prompt with the fields I need with a custom OutputParser and fed the prompt directly to the chain.

reading API keys from .env file

I saw this function:

func New() (*LLM, error) {
	token := os.Getenv(tokenEnvVarName)
	if token == "" {
		return nil, ErrMissingToken
	}
	c, err := huggingfaceclient.New(token)
	return &LLM{
		client: c,
	}, err
}

And I know i can modify it to read the key from .env file, using godotenv like:

package main

import (
	"fmt"
	"log"
	"os"

	"github.com/joho/godotenv"
)

func main() {
	err := godotenv.Load() // will load the ".env" file
	if err != nil {
		log.Fatalf("Some error occured. Err: %s", err)
	}

	env := os.Getenv("HUGGINGFACEHUB_API_TOKEN")
	if "" == env {
             log.Fatalf("Can not find the API")
       }
}

Is there a way to modify the New() function through something like an extension instead of modifying the package code itself.

I know ths could be considered as a general golang question rather than directly related to this package, but though to ask you. Thanks

Improve CONTRIBUTING.md

As interest is spooling up here it's more important to have clear contribution guidelines.

In scope should include:

  • Pull request sizing and scoping.
  • PR title and description norms.
  • Code quality expectations
  • Testing expectations

docs: finish first version of documentation site

The documentation site is currently missing a lot of content and needs work.

To import code from the examples directory in an mdx file you must write this at the start of the file:

import CodeBlock from "@theme/CodeBlock";
import ExampleLLM from "@examples/llm-chain-example/llm_chain.go";

Then write this where the code should be:

<CodeBlock language="go">{ExampleLLM}</CodeBlock>

Structured parser doesn't parse array

Hi, I am trying to parse an array of data using structured parser. The json I am trying to parse is

{
  "ids": [number]
}

I am getting this error map[] json: cannot unmarshal array into Go value of type string
I tried to change the data type of the parser to map[string]any and it worked perfectly.

I was wondering if this case is handled somewhere else? Or can we add it to the existing parser?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.