Giter Club home page Giter Club logo

cherche's Issues

active project

Just curious if this project is still active. It looks great, thank for working on it!

"IndexError: index out of range in self "While adding documents to cherche pipeline

I'm using a cherche pipline built of a tfidf retriever with a sentencetransformer ranker as follows : search = (retriever + ranker)
While trying to add documents to the pipeline (search.add(documents=documents), I got this error :

"""/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
2181 # remove once script supports set_grad_enabled
2182 no_grad_embedding_renorm(weight, input, max_norm, norm_type)
-> 2183 return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
2184
2185

IndexError: index out of range in self"""

k param when creating sbert retriever not taken into account

Create a retriever based on a sentence bert, passing a value, eg. 10, to k param.
It is not taken into account when calling the retriever (more values are returned)

    retriever = retrieve.Encoder(
        key='id',
        on=['content'],
        encoder=SentenceTransformer('sentence-transformers/all-MiniLM-L12-v2').encode,
        k = 10
    )
    retriever(documents=docs)

len(retriever(queries)[0]) > 10

k param in retriever, ranker and pipeline, and documentation

the doc at https://raphaelsty.github.io/cherche/api/compose/Pipeline/
regarding the "call" method says:

If the batch_size_ranker, or batch_size_retriever it takes precedence over the batch_size. If the k_ranker, or k_retriever it takes precedence over the k parameter.

which is not really understandable, needs to be clarified (and could be interpreted as something misleading).

Regarding the k param, please note the following: if you define a retriever (say a tfidf one) with a k param of 20, followed by a ranker with a k param of 10, (your interested in top_k = 10 values at the end, but use 20 values at the retriever level) then a likely error one can make is to call the pipeline with a k value of 10. In this case indeed, it appears that the retriever uses a k value of 10.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.