Giter Club home page Giter Club logo

semantic-search-with-sbert's Introduction

semantic-search-with-sbert's People

Contributors

99sbr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

semantic-search-with-sbert's Issues

Missing label in InputExample?

Hi! Thanks for the article and the colab example is very useful.
Here is one question:

In colab code "semantic-search-with-sbert-faiss (1).ipynb":

with open('../input/user-query-data/generated_queries_all (1).tsv') as fIn:
    for line in fIn:
        try:
            query, paragraph = line.strip().split('\t', maxsplit=1)
            train_examples.append(InputExample(texts=[query, paragraph]))   # <--- missing label
        except:
            pass

It looks like you didn't specify the "label" (i.e the score) argument.
Since the default label value is 0, it means the provided query and paragraph pair are dissimilar, which is not what we want.
(ref. https://github.com/UKPLab/sentence-transformers/blob/master/sentence_transformers/readers/InputExample.py

Any comment?

How to return the match score for the query we're searching?

We're writing function to return the top 5 results. Can we get the match score of the top 5 results?

def fetch_info(dataframe_idx):
    info = df.iloc[dataframe_idx]
    meta_dict = {}
    meta_dict['Pdf'] = info['Pdf']
    meta_dict['Content'] = info['Content']
    meta_dict['Page no'] = info['Page no']
    return meta_dict
    
def search(query, top_k, index, model):
    t=time.time()
    query_vector = model.encode([query])
    top_k = index.search(query_vector, top_k)
    print('Results in Total Time: {}'.format(time.time()-t))
    top_k_ids = top_k[1].tolist()[0]
    top_k_ids = list(np.unique(top_k_ids))
    results =  [fetch_info(idx) for idx in top_k_ids]
    return results

The above code is the main function we'll be using for the query results

query = "Movie"
results = search(query, top_k=5, index=index, model=model)

print("")
for result in results:
    print(result)

The above code will return the top 5 results. Can we get the match score of the top 5 results?

TypeError: in method 'IndexIDMap_add_with_ids', argument 4 of type 'faiss::IndexIDMapTemplate< faiss::Index >::idx_t const *'

When i'm trying to run the code in local jupyter notebook windows 10, it's throwing an error for below code

encoded_data = model.encode(df.Plot.tolist())
encoded_data = np.asarray(encoded_data.astype('float32'))
index = faiss.IndexIDMap(faiss.IndexFlatIP(768))
index.add_with_ids(encoded_data, np.array(range(0, len(df))))
faiss.write_index(index, 'movie_plot.index')

The error is :

TypeError                                 Traceback (most recent call last)
<ipython-input-26-22c477f27f62> in <module>
----> 1 index.add_with_ids(encoded_data, np.array(range(0, len(df))))
      2 faiss.write_index(index, 'movie_plot.index')

~\t5\lib\site-packages\faiss\__init__.py in replacement_add_with_ids(self, x, ids)
    233 
    234         assert ids.shape == (n, ), 'not same nb of vectors as ids'
--> 235         self.add_with_ids_c(n, swig_ptr(x), swig_ptr(ids))
    236 
    237     def replacement_assign(self, x, k, labels=None):

~\t5\lib\site-packages\faiss\swigfaiss.py in add_with_ids(self, n, x, xids)
   4950 
   4951     def add_with_ids(self, n, x, xids):
-> 4952         return _swigfaiss.IndexIDMap_add_with_ids(self, n, x, xids)
   4953 
   4954     def add(self, n, x):

TypeError: in method 'IndexIDMap_add_with_ids', argument 4 of type 'faiss::IndexIDMapTemplate< faiss::Index >::idx_t const *'

I installed all required libraries and for faiss i installed pip install faiss-cpu

ValueError: not enough values to unpack (expected 2, got 1) for index.add_with_ids(encoded_data, ids)

ValueError: not enough values to unpack (expected 2, got 1) for index.add_with_ids(encoded_data, ids)

I'm trying to encode the data with the help of below data and code

print(df)

Output is :

 Pdf                Content                                            Page no
July 20, 2016.PDF   RESERVE BANK OF INDIA DEPARTMENT OF CURRENCY M...  3.0
July 20, 2016.PDF   RESERVE BANK OF INDIA DEPARTMENT OF CURRENCY M...  3.0 
July 20, 2016.PDF   Para 1 Authority to Impound Counterfeit Notes ...  3.0  
July 20, 2016.PDF   (i) All branches of Public Sector Banks.           3.0
July 20, 2016.PDF   (ii) All branches of Private Sector Banks and ...  3.0
...                            ...                                                                                                 
April 1, 2021.pdf   4. Motif of Mangalayan depicting the country’s...  21.0              
April 1, 2021.pdf   5. Denominational numeral २००० in Devnagari        21.0                       
April 1, 2021.pdf   side. For visually impaired Intaglio or raised...  21.0                      
April 1, 2021.pdf   11. Horizontal rectangle with ₹2000 in raised ...  21.0                       
April 1, 2021.pdf   12. Seven angular bleed lines on left and righ...  21.0                       

And the code is below :

encoded_data = model.encode(str(df.Content.tolist()))
encoded_data = np.asarray(encoded_data.astype('float32'))
index = faiss.IndexIDMap(faiss.IndexFlatIP(768))
ids = np.array(range(0, len(df)))
ids = np.asarray(ids.astype('int64'))
index.add_with_ids(encoded_data, ids)

The error is with the line of code index.add_with_ids(encoded_data, ids)

The error it's returning is :

ValueError                                Traceback (most recent call last)
<ipython-input-32-791ec30482ee> in <module>
----> 1 index.add_with_ids(encoded_data, ids)

~\t5\lib\site-packages\faiss\__init__.py in replacement_add_with_ids(self, x, ids)
    229             in result lists to mean "not found" so it's better to not use it as an id.
    230         """
--> 231         n, d = x.shape
    232         assert d == self.d
    233 

ValueError: not enough values to unpack (expected 2, got 1)

When i'm trying to add index.add_with_ids(encoded_data, ids), it's returning error like ValueError: not enough values to unpack (expected 2, got 1)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.