Giter Club home page Giter Club logo

jtr's Introduction

Hi there 👋 This is Haitao Li.

  • 😀 I’m a second-year master student at Tsinghua IR Group supervised by Prof. Yiqun Liu.
  • 🏆 My research lies in Information Retrieval and Legal Case Retrieval. I currently focus on more reliable and interpretable legal case retrieval techniques with large language models. I am also very curious about dense retrieval. The publications are available at my homepage.
  • 📫 Contact me via [email protected]

jtr's People

Contributors

cshaitao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

thuir

jtr's Issues

您好,请问一下进行Overlapped Cluster后,正负结点是如何选定的

利用KMeans初始化得到的树中,一个document仅仅被分配给了一个leaf node,这个时候正节点是document对应的leaf node及其祖先,其余为负结点,树的每一层仅有一个正节点;
而在进行一个Overlapped Cluster后,一个document被分配给了多个leaf node,此时树的每一层将会有多个正节点,那这个时候是怎么选取正负节点,以适应新的cluster assignment的?

在您的JTR/reorganize_clusters_tree.py源码中,有如下代码,如果我的理解没有出错,您在更新cluster后,dict_label直接取成了pid所对应的最后一个节点的label,即只考虑document被分配到的最后一个leaf node,而之前被分配的leaf node直接不考虑?这样的话,似乎不能适应新的cluster assignment。这是我的一点浅薄的理解,如果有错误,烦请您指出,我将感激不尽!

    dict_label = {}   
    for leaf in tree.leaf_dict:
        node = tree.leaf_dict[leaf]
        pids = node.pids
       
        for pid in pids:
            dict_label[pid] = str(node.val)

Source codes are not available and Question about HNSW parameters setting

Dear authors,

Thank you for sharing your work with the research community. I enjoyed reading your paper and learning from your insights. I found your work to be both insightful and informative. I am writing to request some additional information that would help me better understand your work.

First, I would appreciate it if you could kindly share the source codes and the dataset that you used in your paper. I understand that you processed the dataset with STAR, but I would like to reproduce your results!

Second, I am interested in knowing more about the parameter settings of HNSW that you used in your experiments. I noticed that you set the link number to 8, which I assume is the degree of a node (i.e., the parameter M). However, according to the HNSW official repo, the recommended range for M is 12-48. Have you tried any higher values for M and how did they affect the performance?

I hope you can find some time to reply to my queries. I am looking forward to hearing from you soon!

JTR for ColBERT

Hello,

First of all, I'd like to say that I really like the work you've done.

I saw the potential of using JTR to speed up token level embeddings models such as ColBERT and created neural-tree. I don't yet know how to perform hierarchical clustering with ColBERT, so I did it using TfIdf or SentenceTransformer, then export the created tree and average ColBERT embeddings into corresponding nodes. The speed gains are quite impressive for ColBERT, well done to you.

Have a great day,

Raphaël

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.