Giter Club home page Giter Club logo

minhash's People

Contributors

deka0106 avatar dependabot[bot] avatar marevol avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

minhash's Issues

[Question] similarity

On README.md you put this case:

// Compare a different text.
String text2 = "Solr is the popular, blazing fast open source enterprise search platform";
byte[] minhash2 = MinHash.calculate(analyzer, text2);
assertEquals(0.453125f, MinHash.compare(minhash, minhash2));

Please, help-me to understand this algorithm. Why the compare of two very different string the value is 0.453125f. Why not a number more near to 0?

Thanks

How the code works?

Hi, I've been learning to use this code recently. Can I think this code as the following three steps? First, we use lucene to generate text sets. Then we use a family of hash functions to obtain the minhash values. Finally we reduce the minhash value length to b-bit. That why we finally got a num* hashbit bits minhash. Am I right? I would appreciate it if you could reply this question.

For Chinese the similarity is not very accurate

The code
String text = "新冠疫苗效果不错"; byte[] minhash = calculateMinHash(text); String text1 = "每天吃饭呀哈哈哈"; byte[] minhash1 = calculateMinHash(text1); float score1 = MinHash.compare(minhash, minhash1);
the result is "0.546875"
you readme result below 0.5 is not simolarlity
But now ,The two text is not simolarlity.But result has been greater than 0.5.
If I use have problem ,Can you help me. Thanks

Tokenizer tokenizer = ...;

What goes in "..." for Tokenizer in the example you wrote in README.md?
I would like to replicate the example but I am confused there.
Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.