Giter Club home page Giter Club logo

Comments (8)

prashantpandey avatar prashantpandey commented on August 20, 2024

Hi Bob,

You are right, SSL is not needed. It came in because initially we tried multiple other hash functions but later we stuck with MurmurHash. Though, we removed other hash functions before making the code public the SSL dependency was missed. I have removed it now.

Also, regarding the segfault during the inner-product query, there was another issue that was reported earlier in which I fixed a "qf_read" bug. That should fix the inner-product segfault issue. Please try the latest code. If the issue is still there then let me know.

Thanks,
Prashant

from squeakr.

rsharris avatar rsharris commented on August 20, 2024

Fetched the code today, built, and tested. No segfault.

Not clear to me whether the answer I'm getting from squeakr-inner-prod is correct or not. I created an example fastq with ten 177-bp reads of DNA from a PRNG, which should be 1500 kmers and no duplications (kmer size 28 is built into the source code, IIRC). Call that "A". Another PRNG fastq of the same size, "B". And a third fastq "C" which is the first nine reads from "A".

squeakr-inner-prod("A","B") gives zero, which seems correct.

squeakr-inner-prod("A","C") gives zero. I expected the answer should be 1350, since there should be 1350 kmers that appear in both sets, all with abundance=1. Is my expectation wrong?

This is what I did:

$ some_random_fastq_command 10x177 > A.fq
$ cat A.fq | head -n 36 > C.fq
$ ../squeakr-nov-2017/squeakr-count 0 20 1 A.fq
Num distinct elem: 1500
Total num elems:   1500
$ ../squeakr-nov-2017/squeakr-count 0 20 1 C.fq
Num distinct elem: 1350
Total num elems:   1350
$ ../squeakr-nov-2017/squeakr-inner-prod A.fq.ser C.fq.ser
Inner product: 0

from squeakr.

prashantpandey avatar prashantpandey commented on August 20, 2024

Hi Bob,

I think the reason is that squeakr is using a different seed (the seed is generated using time(NULL)) for the hash function for each fastq file.
Therefore, even if a k-mer appears in two fastq files the hash of that k-mer will be different in CQF outputs. And the inner-product program will not report that k-mer.
I can change the code to take the seed value as an optional argument to squeakr. That way if users want to compare two CQF output files they can specify the same seed value for each input fastq file.

I will try and do that soon. Will let you know after I push the change.

Thanks,
Prashant

from squeakr.

rsharris avatar rsharris commented on August 20, 2024

No need to hurry on my behalf.

An alternative to allowing the seed to be specified is to make the default seed be the same for all runs. Not knowing the program all that well, I don't know whether there might be some use cases that benefit from a randomly chosen seed.

from squeakr.

rtjohnso avatar rtjohnso commented on August 20, 2024

from squeakr.

prashantpandey avatar prashantpandey commented on August 20, 2024

Hi Bob,

I pushed a patch that uses a fixed seed. Please try it and let us know if you still see the same issue in the inner-product query.

Thanks,
Prashant

from squeakr.

rsharris avatar rsharris commented on August 20, 2024

Howdy,

Cool, looks like it works.

Tried my test on it (described in Nov/6 post above), and it gives me the expected answer -- inner product = 1350.

I don't know whether you are intending this package as proof-of-concept, or as a full-blown idiot-proof tool.

If it's the latter, and you expect that the fixed seed might be changed in the future, or if you expect to allow users to choose seeds, then you probably ought to store the seed in the .ser file. (maybe you already do, I haven't looked at the code for that). And then if the user tries to compute inner product on CQFs from different seeds, you could reject it (instead of quietly reporting an "incorrect" answer).

I'm not suggesting you SHOULD do that. I'm just raising it as a potential usage issue that could bite people in the future.

Also realize I'm not, at present, a user. I had just read the paper and thought "hey, this is a cool object" and decided to try it out.

Thanks,
Bob H

from squeakr.

prashantpandey avatar prashantpandey commented on August 20, 2024

Hi Bob,

Great!! And thanks a bunch for reporting these issues.
That's actually a cool suggestion. I do keep the seed in the .ser file. And checking if the seeds are same would be trivial to do.
I will write a patch for that.

Thanks,
Prashant

from squeakr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.