Giter Club home page Giter Club logo

Comments (8)

malthesr avatar malthesr commented on August 12, 2024 1

Glad to hear it worked!

I'll look into increasing the dimensionality when I have some time, hopefully before too long. I opened #5 to track this.

from winsfs.

malthesr avatar malthesr commented on August 12, 2024

Thanks for reporting.

I'm slightly surprised that you see such a high number of epochs. I'd expect that 3D might run for a bit longer than 2D, but this does sound excessive. Do you have very low depth (say, <=1x), particularly high error rates, and/or large sample sizes? Also, approximately how many sites do you have in your input?

In either case, yes, running with --max-epochs sounds like a reasonable solution. In addition, you may want to run with -vv, which will print the SFS after each epoch. That will allow you to check that the estimate is reasonable stable towards the end.

from winsfs.

jamesfifer avatar jamesfifer commented on August 12, 2024

Thanks for the speedy reply. To answer your questions:
Yes I have low depth and the safs were created without using angsd's minIndDepth filter. I will report back what happens using different mindepths.
Sample size is 177.
I am generating from 277826 sites.

I just finished re-running with winsfs -vv -t 20 --max-epochs 600 popAtlAst.out.saf.idx popAtlOc.out.saf.idx popGulfAst.out.saf.idx > popAtlAst_AtlOc_GulfAst.sfs
but it didn't seem to work, despite adding the --max-epochs parameter it goes above 600, and despite adding -vv the SFS is not printed. Instead the output looks like this:
INFO [windowem] Finished epoch 616 INFO [windowem] Finished epoch 617 INFO [windowem] Finished epoch 618 INFO [windowem] Finished epoch 619

from winsfs.

malthesr avatar malthesr commented on August 12, 2024

Thank you, that is helpful.

I think the number of input sites explains the high number of epochs: with fewer sites in the input, more epochs will be required for convergence. I've typically seen convergence in <5-10 epochs with >100M sites input, but on smaller test files I also see more. More generally, I should say that winsfs has mainly been developed and tested for larger inputs.

I'm very surprised you don't get stopping at 600 epochs while setting --max-epochs 600, and that you don't get intermediate spectra with -vv. Could I get you to try and run the test files to check? That is, that running,

wget -q https://github.com/malthesr/winsfs/raw/main/winsfs-cli/tests/data/{A,B,C}.saf.{idx,gz,pos.gz}
winsfs -vv --max-epochs 2 A.saf.idx B.saf.idx C.saf.idx 2>&1 >/dev/null | sed 's/Current SFS:.*/Current SFS: [sfs]/g'

Prints something like:

INFO  [init] Opening input full (v3) SAF files:
	A.saf.idx
	B.saf.idx
	C.saf.idx
DEBUG [init] Using 4 threads for reading
INFO  [init] Reading (intersecting) sites in input SAF files into memory
DEBUG [init] Found 220000 (intersecting) sites in SAF files with shape 11/13/15
DEBUG [init] Shuffling SAF sites
DEBUG [init] Using 500 full blocks of size 440
DEBUG [init] Using window size of 100 blocks per window
DEBUG [init] Creating uniform initial SFS
DEBUG [stop] Stopping rule set to 2 epochs
INFO  [windowem] Finished epoch 1
DEBUG [windowem] Current SFS: [sfs]
DEBUG [stop] Current epoch 1/2
INFO  [windowem] Finished epoch 2
DEBUG [windowem] Current SFS: [sfs]
DEBUG [stop] Current epoch 2/2

from winsfs.

jamesfifer avatar jamesfifer commented on August 12, 2024

Hello,
It must have been a user error on my end. It works fine with your test files and now the -vv and --max-epochs seem to be working with my data, sorry about that.
Unfortunately I still get an undesirable result:
winsfs -vv -t 20 --max-epochs 600 popAtlAst.out.saf.idx popAtlOc.out.saf.idx gives me

INFO [init] Reading (intersecting) sites in input SAF files: popAtlAst.out.saf.idx popAtlOc.out.saf.idx popGulfAst.out.saf.idx DEBUG [init] Found 230476 (intersecting) sites in SAF files with shape 133/65/73 DEBUG [init] Shuffling SAF sites DEBUG [init] Using 501 full blocks of size 460 DEBUG [init] Last block has size 16 DEBUG [init] Using window size of 100 blocks per window DEBUG [init] Creating uniform initial SFS DEBUG [stop] Stopping rule set to 600 epochs INFO [windowem] Finished epoch 1 DEBUG [windowem] Current SFS: NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN$ DEBUG [stop] Current epoch 1/600 INFO [windowem] Finished epoch 2 DEBUG [windowem] Current SFS: NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN$ DEBUG [stop] Current epoch 2/600 INFO [windowem] Finished epoch 3 DEBUG [windowem] Current SFS: NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN$ DEBUG [stop] Current epoch 3/600 INFO [windowem] Finished epoch 4

I checked to see if it was a 3D issue, but I get the same result when trying 1D sfs
winsfs -vv -t 20 --max-epochs 600 popAtlAst.out.saf.idx

Also FYI I am able to get 1D/2D sfs files with these .saf.idx files when I use realSFS

from winsfs.

malthesr avatar malthesr commented on August 12, 2024

The NaN values certainly look like a bug somewhere. On the bright side, at least they explain the lack of convergence.

The easiest way for me to debug would be to have some input to reproduce the problem. Based on the number of sites and individuals, I'm guessing the smallest of these SAF files is around 10MB and might just fit in an email attachment? If so, would you be willing to email me the files (i.e. .saf.idx, .saf.gz, and .saf.pos.gz for the smallest 1D case that gives NaN) at malthe.rasmussen [at] bio.ku.dk? Of course I won't be sharing them further. If for whatever reason that's not possible, I'll think of some other way to debug.

Thank you for your patience.

from winsfs.

malthesr avatar malthesr commented on August 12, 2024

This was a numerical issue that could arise in certain corner cases. It should be fixed now. I've added tests, and, to avoid this situation where the NaNs are quietly hiding in the logs, winsfs now checks for NaN in the SFS after each epoch and exits with an error message if any are found.

You can re-install using e.g. cargo install --force --git https://github.com/malthesr/winsfs. I get convergence on your 1D example in <10 epochs now, though that'll probably be a bit higher in 2-3D. Thanks a lot for helping out with this, and please let me know if any problems persist.

from winsfs.

jamesfifer avatar jamesfifer commented on August 12, 2024

Sweeet yep it works with 3D now (took 61 epochs), thanks so much! By the way- any plans to increase to 4D capabilities :) ?

from winsfs.

Related Issues (11)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.