Comments (8)
Glad to hear it worked!
I'll look into increasing the dimensionality when I have some time, hopefully before too long. I opened #5 to track this.
from winsfs.
Thanks for reporting.
I'm slightly surprised that you see such a high number of epochs. I'd expect that 3D might run for a bit longer than 2D, but this does sound excessive. Do you have very low depth (say, <=1x), particularly high error rates, and/or large sample sizes? Also, approximately how many sites do you have in your input?
In either case, yes, running with --max-epochs
sounds like a reasonable solution. In addition, you may want to run with -vv
, which will print the SFS after each epoch. That will allow you to check that the estimate is reasonable stable towards the end.
from winsfs.
Thanks for the speedy reply. To answer your questions:
Yes I have low depth and the safs were created without using angsd's minIndDepth filter. I will report back what happens using different mindepths.
Sample size is 177.
I am generating from 277826 sites.
I just finished re-running with winsfs -vv -t 20 --max-epochs 600 popAtlAst.out.saf.idx popAtlOc.out.saf.idx popGulfAst.out.saf.idx > popAtlAst_AtlOc_GulfAst.sfs
but it didn't seem to work, despite adding the --max-epochs parameter it goes above 600, and despite adding -vv the SFS is not printed. Instead the output looks like this:
INFO [windowem] Finished epoch 616 INFO [windowem] Finished epoch 617 INFO [windowem] Finished epoch 618 INFO [windowem] Finished epoch 619
from winsfs.
Thank you, that is helpful.
I think the number of input sites explains the high number of epochs: with fewer sites in the input, more epochs will be required for convergence. I've typically seen convergence in <5-10 epochs with >100M sites input, but on smaller test files I also see more. More generally, I should say that winsfs
has mainly been developed and tested for larger inputs.
I'm very surprised you don't get stopping at 600
epochs while setting --max-epochs 600
, and that you don't get intermediate spectra with -vv
. Could I get you to try and run the test files to check? That is, that running,
wget -q https://github.com/malthesr/winsfs/raw/main/winsfs-cli/tests/data/{A,B,C}.saf.{idx,gz,pos.gz}
winsfs -vv --max-epochs 2 A.saf.idx B.saf.idx C.saf.idx 2>&1 >/dev/null | sed 's/Current SFS:.*/Current SFS: [sfs]/g'
Prints something like:
INFO [init] Opening input full (v3) SAF files:
A.saf.idx
B.saf.idx
C.saf.idx
DEBUG [init] Using 4 threads for reading
INFO [init] Reading (intersecting) sites in input SAF files into memory
DEBUG [init] Found 220000 (intersecting) sites in SAF files with shape 11/13/15
DEBUG [init] Shuffling SAF sites
DEBUG [init] Using 500 full blocks of size 440
DEBUG [init] Using window size of 100 blocks per window
DEBUG [init] Creating uniform initial SFS
DEBUG [stop] Stopping rule set to 2 epochs
INFO [windowem] Finished epoch 1
DEBUG [windowem] Current SFS: [sfs]
DEBUG [stop] Current epoch 1/2
INFO [windowem] Finished epoch 2
DEBUG [windowem] Current SFS: [sfs]
DEBUG [stop] Current epoch 2/2
from winsfs.
Hello,
It must have been a user error on my end. It works fine with your test files and now the -vv and --max-epochs seem to be working with my data, sorry about that.
Unfortunately I still get an undesirable result:
winsfs -vv -t 20 --max-epochs 600 popAtlAst.out.saf.idx popAtlOc.out.saf.idx
gives me
INFO [init] Reading (intersecting) sites in input SAF files: popAtlAst.out.saf.idx popAtlOc.out.saf.idx popGulfAst.out.saf.idx DEBUG [init] Found 230476 (intersecting) sites in SAF files with shape 133/65/73 DEBUG [init] Shuffling SAF sites DEBUG [init] Using 501 full blocks of size 460 DEBUG [init] Last block has size 16 DEBUG [init] Using window size of 100 blocks per window DEBUG [init] Creating uniform initial SFS DEBUG [stop] Stopping rule set to 600 epochs INFO [windowem] Finished epoch 1 DEBUG [windowem] Current SFS: NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN$ DEBUG [stop] Current epoch 1/600 INFO [windowem] Finished epoch 2 DEBUG [windowem] Current SFS: NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN$ DEBUG [stop] Current epoch 2/600 INFO [windowem] Finished epoch 3 DEBUG [windowem] Current SFS: NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN$ DEBUG [stop] Current epoch 3/600 INFO [windowem] Finished epoch 4
I checked to see if it was a 3D issue, but I get the same result when trying 1D sfs
winsfs -vv -t 20 --max-epochs 600 popAtlAst.out.saf.idx
Also FYI I am able to get 1D/2D sfs files with these .saf.idx files when I use realSFS
from winsfs.
The NaN
values certainly look like a bug somewhere. On the bright side, at least they explain the lack of convergence.
The easiest way for me to debug would be to have some input to reproduce the problem. Based on the number of sites and individuals, I'm guessing the smallest of these SAF files is around 10MB and might just fit in an email attachment? If so, would you be willing to email me the files (i.e. .saf.idx
, .saf.gz
, and .saf.pos.gz
for the smallest 1D case that gives NaN
) at malthe.rasmussen [at] bio.ku.dk
? Of course I won't be sharing them further. If for whatever reason that's not possible, I'll think of some other way to debug.
Thank you for your patience.
from winsfs.
This was a numerical issue that could arise in certain corner cases. It should be fixed now. I've added tests, and, to avoid this situation where the NaN
s are quietly hiding in the logs, winsfs
now checks for NaN
in the SFS after each epoch and exits with an error message if any are found.
You can re-install using e.g. cargo install --force --git https://github.com/malthesr/winsfs
. I get convergence on your 1D example in <10 epochs now, though that'll probably be a bit higher in 2-3D. Thanks a lot for helping out with this, and please let me know if any problems persist.
from winsfs.
Sweeet yep it works with 3D now (took 61 epochs), thanks so much! By the way- any plans to increase to 4D capabilities :) ?
from winsfs.
Related Issues (11)
- type parameters must be declared prior to const parameters HOT 2
- error: invalid or unsupported SAF magic number HOT 3
- Banded SAF input
- Higher dimensions HOT 1
- Calculating Fst and plotting the values HOT 2
- "Problem with size of dimension" when winsfs SFS is used with ANGSD saf2theta HOT 2
- Option to resume aborted runs? HOT 3
- bootstrapped SFS? HOT 3
- Determining FST based on winSFS output HOT 3
- No such file or directory (os error 2) error
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from winsfs.