Giter Club home page Giter Club logo

winsfs's Introduction

winsfs

GitHub Actions status

winsfs is a tool for inference of the site frequency spectrum ("SFS") from low-depth sequencing data. The associated manuscript is published in Genetics, and a pre-print is available on bioRxiv.

In overview, winsfs iteratively estimates the SFS on smaller blocks of data conditional on the current estimate, and then updates the estimate as the average over a window of such block estimates.

Contents

  1. Quickstart
    1. Coming from realSFS
  2. Motivation
  3. Usage
    1. Input
    2. Estimation
    3. Output
    4. Streaming
  4. Utilities
    1. View
  5. Installation
    1. Latest release
    2. Current git

Quickstart

Assuming winsfs has been installed and SAF files have already been made, default estimation can be run in one or two dimensions:

winsfs $saf1 > $sfs
winsfs $saf1 $saf2 > $sfs

Here, $saf1/$saf2 is the path to any SAF member file (i.e. some file with extension .saf.idx, .saf.pos.gz, or .saf.gz) and $sfs is where you wish to write the finished SFS estimate.

Note that the estimated SFS is written to stdout, so make sure to redirect as desired.

The Usage section below goes into more detail. See also winsfs -h (short help) or winsfs --help long help for an overview of options.

Coming from realSFS

If you're already familiar with the realSFS program from ANGSD, usage of winsfs should be straightforward. The input format is the same, so in

realSFS $saf1 > $sfs
realSFS $saf1 $saf2 > $sfs

simply replace realSFS by winsfs:

winsfs $saf1 > $sfs
winsfs $saf1 $saf2 > $sfs

The command-line options to winsfs differ from those available in realSFS, but should not be necessary in general. See more in the Usage section below. By default, winsfs is quiet, unlike realSFS. You can add a -v or -vv flag to print some information to stderr while running.

Likewise, the output format should be familiar: winsfs outputs two lines, where the second is the same format as the realSFS output; the first is a small header line with some information about the shape of the SFS. See Output for more details.

Motivation

Estimating the SFS from called genotypes is typically fairly straight-forward. However, it has been shown that estimating the SFS from genotypes using low-depth sequencing data creates significant bias, which propagates to downstream inference. As a very rough rule of thumb, this is true up until around 10x coverage.

winsfs is a method for addressing this issue by inferring the SFS from genotype likelihoods using an stochastic optimisation algorithm. Some of its benefits are highlighted here; for a full discussion of the method and the various ways it has been evaluated, please see the associated article.

sim

The figure above shows the two-dimensional SFS estimated by winsfs (middle) with default parameters compared to the known truth (left) for simulated 2x data from two samples of 20 individuals. winsfs reports convergence after 8 passes through the data ("epochs") and accurately recovers the true spectrum. In comparison, realSFS (right), which is the most widely used current method, takes 101 epochs before converging and presents a "checkerboard" pattern in the interior of the SFS. By averaging over smaller block estimates of the spectrum, winsfs has an implicitly smoothing effect on the spectrum, which tends to improve inference when a large number of parameters must be estimated with little available information.

In general, winsfs requires very few epochs to converge: almost always less than 10, and typically only 2-5. In addition, the implementation aims to be efficient. The figure below shows the computational requirements of winsfs (again compared to realSFS) used for estimating the two-dimensional SFS of approximately 0.6B sites of low-quality, real-world data.

bench

The figure shows winsfs in the main usage mode, but also the so-called "streaming mode". Since only a few passes over the input data are required, it is possible to run winsfs without reading data into RAM. This increases the run-time, but significantly decreases the memory requirements. More details are available in the Streaming section.

Usage

Input

The input for winsfs is so-called "site allele frequency" ("SAF") likelihoods calculated for each input population separately. It is possible to think of SAF likelihoods as the generalisation of genotype likelihoods from one individual to a population. The winsfs article provides some theoretical background, and you can also look at the original article about SAF likelihoods for more information.

SAF likelihoods are stored in SAF files from the ANGSD software suite. SAF files are split across three separate files that end in .saf.idx, .saf.pos.gz, or .saf.gz. In the simplest case, SAF files can be generated from a list of BAM files for a population:

angsd -b $bamlist -anc $ref -out $prefix -dosaf 1 -gl 2

where $bamlist is the list of BAMs, $ref is a FASTA reference, and $prefix determines the output location (i.e. -out abc will create SAF files abc.saf.idx, abc.saf.pos.gz, and abc.saf.gz).

In general, however, it is advisable to filter the input data before SAF creation. Which filters to use will depend on the kind of data you have. For low-depth whole-genome short-read sequencing, this article describes a good filtering workflow (cf. "strictref" filter) and has code available.

For the purposes of trying out winsfs, some test SAF files are available in this repository and can be downloaded by running:

wget -q https://github.com/malthesr/winsfs/raw/main/winsfs-cli/tests/data/{A,B}.saf.{idx,gz,pos.gz}

This will download two SAF files (e.g. six files total) {A,B}.saf.{idx,gz,pos.gz} to the current working directory. We will use these files below for illustration.

If you wish to run joint SFS estimation for multiple populations, note that it is not required that these contain the same sites. winsfs will automatically intersect the input to get only sites present in all populations. The flip-side, of course, is that you should be aware that non-intersecting sites are ignored.

Estimation

To run winsfs with default parameters for a single population (here the A population from above), simply run:

winsfs A.saf.idx > A.sfs

For two populations, simply add another SAF file member path (here from the B population):

winsfs A.saf.idx B.saf.idx > A-B.sfs

winsfs will run quietly until finished and print the estimated SFS to stdout, redirecting to A.sfs or A-B.sfs in the above examples. Note that for genome-scale data, this could take a while, especially in multiple dimensions, so you may wish to run in the background or in a detachable terminal multiplexer. Also note that this will read the contents of the SAF file(s) into RAM: depending on the input file size, this may require a significant amount of RAM. For a sense of scale, in the benchmark in the winsfs article, SFS estimation for a single population with 12 individuals approx 0.6B sites required 63GB of RAM. If this is not an option for you, see the streaming section below.

By default, winsfs is quiet and prints nothing to the terminal. You may wish to add -v to get a bit of information about progress, or -vv to see more information, including how the SFS looks after each epoch of optimisation. In addition, winsfs can be made to run faster by increasing the number of threads using -t/--threads if more than the default four are available. Finally, it is possible to set a seed for winsfs using -s/--seed for reproducibility.

It is also possible to tweak the hyperparameters of winsfs (using the -b/--block-size, -B/--blocks, and -w/--window-size flags), but this is not generally recommended. Based on our experiences, the defaults should work well for a wide range of inputs. Likewise, it is possible to change the stopping criteria (-l/--tolerance and/or --max-epochs), but this should likewise not be necessary.

These and more options can also be seen by running winsfs -h (for a short description of each flag) or winsfs --help (for a longer description).

Output

The output format consists of two lines. The first is a header lines giving the shape of the SFS. The second line is the SFS itself printed in flat, row-major (also known as C-major) format.

For example, running:

winsfs --seed 1 A.saf.idx B.saf.idx

results in the following output:

#SHAPE=<11/13>
218928.885549 176.078359 121.493226 44.881154 34.052637 24.050088 1.685736 0.558335 3.975430 0.000001 0.000372 9.115605 0.000000 225.017010 0.585755 0.000000 0.000000 1.320827 0.444349 0.000000 0.000000 2.239774 0.077091 0.000000 0.000000 0.000000 91.941164 1.249722 0.000000 0.000000 0.172935 0.000004 0.000000 0.000000 6.521391 2.827981 0.069702 0.000000 0.000000 16.983798 0.001461 15.931486 0.423076 0.101538 0.167897 0.000009 2.704495 0.000154 0.000000 0.000000 0.000000 0.000000 74.574485 0.000000 6.460728 3.276808 0.001221 0.012219 0.005795 19.987612 0.000010 0.000000 0.000000 4.481677 0.000000 19.241780 0.000000 0.000000 0.000038 0.000002 0.003287 0.030075 0.002316 0.000023 0.000003 2.246225 0.000120 3.272240 0.850840 0.000000 0.000001 4.961961 3.597735 0.728045 0.064433 0.000023 0.137432 11.529768 9.486486 0.000000 0.060813 19.449133 0.000000 0.845622 8.274797 0.000009 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 8.551725 0.023502 0.000000 0.774367 0.006260 0.092155 2.469819 0.000243 0.000000 1.889402 0.000000 0.000000 0.000000 6.180176 0.000000 0.000000 0.000000 5.095501 7.922833 14.515402 0.000000 0.000000 0.000000 0.000000 0.000161 0.000000 1.554942 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 14.549382 0.000000 29.232257

The header line tells us that this SFS has shape 11/13, i.e. it can be read as a matrix with 11 rows and 13 columns. Since the format is row-major, the first 13 values in the second line corresponds to the first row of this matrix; then the next 13 values correspond to the second row, and so on.

Note also that the output SFS is unnormalised: the values in the SFS sums to the total number of (intersecting) input sites. Hence, to get the SFS on probability scale, you can simply divide each value by the sum.

See the View section for normalising or folding the output spectrum, or for conversion to other formats.

Streaming

It is possible to run winsfs in so-called "streaming mode". Unlike the main usage mode described above, streaming mode uses only a trivial amount of RAM (a couple of MB, say), but this comes at the expense of disk space usage and longer run-time. If you have the RAM required to run in the main usage mode, doing so will be more convenient.

To run in streaming mode, an intermediate file must be produced. Briefly, this is required to (jointly) shuffle around the input sites to break linkage disequilibrium patterns. The winsfs shuffle sub-command is used for this preparatory step. With a single population:

winsfs shuffle --output A.saf.shuf A.saf.idx

This will write the intermediate, shuffled file to A.saf.shuf. For technical reasons, this file cannot be written to stdout, so the output file destination must be provided via the -o/--output flag.

Using this file, it is possible to run winsfs as normal:

winsfs A.saf.shuf > A.sfs

This will automatically run in streaming mode based on the input file format. The usual flags and options apply (see above), except that streaming mode can only run on a single thread.

Running streaming mode in two dimensions is similar:

winsfs shuffle --output A-B.saf.shuf A.saf.idx B.saf.idx
winsfs A-B.saf.shuf > A-B.sfs

Note, however, that the input to the winsfs in the second line is only a single file now, since populations A and B have been jointly shuffled into the A-B.saf.shuf file. This is by necessity: it is not possible to run winsfs shuffle for each of the A and B populations separately and then run two-dimensional estimation from the results of the output.

Utilities

Apart from the main tools to estimate the SFS, winsfs contains some subcommand to work with frequency spectra in general. Note that the following can be used whether or not the spectrum has been created by winsfs.

View

The winsfs view subcommand can fold the SFS, normalise the SFS, and convert it between formats. We can create an SFS for demonstration:

winsfs --seed 1 A.saf.idx > A.sfs
cat A.sfs
#SHAPE=<11>
219338.725607 234.737776 95.505146 32.339889 124.751169 2.732751 71.741684 18.504599 0.004084 37.070165 43.887130

We can normalise the SFS with the -n/--normalise flag:

winsfs view --normalise A.sfs
#SHAPE=<11>
0.996994 0.001067 0.000434 0.000147 0.000567 0.000012 0.000326 0.000084 0.000000 0.000169 0.000199

Or fold it using -f/--fold:

winsfs view --fold A.sfs
#SHAPE=<11>
219382.612737 271.807941 95.509230 50.844488 196.492853 2.732751 0.000000 0.000000 0.000000 0.000000 0.000000

winsfs view also supports conversion between the standard plain text format and the numpy npy binary format using the -o/--output-format flag, which may be helpful for downstream processing of the SFS in python.

Installation

A recent Rust toolchain is required to install winsfs. Currently, the Rust toolchain can be installed by running:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source $HOME/.cargo/env

See instructions for more details.

Once the Rust toolchain is installed (see above), the latest winsfs release can be installed using cargo:

Latest release

cargo install winsfs-cli

This will install the winsfs binary to $HOME/.cargo/bin by default, which should be in the $PATH after installing cargo. Alternatively:

cargo install winsfs-cli --root $HOME

Will install to $HOME/bin.

Current git

The latest git version may include more (potentially experimental) features, and can be installed using:

cargo install --git https://github.com/malthesr/winsfs

winsfs's People

Contributors

malthesr avatar

Stargazers

James McKenna avatar  avatar Cameron So avatar Alastair avatar Moritz Blumer avatar Alex Blumenfeld avatar R. Nicolas Lou avatar Isin Altinkaya avatar Dave Carlson avatar Alexis Simon avatar peterdfields avatar  avatar

Watchers

 avatar

winsfs's Issues

No such file or directory (os error 2) error

Hi there,

I recently downloaded winsfs and started to get No such file or directory (os error 2) error. It appears that this is only the case in newer versions as I can run the tool fine on an older download. From my understanding, it would have happened in the past few months (January or February). Really like this tool though!

Additionally, I followed all of the specified download steps both times so I don't think it is an issue of missing an instillation step.

Thank you!

--Chris

"Problem with size of dimension" when winsfs SFS is used with ANGSD saf2theta

Hello! I am excited to upgrade to winsfs in my genotype likelihood analyses.

I am trying to use ANGSD to calculate theta statistics and I'm running into an error when I try to do it with winsfs. Apparently winsfs creates an SFS that has 1 fewer dimensions than realSFS does, and this trips up ANGSD. Does anyone know where this discrepancy comes from? Is there a way to edit the winsfs output so that it's in the form ANGSD expects?

For now I will have to stick with realSFS, but please let me know if there is a way to get around this and use winsfs!

Thanks,
-Teresa

Code with winsfs:

$ANGSDIR/angsd -b $BAMS -out $SAF \
-anc flinflon/$REF \
-minMapQ 30 -minQ 30  \
-dosaf 1 -GL 2 -sites $SITES


winsfs $SAF.saf.idx > $SFS

$ANGSDIR/misc/realSFS saf2theta $SAF.saf.idx -outname $THETAOUT -sfs $SFS -fold 1

Error message:

[persaf::persaf_init] Version of flinflon/BWABHVI/ANGSD/Vphil_subsamp2k.saf.idx is 3
[persaf::persaf_init] Assuming .saf.gz file is flinflon/BWABHVI/ANGSD/Vphil_subsamp2k.saf.gz
[persaf::persaf_init] Assuming .saf.pos.gz file is flinflon/BWABHVI/ANGSD/Vphil_subsamp2k.saf.pos.gz
	-> args: tole:0.000000 nthreads:4 maxiter:100 nsites(block):0 start:flinflon/BWABHVI/ANGSD/Vphil_subsamp2k.1dsfs chr:(null) start:-1 stop:-1 fstout:flinflon/BWABHVI/ANGSD/Vphil_subsamp2k oldout:0 seed:-1 bootstrap:0 resample_chr:0 whichFst:0 fold:1 ref:(null) anc:(null)
	-> Will read chunks of size: 4096
	-> Reading: flinflon/BWABHVI/ANGSD/Vphil_subsamp2k.1dsfs assuming counts (will normalize to probs internally)
	-> Pxroblem with size of dimension of prior 31 vs 32

The error remains even if I fold the winsfs SFS first using winsfs view -f.

I do not get this problem (or should I say pxroblem) when I use realSFS to make the SFS like this. With the realSFS SFS, saf2theta successfully creates a thetas file.

ANGSDIR/misc/realSFS $SAF.saf.idx -fold 1 > $SFS.realsfs

Determining FST based on winSFS output

Hi there,

Thank you for implementing a much faster SFS method! I was just curious if you have suggestions for performing FST using the winSFS ouput. Will it work with the angsd implementation?

Best,
Chris

Higher dimensions

Currently, only up to three dimensions are supported. More was requested as part of #3 , and would probably be a good thing to have.

It shouldn't be particularly difficult to add higher dimensions, but it'll need some testing to make sure it looks reasonable, doesn't regress in the future, etc. I'll look at adding this in the near future.

bootstrapped SFS?

Hello!
Is there a way to bootstrap the SFS? (similar to realSFS's -bootstrap parameter?) Im hoping to use bootstrapped SFS to generate confidence intervals of predicted parameters like Ne and migration from moments.
Thanks!
James

Calculating Fst and plotting the values

I saw that there is now support for Fst calculation with the commit named "Add Fst". But I don't see how to get it from the program.

How to get the Fst values from 2 saf files from 2 populations and plot the results in R?

Also, when plotting the SFS values in R, is it required to remove the first and last values (e.g., barplot(winsfs[-c(1, length(winsfs))]), as this is what I've seen for the realSFS output)? (assume that the winsfs is the output of the program based on a saf file, the values are pasted below)

Folded:

winsfs = c(128459.496517, 37807.43237, 19595.541448, 13794.172787, 11914.055373, 
11591.036125, 10717.842869, 10659.416557, 10619.180001, 9878.407667, 
9693.903963, 9162.882716, 7863.178519, 9416.350486, 8264.15024, 
7746.684849, 7423.467346, 7521.058327, 7142.004797, 7026.281412, 
6516.114025, 6935.251862, 6729.684416, 6037.666072, 6045.952302, 
6472.325573, 6216.954428, 6035.96378, 5911.554285, 6503.866442, 
6635.114531, 10348.583203, 7124.833748, 5177.385221, 4065.965662, 
4468.497129, 3837.587458, 3920.093395, 4342.735254, 3935.650826, 
4049.177171, 3887.896187, 4008.457721, 4412.583478, 4270.295593, 
4998.608504, 3509.831612, 4812.221073, 4710.699949, 4670.200019, 
4608.496482, 4876.892382, 4554.280017, 5063.93489, 5305.382852, 
5178.426539, 6011.698312, 5845.579826, 6383.94782, 7191.321642, 
7156.951056, 8282.703951, 8816.750871, 9978.439809, 11285.247227, 
13067.576935, 24704.074097, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)

Thanks!

Option to resume aborted runs?

Hi thanks so much for adding 4d capabilities!!!! I have been trying it out on the same dataset that I was using for 3d, it seems to be working fine although taking a very looong time as you warned.
I recently did a 4d run with 16 threads and it took about three weeks (215:22:53:00 CPU) to get to 14 epochs, given it took 61 epochs for the 3d I am guessing I still have quite a bit of time before it will converge. Not sure how hard this would be, but any chance you could add a resume feature? We are limited to about a month of cluster time per job so a resume feature would be really helpful for us! :)

Here was the info on the run in case you are interested:
DEBUG [init] Using 16 threads for reading
INFO [init] Reading (intersecting) sites in input SAF files into memory
DEBUG [init] Found 216836 (intersecting) sites in SAF files with shape 73/65/55/133
DEBUG [init] Shuffling SAF sites
DEBUG [init] Using 500 blocks, the first 336 containing 434 sites and the remaining blocks containing 433 sites
DEBUG [init] Using window size of 100 blocks per window
DEBUG [init] Creating uniform initial SFS
DEBUG [stop] Stopping rule set to log-likelihood tolerance 0.0001 (default)

Banded SAF input

winsfs has been updated to work from the banded SAF representation that angsd now outputs by default, in addition to the old format. The internal representation, however, is not banded. That is, the values that are implicitly zero in the banded SAF files are filled in by explicit zeros when reading. This is correct, but it is inefficient in terms of run-time and memory usage for very large sample sizes.

I'll be adding an internal, banded SAF representation at some point to deal with this, and I'm opening this issue to track this work.

type parameters must be declared prior to const parameters

Hi Guys,

I am trying to install winsfs at our local Redhat cluster and I am experiencing some problems.
Everything goes well until this it starts to Compiling winsfs-core v0.1.0.

I am just posting the error messages below to check whether this is something you have seen before:

cargo install --git https://github.com/malthesr/winsfs --root /projects/mjolnir1/apps/winsfs Updating git repository https://github.com/malthesr/winsfs`
Installing winsfs-cli v0.6.0 (https://github.com/malthesr/winsfs#f1d0f0be)
Updating git repository https://github.com/malthesr/angsd-io.git
Updating crates.io index
Downloaded crossbeam-channel v0.5.6
Downloaded crossbeam-utils v0.8.11
Downloaded crossbeam-deque v0.8.2
Downloaded unicode-ident v1.0.3
Downloaded os_str_bytes v6.2.0
Downloaded hashbrown v0.12.3
Downloaded proc-macro2 v1.0.43
Downloaded clap v3.2.16
Downloaded syn v1.0.99
Downloaded quote v1.0.21
Downloaded libc v0.2.127
Downloaded clap_derive v3.2.15
Downloaded crossbeam-epoch v0.9.10
Downloaded 13 crates (1.5 MB) in 8.00s
Compiling autocfg v1.1.0
Compiling cfg-if v1.0.0
Compiling libc v0.2.127
Compiling version_check v0.9.4
Compiling once_cell v1.13.0
Compiling crossbeam-utils v0.8.11
Compiling proc-macro2 v1.0.43
Compiling cc v1.0.73
Compiling unicode-ident v1.0.3
Compiling quote v1.0.21
Compiling crc32fast v1.3.2
Compiling syn v1.0.99
Compiling rayon-core v1.9.3
Compiling scopeguard v1.1.0
Compiling adler v1.0.2
Compiling hashbrown v0.12.3
Compiling log v0.4.17
Compiling byteorder v1.4.3
Compiling ppv-lite86 v0.2.16
Compiling either v1.7.0
Compiling heck v0.4.0
Compiling os_str_bytes v6.2.0
Compiling bitflags v1.3.2
Compiling strsim v0.10.0
Compiling termcolor v1.1.3
Compiling textwrap v0.15.0
Compiling miniz_oxide v0.5.3
Compiling clap_lex v0.2.4
Compiling memoffset v0.6.5
Compiling crossbeam-epoch v0.9.10
Compiling indexmap v1.9.1
Compiling rayon v1.5.3
Compiling proc-macro-error-attr v1.0.4
Compiling proc-macro-error v1.0.4
Compiling simple_logger v2.2.0
Compiling crossbeam-channel v0.5.6
Compiling flate2 v1.0.24
Compiling libdeflate-sys v0.8.0
Compiling crossbeam-deque v0.8.2
Compiling getrandom v0.2.7
Compiling num_cpus v1.13.1
Compiling atty v0.2.14
Compiling rand_core v0.6.3
Compiling rand_chacha v0.3.1
Compiling rand v0.8.5
Compiling libdeflater v0.8.0
Compiling noodles-bgzf v0.12.0
Compiling clap_derive v3.2.15
Compiling angsd-io v0.1.0 (https://github.com/malthesr/angsd-io.git?rev=c9d36cd#c9d36cd6)
Compiling winsfs-core v0.1.0 (/home/jsd606/.cargo/git/checkouts/winsfs-06f8d3aa94bc2f69/f1d0f0b/winsfs-core)
error: type parameters must be declared prior to const parameters
--> winsfs-core/src/em/adaptors.rs:35:22
|
35 | impl<const N: usize, T, F, I> Em<N, I> for Inspect<T, F>
| -----------------^--^--^- help: reorder the parameters: lifetimes, then types, then consts: <T, F, I, const N: usize>

error: type parameters must be declared prior to const parameters
--> winsfs-core/src/em/adaptors.rs:49:22
|
49 | impl<const N: usize, T, F, R> StreamingEm<N, R> for Inspect<T, F>
| -----------------^--^--^- help: reorder the parameters: lifetimes, then types, then consts: <T, F, R, const N: usize>

error: type parameters must be declared prior to const parameters
--> winsfs-core/src/em/standard_em.rs:44:22
|
44 | impl<const N: usize, I> Em<N, I> for StandardEm
| -----------------^- help: reorder the parameters: lifetimes, then types, then consts: <I, const N: usize>

error: type parameters must be declared prior to const parameters
--> winsfs-core/src/em/standard_em.rs:53:22
|
53 | impl<const N: usize, R> StreamingEm<N, R> for StandardEm
| -----------------^- help: reorder the parameters: lifetimes, then types, then consts: <R, const N: usize>

error: type parameters must be declared prior to const parameters
--> winsfs-core/src/em/standard_em.rs:66:22
|
66 | impl<const N: usize, I> Em<N, I> for StandardEm
| -----------------^- help: reorder the parameters: lifetimes, then types, then consts: <I, const N: usize>

error: type parameters must be declared prior to const parameters
--> winsfs-core/src/em/window_em.rs:18:37
|
18 | pub struct WindowEm<const N: usize, T> {
| -----------------^- help: reorder the parameters: lifetimes, then types, then consts: <T, const N: usize>

error: type parameters must be declared prior to const parameters
--> winsfs-core/src/em/window_em.rs:17:10
|
17 | #[derive(Clone, Debug, PartialEq)]
| ^^^^^ help: reorder the parameters: lifetimes, then types, then consts: <T: ::core::clone::Clone, const N: usize>

error: type parameters must be declared prior to const parameters
--> winsfs-core/src/em/window_em.rs:17:17
|
17 | #[derive(Clone, Debug, PartialEq)]
| ^^^^^ help: reorder the parameters: lifetimes, then types, then consts: <T: ::core::fmt::Debug, const N: usize>

error: type parameters must be declared prior to const parameters
--> winsfs-core/src/em/window_em.rs:17:24
|
17 | #[derive(Clone, Debug, PartialEq)]
| ^^^^^^^^^ help: reorder the parameters: lifetimes, then types, then consts: <T: ::core::cmp::PartialEq, const N: usize>

error: type parameters must be declared prior to const parameters
--> winsfs-core/src/em/window_em.rs:24:22
|
24 | impl<const N: usize, T> WindowEm<N, T> {
| -----------------^- help: reorder the parameters: lifetimes, then types, then consts: <T, const N: usize>

error: type parameters must be declared prior to const parameters
--> winsfs-core/src/em/window_em.rs:40:22
|
40 | impl<const N: usize, T> EmStep for WindowEm<N, T>
| -----------------^- help: reorder the parameters: lifetimes, then types, then consts: <T, const N: usize>

error: type parameters must be declared prior to const parameters
--> winsfs-core/src/em/window_em.rs:47:22
|
47 | impl<const N: usize, I, T> Em<N, I> for WindowEm<N, T>
| -----------------^--^- help: reorder the parameters: lifetimes, then types, then consts: <I, T, const N: usize>

error: type parameters must be declared prior to const parameters
--> winsfs-core/src/em/window_em.rs:73:22
|
73 | impl<const N: usize, R, T> StreamingEm<N, R> for WindowEm<N, T>
| -----------------^--^- help: reorder the parameters: lifetimes, then types, then consts: <R, T, const N: usize>

error: type parameters must be declared prior to const parameters
--> winsfs-core/src/em.rs:37:32
|
37 | fn inspect<const N: usize, F>(self, f: F) -> Inspect<Self, F>
| -----------------^- help: reorder the parameters: lifetimes, then types, then consts: <F, const N: usize>

error: type parameters must be declared prior to const parameters
--> winsfs-core/src/em.rs:46:30
|
46 | pub trait Em<const N: usize, I>: EmStep {
| -----------------^- help: reorder the parameters: lifetimes, then types, then consts: <I, const N: usize>

error: type parameters must be declared prior to const parameters
--> winsfs-core/src/em.rs:95:39
|
95 | pub trait StreamingEm<const N: usize, R>: EmStep
| -----------------^- help: reorder the parameters: lifetimes, then types, then consts: <R, const N: usize>

error: type parameters must be declared prior to const parameters
--> winsfs-core/src/saf/iter/sites.rs:46:22
|
46 | impl<const N: usize, T> IntoSiteIterator for T
| -----------------^- help: reorder the parameters: lifetimes, then types, then consts: <T, const N: usize>

error: type parameters must be declared prior to const parameters
--> winsfs-core/src/saf/iter/sites.rs:98:22
|
98 | impl<const N: usize, T> IntoParallelSiteIterator for T
| -----------------^- help: reorder the parameters: lifetimes, then types, then consts: <T, const N: usize>

error: type parameters must be declared prior to const parameters
--> winsfs-core/src/lib.rs:49:43
|
49 | pub(crate) trait ArrayExt<const N: usize, T> {
| -----------------^- help: reorder the parameters: lifetimes, then types, then consts: <T, const N: usize>

error: type parameters must be declared prior to const parameters
--> winsfs-core/src/lib.rs:63:22
|
63 | impl<const N: usize, T> ArrayExt<N, T> for [T; N] {
| -----------------^- help: reorder the parameters: lifetimes, then types, then consts: <T, const N: usize>

error[E0658]: default values for const generic parameters are experimental
--> winsfs-core/src/em/standard_em.rs:22:39
|
22 | pub struct StandardEm {
| ^^^^^^^
|
= note: see issue #44580 rust-lang/rust#44580 for more information

error[E0658]: default values for const generic parameters are experimental
--> winsfs-core/src/sfs.rs:95:49
|
95 | pub struct Sfs<const N: usize, const NORM: bool = true> {
| ^^^^^^
|
= note: see issue #44580 rust-lang/rust#44580 for more information

error[E0658]: destructuring assignments are unstable
--> winsfs-core/src/saf.rs:453:38
|
453 | Ordering::Less => (i, j) = (j, i),
| ------ ^
| |
| cannot assign to this expression
|
= note: see issue #71126 rust-lang/rust#71126 for more information

Compiling clap v3.2.16
For more information about this error, try rustc --explain E0658.
error: could not compile winsfs-core due to 23 previous errors
warning: build failed, waiting for other jobs to finish...
error: failed to compile winsfs-cli v0.6.0 (https://github.com/malthesr/winsfs#f1d0f0be), intermediate artifacts can be found at /tmp/cargo-installf3ERZa

Caused by:
build failed`

error: invalid or unsupported SAF magic number

Hello,
I just installed winsfs on a cluster using the cargo thing and I ran into this error:
error: invalid or unsupported SAF magic number (found '[73, 61, 66, 76, 34, 00, 00, 00]', expected '[73, 61, 66, 76, 33, 00, 00, 00]')

The example (SAFA and B) ran perfectly.
I tried with different SAF files obtained from ANGSD (angsd version: 0.937-108-gbb2e2d7 (htslib: 1.14-9-ge769401)) in different populations and with different filters in ANGSD and I always got the same error. I googled but...

Note that I am not super confident in my dataset because 1) of some strange pattern in terms of depths (bumpy from 1 to 100 depth then going to the usual "normal" distribution like and 2) of the results of the realSFS which are also bumpy to some point (e.g. for one pop of 11 individuals: 477252479.919277 1275341.790921 1311721.463687 1150418.711882 1038184.248378 818146.03548 707177.638546 636220.983638 565128.098679 524291.968394 482128.873608 350877.267486 0 0 0 0 0 0 0 0 0 0 0 )
So I am increasing the iterations in realSFS (default parameter with 30 it and 500000 sites) but I also wanted to test whether winsfs gives different results... and that's where the error came in the game.

Thank you for any comment you may have.

600+ epochs for 3d SFS?

Hello I am trying to use winsfs to create 3d SFS and have so far been unable to get it to converge and output a sfs file.
winsfs -v -t 20 popAtlAst.out.saf.idx popAtlOc.out.saf.idx popGulfAst.out.saf.idx > popAtlAst_AtlOc_GulfAst.sfs
You mention that winsfs usually takes less than 10 epochs to converge (although perhaps this was only for 2d?)
I recently had an aborted run where it had hit
INFO [windowem] Finished epoch 619

Do you recommend adding the --max epoch parameter for 3d sfs?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.