stevemussmann / bayesass3-snps Goto Github PK
View Code? Open in Web Editor NEWModification of BayesAss 3.0.4 to allow handling of large SNP datasets
License: GNU General Public License v3.0
Modification of BayesAss 3.0.4 to allow handling of large SNP datasets
License: GNU General Public License v3.0
Dear Steve,
I am using BayesAss-SNPs to estimate migration rates of 6 populations of seabirds. Other methods indicate a lot of migration among the studied populations. It seems that my posterior distributions of m[i,i] get stacked at the lower limit of 0,67. I wonder if it would be possible to relax the prior constrains on m values. Could you help me?
Regards,
Mariana Mazzochi
Hello,
I am working with a RADseq dataset consisting of ~5,200 SNPs, ~750 samples, and 14 populations. I am running BayesAss on my university's HPC cluster which unfortunately only allows jobs to run for a maximum time of 1 week before cancelling them. In that amount of time I can only run about 6 M iterations but some of my pairwise migration models are not reaching convergence within that time. I read in some papers that said they ran multiple BayesAss runs and then "merged" them together to increase the number of total iterations.
My question is can I launch BayesAss to run beginning at the state that a previous run left off on? Or any other recommendations of how to increase the number of iterations so that my models hopefully reach convergence?
Any recommendations are greatly appreciated!
Hi there,
I am currently working on microsatellite data consisting of 4 populations with 16 loci. I have previously ran BayesAss on the same data but for 3 populations and it worked perfectly. However, I am not unable to run the data for 4 populations as I am getting an error message stating:
gsl: ../gsl/gsl_rng.h:200: ERROR: invalid n, either 0 or exceeds maximum value of generator. Default GSL error handler invoked.
I have troubleshooted the program and changed the number of seeds but have still gotten this error. I am aware that is not SNP data but was hoping you might have a solution on how to fix this.
Below is the attached file.
Regional_dataset.txt
Kind regards,
Leah.
Hi Steve,
I have encountered a problem as shown in the below picture during install the BayesAss3-SNPs. As I am using CentOs of Linux, GNU Science Library (GSL) has been installed through gsl-latest.tar.gz, and gsl_sf_gamma.h exists in the gsl/include/gsl_sf_gamma.h.
Do you know the potential problems?
Thanks
Li-sha
Hi Steve,
Thank you for putting this program out there!
However, we have a problem trying to compile the program on a university computer cluster (Linux). The cluster manager forwarded me the error, he was using gcc 4.9.3 when compiling. I hope you can help.
The error:
g++ -O3 -Wall -c main.cpp
main.cpp: In function ‘int main(int, char**)’:
main.cpp:1153:19: warning: narrowing conversion of ‘l’ from ‘unsigned int’ to ‘char’ inside { } is ill-formed in C++11 [-Wnarrowing]
char targ[]={l}, targ2[]={j};
^
main.cpp:1153:32: warning: narrowing conversion of ‘j’ from ‘unsigned int’ to ‘char’ inside { } is ill-formed in C++11 [-Wnarrowing]
char targ[]={l}, targ2[]={j};
^
main.cpp:1174:29: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
if(iterAllele->second == sampleIndiv[l]->getAllele(j,m))
^
main.cpp: In function ‘void readInputFile(Indiv**, unsigned int&, unsigned int&, unsigned int&, unsigned int*, std::string&, int)’:
main.cpp:1257:7: warning: unused variable ‘lastChar’ [-Wunused-variable]
char lastChar='a', nextChar;
^
main.cpp:1257:21: warning: variable ‘nextChar’ set but not used [-Wunused-but-set-variable]
char lastChar='a', nextChar;
^
main.cpp: In function ‘void getEmpiricalAlleleFreqs(double***, Indiv**, unsigned int*, unsigned int, unsigned int, unsigned int, bool)’:
main.cpp:1505:42: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
if (sampleIndiv[m]->getAllele(j,0) == k)
^
main.cpp:1508:42: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
if (sampleIndiv[m]->getAllele(j,1) == k)
^
main.cpp: In function ‘void parseComLine(int, char**, std::string&, int&, unsigned int&, unsigned int&, unsigned int&, std::string&, double&, double&, double&, bool&, bool&, bool&, bool&, bool&, bool&, int&)’:
main.cpp:1735:47: error: ‘class boost::program_options::typed_value<std::basic_string, char>’ has no member named ‘required’
("file,F", opt::value(&infileName)->required(), "Specify input file")
^
main.cpp:1736:41: error: ‘class boost::program_options::typed_value<int, char>’ has no member named ‘required’
("loci,l", opt::value(&MAXLOCI)->required(), "Specify number of loci in input file")
^
main.cpp:1768:13: error: ‘required_option’ in namespace ‘opt’ does not name a type
catch(opt::required_option& e) //catch errors resulting from required options
^
main.cpp:1770:42: error: ‘e’ was not declared in this scope
std::cerr << std::endl << "ERROR: " << e.what() << std::endl << std::endl;
^
make: *** [main.o] Error 1
Many thanks,
Milaja
Hello,
I'm using BayesAss3-SNPs with about 17k loci and 4 populations. I'm having a couple of issues.
BA3-SNPS -F ./BayesAss_Input.immanc -l 17129 -i 50000000 -n 5000 -b 10000000 -o BA3out_Run1.txt -m 0.3 -a 0.9 -f 0.1 -u -g -t -v 2>&1 | tee Run1.log
Thank you.
Hello,
I did a run with 10M generations and 1M burnin, and I'm trying to adjust parameters for better acceptance rates. I have the following output:
Line 1:
logP(M): -150.64 logL(G): -3384698.26 logL: -3384848.90 % done: [0.00] % accepted: (0.26, 0.00, 0.23, 0.00, 0.00)
Line 1010:
logP(M): -160.69 logL(G): -3614923.20 logL: -3615083.89 % done: (1.00) % accepted: (0.36, 0.00, 0.33, 0.22, 0.00)
My questions are:
Thank you.
Dear Dr. Mussmann,
I hope everything is fine with you. I would like to ask you for some help in using Bayesass3-SNPs, please.
I've been trying to run this on Ubuntu on a dataset of four species, with 513, 148, 328 and 248 SNPs respectively. I've been trying to adjust the parameters 1, 3 and 4 to get accaptance rates between 20-60%, by increasing the proposal step lengths, as recommended in the manual, however for three of the species, mixing parameters continue higher than 60% even after increasing step size up until 1,00. We checked for MCMC diagnostics using trace plots, and it seems fine.
For those three species, one of them presented a genetic structure while the other two did not have structured populations. For all three species, migrations rates among localities were very low.
Would you say these results can be confiable? Is there any other way that I could adequately adjust mixing parameters in order to get better results?
This was the final command line I used for the analysis:
BA3-SNPS -v -F Pinctada_bayesass.immanc -l 328 -m 1.00 -a 1.00 -f 1.00 -t -s 100 -i10000000 -b1000000
Thank you very much!
Best regards,
Pedro
I am new to the BA3 and I feel the inputs for this software should be all bi-allelic loci. Am I correct?
Hi Steve,
I think I have identified a fairly significant parsing error. It appears that when parsing the source population for each sample, it takes the population from the previous sample instead. For example, if individual_1 was sampled from population 0 and the next individual_2 was sampled from population 1, the program appears to say that individual_2 was actually sampled from population 0, often leading to this individual now being classified as a first generation migrant from pop 1 to pop 0 (which of course, isn't the case, it was sampled from pop 1). I have attached some example outputs that I have gotten, including the script I ran, the input file, output .indiv.txt file, and identified first generation migrants (.1migs) with their true source population on each line. Let me know if you need anymore information or can think if something has gone wrong on my end.
Thanks,
Bryson
example_files.zip
Hi there, thanks for developing this - it's really helpful.
I've been trying to run this on Ubuntu on a dataset of 1000 loci using the following command:
./BayesAss3-SNPs/BA3-SNPS-Ubuntu64 -t -s 321 -i 16000000 -b 8000000 -l 1000 -o ba3 _seed321_9aug.out -F bayesassout9aug1000loci.inp -v -a0.25 -f0.25 -m0.25
I've then been trying to optimize the mixing by modifying the last 3 parameters. However my results always show the individual migrant ancestries never increase from 0 (unless I use the -p parameter).
logP(M): -703.83 logL(G): -97057.71 logL: -97761.54 % done: [0.03] % accepted: (0.16, 0.00 0.38, 0.14, 0.62)
I'm not sure how to fix this, I have attached my input file if it helps.
bayesassout9aug1000loci.txt
Any suggestions on how to fix this? I have tried many different variations of the last 3 parameters in my command and nothing seems to change it.
Hi Steve,
I am trying BayesAss-SNPs right now and it works fine as I feel (just started and showing only % done).
I am wondering if there is BA3_SNPs_autotune inside of BayesAss-SNPs as you mentioned in the MEE paper. If not, how should I know about the performance of the parameters, or I should also set it by myself? Thank you very much!
Best regards,
Han Xiao
Hello,
I'm running BA3 using parallel, but it is using only 2 threads even when I specify -j 20. Is there a better way to parallelize?
Thank you.
Hi, do you have any suggestions (or scripts) on how to remove loci where data is missing for all individuals. I am having an issue running the program with my input file and believe this is the culprit.
`>BA3-SNPS -v -i100000000 -b1000000 -t -g -u -l 21292 -F wgenome_20_BA3.txt
BA3-SNPS Version 1.1 (BA3-SNPS)
Released: 07/11/2019
Steven Mussmann
Department of Biological Sciences at U. of Arkansas
Modified from BayesAss Version 3.0.4
Bruce Rannala
Department of Evolution and Ecology at UC Davis
Please cite: Wilson & Rannala (2003). Bayesian Inference of recent
migration rates using multilocus genotypes. Genetics 163:1177-1191.
Please also cite: Mussmann, Douglas, Chafin, & Douglas (2019). BA3-
SNPs: Contemporary migration reconfigured in BayesAss for next-
generation sequence data. Methods in Ecology and Evolution.
Made new Indiv object
Going to read input file
Setting alleles
pop_1 0
Read input file
At least one locus may contain no data for all samples in your input file.
gsl: ../gsl/gsl_rng.h:200: ERROR: invalid n, either 0 or exceeds maximum value of generator
Default GSL error handler invoked.
Abort trap: 6
`
Hello,
I'm looking forward to running this software, however, I'm having difficulty with the installation. I'm installing via Docker to ensure I have all the dependencies, and this is my first time using this program. Everything seems to have gone correctly, using pull
to install, then executing runDocker.sh
. This does indeed create a data
folder.
However, I'm confused about the line in the ReadMe, "In addition to BA3-SNPs, the countLociImmanc.sh script, BA3-SNPS-autotune, and file conversion scripts (stacksStr2Immanc.pl and pyradStr2Immanc.pl) are also installed in your $PATH in this container." I don't see these scripts in the 'data' folder where I executed runDocker.sh, or the /apps/data
folder. If they are installed elsewhere, they don't seem to be accessible on my path. I ran /countLociImmanc.sh -f autosomal_genstr_filtered_maxMeanDP49_rmRelated.immanc
from both data
folders mentioned above and got the error:
./countLociImmanc.sh: No such file or directory
Should I be installing these scripts separately? I thought these would be installed using Docker, but I'm new to this method for installing programs. I sincerely appreciate any help you can provide!
Quinn
Hi Steve,
I am wondering if it is possible to parallelize Bayesass. I am working in a cluster with several nodes and it would be awesome to could use the total power of the machine to run this program. Usually, I wait 1 day to obtain results from a single running and maybe the parameters I used were wrong.
Do you know if the program could run in several cores at the same time? Or do you have some solution to improve running times?
Thanks
Diego
Hi Steve,
Thanks for this contribution.
I tried running BayesAss3-SNPs on the 2-populations example file from BA3 on a linux cluster using one cpu with 5G memory, but I get a segmentation error:
~/BayesAss3-SNPs-master/BA3-SNPS -F ~/BA3-3.0.4/examples/2pop.txt -l 9
BayesAss Edition 3.0.4 (BA3)
Released: 09/28/2015
Bruce Rannala
Department of Evolution and Ecology at UC Davis
Made new Indiv object
Going to read input file
Setting alleles
Segmentation fault (core dumped)
What am I missing?
Best,
Patricia
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.