Pipeline for consistent and standardized simulation of admixed genotypes. Creates simulated genotypes as well as pruned and thinned genotypes for benchmarking at different snp levels. Based off of the code run by Angela Andaleon which in turn uses the admixture simulation tool created by the makers of RFMix.
This pipeline has been made mostly to maintain consistency for ease of benchmarking different admixture analysis softwares. Adsim pipeline first creates simulated genotypes using the admixture simulation tool. From there these can be consistently pruned and or thinned as the user desires and as different software demands. This makes it easy to benchmark at different numbers of snps. Note, adsim is designed to prune and thin each input population independently. This results in different snps across each files. That said, it is fairly trivial to analyze them jointly by concatonating vcf files and pruning/thinning from there.
Pipeline is tested on publicly available data from 1000 genomes
Pipeline runs in two steps
- Simulate the population for a particular chromosome
- Prune and/or thin the simulated genotypes to a certain number of snps
This pipeline is only designed to run on one chromosome at a time. If you wish to run it on multiple chromosomes you'll either have to run it multiple times or edit the pipeline accordingly. This pipeline has been designed with the intention of running each population individually, resulting in different snps appearing in each of your pruned/thinned files. In theory, however, it should be simple to keep your snp list consistent across all populations by combining each into one vcf file, pruning/thinning from there, and finally separating the vcf file by population.
- Admixture simulation tool
- plink
- vcftools
- awk
- sed
- tabix
At testing all software is run on a linux machine running ubuntu