Learning Natural Selection from the Site Frequency Spectrum
This repo includes two main programs:
- SFselect.py -- a standalone program for applying pre-trained SVMs of the site frequency spectrum (SFS) to allele frequency data. The output is for each sliding genomic window a probability under the model that the window is evolving under a sweep.
For more details on using SFselect.py, see http://bioinf.ucsd.edu/~rronen/sfselect.html
-
SFselect_train.py -- a program for training SVMs of the site frequency spectrum (SFS) to classify regions evolving neutrally from those evolvign under a hard selective sweep. This program requires as input simulated population data (can be generated by simulators like ms, msms, etc). See 'params.py' for setting the simulation parameters (from which the data file names are constructed, among other things).
-
To run it, you have to install sklearn v0.13, by running
sudo pip install -Iv https://pypi.python.org/packages/source/s/scikit-learn/scikit-learn-0.13.tar.gz#md5=8d6029f668a330aded7afe5df18df4dc
###Dependencies numpy, matplotlib, scikits-learn (tested with v0.13)