dwinter / mmod Goto Github PK

View Code? Open in Web Editor NEW

11.0 4.0 5.0 8.74 MB

Differentiation statistics in R

License: Other

R 100.00%

mmod's Introduction

Modern Measures of Differentiation

mmod is an R package for calculating modern population divergence statistics.

Quickstart

Install

mmod is on CRAN, so you can install the latest stable version using install.packages("mmod"). This github repository may be running ahead of the version on CRAN, if you really want the latest version you can use devtools to install the code in thes repo:

library(devtools)
install_github("dwinter/mmod")

Usage

Once it's up an running all you need is genepop (or fstat) file with your data

    >library(mmod)
    >my_data <- read.genepop("my_file.gen")
    >diff_stats(my_data)

Overview

Population geneticists have traditionally used Nei's Gst (often confusingly called Fst...) to measure divergence between populations. It turns out, Gst doesn't really measure divergence so, [a set of new measures have been developed] (http://www.molecularecologist.com/2011/03/should-i-use-fst-gst-or-d-2/)

mmod is a package that brings two of these measures; Hedricks (2005, 2011) G''st Jost's (2008) D and Meirman's (2005) φ'st to R, along with a function that calculates Nei's Gst using nearly unbiased estimators for Hs and Ht (the two key parameters from which most of these stats are calculated). All these functions work on genind objects from the package adegenet so data can be read in from standard genepop for fstat files. An overview of a typical usage is provided in a vignette called "demo", acessable from vignette("demo", package="mmod"), I suggest new users read this before that start.

Help

All functions are documented and there is Vignette describing a basic usage of the pacakge.

mmod's People

Contributors

Stargazers

Watchers

Forkers

thierrygosselin zkamvar wonphoon-co guochengying-7824

mmod's Issues

Utilize [pop = ] accessor instead of repool for pairwise functions

In all of mmod's pairwise functions, you first split populations and then repool them for each calculation. Using the new [pop = ] accessor would be advantageous as it would prevent unnecessary repooling of the data.

Example:

pair <- function(index.a, index.b){
    a <- pops[[index.a]]
    b <- pops[[index.b]]
    temp <- repool(a,b)
    # do something with temp
}

could be turned into:

pair <- function(two.pops = c(index.a, index.b), pops){
    temp <- pops[pop = two.pops]
    # do something with temp
}

This way, you could pass the table of combinations to apply:

apply(allP, 2, pair, pops)

summarise_bootstrap(x,Gst_Hedrick) does not support missing values

When
bs <- chao_bootstrap(decoris.final, nreps = 1000)
is run, I am able to use
bs.D <- summarise_bootstrap(bs, D_Jost)
without any errors, but when the method is changed to Gst_Hedrick, the following error is generated:

bs.G <- summarise_bootstrap(bs, Gst_Hedrick)
Error in quantile.default(B, c(0.025, 0.975)) :
missing values and NaN's not allowed if 'na.rm' is FALSE

No object is saved during this process.

adegenet 2.0

Adegenet 2.0 is coming out.... and it breaks everything!

At first glance these changes seem to change the way H_s and Ht, and therefore every other thing, are calculated. Will need to get down to the brass tacks of what's going on

use nPop instead of length(seppop())

Hi. I noticed that in pairwise_fxn, you separate the populations and take the length of the list to figure out how many you need (these two lines). This is perhaps better done with the nPop() accessor:

n.pops <- nPop(x)

Cheers,
Zhian

Provide more methods/support for condidene intervals

At present we only provide a parametric boostrap for generating bootstrap samples and a percentile boostrap to summarize them. This can be a problem, because the parametric boostraps are often biased (as a result of very rare alleles in some population samples/ and unsampled alleles in the "wider" pops).

We shuold:

Include a resample-by-individual boostrap
Include a normal-method (i.e +/- twice the standard error of the BS distribution) for estimating a CI
Document these methods and their issues

pairwise_Gst_Nei

Hi there,

I was wondering if there was a way to calculate the significance (i.e., p-value) for pairwise_Gst_Nei?

I am confused by the online tutorial where I do understand that:

result <- chao_bootstrap(data, nreps = 10)
result_bootstrap <- summarise_bootstrap(result, Gst_Nei)

And the results are confidence intervals (by locus).

I would like to understand the differences between a set of populations and the significance of this difference.

Thank you for your help!

diff_stats(gi, phi_st = TRUE) error with large data set

Hi there,

I'm just using your diff_stats function to calculate phi_st values. I've tried using a SNP data set of ~18000 loci (a genind object filtered for monomorphs). It keeps producing the error "Error in dimnames(x) <- dn : length of 'dimnames' [1] not equal to array extent". If I drop the number of loci by heavily filtering with a MAF of 0.5 (leaves 18 loci) the function works fine. I was just wondering if this problem has been encountered before and if you might know a solution? Or perhaps it's not optimised to deal with large amounts of data?

Cheers

genind error

Hi,

I understand that multidna2genind is not part of mmod, but apex is no longer being maintained...so I thought perhaps you could advise me (as mmod relies on this function).

I am reading in fasta files from different genes from the same msa, but cannot for some reason build a genind object with them (see below). These are nucleotide alignments and their names have no "." in them.

Similary, if I try to build a genind object from a DNAbin object, I run into trouble (see below below).

Sorry to bother you with this, but any help would be appreciated.

#read alignments

files<-list.files(path='.',pattern='.fasta')
als<-read.multiFASTA(files)

#remove . from names

(setLocusNames(als) <- gsub(".fasta", "", getLocusNames(als)))

#build genind object

als.gid<-multidna2genind(als,mlst=T,gapIsNA=T)
Error in .local(.Object, ...) :
more than one '.' in column names; please name column as [LOCUS].[ALLELE]
In addition: Warning messages:
1: In df2genind(xdfnum, ploidy = 1, ind.names = x@labels) :
character '.' detected in names of loci; replacing with '_'
2: In df2genind(xdfnum, ploidy = 1, ind.names = x@labels) :
entirely non-type individual(s) deleted

#convert DNAbin to genind

al1<-read.FASTA(files[1])
al1

3535 DNA sequences in binary format stored in a list.

All sequences of same length: 225

Labels:
X534418
X534419
X049951
X123292
X123293
X226610
...

Base composition:
a c g t
0.209 0.200 0.187 0.404
(Total: 795.38 kb)

wm<-as.genind.DNAbin(al1,rep(c('A','B','C','D','E'),each=707))
Error in 1:dim(x)[1] : argument of length 0

dist.codom error

I am using the dist.codom function, but I am getting the following error:

Error in dimnames(x) <- dn :
length of 'dimnames' [1] not equal to array extent

I was able to succesfully upload my SNP dataset with the read.structure() and have been able to run succesfully other function within the package but not dist.codom. Any ideas as to what is happening?

Calculate bootstrap summary for pairwise differentiation statistics

Hi, I am wondering if it is possible to calculate a bootstrap summary for pairwise differentiation statistics (e.g. Nei's Gst). The purpose is to generate confidence intervals for each value generated for all (pairwise) combinations of populations in a genind object. After running:
boot10 <- chao_bootstrap(genind_one, nreps = 10)

I attempted to run:
boot10_pw <- summarise_bootstrap(boot10, pairwise_Gst_Nei)

However, I received an error:
"Error in stats["per.locus", ] : no 'dimnames' attribute for array"

Is this analysis possible to do in the mmod package, or can bootstrap summaries only be performed per locus and then summarized over all loci, disregarding the population structure?
Thank you!

D_Jost(x, hsht_mean = "arithmetic") gives the wrong arithmetic mean.

D_Jost(x, hsht_mean = "arithmetic") gives a wrong D value. It is same as the harmonic mean produced by D_Jost(x, hsht_mean = "harmonic").