Giter Club home page Giter Club logo

mmod's Introduction

Travis-CI Build Status Current CRAN Release Coverage Status

Modern Measures of Differentiation

mmod is an R package for calculating modern population divergence statistics.

Quickstart

Install

mmod is on CRAN, so you can install the latest stable version using install.packages("mmod"). This github repository may be running ahead of the version on CRAN, if you really want the latest version you can use devtools to install the code in thes repo:

library(devtools)
install_github("dwinter/mmod")

Usage

Once it's up an running all you need is genepop (or fstat) file with your data

    >library(mmod)
    >my_data <- read.genepop("my_file.gen")
    >diff_stats(my_data)

Overview

Population geneticists have traditionally used Nei's Gst (often confusingly called Fst...) to measure divergence between populations. It turns out, Gst doesn't really measure divergence so, [a set of new measures have been developed] (http://www.molecularecologist.com/2011/03/should-i-use-fst-gst-or-d-2/)

mmod is a package that brings two of these measures; Hedricks (2005, 2011) G''st Jost's (2008) D and Meirman's (2005) ฯ†'st to R, along with a function that calculates Nei's Gst using nearly unbiased estimators for Hs and Ht (the two key parameters from which most of these stats are calculated). All these functions work on genind objects from the package adegenet so data can be read in from standard genepop for fstat files. An overview of a typical usage is provided in a vignette called "demo", acessable from vignette("demo", package="mmod"), I suggest new users read this before that start.

Help

All functions are documented and there is Vignette describing a basic usage of the pacakge.

mmod's People

Contributors

dwinter avatar pitakakariki avatar thierrygosselin avatar zkamvar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

mmod's Issues

Utilize [pop = ] accessor instead of repool for pairwise functions

In all of mmod's pairwise functions, you first split populations and then repool them for each calculation. Using the new [pop = ] accessor would be advantageous as it would prevent unnecessary repooling of the data.

Example:

pair <- function(index.a, index.b){
    a <- pops[[index.a]]
    b <- pops[[index.b]]
    temp <- repool(a,b)
    # do something with temp
}

could be turned into:

pair <- function(two.pops = c(index.a, index.b), pops){
    temp <- pops[pop = two.pops]
    # do something with temp
}

This way, you could pass the table of combinations to apply:

apply(allP, 2, pair, pops)

summarise_bootstrap(x,Gst_Hedrick) does not support missing values

When
bs <- chao_bootstrap(decoris.final, nreps = 1000)
is run, I am able to use
bs.D <- summarise_bootstrap(bs, D_Jost)
without any errors, but when the method is changed to Gst_Hedrick, the following error is generated:

bs.G <- summarise_bootstrap(bs, Gst_Hedrick)
Error in quantile.default(B, c(0.025, 0.975)) :
missing values and NaN's not allowed if 'na.rm' is FALSE

No object is saved during this process.

adegenet 2.0

Adegenet 2.0 is coming out.... and it breaks everything!

  • At first glance these changes seem to change the way H_s and Ht, and therefore every other thing, are calculated. Will need to get down to the brass tacks of what's going on

use nPop instead of length(seppop())

Hi. I noticed that in pairwise_fxn, you separate the populations and take the length of the list to figure out how many you need (these two lines). This is perhaps better done with the nPop() accessor:

n.pops <- nPop(x)

Cheers,
Zhian

Provide more methods/support for condidene intervals

At present we only provide a parametric boostrap for generating bootstrap samples and a percentile boostrap to summarize them. This can be a problem, because the parametric boostraps are often biased (as a result of very rare alleles in some population samples/ and unsampled alleles in the "wider" pops).

We shuold:

  • Include a resample-by-individual boostrap
  • Include a normal-method (i.e +/- twice the standard error of the BS distribution) for estimating a CI
  • Document these methods and their issues

pairwise_Gst_Nei

Hi there,

I was wondering if there was a way to calculate the significance (i.e., p-value) for pairwise_Gst_Nei?

I am confused by the online tutorial where I do understand that:

result <- chao_bootstrap(data, nreps = 10)
result_bootstrap <- summarise_bootstrap(result, Gst_Nei)

And the results are confidence intervals (by locus).

I would like to understand the differences between a set of populations and the significance of this difference.

Thank you for your help!

diff_stats(gi, phi_st = TRUE) error with large data set

Hi there,

I'm just using your diff_stats function to calculate phi_st values. I've tried using a SNP data set of ~18000 loci (a genind object filtered for monomorphs). It keeps producing the error "Error in dimnames(x) <- dn : length of 'dimnames' [1] not equal to array extent". If I drop the number of loci by heavily filtering with a MAF of 0.5 (leaves 18 loci) the function works fine. I was just wondering if this problem has been encountered before and if you might know a solution? Or perhaps it's not optimised to deal with large amounts of data?

Cheers

genind error

Hi,

I understand that multidna2genind is not part of mmod, but apex is no longer being maintained...so I thought perhaps you could advise me (as mmod relies on this function).

I am reading in fasta files from different genes from the same msa, but cannot for some reason build a genind object with them (see below). These are nucleotide alignments and their names have no "." in them.

Similary, if I try to build a genind object from a DNAbin object, I run into trouble (see below below).

Sorry to bother you with this, but any help would be appreciated.

#read alignments

files<-list.files(path='.',pattern='.fasta')
als<-read.multiFASTA(files)

#remove . from names

(setLocusNames(als) <- gsub(".fasta", "", getLocusNames(als)))

#build genind object

als.gid<-multidna2genind(als,mlst=T,gapIsNA=T)
Error in .local(.Object, ...) :
more than one '.' in column names; please name column as [LOCUS].[ALLELE]
In addition: Warning messages:
1: In df2genind(xdfnum, ploidy = 1, ind.names = x@labels) :
character '.' detected in names of loci; replacing with '_'
2: In df2genind(xdfnum, ploidy = 1, ind.names = x@labels) :
entirely non-type individual(s) deleted

#convert DNAbin to genind

al1<-read.FASTA(files[1])
al1

3535 DNA sequences in binary format stored in a list.

All sequences of same length: 225

Labels:
X534418
X534419
X049951
X123292
X123293
X226610
...

Base composition:
a c g t
0.209 0.200 0.187 0.404
(Total: 795.38 kb)

wm<-as.genind.DNAbin(al1,rep(c('A','B','C','D','E'),each=707))
Error in 1:dim(x)[1] : argument of length 0

dist.codom error

I am using the dist.codom function, but I am getting the following error:

Error in dimnames(x) <- dn :
length of 'dimnames' [1] not equal to array extent

I was able to succesfully upload my SNP dataset with the read.structure() and have been able to run succesfully other function within the package but not dist.codom. Any ideas as to what is happening?

Calculate bootstrap summary for pairwise differentiation statistics

Hi, I am wondering if it is possible to calculate a bootstrap summary for pairwise differentiation statistics (e.g. Nei's Gst). The purpose is to generate confidence intervals for each value generated for all (pairwise) combinations of populations in a genind object. After running:
boot10 <- chao_bootstrap(genind_one, nreps = 10)

I attempted to run:
boot10_pw <- summarise_bootstrap(boot10, pairwise_Gst_Nei)

However, I received an error:
"Error in stats["per.locus", ] : no 'dimnames' attribute for array"

Is this analysis possible to do in the mmod package, or can bootstrap summaries only be performed per locus and then summarized over all loci, disregarding the population structure?
Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.