Giter Club home page Giter Club logo

otu_phylogeny's Introduction

OTU_Phylogeny

Scripts used for pipeline analyzing effects of species boundaries on community diversity.

Intraspecific_SeqError_Sim.R codes the function used to append branches onto existing phylogenies, using sim.bd.tree (from the R package ape).

MPDvsDelta.R codes for function transforming phylogenies based on Pagel's delta and (given the corresponding comparative.comm object) calculate the MPD of transformed outputs, given a specified vector of values.

SimulateCommunity.R is the primary script, calling the above two commands and sim.meta.phy.comm (or sim.comm, for non-ultrametric scenarios) to simulate communities/phylogenies, and to transform phylogenies using phy.d.transform. Communities/phylogenies modeling populations (intra) as well as random sequencing error (seq) are generated from the original community/phylogeny, and are also transformed. Also in this script is the abundance.mapping function, which transfer community abundances from original community to communities with populations and sequencing error.

The object generated from a single simulation instance contains 3 community matrices, 3 phylogenies, 3 MPD.matrices (i.e. MPD values recorded after delta transforms, for original phylogenies, phylogenies with intraspecific error, and phylogenies with sequencing error), and a vector of original MPD values prior to any branch additions or transformations.

wrapper.R wraps up these prior scripts and calls them in a simple script (i.e. source(MPDvsDelta.R). It then iterates the SimulateCommunity function given specified simulation parameters (the params variable) using mcMap (to utilize multiple cores). Currently, the only parameters being altered are intraspecific and sequencing error birth/death rates, and the initial number of species in the community.

MeasureShiftingDiversity.R analyzes the output generated from wrapper.R (which is stored in an .Rdata file). It contains the code for calculating 2 different response metrics from simulation data: correlations and rank shifts. Linear models are used to measure the effects of delta and other simulation parameters on each response metric. This script also contains code for plotting.

otu_phylogeny's People

Contributors

akoontz11 avatar

Watchers

Will Pearse avatar

otu_phylogeny's Issues

Example issue

  • You can keep track
    • Of tasks to be done
    • And sub-tasks too
  • In this sort of format
  • Which is great, because you can click 'done' on a sub-task when it's done
  • ...which feels fantastic, I can tell you!

Updating analysis to compare against untransformed phylogenies

I wanted to give an update on what I've been doing to update the code to compare against untransformed phylogenies.

Currently, I'm encountering issues with the worker function I'm sapplying to the simulation data. The original worker function just correlated to the 10th column (i.e. delta = 1.0) of the matrix of MPD values supplied as an argument to that function. That would typically look like

> data[[1]][,10]
   Site_1    Site_2    Site_3    Site_4    Site_5    Site_6    Site_7    Site_8    Site_9   Site_10 
30.750000 26.240000 48.000000 13.058985 38.437500 22.777778 40.163265 43.911243 49.603025 36.207500 
  Site_11   Site_12   Site_13   Site_14   Site_15   Site_16   Site_17   Site_18   Site_19   Site_20 
37.950413 60.173333 41.497041 54.294785 22.777778 39.861111 21.191111 38.437500 53.330000 40.493827 
  Site_21   Site_22   Site_23   Site_24   Site_25   Site_26   Site_27   Site_28   Site_29   Site_30 
41.000000 64.229492 35.483929 36.444444 36.384088 41.000000 38.816568 54.814404 12.187500 52.331633 
  Site_31   Site_32   Site_33   Site_34   Site_35   Site_36   Site_37   Site_38   Site_39   Site_40 
58.360285 28.126200 35.434903 51.504132 51.890625 37.950413 40.163265 33.586777 36.444444 52.714286 
  Site_41   Site_42   Site_43   Site_44   Site_45   Site_46   Site_47   Site_48 
52.382500 54.613281  2.000000 41.000000 36.444444 41.000000  8.816327  8.265306

My new worker function instead takes a vector of MPD values of the original untransformed phylogeny as a 2nd argument to the function, and correlates against that. That vector looks identical to the matrix column shown above, and using the new worker function works when I'm comparing just a single set of values.

But when I try to sapply the new worker function, I get an error reading "incompatible dimensions". I feel like this may be happening because I'm trying to sapply a function that takes multiple arguments, and I'm doing that incorrectly, but I'm not sure.

I'm still working on this, and considering if there's a means to do all of this in a more straightforward way. But I wanted to let you know where I was at.

Open Items

  • Community simulation function (takes simulation parameters as input, outputs mpd vs. delta matrices)
    • Simulates community
    • Adds populations
    • Adds sequencing error
    • Distributes original species abundances over matrices for each of those
    • Calculates phylogenetic diversity over transformation values
  • Measurement of PD over different communities simulated
    • Comparison of different communities to their mpd values at a delta = 1
  • Wrap up above two components in a loop

MeasureShiftingDiversity.R

I've uploaded some initial code for measuring how mpd of sites is changing over delta transformations. Right now, what I'm doing is plotting (here) mean change in ranking of sites at a certain transformation value versus difference between that transformation value and the "baseline" value (delta = 1.0). The curve for this is indicating that fractional delta values generate the largest differences in mpd measures, which saturate after passing the baseline value. Something about this approach feels...not correct, but I can't really say what. Feel free to weigh in.

I should note that this isn't comparing the mpd of the original community (prior to delta transform = 1)--that still needs to be incorporated. I'm going to work using the correlation coefficient next.

Issues in community simulation

I wanted to mention two issues I'm still seeing in the SimulateCommunity.R code:

  1. The sites in my "baseline" simulated community show variable behavior when I perform the delta transform on it (here). Basically, sometimes the curves show this asymptotic behavior, and sometimes they are straight lines. I'm still trying to figure out what's causing the difference between then two (and if I need to fix it...it seems like it shouldn't be this way).

  2. When I perform the delta transformation/capture mpd values for the intra.comm (here) and the seq.comm (here), the corresponding mpd matrices that are returning have a lot of NAs in them. I haven't figured out what's causing this, but suspect it's connected to the sites in each of these communities that have a lot of 0s.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.