Giter Club home page Giter Club logo

codons's Introduction

Downstream analysis code for the “Mammalian codon usage” manuscript

The code in this repository comprises the entirety of the analysis code for the manuscript “Mammalian codon usage”. After cloning the repository and installing the necessary dependencies, the analysis can be run using

make

Be warned that this may take quite a while. The supplementary material (tables and figures) can then be generated using

make supplements

Data

In order to run the code, the project data needs to be downloaded from Figshare and put directly into the folder data under the project root:

make download-data

Dependencies

The code has a number of dependencies. The following packages need to be installed manually from their respective sources (CRAN, Bioconductor or Github):

  • Biostrings, 2.36.1
  • DESeq2, 1.8.1
  • brew, 1.0.6
  • dplyr, 0.4.3.9000
  • ggbeeswarm, 0.3.0
  • ggplot2, 1.0.1
  • gplots, 2.17.0
  • gridExtra, 2.0.0
  • knitr, 1.10.5
  • lazyeval, 0.1.10.9000
  • magrittr, 1.5
  • methods, 3.2.1
  • modules, 0.8.2
  • pander, 0.5.2
  • parallel, 3.2.1
  • piano, 1.8.2
  • reshape2, 1.4.1.9000
  • rvest, 0.2.0.9000
  • tidyr, 0.3.1.9000
  • xlsx, 0.5.7
  • xml2, 0.1.9000

The code uses (pre-1.0) ‹modules›. The following modules need to be installed from Github:

Note to users: At the time of writing, ‹ebits› has not yet been published. Consequently, the above link unfortunately does not work, and the code in this project cannot be run directly.

However, while crucial to the project, ‹ebits› is merely a collection of general purpose programming tools; it does not contain logic pertaining to this project. Most of its uses are transparent and should not impact the understanding of the code. There is just two exceptions:

  • ‹ebits› introduces a new meaning for the operator ->, which is used liberally in the code. The code declares an anonymous function. These two are therefore equivalent:

    x -> x * 2
    function (x) x * 2

    And so are these:

    x ~ y -> x + y
    function (x, y) x + y
  • ‹ebits› introduces the operator %.% for function composition. Given two functions f and g, (f %.% g)(x) is equivalent to f(g(x)).

codons's People

Contributors

klmr avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

codons's Issues

Refactor DE code

Refactor differential expression calculation to unify code

  • Rewrite scripts/write-de-results to scripts/differential-expression, to be the equivalent of scripts/go-enrichment.
  • Rewrite Makefile rules to make te depend on de and generate necessary results for all.
  • Rewrite supplements.make rules to use the new script instead.
  • Rewrite codon-anticodon-correlation.rmd.brew to use calculated DE results instead of recalculating them, equivalent to its use of gsa results.
  • Decide where to calculate and save top 200 upregulated genes.
  • See whether supplements.make can be made redundant.

Fix file paths in `data` or `config`

File paths to data files should be specified via module_file to ensure that they work even when the scripts are run from a directory that is not codons. This can be trivially solved by using module_file.

Refactor Makefile

Change organisation such that each script creates a single output that constitutes a target in the Makefile. Then the flow will be (partially; other parts not affected);

workflow of new Makefile

  • Rewrite differential-expression script to create single output
  • Implement overexpressed-genes to take this as input, and create single output
  • Rewrite go-enrichment to create single output (and rename to reflect GSA …)
  • Rewrite codon-anticodon-correlation such that
    • It is a script rather than a notebook
    • It reads the inputs generated above, and creates a single output
    • It can be configured to work with tAI or wobble-correlation instead
  • Optionally create a new notebook taking this as input and generating progressive plots
  • Rewrite Makefile to implement the dependencies listed above

Performance issue in codon–anticodon TE analysis

Calculating the codon–anticodon comparison for different conditions (in particular the “mismatching” set) seems to have exponential memory requirement and runtime, and makes the code hard to run (fails on laptop).

This can probably be refactored “easily” by running the calculation for different sets sequentially in a loop rather than having them in the same data.frame simultaneously, and using dplyr grouped operations.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.