Giter Club home page Giter Club logo

rice-gwas's Introduction

Rice-GWAS

Welcome! This repo contains multiple GWASs performed on processed SNP data, following the analysis found in my Rice-PCA-SNPs repo. This workflow was adapted from the course BIS 180L Genomics Laboratory at UC Davis, and the SNPs examined were derived from a bulk RNA seq experiment on different strains of rice.

File Structure


This repo contains three directories: input, output, and scripts. input and output contain input and output data to be processed/retrieved by different programs (usually stored locally). scripts contains an .Rmd file and the associated html version which can be used to view all inputs and outputs of the code used in this analysis. Also in scripts is an R script that contains all R in this analysis. The workflow in this repo was performed entirely with R.

Workflow Overview


As mentioned, this project makes use of data previously gathered/processed in my Rice-PCA-SNPs repo: the SNPs, principle components, population assignments (by fastStructure), and associated phenotype data. Then the PCs, population assignments, and phenotype data were all joined to one object.

I chose the trait seed length as my trait of interest and next examined some of the variation in seed length in the dataset. I first produced some visualizations of seed length data like those shown below:

total seed length histogram seed length faceted by region

After, the means and standard error of the means for seed length for each region were calculated, and used to perform an ANOVA. This test revealed that there are significant differences in mean seed length by region (p-value 1.81e-12). Unfortunately (but not unsurprisingly), mean seed length also significantly varied depending on fastStructure population assignment (ANOVA, p-value 8.8e-15) -- suggesting that in a GWAS population structure would need to be taken into consideration.

To use statgenGWAS to perform GWASs, four data frames (and an optional 5th) are required:

  1. genotype data which contains the genotype at each SNP for each individual/strain
  2. genotype map that contains the chromosome and position numbers of each SNP (SNPs in this data frame must be in the same order as the latter)
  3. a data frame with phenotype data
  4. a kinship matrix
  5. (optional) a covariate data frame that can be used to estimate population structure

After each data frame is prepared, they are brought together into one .gdata object which is subsequently used to perform GWAS. The genotype data also must be recoded, redundant SNPs must be removed, and missing values replaced. statgenGWAS has a function that does this.

In this analysis three GWAS were run:

  1. No pop. structure correction
  2. PCs used for pop. structure correction
  3. Kinship matrix used for pop. structure correction

For your viewing pleasure, I've included the qq and manhattan plots from each analysis below.

No pop. structure correction

qq plot manhattan plot

PCs as pop. structure correction

qq plot manhattan plot

Kinship matrix as pop. structure correction

qq plot manhattan plot

By comparing the qq and manhattan plots from each GWAS, it appears that in this case using a kinship matrix was the best way to control for population structure as it resulted in the fewest significant SNPs and the best looking qq plot.

rice-gwas's People

Contributors

aangush avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.