Giter Club home page Giter Club logo

ptsd_postgwas's Introduction

post GWAS analysis for PTSD project

Step 0:data collection.

The two cohorts were obtained from the PGC and FINNGEN databases.

Adjusting the data structure, effect allele frequency was manually calculated for PGC data. The formula is: (Frequency of coded allele in cases * Number of cases + Frequency of coded allele in controls * Number of controls)/(Number of cases + Number of controls). Example:

awk 'BEGIN {FS=OFS="\t"} NR==1 {header=$0; print $0; next} {Nca=$17; FRQ_A_4363=$6; Nco=$18; FRQ_U_10976=$7; if ((Nca + Nco) != 0) {af_alt = (Nca * FRQ_A_4363 + Nco * FRQ_U_10976) / (Nca + Nco)} else {af_alt = 0}; print $0, af_alt}' pts_lat_freeze2_overall.results > pts_lat_freeze2_overall.results.adj
sed -i '1s/$/\taf_alt/' pts_lat_freeze2_overall.results.adj

Simple adjustment of column names for FINNGEN data.

Step 1: meta analysis

Tool: Metal. References:METAL: fast and efficient meta-analysis of genomewide association scans.

The METAL software is designed to facilitate meta-analysis of large datasets (such as several whole genome scans) in a convenient, rapid and memory efficient manner.

metal metal.txt

The metal.txt file shows the specific parameters. SCHEME STDERR, weights effect size estimates using the inverse of the corresponding standard errors. AVERAGEFREQ ON, tracks the effect allele frequency across all files and report the mean effect allele frequency. TRACKPOSITIONS ON, track SNP genome positions. GENOMICCONTROL ON, automatically correct test statistics to account for small size. automatically correct test statistics to account for small amounts of population stratification or unaccounted for relatedness.

SCHEME STDERR
AVERAGEFREQ ON
TRACKPOSITIONS ON
GENOMICCONTROL ON

The following segment defines the parameters for processing genetic marker data from two datasets, which in this case are the FINNGEN and PGC PTSD data. It specifies marker identifiers, alleles, effect sizes, standard errors, p-values, allele frequencies, chromosome locations, and genomic positions for each dataset. Each column was filled in with its corresponding column in the database. Following data processing using these parameters, a meta-analysis is executed to synthesize findings across both datasets.

# Describe and process the SardiNIA input files
MARKER   rsids
ALLELE   ref alt
EFFECT   beta
STDERR   sebeta
PVAL     pval
FREQLABEL af_alt
CHROMOSOME chrom
POSITION pos

PROCESS FINNGEN

# Describe and process the SardiNIA input files
MARKER   SNP
ALLELE   A1 A2
EFFECT   log(OR)
STDERR   SE
PVAL     P
FREQLABEL FRQ_A_
CHROMOSOME CHR
POSITION BP

PROCESS PGCPTSD

# Execute meta-analysis
ANALYZE

Step 2: Post Meta-analysis GWAS

Tool: gwaslab. References:GWASlab.

gwaslab is a comprehensive tool designed for Genome-Wide Association Studies (GWAS) to perform statistical analyses, visualize genomic data in manhattan and qq plots, and interpret genetic associations.

import gwaslab as gl

To conduct a GWAS using the meta-analysis file, ensure that its columns correspond precisely to the parameters specified in gwaslab.

mysumstats =gl.Sumstats("METAANALYSIS1.TBL",
                        rsid="MarkerName",
                         ea="Allele1",
                         chrom='Chromosome',
                         pos="Position",
                         nea="Allele2",                        
                         beta="Effect",
                         se="StdErr",
                         p="P-value",
                         direction="Direction",
                         build="38")

Complete tutorial for next steps is found on https://cloufield.github.io/gwaslab/tutorial_3.4/. The p-value threshold used is 1e-6.

Step X: TwoSampleMR and colocalisation analysis

MR_and_coloc.R allows twosample MR as well as colocalization analysis based on SMR output. Additional information on the MAF value of the outcome data is required. all_outcome is a data frame with two columns: rsids and af_alt.

all_outcome<-data.table::fread('finngen/finngen_PTSD')
all_outcome<-data.frame(rsids=c(all_outcome$rsids),
                        af_alt=c(all_outcome$af_alt))

source('MR_and_coloc.R')

result <- MR_and_coloc('./testExi.ENSG00000182481.10.txt','PTSD','./test',
             2860,400000,37,all_outcome,'./clump/1kg.v3/EUR')

Parameters to be provided by MR_and_coloc include: smrfile: SMR output, trait_name: outcome name, e.g. "PTSD", plotpath: folder to save plots, n_qtl: number of samples of qtl data, n_ gwas: number of samples in gwas data, g_version: gene version (37 or 38) of qtl data.

ptsd_postgwas's People

Contributors

lbwfff avatar salemabiel avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.