ilarsf / gwastools Goto Github PK

View Code? Open in Web Editor NEW

29.0 3.0 7.0 1.21 MB

Basic and fast GWAS functions for QQ and Manhattan plots (incl. gene names)

R 100.00%

gwas rscript qq manhattan-plot genome-analysis snps power plotting optparse

gwastools's People

Contributors

Stargazers

Watchers

Forkers

bnwolford quanrd wangbinhu jzhan211 jtnedoctor amazingshi jiangweixia

gwastools's Issues

ManhattanPlot.R: NCBI annotation not available

Hi,

For a collaboration project, we need to make Manhattan plots from our GWAS data. I have been able to use the Rscript ManhattanPlot.r and that given the following message.

Please run the following code line by line in an interactive R session:
library(Map2NCBI)
library(data.table)
file.genelist37 <- path.expand("~/GeneList_Homo_sapiens_BUILD.37.3.txt")
genelist37 <- GetGeneList_v11("Homo sapiens",build="BUILD.37.3",savefiles=F,destfile=tempfile())
fwrite(genelist37,file.genelist37,sep="\t",quote=T)

Unfortunately, the ftp://ftp.ncbi.nih.gov/genomes/MapView for build37 is not available. Do you have other options for the annotation?

Your help will be much appreciated.

Script fails if no SNPs are significant?

If I run the test data included with SAIGE I get the following results:

CHROM    POS    ID    REF    ALT    AC    MAC    MAF    AF    CHISQ    BETA    SEBETA_GC    PVALUE_GC    LOG10P_GC
1    4    rs4    2    1    20    20    0.01    0.01    0.00900168133609557    -0.0748360120098867    0.788767156474349    0.924    0.0341342498337596
1    9    rs9    2    1    4    4    0.002    0.002    0.778767920480584    -1.36954622722431    1.5519319320999    0.378    0.42306208504815
1    10    rs10    2    1    4    4    0.002    0.002    0.198787281155206    0.655246816602172    1.46963884563484    0.656    0.183293820523815
1    11    rs11    2    1    2    2    0.001    0.001    0.0982969720215684    -1.10691964528169    3.53057974490197    0.754    0.122696058298277
1    12    rs12    2    1    30    30    0.015    0.015    0.0187906336497365    -0.0897892694469568    0.655018740034906    0.891    0.0501376964185931
1    14    rs14    2    1    75    75    0.0375    0.0375    0.0298836109133599    -0.0747761355054132    0.432560123012224    0.863    0.0641127257257747
1    15    rs15    2    1    244    244    0.122    0.122    1.32622747205384    -0.28363607944272    0.24629322403832    0.249    0.602968677777628
1    17    rs17    2    1    12    12    0.006    0.006    0.00243722719101353    -0.0411351808952507    0.833230970885822    0.961    0.0174457511091217
1    20    rs20    2    1    762    762    0.381    0.381    0.000120928933151418    -0.00181338599534426    0.164901705220366    0.991    0.00382730415769189
1    22    rs22    2    1    459    459    0.2295    0.2295    0.559747345486054    0.144922266593042    0.193704226331097    0.454    0.342597941239989

But if I run your script on these data I get the message:

Error in seq.default(2, length(chrLab), by = 2) : wrong sign in 'by' argument Calls: axis -> seq -> seq.default Execution halted

I do not know if fixing this is worth your time, but I thought I should report the error :)

Error in candidateRegions$POS[r] <- rhits$POS[which.max(rhits$LOG10P)] : replacement has length zero

I just downloaded the updated scripts (my old was probably several years old), but I have an issue with the Manhattan plot script.

The error message I get is
Error in candidateRegions$POS[r] <- rhits$POS[which.max(rhits$LOG10P)] :
replacement has length zero
Calls: ManhattanPlot
In addition: Warning message:
In as.data.table.list(x, keep.rownames = keep.rownames, check.names = check.names, :
Item 3 has 0 rows but longest item has 1; filled with NA

I have tried to run the script on two phenotypes, where it works on one, but not the other. I have gone through the files, but I am not able to find anything wrong; all columns have value within expected ranges, with no NA. They should also be sorted on chr - pos. For the phenotype giving an error, I tried with the old script I had on the cluster, and it works fine.

This is the first ten lines of the input file (which is tab-separated)
CHR POS P MAF
1 85022 0.99079011268437 0.00404269760474563
1 135195 0.52322468045994 0.00930232927203178
1 233476 0.44981101380145 0.00814521033316851
1 233582 0.286454369491075 0.00564469257369637
1 554968 0.294178441484736 0.00653086602687836
1 569994 0.615267901422401 0.00595780368894339
1 574151 0.622304870569572 0.0127126947045326
1 581812 0.302093715615789 0.0190441608428955
1 648843 0.643641380536429 0.00501822307705879

This is the code I am running: Rscript ManhattanPlot.r --input test.txt --prefix test --chr CHR --pos POS --pvalue P

It guess it must be something wrong with the input file, that is tolerated by the old script, but not the new one, but I have not been able to figure out by looking through the code or the inputfile...

Do you have any ideas what your updated script is not tolerating?
Thankful for any help!

Edit: The only difference I could find between the two phenotypes, is that the phenotype that did not work, did not have any significant SNPs. For the file where the script worked, I removed the significant SNP(s). Then running the Manhattan plot script again I get the error. Could it be a bug in the script that requires the resultfile to contain significant SNPs?

Perhaps you should add some example data to make your script easier to understand and use?

If you included the file ExampleGWAS.txt in the repository it would be much easier for others to understand how to use the tool. I cannot see that the format of the input file is described anywhere.

Thanks for making the code available btw, I really appreciate it 👍

Error including candidate regions file in updated Manhattan plot script

I have successfully included a hitregions file in an older version of the Manhattan plot, but in the newest version i get an error message:

Error in merge.data.table(plotdata[, .(CHROM, POS, x, y)], candidateRegions,  : 
 Elements listed in `by` must be valid column names in x and y
Calls: ManhattanPlot -> merge -> merge.data.table

This is the tabseparated candidate region file (I tried to change the last column name from LEGENDTEXT to LABEL, as indicated in the --help function), but nothing changed

CHROM START END COL LABEL
8 65984044 66384044 green Novel_locus
7 20490738 20890738 green Novel_locus
12 22975219 24975219 blue Known_locus
8 129718772 131718772 blue Known_locus
18 49379032 51379032 blue Known_locus

This is the code I am running

Rscript ManhattanPlot.r --input plots/low_back_pain_H23_forplots.txt --prefix test --chr CHR --pos POS --pvalue p.value --hitregion hitregions.txt --maintitle 'test'

I might have missed an update on how to make the hitregion file, but the only difference I could see by comparing the two scripts, is the header of the last column (LEGENDTEXT vs. LABEL). The excact same code and files (except for changing the header of the last column) works when running the older script. It also works fine when dropping the --hitregion option.

Sorry if I have missed something obvious.

Thanks,
Sigrid

ilarsf / gwastools Goto Github PK

gwastools's People

Contributors

Stargazers

Watchers

Forkers

gwastools's Issues

ManhattanPlot.R: NCBI annotation not available

Script fails if no SNPs are significant?

Error in candidateRegions$POS[r] <- rhits$POS[which.max(rhits$LOG10P)] : replacement has length zero

Perhaps you should add some example data to make your script easier to understand and use?

Error including candidate regions file in updated Manhattan plot script

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent