qlu-lab / supergnova Goto Github PK

View Code? Open in Web Editor NEW

23.0 23.0 9.0 432 KB

License: MIT License

Python 100.00%

supergnova's People

Contributors

Stargazers

Watchers

Forkers

shicheng-guo optimist0372 jtnedoctor zerland forget999 zengzeng12 christine-0514

supergnova's Issues

Problem with tutorial

I am trying to run the tutorial, but the final command, where supergnova.py is called, gets stuck at the following message:

"24 CPUs are detected. Using 24 threads in computation ..."

without showing any error after it has been running for more than 24 hours.

I have created a conda environment with the required packages to run the commands, but I don't know what might be going on. I suspect the problem could be that I am not using the right package versions, as it happens with the pandas newest version. I would be very grateful if you could help me to know what is causing this.

Thanks.

python pandas version issue

1.4.2 = no
1.2.4 = yes

error installing on Mac

Hi, I have installed python3 and pandas and the other dependencies on my MacBook
however, I get the following error when testing:

File "/Users/mukutimran/Desktop/SUPERGNOVA/SUPERGNOVA/supergnova.py", line 36, in
pd.set_option('precision', 4)
File "/usr/local/lib/python3.9/site-packages/pandas/_config/config.py", line 256, in call
return self.func(*args, **kwds)
File "/usr/local/lib/python3.9/site-packages/pandas/_config/config.py", line 149, in _set_option
key = _get_single_key(k, silent)
File "/usr/local/lib/python3.9/site-packages/pandas/_config/config.py", line 116, in _get_single_key
raise OptionError("Pattern matched multiple keys")
pandas._config.config.OptionError: 'Pattern matched multiple keys'

Problems with error: "pandas.errors.ParserError: Error tokenizing data"

Dear all,

I am an absolute beginner in python as well as conducting analyses with SUPERGNOVA and have heavy problems to solve the following error.

SUPERGNOVA % python3 supergnova.py ./data/sumstats/Inter.txt ./data/sumstats/AccPA.txt
--N1 102837
--N2 91084
--bfile data/bfiles/eur_chr@_SNPmaf5
--partition data/partition/[email protected]
--out results.txt

Preparing files for analysis...
Traceback (most recent call last):
File "/Users//SUPERGNOVA/supergnova.py", line 93, in
pipeline(parser.parse_args())
File "/Users//SUPERGNOVA/supergnova.py", line 54, in pipeline
gwas_snps, bed, N1, N2 = prep(args.bfile, args.partition, args.sumstats1, args.sumstats2, args.N1, args.N2)
File "/Users//SUPERGNOVA/prep.py", line 65, in prep
dfs = [pd.read_csv(file, delim_whitespace=True)
File "/Users//SUPERGNOVA/prep.py", line 65, in
dfs = [pd.read_csv(file, delim_whitespace=True)
File "/Users//anaconda3/lib/python3.10/site-packages/pandas/util/_decorators.py", line 211, in wrapper
return func(args, kwargs)
File "/Users//anaconda3/lib/python3.10/site-packages/pandas/util/_decorators.py", line 331, in wrapper
return func(args, kwargs)
File "/Users//anaconda3/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 950, in read_csv
return _read(filepath_or_buffer, kwds)
File "/Users//anaconda3/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 611, in _read
return parser.read(nrows)
File "/Users//anaconda3/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1778, in read
) = self._engine.read( # type: ignore[attr-defined]
File "/Users//anaconda3/lib/python3.10/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 230, in read
chunks = self._reader.read_low_memory(nrows)
File "pandas/_libs/parsers.pyx", line 808, in pandas._libs.parsers.TextReader.read_low_memory
File "pandas/_libs/parsers.pyx", line 866, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 852, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas/_libs/parsers.pyx", line 1973, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 21 fields in line 2771, saw 28

Has someone any idea to solve this error? Thanks in advance for your help.

Global genetic covariance estimation

Hello,

I have successfully used SUPERGNOVA to calculate local genetic covariance estimates, however now I would like to confirm that those sum up to the total genome-wide genetic covariance. Is the global genetic covariance estimate output by the program, or is there an argument I can use to get this estimate? Sorry if this is not the best place to ask the question since it is not an issue with the program/code, I just wasn't sure of the best way to reach out.

Thank you in advance for your help!

Best,

Sarah

Question about allele frequency when formatting sum stats with munge_sumstats.py

Hello,

One of my GWAS sum stat files provides a frequency column for the alt allele, and the other provides the allele frequency for allele 1 (which I assume is the reference allele - I'm going to attempt to confirm that). Can I just provide both of these frequency to columns to munge_sumstats.py like I did below? Or do I need to convert the frequency column in one of the summary stats files so that is consistent with the other?

Thank you!

python /mnt/mfs/hgrcgrid/homes/nrr2132/ldsc/munge_sumstats.py
--sumstats /mnt/mfs/hgrcgrid/homes/nrr2132/analysis/AAGWAS/11.18.22/WithIbadan/METAL/Model2/ADGC_AA_GWAS3.AGR.MODEL2.with_ibadan.annotated.METAL.20221116_TabSeparated.txt
--N 9095
--snp MarkerName
--a1 Allele1
--a2 Allele2
--p P-value
--frq Freq1
--signed-sumstats Effect,0
--out ADGC_AFR_WithIbadan_Model2_SumStats_Munge.txt

python /mnt/mfs/hgrcgrid/homes/nrr2132/ldsc/munge_sumstats.py
--sumstats /mnt/vast/hpc/reitz_lab/REITZ_LAB/VascularTrait_SummaryStat/Lipids_GWAS_SummaryStats/Global_Lipids_Genetics_Consortium/Graham_2021/African/nonHDL_INV_AFR_HRC_1KGP3_others_ALL.meta.singlevar.results
--N 99432
--snp rsID
--a1 REF
--a2 ALT
--p pvalue
--frq POOLED_ALT_AF
--signed-sumstats EFFECT_SIZE,0
--out nonHDL_Graham2021_AFR_SumStats_munge.txt

proportion of correlated regions

Hi Everyone,

I was wondering if someone could share some R code that takes for input the results of supergnova and estimates the proportion of correlated regions. Here's what I have so far;

Best to All,
Marc.

R --no-save --quiet
library( "ashr" )

#----- load supergnova results
my <- read.table(
"supanova_insomnia.txt",
header=TRUE, sep=" ", as.is=TRUE,
stringsAsFactors=FALSE, quote="" )

#----- estimate proportion of correlated regions
J <- ash( my[,"rho"], sqrt( my[,"var"] ), mixcompdist="halfnormal" )
sum( 1 - J$result$lfdr ) / nrow( my ) # <----- is this correct?

Skipping LDscore estimation step!

Hi,

Would be great to have your suggestion on a time-effective way of running sgnova, as I am planning to do it on about 200 traits vs 1 trait of interest.
For now after using 40 threads, the job takes about 30 minutes, and wondering if there are any ways to do more effectively, for instance using precalculated ldscore (as all are european only analysis).
Any comments/suggestions would be greatly appreciated.

Regads
msarguru

bfile and partition parameter

Hi ,

I am trying to run the SUPERGNOVA for my datasets.

The parameters such as bfile and partition are optional according to the help provided. But while running it shows these parameters are required. Could be please confirm whether we can run the program without those parameters

Thanks,
JK

Failure at the computing local genetic covariance step

Hello,

Thanks so much for providing this update to GNOVA! I've attempted to run supergnova.py with two input files that gnova.py can handle with no issues.

Here is the command I am running:

python3.6 supergnova.py /home/dubeu/SumStatAnalyses/GNOVA/sumstats1_nomissing /home/dubeu/SumStatAnalyses/GNOVA/sumstats2_nomissing --N1 800000 --N2 40000 --bfile data/bfiles/eur_chr@_SNPmaf5 --partition data/partition/[email protected] --out results.txt

Here is the error I am getting:
...
Computed local genetic covariance for chromosome 11
Computed local genetic covariance for chromosome 12
Traceback (most recent call last):
File "supergnova.py", line 91, in
pipeline(parser.parse_args())
File "supergnova.py", line 63, in pipeline
out = calculate(args.bfile, bed, args.thread, gwas_snps, N1, N2, h_1, h_2, pheno_corr, pheno_corr_var)
File "/home/dubeu/SumStatAnalyses/SUPERGNOVA/calculate.py", line 177, in calculate
all_dfs.append(_supergnova(cur_bfile, partition, thread, gwas_snps, n1, n2, h1, h2, pheno_corr, pheno_corr_var))
File "/home/dubeu/SumStatAnalyses/SUPERGNOVA/calculate.py", line 155, in _supergnova
df = pd.concat(results, ignore_index=True)
File "/home/dubeu/.local/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 281, in concat
sort=sort,
File "/home/dubeu/.local/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 329, in init
raise ValueError("No objects to concatenate")
ValueError: No objects to concatenate

ValueError: Input contains NaN, infinity or a value too large for dtype('float64')

Hello,

I am trying to estimate the local genetic covariance of two traits but I get the error "ValueError: Input contains NaN, infinity or a value too large for dtype('float64')".

The script runs well until "Calculating phenotypic correlation..." where I get an error with
"lm = linear_model.LinearRegression().fit(pd.DataFrame(l), pd.DataFrame(z_xy), sample_weight=w)" in pheno.py. I am using scikit-learn version 0.23.1

My command is
python SUPERGNOVA/supergnova.py MAFUKB.sumstats.gz MAFNoUH.sumstats.gz --N1 364859 --N2 728324 --bfile ./SUPERGNOVA/data/bfiles/eur_chr@_SNPmaf5 --partition ./SUPERGNOVA/data/partition/[email protected] --out partition.txt.

The heritabilities of my 2 traits are low 0.014 and 0.010. Is this causing the error? If not, what could I do to address the error?

Cheers,
Doug

no missing values but with "ValueError: array must not contain infs or NaNs"

I am trying to run my own data on SUPERGNOVA,
however, I get the following error when testing:

Preparing files for analysis...
Calculating LD scores...
49351 SNPs included in our analysis...
Calculating heritability...
The genome-wide heritability of the first trait is 0.0196689057368004.
The genome-wide heritability of the second trait is -0.011096917527349101.
Calculating phenotypic correlation...
Traceback (most recent call last):
File "supergnova.py", line 93, in
pipeline(parser.parse_args())
File "supergnova.py", line 63, in pipeline
pheno_corr, pheno_corr_var = pheno(gwas_snps, ld_scores, N1, N2, h_1, h_2)
File "F:\SUPERGNOVA_Github\SUPERGNOVA\pheno.py", line 33, in pheno
lm = linear_model.LinearRegression().fit(pd.DataFrame(l), pd.DataFrame(z_xy), sample_weight=w)
File "F:\python\lib\site-packages\sklearn\linear_model_base.py", line 569, in fit
linalg.lstsq(X, y)
File "F:\python\lib\site-packages\scipy\linalg\basic.py", line 1146, in lstsq
a1 = _asarray_validated(a, check_finite=check_finite)
File "F:\python\lib\site-packages\scipy_lib_util.py", line 293, in _asarray_validated
a = toarray(a)
File "F:\python\lib\site-packages\numpy\lib\function_base.py", line 489, in asarray_chkfinite
"array must not contain infs or NaNs")
ValueError: array must not contain infs or NaNs

I have cecked my data, and no missing values in my GWAS summary data. but I don't know how to solve this problem. I would be very grateful if you could help me to know what is causing this.

Thanks.

h1 and h2 outputs

Just an FYI, the h1 and h2 outputs in the results file appear to be h1² and h2².

qlu-lab / supergnova Goto Github PK

supergnova's People

Contributors

Stargazers

Watchers

Forkers

supergnova's Issues

Recommend Projects

Recommend Topics

Recommend Org