nloyfer / uxm_deconv Goto Github PK

View Code? Open in Web Editor NEW

34.0 34.0 11.0 4.23 MB

License: Other

Python 96.52% R 3.48%

uxm_deconv's People

Contributors

Stargazers

Watchers

Forkers

ryancey1 friend1ws suranfake ultimagen kange2014 explcre dshcherbo chaorongc sambuckberry dnaase

uxm_deconv's Issues

uxm build - marker file

Thanks for releasing this package. uxm build asks for a marker file. How do you create this from your reference set? And is the atlas built from a subset of loci from this marker file?

`uxm plot` is not recognized as a valid command

Anyone else getting this error when using uxm plot?

$ uxm plot
Invalid command: plot
did you mean plot?
UXM deconvolution tool, version 0.1.0

Usage: uxm <command> [<args>]
run uxm <command> -h for more information
Optional commands:

deconv
build
plot
heatmap
test
binary

How to get Megakryocyte Markers?

When reviewing the paper "A DNA methylation atlas of normal human cell types," it was noticed that the authors referenced an additional 250 Megakaryocyte markers from the study by Moss et al. ("Megakaryocyte and erythroblast DNA in plasma and platelets (2022)"). However, the steps to obtain DNA methylation data for megakaryocytes were not clearly provided.

Clarifying how the DNA methylation data for the Megakaryocyte markers was obtained would improve the reproducibility of the study.

Considering this, I am raising this issue on GitHub for further discussion and clarification.

Reference for the both datasets Nature and Med papers (VEC)

Thank you very much for the great tool!

I recently found this manuscript from your group:
"The DNA methylome of human vascular endothelium and its use in liquid biopsies"

It complements the reference dataset from the nature paper with 20 more cell types, right?
Are there any plans to make the bigger reference available?
As far as I understand, in here (supplemental), there is only the 39 cell types reference.

Regarding what reference to use, is it preferable to use the bigger marker set (U250)?
Does the 250 means that the top 250 unmethylated regions within each cell type was used to generate the reference?

Thanks
Francesc

UXM error: Invalid command

Hello,

Recently I downloaded UXM and installed it successfully.
However, when I use UXM, for example, "uxm deconv", it keeps showing "Invalid command: deconv, did you mean deconv? UXM deconvolution tool, version 0.1.0".

Where did the problem occur? How can I fix it?

The same issue also occurs with wgbstools.
Code:
./wgbstools init_genome -h
Output error:
Invalid command: init_genome
did you mean init_genome?

I hope to receive your assistance as soon as possible. Thank you.

Full Atlas

Hello and thank you for this amazing work!

I was wonder if the full atlas is available?
Could not find it here nor in the GEO repository.

Thanks,
Assaf

NCBI reference genomes cannot make/build atlas

Does UXM allow the use of NCBI reference genomes, I have some data aligned to NCBI reference genome. We seem to be having issues building an atlas due to the lack of chr tag in NCBI genomes (1:927288-927423).

The following error occurs: Invalid marker file: test.bed. "name" column must start with “chr"

If we manually add the chr, the deconvolution then fails.

Would you have a solution to this problem?

Own marker regions in deconvolution reference atlas

I am trying to replace the neuron markers in the Atlas.U250.l4.hg38.full.tsv with my own markers. I am able to fill all fields with the correct information, except for the startCpG and endCpG columns because I have not been able to find a file that numbers all CpGs in hg38 by order in the genome. If you are able to, would you please provide the file(s) which contain this information that you used when developing the software? Thanks.

top markers

great tool!
You uploaded two atlases with the top 25 and 250 markers/cell type.
As I have understood it you selected these based on the difference of block average methylation percentiles between target tissue samples and the rest of all samples. So I was wondering if it is possible to subset the top 250 marker atlas to e.g. the top 50 with the atlas file alone without the actual sample files and if yes, how?
If no, would it be possible that you could upload the file with the results from find_marker of the 250 atlas (startCpG, endCpG, target, region, lenCpG, bp, tg_mean, bg_mean, delta_means, delta_quants, delta_maxmin)
Thanks for the help!

Tissue specific deconvolution

Thank you for your decidated works for uxm.
I've run uxm deconv on my WGBS data obtained from human PBMC and mapped to hg38.
I expected to have higher proportion on blood cell types but they were not.
So is there any methods to limit a specific tissue when running uxm deconv or any other suggestions?
Note that "supplemental/Atlas.U25.l4.hg38.full.tsv" file was used to align with mapped version.

Invalid Input Argument error during deconvolution

I am trying to use uxm deconv with my own data after making sure the program works with the tutorial data. I get the following error:

uxm deconv tmc-110_WGBS.sorted.dedup.STRIPPED.pat.gz -o uxm_tmc-110_WGBS.STRIPPED2.csv --debug
wgbstools homog -f --rlen 4 -b /shared/home/bskoseva/src/UXM_deconv/tmp_dir/l4/tmc-110_WGBS.sorted.dedup.STRIPPED.79byf_1v.bed /analysis/cloud_projects/research/GL_COVID_TMC_and_Peds/uxm_deconvolution/tmc-110_WGBS.sorted.dedup.STRIPPED.pat.gz --prefix /shared/home/bskoseva/src/UXM_deconv/tmp_dir/l4/tmc-110_WGBS.sorted.dedup.STRIPPED.79byf_1v -v
Warning: skipping an empty sample tmc-110_WGBS.sorted.dedup.STRIPPED
Invalid input argument
Length of values (2) does not match length of index (36)

I looked at #1 to try and troubleshoot on my own but I am not seeing anything that helps me understand this error, or how to get past it. Here are the commands I used:

# set to the custom ref genome
$ wgbstools set_default_ref --name GRCh38 

Existing references:
=====
hg19
hg38
GRCh38 (default)

$ wgbstools bam2pat /analysis/projects/methyl/tmc-110_WGBS/tmc-110_WGBS.sorted.dedup.bam

# get the markers
$ tail -n +2 Atlas.U25.l4.hg38.full.tsv | cut -f1-5 > markers_Atlas.U25.l4.hg38.full.bed

# restrict to regions found in the atlas
$ wgbstools view -L markers_Atlas.U25.l4.hg38.full.bed tmc-110_WGBS.sorted.dedup.pat.gz --min_len 4 --strip --strict > tmc-110_WGBS.sorted.dedup.STRIPPED.pat

# check the content of the output
$ head tmc-110_WGBS.sorted.dedup.STRIPPED.pat
chr1	24569	CCCCCC	1
chr1	24569	CCCCCCCTCC	1
chr1	24569	CCCCCTTCC	1
chr1	24569	CCCT	1
chr1	24569	CCTCCCCCCCCCCC	1
chr1	24569	CCTCCCCTT	1
chr1	24569	TTTTTT	1
chr1	24572	CCCCCCC.C	1
chr1	24578	CCCCCC	1
chr1	63940	CCCCCC	1

# zip the pat file
$ gzip tmc-110_WGBS.sorted.dedup.STRIPPED.pat

# check the output of homog
$ wgbstools homog -b markers_Atlas.U25.l4.hg38.full.bed T-COV-R-110_WGBS.sorted.dedup.STRIPPED.pat.gz 
$ gunzip -c tmc-110_WGBS.sorted.dedup.STRIPPED.uxm.bed.gz | head 
chr1	1262136	1262432	24569	24584	1	1	7
chr1	2384160	2384745	63940	63960	0	0	25
chr1	5950648	5950918	133709	133715	0	0	10
chr1	5959258	5959335	133878	133884	0	0	8
chr1	7991117	7991683	173499	173512	3	2	15
chr1	9554214	9554463	199896	199905	0	0	12
chr1	10947269	10947539	226986	226992	1	0	9
chr1	11846130	11846567	242812	242825	0	0	14
chr1	14954541	14954609	282749	282754	0	1	14
chr1	20916695	20916823	376835	376841	0	0	8

How can I further troubleshoot?

Nanopore Data

Hello everyone!

My Name is Azlan and i am currently analyzing our ONT Data. I wanted to use uxm and wgbs_tools to further investigate the methylation status in different samples and just wanted to ask if i can use these tools for ONT. I read a lot about wgbs Data and just am curious if anyone has done deconvolution and methylation analysis for Nanopore Data with these Tools.

Kind regards,
Azlan

Invalid file error

Hi! I was really interested in using uxm for cell type deconvolution, but unfortunately when running uxm deconv I get the following error:

Invalid file /.../UXM_deconv/tmp_dir/l4/CAG.dedup.ptp2zrjr.uxm.bed.gz

The program then hangs and no more output is generated. What could be the source of this error? Is it possible that I first need to filter the input for uxm? (only the regions in the references)

Here's my current workflow, which starts from deduplicated reads in .bam files:

Sort the reads: samtools sort dedup.bam -o dedup.bam
Convert bam to pat: wgbstools bam2pat dedup.bam
Deconvolution: uxm deconv dedup.pat.gz -o deconv.csv

Thanks!

Specify "--thresholds" in uxm build

Edit: my comment was deleted, rewriteing..

Hi,

I'm trying to run uxm buid with the argument -l 2, and I'm getting the following messege:

for rlen==2, --thresholds must be specified'

However, I can't find --threshold in the command's list of arguments.

@nloyfer can you please help me with this?

Thank you!

homog_mem.py debug flag and max threading uxm build

Hi,
running uxm build, I noticed that it will produce an error unless the following is added to homog_mem.py:

parser.add_argument('--debug', '-d', action='store_true')

perhaps you have missed that during the last commit.

I also noticed that it will fail to build the atlas with a high number of threads. Maybe you want to add a warning that it might crash if too many cores are used. -@ 50 worked while -@ 100 failed all the time (at least for me)

Deconvolution issues

While I can get deconvolution to work for the test data you provided, whenever I attempt it with another dataset (like this provided in https://github.com/nloyfer/wgbs_tools/tree/master/tutorial or my own data) I get the following errors:

infered from atlas name that rlen=4 [ mcD_2021_011 ] no memoization found [ mcD_2021_012 ] no memoization found [ wt homog ] [ mcD_2021_011 ] WARNING: all zeros! [ wt homog ] [ mcD_2021_012 ] WARNING: all zeros! WARNING: possibly failed in /gpfs/gsfs12/users/NHLBI_IDSS/projects/NHLBI-4/patDeconv/mcD_2021_011.pat.gz - all 900 values are zero. memoization is not updated, to be safe WARNING: possibly failed in /gpfs/gsfs12/users/NHLBI_IDSS/projects/NHLBI-4/patDeconv/mcD_2021_012.pat.gz - all 900 values are zero. memoization is not updated, to be safe [ mcD_2021_014 ] no memoization found [ mcD_2021_018 ] no memoization found [ wt homog ] [ mcD_2021_014 ] WARNING: all zeros! [ wt homog ] [ mcD_2021_018 ] WARNING: all zeros! WARNING: possibly failed in /gpfs/gsfs12/users/NHLBI_IDSS/projects/NHLBI-4/patDeconv/mcD_2021_018.pat.gz - all 900 values are zero. memoization is not updated, to be safe WARNING: possibly failed in /gpfs/gsfs12/users/NHLBI_IDSS/projects/NHLBI-4/patDeconv/mcD_2021_014.pat.gz - all 900 values are zero. memoization is not updated, to be safe Warning: skipping an empty sample mcD_2021_011 Warning: skipping an empty sample mcD_2021_012 Warning: skipping an empty sample mcD_2021_018 Warning: skipping an empty sample mcD_2021_014 Invalid input argument Length of values (2) does not match length of index (36)

This is using pat.gz files filtered to contain only the atlas sites, as described in your tutorial, & it does contain a majority of the sites. Any help with figuring out what is going wrong would be greatly appreciated.