nloyfer / uxm_deconv Goto Github PK
View Code? Open in Web Editor NEWLicense: Other
License: Other
Thanks for releasing this package. uxm build
asks for a marker file. How do you create this from your reference set? And is the atlas built from a subset of loci from this marker file?
Anyone else getting this error when using uxm plot
?
$ uxm plot
Invalid command: plot
did you mean plot?
UXM deconvolution tool, version 0.1.0
Usage: uxm <command> [<args>]
run uxm <command> -h for more information
Optional commands:
deconv
build
plot
heatmap
test
binary
When reviewing the paper "A DNA methylation atlas of normal human cell types," it was noticed that the authors referenced an additional 250 Megakaryocyte markers from the study by Moss et al. ("Megakaryocyte and erythroblast DNA in plasma and platelets (2022)"). However, the steps to obtain DNA methylation data for megakaryocytes were not clearly provided.
Clarifying how the DNA methylation data for the Megakaryocyte markers was obtained would improve the reproducibility of the study.
Considering this, I am raising this issue on GitHub for further discussion and clarification.
Thank you very much for the great tool!
I recently found this manuscript from your group:
"The DNA methylome of human vascular endothelium and its use in liquid biopsies"
It complements the reference dataset from the nature paper with 20 more cell types, right?
Are there any plans to make the bigger reference available?
As far as I understand, in here (supplemental), there is only the 39 cell types reference.
Regarding what reference to use, is it preferable to use the bigger marker set (U250)?
Does the 250 means that the top 250 unmethylated regions within each cell type was used to generate the reference?
Thanks
Francesc
Hello,
Recently I downloaded UXM and installed it successfully.
However, when I use UXM, for example, "uxm deconv", it keeps showing "Invalid command: deconv, did you mean deconv? UXM deconvolution tool, version 0.1.0".
Where did the problem occur? How can I fix it?
The same issue also occurs with wgbstools.
Code:
./wgbstools init_genome -h
Output error:
Invalid command: init_genome
did you mean init_genome?
I hope to receive your assistance as soon as possible. Thank you.
Hello and thank you for this amazing work!
I was wonder if the full atlas is available?
Could not find it here nor in the GEO repository.
Thanks,
Assaf
Does UXM allow the use of NCBI reference genomes, I have some data aligned to NCBI reference genome. We seem to be having issues building an atlas due to the lack of chr tag in NCBI genomes (1:927288-927423).
The following error occurs: Invalid marker file: test.bed. "name" column must start with βchr"
If we manually add the chr, the deconvolution then fails.
Would you have a solution to this problem?
I am trying to replace the neuron markers in the Atlas.U250.l4.hg38.full.tsv with my own markers. I am able to fill all fields with the correct information, except for the startCpG and endCpG columns because I have not been able to find a file that numbers all CpGs in hg38 by order in the genome. If you are able to, would you please provide the file(s) which contain this information that you used when developing the software? Thanks.
great tool!
You uploaded two atlases with the top 25 and 250 markers/cell type.
As I have understood it you selected these based on the difference of block average methylation percentiles between target tissue samples and the rest of all samples. So I was wondering if it is possible to subset the top 250 marker atlas to e.g. the top 50 with the atlas file alone without the actual sample files and if yes, how?
If no, would it be possible that you could upload the file with the results from find_marker of the 250 atlas (startCpG, endCpG, target, region, lenCpG, bp, tg_mean, bg_mean, delta_means, delta_quants, delta_maxmin)
Thanks for the help!
Thank you for your decidated works for uxm.
I've run uxm deconv on my WGBS data obtained from human PBMC and mapped to hg38.
I expected to have higher proportion on blood cell types but they were not.
So is there any methods to limit a specific tissue when running uxm deconv or any other suggestions?
Note that "supplemental/Atlas.U25.l4.hg38.full.tsv" file was used to align with mapped version.
I am trying to use uxm deconv
with my own data after making sure the program works with the tutorial data. I get the following error:
uxm deconv tmc-110_WGBS.sorted.dedup.STRIPPED.pat.gz -o uxm_tmc-110_WGBS.STRIPPED2.csv --debug
wgbstools homog -f --rlen 4 -b /shared/home/bskoseva/src/UXM_deconv/tmp_dir/l4/tmc-110_WGBS.sorted.dedup.STRIPPED.79byf_1v.bed /analysis/cloud_projects/research/GL_COVID_TMC_and_Peds/uxm_deconvolution/tmc-110_WGBS.sorted.dedup.STRIPPED.pat.gz --prefix /shared/home/bskoseva/src/UXM_deconv/tmp_dir/l4/tmc-110_WGBS.sorted.dedup.STRIPPED.79byf_1v -v
Warning: skipping an empty sample tmc-110_WGBS.sorted.dedup.STRIPPED
Invalid input argument
Length of values (2) does not match length of index (36)
I looked at #1 to try and troubleshoot on my own but I am not seeing anything that helps me understand this error, or how to get past it. Here are the commands I used:
# set to the custom ref genome
$ wgbstools set_default_ref --name GRCh38
Existing references:
=====
hg19
hg38
GRCh38 (default)
$ wgbstools bam2pat /analysis/projects/methyl/tmc-110_WGBS/tmc-110_WGBS.sorted.dedup.bam
# get the markers
$ tail -n +2 Atlas.U25.l4.hg38.full.tsv | cut -f1-5 > markers_Atlas.U25.l4.hg38.full.bed
# restrict to regions found in the atlas
$ wgbstools view -L markers_Atlas.U25.l4.hg38.full.bed tmc-110_WGBS.sorted.dedup.pat.gz --min_len 4 --strip --strict > tmc-110_WGBS.sorted.dedup.STRIPPED.pat
# check the content of the output
$ head tmc-110_WGBS.sorted.dedup.STRIPPED.pat
chr1 24569 CCCCCC 1
chr1 24569 CCCCCCCTCC 1
chr1 24569 CCCCCTTCC 1
chr1 24569 CCCT 1
chr1 24569 CCTCCCCCCCCCCC 1
chr1 24569 CCTCCCCTT 1
chr1 24569 TTTTTT 1
chr1 24572 CCCCCCC.C 1
chr1 24578 CCCCCC 1
chr1 63940 CCCCCC 1
# zip the pat file
$ gzip tmc-110_WGBS.sorted.dedup.STRIPPED.pat
# check the output of homog
$ wgbstools homog -b markers_Atlas.U25.l4.hg38.full.bed T-COV-R-110_WGBS.sorted.dedup.STRIPPED.pat.gz
$ gunzip -c tmc-110_WGBS.sorted.dedup.STRIPPED.uxm.bed.gz | head
chr1 1262136 1262432 24569 24584 1 1 7
chr1 2384160 2384745 63940 63960 0 0 25
chr1 5950648 5950918 133709 133715 0 0 10
chr1 5959258 5959335 133878 133884 0 0 8
chr1 7991117 7991683 173499 173512 3 2 15
chr1 9554214 9554463 199896 199905 0 0 12
chr1 10947269 10947539 226986 226992 1 0 9
chr1 11846130 11846567 242812 242825 0 0 14
chr1 14954541 14954609 282749 282754 0 1 14
chr1 20916695 20916823 376835 376841 0 0 8
How can I further troubleshoot?
Hello everyone!
My Name is Azlan and i am currently analyzing our ONT Data. I wanted to use uxm and wgbs_tools to further investigate the methylation status in different samples and just wanted to ask if i can use these tools for ONT. I read a lot about wgbs Data and just am curious if anyone has done deconvolution and methylation analysis for Nanopore Data with these Tools.
Kind regards,
Azlan
Hi! I was really interested in using uxm for cell type deconvolution, but unfortunately when running uxm deconv
I get the following error:
Invalid file /.../UXM_deconv/tmp_dir/l4/CAG.dedup.ptp2zrjr.uxm.bed.gz
The program then hangs and no more output is generated. What could be the source of this error? Is it possible that I first need to filter the input for uxm? (only the regions in the references)
Here's my current workflow, which starts from deduplicated reads in .bam files:
samtools sort dedup.bam -o dedup.bam
wgbstools bam2pat dedup.bam
uxm deconv dedup.pat.gz -o deconv.csv
Thanks!
Edit: my comment was deleted, rewriteing..
Hi,
I'm trying to run uxm buid
with the argument -l 2
, and I'm getting the following messege:
for rlen==2, --thresholds must be specified'
However, I can't find --threshold
in the command's list of arguments.
@nloyfer can you please help me with this?
Thank you!
Hi,
running uxm build, I noticed that it will produce an error unless the following is added to homog_mem.py:
parser.add_argument('--debug', '-d', action='store_true')
perhaps you have missed that during the last commit.
I also noticed that it will fail to build the atlas with a high number of threads. Maybe you want to add a warning that it might crash if too many cores are used. -@ 50 worked while -@ 100 failed all the time (at least for me)
While I can get deconvolution to work for the test data you provided, whenever I attempt it with another dataset (like this provided in https://github.com/nloyfer/wgbs_tools/tree/master/tutorial or my own data) I get the following errors:
infered from atlas name that rlen=4 [ mcD_2021_011 ] no memoization found [ mcD_2021_012 ] no memoization found [ wt homog ] [ mcD_2021_011 ] WARNING: all zeros! [ wt homog ] [ mcD_2021_012 ] WARNING: all zeros! WARNING: possibly failed in /gpfs/gsfs12/users/NHLBI_IDSS/projects/NHLBI-4/patDeconv/mcD_2021_011.pat.gz - all 900 values are zero. memoization is not updated, to be safe WARNING: possibly failed in /gpfs/gsfs12/users/NHLBI_IDSS/projects/NHLBI-4/patDeconv/mcD_2021_012.pat.gz - all 900 values are zero. memoization is not updated, to be safe [ mcD_2021_014 ] no memoization found [ mcD_2021_018 ] no memoization found [ wt homog ] [ mcD_2021_014 ] WARNING: all zeros! [ wt homog ] [ mcD_2021_018 ] WARNING: all zeros! WARNING: possibly failed in /gpfs/gsfs12/users/NHLBI_IDSS/projects/NHLBI-4/patDeconv/mcD_2021_018.pat.gz - all 900 values are zero. memoization is not updated, to be safe WARNING: possibly failed in /gpfs/gsfs12/users/NHLBI_IDSS/projects/NHLBI-4/patDeconv/mcD_2021_014.pat.gz - all 900 values are zero. memoization is not updated, to be safe Warning: skipping an empty sample mcD_2021_011 Warning: skipping an empty sample mcD_2021_012 Warning: skipping an empty sample mcD_2021_018 Warning: skipping an empty sample mcD_2021_014 Invalid input argument Length of values (2) does not match length of index (36)
This is using pat.gz files filtered to contain only the atlas sites, as described in your tutorial, & it does contain a majority of the sites. Any help with figuring out what is going wrong would be greatly appreciated.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.