caleblareau / maegatk Goto Github PK
View Code? Open in Web Editor NEWMitochondrial Alteration Enrichment and Genome Analysis Toolkit
License: MIT License
Mitochondrial Alteration Enrichment and Genome Analysis Toolkit
License: MIT License
Hello,
is there a tutorial describing a normal workflow for clonal analysis from 10x data (after mitochondrial read amplification following Miller et al. NatBiotech)?
Thanks, Chris
Hi,
I was wondering the relationship between maegatk and maegatk-indel, is the output of the former the input of the latter?
In addition, after installing this package by 'pip3 install maegatk' ,I can use maegatk, but cannot use maegatk-indel (bash: maegatk-indel: command not found...). What should I do to make it work?
Thank you very much!
Hi. I installed maegatk using pip in a conda environment. I also cloned the repository to a separate folder.
When I run the test, I encounter the following error.
AttributeError in line 30 of /home/grasshoff/anaconda3/envs/Maegtk_Python38/lib/python3.8/site-packages/maegtk/bin/snake/Snakefile.maegtk.Gather:
'InputFiles' object has no attribute 'depths'
File "/home/grasshoff/anaconda3/envs/Maegtk_Python38/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 2330, in run_wrapper
File "/home/grasshoff/anaconda3/envs/Maegtk_Python38/lib/python3.8/site-packages/maegtk/bin/snake/Snakefile.maegtk.Gather", line 30, in __rule_make_depth_table
File "/home/grasshoff/anaconda3/envs/Maegtk_Python38/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 569, in _callback
File "/home/grasshoff/anaconda3/envs/Maegtk_Python38/lib/python3.8/concurrent/futures/thread.py", line 57, in run
File "/home/grasshoff/anaconda3/envs/Maegtk_Python38/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 555, in cached_or_run
File "/home/grasshoff/anaconda3/envs/Maegtk_Python38/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 2362, in run_wrapper^[[0m
I used the following command.
maegtk bcall -i data/test_maester.bam -o test_maester -z
I'll attach the complete log file to this post.
logs.txt
Hello.
I tried to reproduce the MAESTER publication but I hit some issues on having too many open file limit on a scRNA-seq dataset.
My steps are,
1-SRR15598773_chrM_bam.bam
GRCh38_chrM.fa
maegatk bcall -i 1-SRR15598773_chrM_bam.bam -o maegatk_outs1 -n SRR15598773-scrna -g GRCh38_chrM.fa -c 12 -bt CB -qc -ub UB -jm 60000m -mr 3 -z
However, there were some errors when splitting the bam files.
Traceback (most recent call last):
File "/home/michael/.local/lib/python3.10/site-packages/maegatk/bin/python/split_barcoded_bam.py", line 64, in <module>
with multi_file_manager(bambcfiles) as fopen:
File "/usr/lib/python3.10/contextlib.py", line 135, in __enter__
return next(self.gen)
File "/home/michael/.local/lib/python3.10/site-packages/maegatk/bin/python/split_barcoded_bam.py", line 56, in multi_file_manager
files = [pysam.AlignmentFile(file, "wb", template = temp) for file in files]
File "/home/michael/.local/lib/python3.10/site-packages/maegatk/bin/python/split_barcoded_bam.py", line 56, in <listcomp>
files = [pysam.AlignmentFile(file, "wb", template = temp) for file in files]
File "pysam/libcalignmentfile.pyx", line 748, in pysam.libcalignmentfile.AlignmentFile.__cinit__
File "pysam/libcalignmentfile.pyx", line 921, in pysam.libcalignmentfile.AlignmentFile._open
OSError: [Errno 24] could not open alignment file `/home/michael/maegatk_outs/temp/barcoded_bams/TTACTGTTCGGAGTGA-1.bam`: Too many open files
[E::hts_open_format] Failed to open file "/home/michael/maegatk_outs/temp/barcoded_bams/GCCAACGAGGTAGACC-1.bam" : Too many open files
I've already tried to increase the open file limit in the system to 1048576 but still didn't work.
Any suggestions on how to filter the cell barcodes or any ways to solve the issues would be appreciated.
Michael
Hi,
I followed the steps to analyze MAESTER data as stated here https://github.com/petervangalen/MAESTER-2021
Once I merged the whole BAM file of the scRNASeq dataset (reads for all chromosomes) and the BAM file of the MAESTER dataset (only reads for chrM) I ran maegatk this way:
maegatk bcall -b HQ_CBs.csv -c 20 -o NUEVO_maegatk -mr 3 -i GSM5534703_K562-BT142.bam -n sGSM5534703_K562-BT142 -z -ub UB -bt CB -so
I saw similar peaks in the supplementary information of the MAESTER paper (Figure 7a https://static-content.springer.com/esm/art%3A10.1038%2Fs41587-022-01210-8/MediaObjects/41587_2022_1210_MOESM1_ESM.pdf) indicating that those peaks belonged to the scRNASeq data so I must be missing something related to the processing of the MAESTER data but so far I have not found the problem.
Here is what my BAM files look like before merging both files to apply maegatk.
scRNASeq BAM (SRA identifier SRR15598773):
SRR15598773.lite.1.127471761 0 chr1 10019 1 91M * 0 0 TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAAACCCA ??????????????????????????????????????????????????????????????????????????????????????????? NH:i:4 HI:i:1 AS:i:87 nM:i:1 RG:Z:scRNASeq:0:1:unknow_flowcell:0 RE:A:I xf:i:0 CR:Z:ATCTTCATCCATCAGA CY:Z:???????????????? CB:Z:ATCTTCATCCATCAGA-1 UR:Z:TTTCTCTTAGTG UY:Z:???????????? UB:Z:TTTCTCTTAGTG
MAESTER BAM (SRA identifier SRR15598774):
SRR15598774_6928992 16 chrM 1 255 59S181M * 0 0 CTGACGGGCCATCACGCCCACACCGCCCCCACGTTCCCCTGAAATCAGACCTCCCGAGGGATCACAGGTCTATCACCCTATTAACCCCTCACGGGAGCTCTCCATGCATGTGGTATTTTCGTCTGGGGGGTGTGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTGCCTCATCCTATTATTTATCGCACCTACGTTCAATATTA ,,,,,F:FF:,FF,,,F:,F,:,,F:::,F,FFF,,::FF::,,,,,F,F,FF,FFF,,F,FF:F,F,,:,FFF:FFF,,F,,,FF,FF:,FF:F:FF::FFF,,FF:F,FFF,F,FFFFFFFFFFFFFF:FFF:,FFFFF,:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:,FFFFFFFFFFFFFFFFFF,FF,FFF:FFFFFFFFFFF,F,:,F,FFFF:FFFFFF,FFF:FFF:F NH:i:1 HI:i:1 AS:i:173 nM:i:3 CB:Z:TGGAGGATCTTGTTAC-1 UB:Z:CACTTATTGTTA
I also opened the merged BAM file from scRNASeq and MAESTER to check the coverage along the chrM. chrM is very well covered. Here is the IGV image.
I also include the content of the final directory.
final.zip
I'm not sure whether the problems comes from maegatk or any script from the MAESTER pipeline so I also opened an issue in the MAESTER github petervangalen/MAESTER-2021#6
Any help would be much appreciated.
Regards,
Sheila
Hi,
would it be possible that you provide a container with a full environment to get the pipeline running, please? A small example dataset would also help a lot.
Moreover, the pipeline parameters and flags etc are not document yet.
Thank you very much!
Hi,
Is there any doc that describes about the files generated by maegatk bcall
, especially those files under the final
directory, and what each column means in each file?
Thanks!
Hi. I ran maegatk for 4 samples. For three I got this error:
/bin/sh: /usr/bin/ls: Argument list too long
ERROR: Could not import any samples from the user specification; check flags, logs and input configuration; QUITTING
The tool split the bam file as it should, but did not create any log files or folders.
I tried maegtk and maegatk. Both failed with the same error. When I tried mgatk, it worked fine.
Hi
I am trying to run maegatk on my dataset. I have installed all the modules required as stated in the tutorial.
java, bwa, bedtools, freebayes, R (4.1.2, with data.table, Matrix, GenomicRanges, SummarizedExperiment). I am running it on python 3.7
I have tried to run the program on both the test dataset, and my own dataset using the commands below:
maegatk bcall --input $bam -o $resul_out -c $ncores -b $barcodes -mr $minReads -z
I keep getting the same error in both instances:
Mon Mar 14 15:46:24 AEST 2022: maegatk v0.1.1
Mon Mar 14 15:46:24 AEST 2022: Found bam file: Data/test_maester.bam for genotyping.
Mon Mar 14 15:46:24 AEST 2022: Will determine barcodes with at least: 100 mitochondrial reads.
Mon Mar 14 15:46:24 AEST 2022: User specified mitochondrial genome matches .bam file
Mon Mar 14 15:46:30 AEST 2022: Finished determining/splitting barcodes for genotyping.
Mon Mar 14 15:46:31 AEST 2022: Genotyping samples with 24 threads
Error in checkGrep(grep(".A.txt", files)) :
Improper folder specification; file missing / extra file present. See documentation
Calls: importMito -> checkGrep
Execution halted
I have attached the a list of all the files generated using (ls -lRh $result_folder), scatter.log, gather.log
test_result_file_list.txt
maegatk.snakemake_scatter.log.txt
maegatk.snakemake_gather.log.txt
Any help would be greatly appreciated.
Thanks
I was trying to run maegatk. It finished splitting bam file and started genotyping, but soon collapsed with the following traceback.
Tue Mar 19 14:22:24 PDT 2024: Genotyping samples with 10 threads
Traceback (most recent call last):
File "/net/module/sw/maegatk/0.2.0/bin/maegatk", line 10, in
sys.exit(main())
File "/net/module/sw/maegatk/0.2.0/lib/python3.10/site-packages/click/core.py", line 1157, in call
return self.main(*args, **kwargs)
File "/net/module/sw/maegatk/0.2.0/lib/python3.10/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/net/module/sw/maegatk/0.2.0/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/net/module/sw/maegatk/0.2.0/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/net/module/sw/maegatk/0.2.0/lib/python3.10/site-packages/maegatk/cli.py", line 300, in main
yaml.dump(dict1, yaml_file, default_flow_style=False, Dumper=yaml.RoundTripDumper)
File "/net/module/sw/maegatk/0.2.0/lib/python3.10/site-packages/ruamel/yaml/main.py", line 1251, in dump
error_deprecation('dump', 'dump', arg="typ='unsafe', pure=True")
File "/net/module/sw/maegatk/0.2.0/lib/python3.10/site-packages/ruamel/yaml/main.py", line 1039, in error_deprecation
raise AttributeError(s, name=None)
AttributeError:
"dump()" has been removed, use
yaml = YAML(typ='unsafe', pure=True)
yaml.dump(...)
instead of file "/net/module/sw/maegatk/0.2.0/lib/python3.10/site-packages/maegatk/cli.py", line 300
yaml.dump(dict1, yaml_file, default_flow_style=False, Dumper=yaml.RoundTripDumper)
Would you have advice?
Hi and thank you for that tool.
Having installed it I wonder why you hide the fact that this program also needs picard (which is missing even in your wiki page), bwa, bedtools, freebayes and even R with 4 packages?
This information would definitely also be nice in the README.md.
I'm looking to compare the calls between mgatk and maegatk. I've read the FAQ and know that they should produce very similar results. I'm currently working with scRNA-seq data from 10X Multiome kit, processed with Cellranger-ARC.
How is a comparison made? For mgatk, I can read the results into a Seurat object using ReadMGATK()
. However this function fails for maegatk output as it has more columns. Is there a way to output a Seurat object RDS like there is a Signac object RDS for mgatk?
Are there any suggestions to make a scRNA via mgatk vs a scRNA via maegatk comparison?
Ultimately I will compare scATAC via mgatk against scRNA via maegatk.
Dear community,
Thanks for developing this tool!
I am applying both mgatk and maegatk on my own MAESTER dataset.
It goes well with mgatk tenx mode but doesnt give any output from maegatk for more than 20 hours.
My maegatk commands are as followed:
maegatk bcall -i ../outs/possorted_genome_bam.bam -g ../reference/refdata-gex-GRCh38-2020-A/fasta/genome.fa -c 8 -ub UB -bt CB -z
The current out directory contains:
(venv3) (mgatk) [yiming@biomed1 maegatk]$ ls -lR
.:
total 4
drwxrwxr-x 4 yiming yiming 43 Oct 11 21:48 maegatk_out
-rw------- 1 yiming yiming 283 Oct 11 21:48 nohup.out
./maegatk_out:
total 0
drwxrwxr-x 2 yiming yiming 10 Oct 11 21:48 final
drwxrwxr-x 3 yiming yiming 35 Oct 11 21:48 temp
./maegatk_out/final:
total 0
./maegatk_out/temp:
total 0
drwxrwxr-x 2 yiming yiming 10 Oct 11 21:48 barcoded_bams
./maegatk_out/temp/barcoded_bams:
total 0
log file:
Tue Oct 11 21:48:02 HKT 2022: maegatk v0.1.1
Tue Oct 11 21:48:02 HKT 2022: Found bam file: /usersdata/yiming/VIO/mito_cellranger/sc-D08/sc-D08/outs/possorted_genome_bam.bam for genotyping.
Tue Oct 11 21:48:02 HKT 2022: Will determine barcodes with at least: 100 mitochondrial reads.
I am not sure what's wrong and hope to get some help from you. Thanks!
Hello,
I'm trying to run maegatk with the test data and provided commands but the execution fails.
$ nohup maegatk bcall -i data/test_maester.bam -o test_maester -z -so >& maegatk_test2.log &
I added the "so" option because otherwise the tools throws a different error message.
Here is part of the error message:
rule make_final_sparse_matrices:
output: /media/scratch0/20230621_UB153_scRNASeq_cell_lineage/analysis/MAESTER_scRNASeq/test_maester/final/maegatk.A.txt.gz, /media/scratch0/20230621_UB153_scRNASeq_cell_lineage/analysis/MAESTER_scRNASeq/test_maester/final/maegatk.C.txt.gz, /media/scratch0/20230621_UB153_scRNASeq_cell_lineage/analysis/MAESTER_scRNASeq/test_maester/final/maegatk.G.txt.gz, /media/scratch0/20230621_UB153_scRNASeq_cell_lineage/analysis/MAESTER_scRNASeq/test_maester/final/maegatk.T.txt.gz, /media/scratch0/20230621_UB153_scRNASeq_cell_lineage/analysis/MAESTER_scRNASeq/test_maester/final/maegatk.coverage.txt.gz
jobid: 2
reason: Missing output files: /media/scratch0/20230621_UB153_scRNASeq_cell_lineage/analysis/MAESTER_scRNASeq/test_maester/final/maegatk.coverage.txt.gz, /media/scratch0/20230621_UB153_scRNASeq_cell_lineage/analysis/MAESTER_scRNASeq/test_maester/final/maegatk.A.txt.gz, /media/scratch0/20230621_UB153_scRNASeq_cell_lineage/analysis/MAESTER_scRNASeq/test_maester/final/maegatk.C.txt.gz, /media/scratch0/20230621_UB153_scRNASeq_cell_lineage/analysis/MAESTER_scRNASeq/test_maester/final/maegatk.T.txt.gz, /media/scratch0/20230621_UB153_scRNASeq_cell_lineage/analysis/MAESTER_scRNASeq/test_maester/final/maegatk.G.txt.gz
resources: tmpdir=/tmp
[Fri Jun 30 18:51:58 2023]
rule make_depth_table:
output: /media/scratch0/20230621_UB153_scRNASeq_cell_lineage/analysis/MAESTER_scRNASeq/test_maester/final/maegatk.depthTable.txt
jobid: 1
reason: Missing output files: /media/scratch0/20230621_UB153_scRNASeq_cell_lineage/analysis/MAESTER_scRNASeq/test_maester/final/maegatk.depthTable.txt
These are the output files in the final folder:
chrM_refAllele.txt
passingBarcodes.tsv
barcodeQuants.tsv
maegatk.depthTable.txt
maegatk.T.txt.gz
maegatk.G.txt.gz
maegatk.C.txt.gz
maegatk.coverage.txt.gz
maegatk.A.txt.gz
maegatk.rds
In the documentation from mgatk I saw there are other files that should have been generated but are missed in the final output directory. *.signac.rds, *.variant_stats.tsv.gz, *.cell_heteroplasmic_df.tsv.gz, *.vmr_strand_plot.png. I've assumed these files should also be located in the final output directory but I might be wrong.
Please find attached the complete log file.
Any help would be much appreciated.
Best regards,
Sheila
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.