Giter Club home page Giter Club logo

maegatk's People

Contributors

caleblareau avatar noranekonobokkusu avatar petervangalen avatar vincent6liu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

maegatk's Issues

Tutorial normal workflow for 10x data

Hello,

is there a tutorial describing a normal workflow for clonal analysis from 10x data (after mitochondrial read amplification following Miller et al. NatBiotech)?

Thanks, Chris

maegatk-indel: command not found...

Hi,

I was wondering the relationship between maegatk and maegatk-indel, is the output of the former the input of the latter?

In addition, after installing this package by 'pip3 install maegatk' ,I can use maegatk, but cannot use maegatk-indel (bash: maegatk-indel: command not found...). What should I do to make it work?

Thank you very much!

Test Data: AttributeError

Hi. I installed maegatk using pip in a conda environment. I also cloned the repository to a separate folder.

When I run the test, I encounter the following error.

AttributeError in line 30 of /home/grasshoff/anaconda3/envs/Maegtk_Python38/lib/python3.8/site-packages/maegtk/bin/snake/Snakefile.maegtk.Gather:
'InputFiles' object has no attribute 'depths'
  File "/home/grasshoff/anaconda3/envs/Maegtk_Python38/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 2330, in run_wrapper
  File "/home/grasshoff/anaconda3/envs/Maegtk_Python38/lib/python3.8/site-packages/maegtk/bin/snake/Snakefile.maegtk.Gather", line 30, in __rule_make_depth_table
  File "/home/grasshoff/anaconda3/envs/Maegtk_Python38/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 569, in _callback
  File "/home/grasshoff/anaconda3/envs/Maegtk_Python38/lib/python3.8/concurrent/futures/thread.py", line 57, in run
  File "/home/grasshoff/anaconda3/envs/Maegtk_Python38/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 555, in cached_or_run
  File "/home/grasshoff/anaconda3/envs/Maegtk_Python38/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 2362, in run_wrapper^[[0m

I used the following command.
maegtk bcall -i data/test_maester.bam -o test_maester -z

I'll attach the complete log file to this post.
logs.txt

"Too many open files" error when working with a scRNA-seq dataset

Hello.

I tried to reproduce the MAESTER publication but I hit some issues on having too many open file limit on a scRNA-seq dataset.

My steps are,

  • download the dataset
  • run with Cell Ranger
  • filter the bam file with mito gene only 1-SRR15598773_chrM_bam.bam
  • filter the reference fasta file to have mito gene only GRCh38_chrM.fa
  • run maegatk with the following parameters,
    • maegatk bcall -i 1-SRR15598773_chrM_bam.bam -o maegatk_outs1 -n SRR15598773-scrna -g GRCh38_chrM.fa -c 12 -bt CB -qc -ub UB -jm 60000m -mr 3 -z

However, there were some errors when splitting the bam files.

Traceback (most recent call last):
  File "/home/michael/.local/lib/python3.10/site-packages/maegatk/bin/python/split_barcoded_bam.py", line 64, in <module>
    with multi_file_manager(bambcfiles) as fopen:
  File "/usr/lib/python3.10/contextlib.py", line 135, in __enter__
    return next(self.gen)
  File "/home/michael/.local/lib/python3.10/site-packages/maegatk/bin/python/split_barcoded_bam.py", line 56, in multi_file_manager
    files = [pysam.AlignmentFile(file, "wb", template = temp) for file in files]
  File "/home/michael/.local/lib/python3.10/site-packages/maegatk/bin/python/split_barcoded_bam.py", line 56, in <listcomp>
    files = [pysam.AlignmentFile(file, "wb", template = temp) for file in files]
  File "pysam/libcalignmentfile.pyx", line 748, in pysam.libcalignmentfile.AlignmentFile.__cinit__
  File "pysam/libcalignmentfile.pyx", line 921, in pysam.libcalignmentfile.AlignmentFile._open
OSError: [Errno 24] could not open alignment file `/home/michael/maegatk_outs/temp/barcoded_bams/TTACTGTTCGGAGTGA-1.bam`: Too many open files
[E::hts_open_format] Failed to open file "/home/michael/maegatk_outs/temp/barcoded_bams/GCCAACGAGGTAGACC-1.bam" : Too many open files

I've already tried to increase the open file limit in the system to 1048576 but still didn't work.
Any suggestions on how to filter the cell barcodes or any ways to solve the issues would be appreciated.

Michael

No coverage along the mitochondrial chromosome with published MAESTER data

Hi,
I followed the steps to analyze MAESTER data as stated here https://github.com/petervangalen/MAESTER-2021
Once I merged the whole BAM file of the scRNASeq dataset (reads for all chromosomes) and the BAM file of the MAESTER dataset (only reads for chrM) I ran maegatk this way:
maegatk bcall -b HQ_CBs.csv -c 20 -o NUEVO_maegatk -mr 3 -i GSM5534703_K562-BT142.bam -n sGSM5534703_K562-BT142 -z -ub UB -bt CB -so

Here is the resulting plot:
maester

I saw similar peaks in the supplementary information of the MAESTER paper (Figure 7a https://static-content.springer.com/esm/art%3A10.1038%2Fs41587-022-01210-8/MediaObjects/41587_2022_1210_MOESM1_ESM.pdf) indicating that those peaks belonged to the scRNASeq data so I must be missing something related to the processing of the MAESTER data but so far I have not found the problem.

Here is what my BAM files look like before merging both files to apply maegatk.
scRNASeq BAM (SRA identifier SRR15598773):
SRR15598773.lite.1.127471761 0 chr1 10019 1 91M * 0 0 TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAAACCCA ??????????????????????????????????????????????????????????????????????????????????????????? NH:i:4 HI:i:1 AS:i:87 nM:i:1 RG:Z:scRNASeq:0:1:unknow_flowcell:0 RE:A:I xf:i:0 CR:Z:ATCTTCATCCATCAGA CY:Z:???????????????? CB:Z:ATCTTCATCCATCAGA-1 UR:Z:TTTCTCTTAGTG UY:Z:???????????? UB:Z:TTTCTCTTAGTG

MAESTER BAM (SRA identifier SRR15598774):
SRR15598774_6928992 16 chrM 1 255 59S181M * 0 0 CTGACGGGCCATCACGCCCACACCGCCCCCACGTTCCCCTGAAATCAGACCTCCCGAGGGATCACAGGTCTATCACCCTATTAACCCCTCACGGGAGCTCTCCATGCATGTGGTATTTTCGTCTGGGGGGTGTGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTGCCTCATCCTATTATTTATCGCACCTACGTTCAATATTA ,,,,,F:FF:,FF,,,F:,F,:,,F:::,F,FFF,,::FF::,,,,,F,F,FF,FFF,,F,FF:F,F,,:,FFF:FFF,,F,,,FF,FF:,FF:F:FF::FFF,,FF:F,FFF,F,FFFFFFFFFFFFFF:FFF:,FFFFF,:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:,FFFFFFFFFFFFFFFFFF,FF,FFF:FFFFFFFFFFF,F,:,F,FFFF:FFFFFF,FFF:FFF:F NH:i:1 HI:i:1 AS:i:173 nM:i:3 CB:Z:TGGAGGATCTTGTTAC-1 UB:Z:CACTTATTGTTA

I also opened the merged BAM file from scRNASeq and MAESTER to check the coverage along the chrM. chrM is very well covered. Here is the IGV image.
IGV_MAESTER

I also include the content of the final directory.
final.zip

I'm not sure whether the problems comes from maegatk or any script from the MAESTER pipeline so I also opened an issue in the MAESTER github petervangalen/MAESTER-2021#6

Any help would be much appreciated.

Regards,

Sheila

Container & documentation

Hi,

would it be possible that you provide a container with a full environment to get the pipeline running, please? A small example dataset would also help a lot.
Moreover, the pipeline parameters and flags etc are not document yet.

Thank you very much!

Error: Argument list too long

Hi. I ran maegatk for 4 samples. For three I got this error:

/bin/sh: /usr/bin/ls: Argument list too long
ERROR: Could not import any samples from the user specification; check flags, logs and input configuration; QUITTING

The tool split the bam file as it should, but did not create any log files or folders.

I tried maegtk and maegatk. Both failed with the same error. When I tried mgatk, it worked fine.

Error in checkGrep(grep(".A.txt", files)) when running maegatk

Hi
I am trying to run maegatk on my dataset. I have installed all the modules required as stated in the tutorial.
java, bwa, bedtools, freebayes, R (4.1.2, with data.table, Matrix, GenomicRanges, SummarizedExperiment). I am running it on python 3.7

I have tried to run the program on both the test dataset, and my own dataset using the commands below:

maegatk bcall --input $bam -o $resul_out -c $ncores -b $barcodes -mr $minReads -z

I keep getting the same error in both instances:

Mon Mar 14 15:46:24 AEST 2022: maegatk v0.1.1
Mon Mar 14 15:46:24 AEST 2022: Found bam file: Data/test_maester.bam for genotyping.
Mon Mar 14 15:46:24 AEST 2022: Will determine barcodes with at least: 100 mitochondrial reads.
Mon Mar 14 15:46:24 AEST 2022: User specified mitochondrial genome matches .bam file
Mon Mar 14 15:46:30 AEST 2022: Finished determining/splitting barcodes for genotyping.
Mon Mar 14 15:46:31 AEST 2022: Genotyping samples with 24 threads
Error in checkGrep(grep(".A.txt", files)) :
Improper folder specification; file missing / extra file present. See documentation
Calls: importMito -> checkGrep
Execution halted

I have attached the a list of all the files generated using (ls -lRh $result_folder), scatter.log, gather.log

test_result_file_list.txt
maegatk.snakemake_scatter.log.txt
maegatk.snakemake_gather.log.txt

Any help would be greatly appreciated.

Thanks

Error while running maegatk

I was trying to run maegatk. It finished splitting bam file and started genotyping, but soon collapsed with the following traceback.

Tue Mar 19 14:22:24 PDT 2024: Genotyping samples with 10 threads
Traceback (most recent call last):
File "/net/module/sw/maegatk/0.2.0/bin/maegatk", line 10, in
sys.exit(main())
File "/net/module/sw/maegatk/0.2.0/lib/python3.10/site-packages/click/core.py", line 1157, in call
return self.main(*args, **kwargs)
File "/net/module/sw/maegatk/0.2.0/lib/python3.10/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/net/module/sw/maegatk/0.2.0/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/net/module/sw/maegatk/0.2.0/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/net/module/sw/maegatk/0.2.0/lib/python3.10/site-packages/maegatk/cli.py", line 300, in main
yaml.dump(dict1, yaml_file, default_flow_style=False, Dumper=yaml.RoundTripDumper)
File "/net/module/sw/maegatk/0.2.0/lib/python3.10/site-packages/ruamel/yaml/main.py", line 1251, in dump
error_deprecation('dump', 'dump', arg="typ='unsafe', pure=True")
File "/net/module/sw/maegatk/0.2.0/lib/python3.10/site-packages/ruamel/yaml/main.py", line 1039, in error_deprecation
raise AttributeError(s, name=None)
AttributeError:
"dump()" has been removed, use

yaml = YAML(typ='unsafe', pure=True)
yaml.dump(...)

instead of file "/net/module/sw/maegatk/0.2.0/lib/python3.10/site-packages/maegatk/cli.py", line 300

			yaml.dump(dict1, yaml_file, default_flow_style=False, Dumper=yaml.RoundTripDumper)

Would you have advice?

The installation in your README.md is missleading

Hi and thank you for that tool.
Having installed it I wonder why you hide the fact that this program also needs picard (which is missing even in your wiki page), bwa, bedtools, freebayes and even R with 4 packages?

This information would definitely also be nice in the README.md.

Comparing mgatk and maegatk

I'm looking to compare the calls between mgatk and maegatk. I've read the FAQ and know that they should produce very similar results. I'm currently working with scRNA-seq data from 10X Multiome kit, processed with Cellranger-ARC.

How is a comparison made? For mgatk, I can read the results into a Seurat object using ReadMGATK(). However this function fails for maegatk output as it has more columns. Is there a way to output a Seurat object RDS like there is a Signac object RDS for mgatk?

Are there any suggestions to make a scRNA via mgatk vs a scRNA via maegatk comparison?
Ultimately I will compare scATAC via mgatk against scRNA via maegatk.

No output and intermediate file

Dear community,

Thanks for developing this tool!
I am applying both mgatk and maegatk on my own MAESTER dataset.
It goes well with mgatk tenx mode but doesnt give any output from maegatk for more than 20 hours.
My maegatk commands are as followed:

maegatk bcall -i ../outs/possorted_genome_bam.bam -g ../reference/refdata-gex-GRCh38-2020-A/fasta/genome.fa -c 8 -ub UB -bt CB -z

The current out directory contains:

(venv3) (mgatk) [yiming@biomed1 maegatk]$ ls -lR
.:
total 4
drwxrwxr-x 4 yiming yiming  43 Oct 11 21:48 maegatk_out
-rw------- 1 yiming yiming 283 Oct 11 21:48 nohup.out

./maegatk_out:
total 0
drwxrwxr-x 2 yiming yiming 10 Oct 11 21:48 final
drwxrwxr-x 3 yiming yiming 35 Oct 11 21:48 temp

./maegatk_out/final:
total 0

./maegatk_out/temp:
total 0
drwxrwxr-x 2 yiming yiming 10 Oct 11 21:48 barcoded_bams

./maegatk_out/temp/barcoded_bams:
total 0

log file:

Tue Oct 11 21:48:02 HKT 2022: maegatk v0.1.1
Tue Oct 11 21:48:02 HKT 2022: Found bam file: /usersdata/yiming/VIO/mito_cellranger/sc-D08/sc-D08/outs/possorted_genome_bam.bam for genotyping.
Tue Oct 11 21:48:02 HKT 2022: Will determine barcodes with at least: 100 mitochondrial reads.

I am not sure what's wrong and hope to get some help from you. Thanks!

Missing output files from test data

Hello,
I'm trying to run maegatk with the test data and provided commands but the execution fails.

$ nohup maegatk bcall -i data/test_maester.bam -o test_maester -z -so >& maegatk_test2.log &

I added the "so" option because otherwise the tools throws a different error message.

Here is part of the error message:
rule make_final_sparse_matrices:
output: /media/scratch0/20230621_UB153_scRNASeq_cell_lineage/analysis/MAESTER_scRNASeq/test_maester/final/maegatk.A.txt.gz, /media/scratch0/20230621_UB153_scRNASeq_cell_lineage/analysis/MAESTER_scRNASeq/test_maester/final/maegatk.C.txt.gz, /media/scratch0/20230621_UB153_scRNASeq_cell_lineage/analysis/MAESTER_scRNASeq/test_maester/final/maegatk.G.txt.gz, /media/scratch0/20230621_UB153_scRNASeq_cell_lineage/analysis/MAESTER_scRNASeq/test_maester/final/maegatk.T.txt.gz, /media/scratch0/20230621_UB153_scRNASeq_cell_lineage/analysis/MAESTER_scRNASeq/test_maester/final/maegatk.coverage.txt.gz
jobid: 2
reason: Missing output files: /media/scratch0/20230621_UB153_scRNASeq_cell_lineage/analysis/MAESTER_scRNASeq/test_maester/final/maegatk.coverage.txt.gz, /media/scratch0/20230621_UB153_scRNASeq_cell_lineage/analysis/MAESTER_scRNASeq/test_maester/final/maegatk.A.txt.gz, /media/scratch0/20230621_UB153_scRNASeq_cell_lineage/analysis/MAESTER_scRNASeq/test_maester/final/maegatk.C.txt.gz, /media/scratch0/20230621_UB153_scRNASeq_cell_lineage/analysis/MAESTER_scRNASeq/test_maester/final/maegatk.T.txt.gz, /media/scratch0/20230621_UB153_scRNASeq_cell_lineage/analysis/MAESTER_scRNASeq/test_maester/final/maegatk.G.txt.gz
resources: tmpdir=/tmp

[Fri Jun 30 18:51:58 2023]
rule make_depth_table:
output: /media/scratch0/20230621_UB153_scRNASeq_cell_lineage/analysis/MAESTER_scRNASeq/test_maester/final/maegatk.depthTable.txt
jobid: 1
reason: Missing output files: /media/scratch0/20230621_UB153_scRNASeq_cell_lineage/analysis/MAESTER_scRNASeq/test_maester/final/maegatk.depthTable.txt

These are the output files in the final folder:
chrM_refAllele.txt
passingBarcodes.tsv
barcodeQuants.tsv
maegatk.depthTable.txt
maegatk.T.txt.gz
maegatk.G.txt.gz
maegatk.C.txt.gz
maegatk.coverage.txt.gz
maegatk.A.txt.gz
maegatk.rds

In the documentation from mgatk I saw there are other files that should have been generated but are missed in the final output directory. *.signac.rds, *.variant_stats.tsv.gz, *.cell_heteroplasmic_df.tsv.gz, *.vmr_strand_plot.png. I've assumed these files should also be located in the final output directory but I might be wrong.

Please find attached the complete log file.

Any help would be much appreciated.

Best regards,

Sheila

maegatk_test2.log

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.