Giter Club home page Giter Club logo

nanoem's Introduction

NanoEM

Codes for analyzing nanoEM data.

Reference

https://academic.oup.com/nar/article/49/14/e81/6279847

Requirements

  • Python3
  • pysam
  • minimap2
  • sambamba
  • samtools

Usage

Base conversion of reference genome

You should change "ref" to your reference genome.

python src/convert_ref.py ref > output.fa 

Base conversion of nanoEM data

OUTPUT:

  • *_CT.fq.gz: C-to-T converted fastq
  • *_GA.fq.gz: G-to-A converted fastq
python src/convert_reads.py fastq

Mapping converted data to converted reference genome

minimap2 -t 8 --split-prefix temp_sam1 -ax map-ont output.fa  *_CT.fq.gz --eqx | samtools view -b | samtools sort -@ 8 -o 1.sorted.bam
samtools index 1.sorted.bam

minimap2 -t 8 --split-prefix temp_sam2 -ax map-ont output.fa  *_GA.fq.gz --eqx | samtools view -b | samtools sort -@ 8 -o 2.sorted.bam
samtools index 2.sorted.bam

You can perform this step without the '--split-prefix' option if you are using minimap2 downloaded by 'git clone' after April 7, 2023. It's faster.

Choosing best alignments

python src/best_align.py --bam1 1.sorted.bam --bam2 2.sorted.bam  --fastq fastq
samtools view -b output_CT.sam | samtools sort -o output_CT.sorted.bam
samtools view -b output_GA.sam | samtools sort -o output_GA.sorted.bam
samtools index output_CT.sorted.bam
samtools index output_GA.sorted.bam

Methylation calling

sambamba mpileup output_CT.sorted.bam -L cpg_sites.bed -o pileup_CT.tsv -t 8 --samtools -f ref
sambamba mpileup output_GA.sorted.bam -L cpg_sites.bed -o pileup_GA.tsv -t 8 --samtools -f ref
python src/call_methylation.py pileup_CT.tsv pileup_GA.tsv > frequency_methylation.tsv

Visualization for bisulfite mode of IGV

python script/vis_GA_utilities.py -b output_GA.sorted.bam | samtools view -b | samtools sort -@ 4 -o output_GA_vis.sorted.bam
samtools index output_GA_vis.sorted.bam

samtools merge output_merge.bam output_CT.sorted.bam output_GA_vis.sorted.bam
samtools sort -@ 4 -o output_merge.sorted.bam output_merge.bam
samtools index output_merge.sorted.bam

nanoem's People

Contributors

yos-sk avatar

Stargazers

 avatar Andreas Evenstad avatar kmanjor avatar  avatar johnsonz avatar

Watchers

 avatar

nanoem's Issues

question about trimming adapter

Hi, thanks for sharing this useful tool.

Can I ask which tool do you use to trim adapter for nanoEM dataset before mapping to the reference genome?

Questions regarding step 4 & 5; sambamba-pileup: Unexpected 'f' when converting from type string to type long

Hi,
thank you for the very useful tool!

I have to questions:

In step 4 it says: python src/best_align.py --bam1 1.sorted.bam --bam2 2.sorted.bam --fastq fastq
by --fastq fastq, do you mean the original readout from nanopore, previously called "1d_pass.fq.gz" or something else?

In step 5 $ sambamba mpileup output_CT.sorted.bam -L cpg_sites.bed -o pileup_CT.tsv -t 6 --samtools -f hg19.fa

I always get the error: sambamba-pileup: Unexpected 'f' when converting from type string to type long
Has anybody experienced the same error?
The only way I found to solve it was to omit the "-L cpg_sites.bed" which is no final working solution.
Which cpg_sites.bed file did you use?

I appreciate any suggestions,
kind regards

[main_samview] fail to read the header from "-".

Hi, thank you for the tool. I have been trying to use it but I don't seem to get past mapping.

If I use minimap2/2.22 it goes incredibly slowly for a 24Gb fastq that should be mapped relatively faster, and haven't manage to make it finish before being killed. And with version 2.17-r941 that is the main one I use I keep having the same error:

[M::main::8.8520.60] loaded/built the index for 34 target sequence(s)
[M::mm_mapopt_update::9.007
0.61] mid_occ = 21211
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 34
[M::mm_idx_stat::9.115*0.62] distinct minimizers: 7329827 (9.49% are singletons); average occurrences: 107.956; average spacing: 5.271
[main_samview] fail to read the header from "-".
samtools sort: failed to read header from "-"

I've explored my fastq files and the genome and they seem normal, not sure what might be happening. Any idea what might be going wrong?

Thanks,
Cora

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.