iCLIP_2019

This code is associated with the paper from Lee et al., "Recruitment of mRNAs to P granules by condensation with intrinsically-disordered proteins". eLife, 2020. http://dx.doi.org/10.7554/eLife.52896

ICLIP analysis for MEG-3 and PGL-1 for trim_last2.py and trim_left3.py, there two file were orginaly written by Nejc Haberman from Ule lab. https://github.com/jernejule/non-coinciding_cDNA_starts

by Sean Lee and Tu Lu ####Sometimes we found that there are null sequences information within sequencing reads. If there is a empty (null) sequence that prevent fastx_toolkits to work, Try to remove these sequences by doing the following.

replace null character with "~~": # example sequencing file named HNNM3BCXY_1_0_1.fastq sed -i.bak 's/\x0/~~/g' HNNM3BCXY_1_0_1.fastq
delete lines with "~" and 2 line below:
sed -i.bak '/~~~~~~~~~/,+2d' HNNM3BCXY_1_0_1.fastq
remove empty sequences
bioawk -cfastx 'length($seq) > 1 {print "@"$name"\n"$seq"\n+\n"$qual}' tmp.fastq > new.fastq

Try to remove adapter, demultiplexing and remove duplicated reads from PCR reaction.—iCLIP_to_hisat.sh

must have in the same folder: trim_left3.py, trim_last2.py, and barcod.txt

After unzipped the fastq files:
- fastx_clipper to clipper adapter and discrad the sequences that are shorter then 10 nt + 5 random barcode + 4 nt barcode.
  Two files are geneated: reads contain intact adapters ( e.g shorter cDNA) and reads do not contain adapter (e.g longer reads)
After remove adaptor, convert fastq to fasta to following process.
Using python code "trim_left3.py" to swap random barcodes to headers of fasta file.
demultiplex, reads with unmached barcodes will be discarded.
Using python code " Trim_last2.py" to swap the last 2 nt degenerative seq to the header
align the reads by hisat2.
filter reads with more than 2 mismatches.
remove duplicates

Use samtools to convert sam to bam

To visualize mapped reads in IGV

SAM to BAM
To sort Bam file
To index .sorted. bam files Now can load .sorted.bam onto IGV for visualzation

To count reads using HTseq for the following analysis.

Use Htseq to make genecount file. If I want to make count file using CDS and UTR separately, just need to use different reference file to HTseq-count. The basic command line is as following:

for file in ./*.sam do htseq-count -o "${file/}" "$file" [reference gtf file] > "${file/%.sam/filename.genecount}" done

After making different count table. Merge count table the further analysis.

elifesciences-publications / iclip_2019 Goto Github PK

iclip_2019's Introduction

iCLIP_2019

This code is associated with the paper from Lee et al., "Recruitment of mRNAs to P granules by condensation with intrinsically-disordered proteins". eLife, 2020. http://dx.doi.org/10.7554/eLife.52896

Try to remove adapter, demultiplexing and remove duplicated reads from PCR reaction.—iCLIP_to_hisat.sh

must have in the same folder: trim_left3.py, trim_last2.py, and barcod.txt

Use samtools to convert sam to bam

To count reads using HTseq for the following analysis.

iclip_2019's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent