The phip-seq-analyzer from viswam78

######################### PhIP-seq Analyzer

Demo version Date: 20170524 #########################

apply for a server node

#interact -p parallel -n 24 -t 24:0:0

################ Data preparation ################

It is supposed that the home folder is ./, and the sequencing files determined by sequencing analyzer were put in the fold ./raw/. so they are fastq file: ./raw/PhIPseq_R1.fastq.gz index file: ./raw/PhIPseq_I1.fastq.gz barcode file: ./raw/Sample-Barcode.txt
The splitted fastq files would be stored into the directory ~/phip/PhIP-seq_Analyzer/test_rawdata/
The running mode would be phipseq analysis mode, so the output folders would be ./test_human and ./test_virus. So the last parameter should be -y ./test
enter the folder known as PhIP-seaq_Analyzer

######################### 1: Demultiplex FASTQ file #########################

Example 1: Mostly used function involves demultiplexing and generating sample_info.csv and variables.txt run the command line like the below:

$ python3 ./bin/bioTreatFASTQ.py -i ./raw/PhIPseq_I1.fastq.gz -f ./raw/PhIPseq_R1.fastq.gz -b ./raw/Sample-Barcode.txt -o ./test_rawdata/ -y ./test

Once the proceduce is done, some folders and files would be created: There would be many fastq files in ./test_rawdata/, and the file names would be determined by the barcode file. The folders known as ./test_human and ./test_virus would be created, of which has sample_info.csv and variables.txt Note: the pipeline only accept the absolute path of a directory or file

Example 2: trim nucleotides when demultiplexing, let say sequencing cycles is 50nt trim 10nt from 3-end or keep the first 40nt: -r 40 remove 10nt from 5-end and keep the left: -t 10 trim 5nt from 5-end and keep 40nt and discard the left: -t 5 -r 35

Example 3: The length of exported reads in FASTQ should be kept equal determined by the Sequencing Analyzer. In some cases, They are not because no quality filtering apply. We could specify -l 100nt. That option would discard all reads shorter than 100nt when demultiplexing.

Example 4: Demultiplexing step can be skipped, and directly get sample_info.csv and variables.txt. Here *.fastq files were stored at ./test_rawdata/

$ python3 ./bin/bioTreatFASTQ.py -o ./test_rawdata/ -y ./test

Example 5: skip demultiplexing step, but trim fastq reads only. Here, all fastq files were put in the ./test_rawdata/, and the trimmed fastq files (remove 40nt from 3-end of each read with 100nt) were saved into ./trim_rawdata/

$ python3 ./bin/bioTreatFASTQ.py -r 60 -x ./test_rawdata/ -o./trim_rawdata/ -y ./test

Example 5: The default reference peptide libraries are human and virus. There are additional peptide library known as allergome(allergic peptides) and PE(public epitopes) -c human,virus -c virus,allergome,PE

################### 2: phipseq analysis ################### run the command line like the below:

#human library $ python3 ./bin/bioPHIPseq.py ./test_human/variables.txt #virus library $ python3 ./bin/bioPHIPseq.py ./test_virus/variables.txt

################### Requirements ###################

The barcode file:

barcode file should be *.txt seperated by tab. The first and second columns should be barcode sequences and sample names, respectively.
Regarding sample names, avoid some characters namely slash(/ or ), asterisk(*), at sign(@), any brackets or white space. And the characters dash(-), underscore(_), or dot(.) are acceptable.
No while line is allowed.

The reads file and index file

FASTQ format
Both of the files should be matched and applied together.
support compressed format with *.gz

Running environments

Linux
Python 3.4.0 above

###################### ERROR Handling ###################### ERROR 1: Mac OS X: ValueError: unknown locale: UTF-8 in Python

Resolution: If you have faced the error on MacOS X, here's the quick fix - add these lines to your ~/.bash_profile: export LC_ALL=en_US.UTF-8 export LANG=en_US.UTF-8 end then reload bash_profile: # source ~/.bash_profile

ERROR 2: Traceback (most recent call last): File "./bin/bioTreatFASTQ.py", line 141, in myGenome.genome(par['fq_file']).demultiplex_fq(par) File "/home/yuan/phip/PhIP-Seq_Analyzer/bin/myGenome.py", line 222, in demultiplex_fq for L1,La, L2,Lb, L3,Lc, L4,Ld in itertools.zip_longest(*[F1,F2]*4): AttributeError: 'module' object has no attribute 'zip_longest'

Resolution: python3 instead of python2.

#end

viswam78 / phip-seq-analyzer Goto Github PK

phip-seq-analyzer's Introduction

phip-seq-analyzer's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent