Giter Club home page Giter Club logo

libraryqc's Introduction

LibraryQC

Description

A script for sgRNA extraction and quality control.

Getting started

Clone this repo:

git clone https://github.com/Jianhua-Wang/LibraryQC

set up conda environment and activate the environment:

cd LibraryQC
conda env create -f libraryqccondaenv.yml
conda activate libraryqc

Usage

(base) ➜  LibraryQC git:(master) ✗ python libraryqc.py -h    
usage: Librayqc using pattern matching to extract the sgRNA from pair-end sequencing data.

python libraryqc.py -i input/file_path.csv -o output -v input/vector.fa -l
input/library.csv

optional arguments:
  -h, --help       show this help message and exit
  -i , --input     csv file contains the sampl name and fq1, fq2 file path of
                   samples
  -o , --output    directory of output file
  -v , --vector    the fasta file of your vector, containing two sequences, 5'
                   and 3'.
  -l , --library   csv file of sgRNA you designed
  -s , --shot      the number of bases upstream and downstream of sgRNA you
                   exptected to match. More shot might reduce the false
                   positive. default=4

Input

1. Path of fastq files

The path of fastq files are specified in a csv file, for example:

sample_name fq_1 fq_2
day0 ./input/data/day0_1.fq.gz ./input/data/day0_1.fq.gz
day14 ./input/data/day14_1.fq.gz ./input/data/day14_1.fq.gz
vf ./input/data/vf_1.fq.gz ./input/data/vf_1.fq.gz

2. sgRNA library

The library you designed or download from other website, for example:

id sequence note
1 ATAGGCACACATGAAGCGGA
2 TTTGCTGATAACTAGATCTA
3 TTGCAGGCCGCGATCTGTGC
4 GGTGGAGACTCCGAGTGTAG
5 ACTTCAAGTACGAGAACCAG -
6 AATTGCATTCGGTTTCTATC -
7 GAGCTTTTTGGGGTGTGACC +
8 AGCCGGTGTTAGTAAGAAAT +
... ... ...

3. Sequence of vector

For non-directional sequencing library, we need to know the sequence vector without sgRNA to determine the strand of extracted sgRNA. The sequence should be split into two end at the insertion site of sgRNA, one end should be named as 5 and another is 3.

>5
ccagaGagggcctatttcccatgattccttcatatttgcatatacgatacaaggctgttagagagataattagaattaatttgactgtaaacacaaagatattagtacaaaatacgtgacgtagaaagtaataatttcttgggtagtttgcagttttaaaattatgttttaaaatggactatcatatgcttaccgtaacttgaaagtatttcgatttcttggctttatatatcttgtggaaaggacgaaacaccg
>3
gttttagagctagaaatagcaagttaaaataaggctagtccgttatcaacttgaaaaagtggcaccgagtcg

Output

Extracted sgRNA for each sample and an HTML report for quality control.

The HTML report was powered by Multiqc, see the example in output directory.

Note

  1. This script only suits pair-end non-directional sequencing data and the insertion site of sgRNA is near 3' end.

libraryqc's People

Contributors

jianhua-wang avatar

Stargazers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.