Giter Club home page Giter Club logo

withncbi's Introduction

Table of Contents

Name

withncbi - egaz and alignDB work with external (NCBI/EBI) data.

Purpose

Fetch sequences, generate reports and build alignments according to various NCBI databases.

For more detailed, check README.md in each sub-directories.

Directory organization

  • db/: turn NCBI genome reports and assembly reports into a query-able MySQL database.

  • ensembl/: Ensembl related scripts.

  • misc/: miscellaneous projects.

  • pop/: build alignments on an whole Eukaryotes genus.

  • taxon/: process (small) genomes according to NCBI Taxonomy.

  • util/: miscellaneous utilities.

Conventions

fasta

  • .fa - genomic sequences
  • .fas - blocked fasta files
  • .fasta - normal/miscellaneous fasta files

fastq

Use .fq over .fastq

Concepts

IntSpans

An IntSpan represents sets of integers as a number of inclusive ranges, for example '1-10,19,45-48'.

The following picture is the schema of an IntSpan object. Jump lines are above the baseline; loop lines are below it.

intspans

AlignDB::IntSpan and jintspan are implements of IntSpan objects in Perl and Java, respectively.

Positions

Examples in S288c.txt

I:1-100
I(+):90-150
S288c.I(-):190-200
II:21294-22075
II:23537-24097

positions

Simple rules:

  • chromosome and start are required
  • species, strand and end are optional
  • . to separate species and chromosome
  • strand is one of + and - and surround by round brackets
  • : to separate names and digits
  • - to separate start and end
  • names should be alphanumeric and without spaces
species.chromosome(strand):start-end
--------^^^^^^^^^^--------^^^^^^----

Runlists in YAML

App::RL

jrunlist

Blocked fasta files

Examples in example.fas

>S288c.I(+):13267-13287|species=S288c
TCGTCAGTTGGTTGACCATTA
>YJM789.gi_151941327(-):5668-5688|species=YJM789
TCGTCAGTTGGTTGACCATTA
>RM11.gi_61385832(-):5590-5610|species=RM11
TCGTCAGTTGGTTGACCATTA
>Spar.gi_29362400(+):2477-2497|species=Spar
TCATCAGTTGGCAAACCGTTA

blocked-fasta-files

App::Fasops

Ranges and links of ranges

App::Rangeops

jrange

Author

Qiang Wang <[email protected]>

Copyright and license

This software is copyright (c) 2015 by Qiang Wang.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.

withncbi's People

Contributors

wang-q avatar guo-wn avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.