Giter Club home page Giter Club logo

sharepg's Introduction

sharepg - analysing shared regions in pangenomes

Note

Want to contribute? Feel free to open a PR on an issue about a missing, buggy or incomplete feature!

Purpose

Tool to analyse shared sequences between populations in pangenomes. This small command-line tool aims to check regions in a graph that are shared by a collection of genomes, and that are not traversed by another collection.

It can extract:

  • Nodes that matches a criterion
  • Genomic intervals on each sequences
  • Bubbles with intervals

Note

Bubbles detection is made possible thanks to BubbleGun, Fawaz Dabbaghie, Jana Ebler, Tobias Marschall, BubbleGun: enumerating bubbles and superbubbles in genome graphs, Bioinformatics, Volume 38, Issue 17, September 2022, Pages 4217โ€“4219, https://doi.org/10.1093/bioinformatics/btac448

Installation

Installation is made using pip with the provided setup.py file.

git clone https://github.com/Tharos-ux/sharepg.git
git pull
pip install --upgrade pip
python -m pip install .

Once installed, command-line become available out-of-the-box. You can check everything went well by printing the manpage with sharepg -h.

Available commands

Let's assume you happen to have:

  • a GFA graph, uncompressed or not
  • a list A of paths/walks names you want to follow (g0, g1, g2)
  • a list B of paths/walks names you want to avoid (g3, g4)
  • a reference name (g0)
  • you want to strictly have all paths/walks you follow and not any of the ones you want to avoid (threshold=1.)
# Extract nodes
sharepg disnodes graph.gfa -a g0 g1 g2 -b g3 g4 -r g0 -t 1 > disnodes.log
# Extract intervals
sharepg disintervals graph.gfa -a g0 g1 g2 -b g3 g4 -r g0 -t 1 > disintervals.log
# Extract bubbles
sharepg disbubbles graph.gfa -a g0 g1 g2 -b g3 g4 -r g0 -t 1 > disbubbles.log

Note

Threshold parameter is a float within range 0. to 1.; it represents the minimum proportion of genomes that needs to be in A and in the observed region and 1-threshold maximum proportion of genomes of B that can go through this same region. For instance, a threshold of 1. says that for a segment to be selected it needs to have all genomes of A and no genomes from B. A threshold of .7 says that at least 70% of A must go through the region and at most 30% of B can go through he region for it to be selected.

sharepg's People

Contributors

tharos-ux avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.