Giter Club home page Giter Club logo

syn-cpg-spacer's Introduction

Syn-CpG-Spacer

DOI GitHub Release GitHub License

Syn-CpG-Spacer is a Progressive Web App (PWA) for biomedical scientists written in Python using Panel, Bokeh, Biopython libraries. It allows for synonymous recoding of genetic sequences to increase the frequency of CpG dinucleotides by setting constraints on their spacing. The primary usecase are experiments with attenuation of viruses.

The software changes codons along a sequence to synonymous alternatives that form CpG dinucleotides according to the user's settings. This can be done at codon positions 1-2, 2-3 and 3-1 (split over two subsequent codons).

Using Panel's Pyodide integration, the app is hosted on GitHub Pages in this repository and is available on the following address:

https://oleksulkowski.github.io/Syn-CpG-Spacer/app/

Installation

In browsers such as Chrome, Safari or Edge, it is possible to install the app onto your machine for offline use by clicking the browser prompt after opening the link. An installed app will download and apply updates automatically when they become available.

Usage

The software allows the user to load their own FASTA sequence or to use a pre-loaded sample sequence (part of HIV-1 Gag).

Important: The loaded sequence must start in-frame, contain only codons present in the codon table and be of a length divisible by 3. If more than one sequence is present in the loaded FASTA file, they must all be of equal length. Only the sequence at the top of the file will be recoded.

The user can then either set a minimum gap between newly added CpG's or set a desired average gap between CpG's. With the latter option, the software will find a minimum gap that will result in as close a possible average gap to the user's setting using a binary search algorithm.

The program allows protecting a set number of initial and final nucleotides from changes, which might be biologically relevant. As increasing the CpG content can decrease the frequency of A in a sequence, the user can also decide to make the remaining sequence synonymously A-rich after CpG's have been added.

Every new recoded sequence requires input of a unique ID. The sequences are displayed on an interactive alignment view that highlights CpG dinucleotides. A table shows statistical data. The user can adjust the settings and compare the sequences. When finished, the user can download the outputs as a FASTA file.

Algorithm outline

  1. The user configures the minimum CpG gap, protected terminal nucleotide length and chooses whether to make the sequence A-rich after adding CpG's.
    • If the user sets a target average gap, a binary search algorithm will perform the steps below to find a minimum CpG gap that results in the closest average CpG gap to the desired one.
  2. Codon instances are generated for every codon along the sequence. It is checked whether the codon already contains a CpG or forms a split CpG with the next codon.
  3. It is determined which codons can potentially be transformed into CpG-forming alternatives based on their position in the sequence. The criterium is being at least the minimum CpG gap away from existing CpG's.
  4. The initial and final number of nucleotides are protected against changes, if specified by the user.
  5. Codons are mutated to synonymous CpG-forming alternatives along the sequence. Minimum CpG gap between newly added CpG's is ensured.
  6. The sequences' synonymity is checked, along with the preservation of terminal signals and adherence to the minimum gap settings.
  7. If the A-enrichment option is selected, the rest of the sequence is synonymously recoded into more A-rich codons, without impacting CpG's.
  8. The same checks as those described in step 6 are performed.

Development

Use the environment.yml file to create an environment with all the dependencies:

conda env create -f environment.yml
conda activate Syn-CpG-Spacer

As per Panel documentation, develop locally in index.py using

panel serve index.py --autoreload

After making changes, convert index.py to the Pyodide PWA:

panel convert index.py --to pyodide-worker --out docs/app --title Syn-CpG-Spacer --pwa

You can run the Pyodide app locally on http://localhost:8000/docs/app by using

python3 -m http.server

Tests

Syn-CpG-Spacer uses Pytest for checking if code changes introduced errors into the recoding algorithm by comparing the new output to a set of validated sequences. This is hooked up to Github Actions CI. Run the tests using

pytest

Within the app, each algorithm run is checked to ensure correct application of user-defined variables.

Community contributions

Please use the issues tab for bug reports and feature requests.

Acknowledgements

The Bokeh sequence viewer is based on code by Damien Farrell (@dmnfarrell).

syn-cpg-spacer's People

Contributors

oleksulkowski avatar

Stargazers

 avatar

Watchers

 avatar

syn-cpg-spacer's Issues

Classes into a file

You may consider putting your main classes (Codon and Gene) into a separate file. This will make your main index.py module more readable and make future code support easier.

Constants into a separate file

You may consider putting your constants ( dna_to_pro, DIC, DIC_for_A_rich, DIC_for_split) into a separate file. This will make your main index.py module more readable and make future code support easier.

Explanation of Prepare clear input file

Currently when the input doesn't start with reading frame, your tool throws an exception: Your file contains invalid codons.

It can confuse an inexperienced user.

I would suggest the following options:
1 you can add into the readme an explanation of how to prepare the input file
2 you can improve the raised message and make it more clear so that user understands how to fix the input

I believe this is currently in the line 247 raise Exception("Your file contains invalid codons.")

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.