View Code? Open in Web Editor
NEW
Predict progenitor sequence of fungal repeat families by correcting for RIP-like mutations and cytosine deamination events. Mask RIP or deamination events from alignments.
License: MIT License
Python 96.98%
Dockerfile 3.02%
derip2's People
Contributors
derip2's Issues
Split related functions out into modules
Add option to allow user to specify which sequence in the input alignment should be used to fill non-corrected positions in the output sequence.
The current behaviour is to fill from the most G/C rich input sequence.
Add track to use as a guide for RIP masking in the final alignment.
i.e.
GATCAGGGTA
AATTAGG-CA
GATTAGG-TA
GATCAGG-CA
- deRIP'd
---X----X-
- Corrected positions
Migrate setup.py to pytoml.
Change package dir structure to standard python format.
Add more detailed stderr reporting:
Report name and path of any output files being written.
Add time trackers for longer tasks
Report criteria used to select filler seq
Report total count of invariant columns in alignment
Report total non-fixed positions in alignment
Report sequences (name and row index) with max and min RIP / GC
In addition to RIP and non-RIP CDA counts, report count of remaining variable positions not accounted for.
Default behaviours:
Print corrected seq to stdout instead of writing to file
Add github action to push package to pypi with new release
Add logging levels and informative progress messages
Refactor code
Add type hints
Add descriptive comments.
Add test cases for core alignment parsing functions.
Add minimal testing datasets.
Add Github action to run tests before merging PRs
Add config settings + Dockerfiles for Codespaces and Gitpod.