adraismawur / big-scape Goto Github PK
View Code? Open in Web Editor NEWThis project forked from medema-group/big-scape
Similarity networks of biosynthetic gene clusters
License: GNU Affero General Public License v3.0
This project forked from medema-group/big-scape
Similarity networks of biosynthetic gene clusters
License: GNU Affero General Public License v3.0
When a user runs BiG-SCAPE
Then pyhmmer should be used for finding and aligning protein domains
So that a user does not need to install these dependencies beforehand
Tasks:
Reported by Jorge
Bug is caused due to refactoring of file paths & run data storage
Integrate as much of BiG-SLICE as possible in the application. There are a number of things in BiG-SLICE that can improve the application:
From #25 task "Clean up schema"
Clean the sqlite schema, removing any tables or indices that are not used in the actual workflow.
A module used during distance calculation is printing something to console. This does not occur in the file logs.
Excerpt from log:
2022-02-03 13:29:46,924 INFO NRPS (616 BGCs)
2022-02-03 13:29:46,925 INFO Writing annotation files
2022-02-03 13:29:46,929 INFO Calculating all pairwise distances
2022-02-03 13:30:42,530 INFO Removing 550 non-relevant MIBiG BGCs
2022-02-03 13:30:42,530 INFO Writing output files
2022-02-03 13:30:42,870 INFO Calling Gene Cluster Families
2022-02-03 13:30:43,421 INFO Cutoff: 0.3
**Ignored unknown character X (seen 1 times)**
2022-02-03 13:30:45,030 INFO Others (671 BGCs)
2022-02-03 13:30:45,031 INFO Writing annotation files
2022-02-03 13:30:45,035 INFO Calculating all pairwise distances
2022-02-03 13:31:03,199 INFO Removing 558 non-relevant MIBiG BGCs
2022-02-03 13:31:03,200 INFO Writing output files
2022-02-03 13:31:03,600 INFO Calling Gene Cluster Families
2022-02-03 13:31:04,457 INFO Cutoff: 0.3
This should be reported to the log as info instead. Also ignores quiet mode.
Development task:
Any packages that are in use and can be upgraded, should be upgraded. There is also a deprecated function warning that should be resolved
Tasks:
When the application is running
Then the application should measure the ram consumed during the process
So that users may see the load on the system, and
So that developers may spot increases, decreases or no change in memory consumption during development
Tasks:
From #25 task "Fix database progress checking (hmmalign)"
Currently does not check the database properly, and also does not update the database properly.
Tasks to finish before 30-06-2022
Currently there is a bit of single-threaded work being done between BGC families in pairwise comparison.
This is probably mostly these three components:
Development task
Currently the BGC data is still somewhat all over the place. Most of it is in BgcInfo due to 0781e72, but there are still things in bgctools.BgcData
Either unify this or migrate over to the storage method BiG-SLICE is using.
Tasks:
Implement BiG-SLICE style storage
Refactor BgcData
Deduplicate data
Optimize complexities (lists instead of sets & vice versa)
Development task:
Commit to using python3. This mostly means removing any backwards compatibility
Tasks:
Index is out of range on the labelsClans array here:
BiG-SCAPE/src/big_scape/clustering.py
Line 465 in 6ff7b03
When a user indicates that MIBiG files are used, and a MIBiG version is set or implied,
Then a set of GBK files for that MIBiG version should be downloaded and extracted
So that the users are not required to do this manually, and
So that the repository does not need to retain the MIBiG files
Tasks:
When the application is executed, and
When the user has specified a log level or one is implied (default = warning)
Then a proper logger should be used to display logs on the CLI and to write a log to a file
So that users can choose what severity of logs to be informed of
Tasks:
Implement two types of filter:
Tasks:
FastTree generates the following warning:
WARNING! This alignment consists of closely-related and very-long sequences.
WARNING! FastTree (or other standard maximum-likelihood tools)
may not be appropriate for aligments of very closely-related sequences
like this one, as FastTree does not account for recombination or gene conversion
See also https://git.wageningenur.nl/medema-group/BiG-SCAPE/-/issues/4
When a storage location for the Pfam files is specified or implied (default [big_scape_dir]/pfam)
When a version of Pfam is specified or implied (default 31.0)
Then that version of the Pfam .hmm file should be downloaded and extracted
So that the users are not required to do this manually
Tasks:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.