Giter Club home page Giter Club logo

superzchen / ilearnplus Goto Github PK

View Code? Open in Web Editor NEW
90.0 90.0 32.0 32.44 MB

iLearnPlus is the first machine-learning platform with both graphical- and web-based user interface that enables the construction of automated machine-learning pipelines for computational analysis and predictions using nucleic acid and protein sequences.

Python 100.00%
automated-modelling bioinformatics-tool biomedical-data-analytics deep-learning feature-selection machine-learning prediction python sequence-analysis

ilearnplus's People

Contributors

fanyangrocks avatar kinyugo avatar superzchen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

ilearnplus's Issues

A question about the PSTNPss function in util/FileProcessing.py

Hello, I am interested in your project, but I have some questions when reading your source code. I hope you can help me answer them.

My question is about the PSTNPss function in util/FileProcessing.py.
Question 1: I see that in this function, you subtract one from the total number of samples for the corresponding label and subtract one from the trinucleotide count at the corresponding location. I don’t understand the purpose and principle of doing this.

p_num, n_num = positive_number, negative_number
po_number = matrix_po[j][order[sequence[j: j + 3]]]
if i[0] in positive_key and po_number > 0:
    po_number -= 1
    p_num -= 1
ne_number = matrix_ne[j][order[sequence[j: j + 3]]]
if i[0] in negative_key and ne_number > 0:
    ne_number -= 1
    n_num -= 1

Question 2: Secondly, this function uses different processing methods for the training dataset and the testing dataset. In the training dataset, you perform the above subtraction operation, but not in the testing dataset. I don’t understand why there is such a difference. I have attached your code snippet for your convenience. Thank you for your time and help!

    def PSTNPss(self):
        try:
            if not self.is_equal:
                self.error_msg = 'PSTNPss descriptor need fasta sequence with equal length.'
                return False

            fastas = []
            for item in self.fasta_list:
                if item[3] == 'training':
                    fastas.append(item)
                    fastas.append([item[0], item[1], item[2], 'testing'])
                else:
                    fastas.append(item)

            for i in fastas:
                if re.search('[^ACGT-]', i[1]):
                    self.error_msg = 'Illegal character included in the fasta sequences, only the "ACGT[U]" are allowed by this encoding scheme.'
                    return False

            encodings = []
            header = ['SampleName', 'label']
            for pos in range(len(fastas[0][1]) - 2):
                header.append('Pos.%d' % (pos + 1))
            encodings.append(header)

            positive = []
            negative = []
            positive_key = []
            negative_key = []
            for i in fastas:
                if i[3] == 'training':
                    if i[2] == '1':
                        positive.append(i[1])
                        positive_key.append(i[0])
                    else:
                        negative.append(i[1])
                        negative_key.append(i[0])

            nucleotides = ['A', 'C', 'G', 'T']
            trinucleotides = [n1 + n2 + n3 for n1 in nucleotides for n2 in nucleotides for n3 in nucleotides]
            order = {}
            for i in range(len(trinucleotides)):
                order[trinucleotides[i]] = i

            matrix_po = self.CalculateMatrix(positive, order)
            matrix_ne = self.CalculateMatrix(negative, order)

            positive_number = len(positive)
            negative_number = len(negative)

            for i in fastas:
                if i[3] == 'testing':
                    name, sequence, label = i[0], i[1], i[2]
                    code = [name, label]
                    for j in range(len(sequence) - 2):
                        if re.search('-', sequence[j: j + 3]):
                            code.append(0)
                        else:
                            p_num, n_num = positive_number, negative_number
                            po_number = matrix_po[j][order[sequence[j: j + 3]]]
                            if i[0] in positive_key and po_number > 0:
                                po_number -= 1
                                p_num -= 1
                            ne_number = matrix_ne[j][order[sequence[j: j + 3]]]
                            if i[0] in negative_key and ne_number > 0:
                                ne_number -= 1
                                n_num -= 1
                            code.append(po_number / p_num - ne_number / n_num)
                            # print(sequence[j: j+3], order[sequence[j: j+3]], po_number, p_num, ne_number, n_num)
                    encodings.append(code)
            self.encoding_array = np.array([])
            self.encoding_array = np.array(encodings, dtype=str)
            self.column = self.encoding_array.shape[1]
            self.row = self.encoding_array.shape[0] - 1
            del encodings
            if self.encoding_array.shape[0] > 1:
                return True
            else:
                return False
        except Exception as e:
            self.error_msg = str(e)
            return False

I have issues with selecting particular feature descriptors.

First of all, thank you for such a nice feature extraction tool.

In iLearnPlus Basic, I couldn't be able to select particular descriptors. Could you please let me know how to solve this issue?

Also, I attached the image for your reference. Please check it.

I look forward to hearing from you soon.

Thank you.

iLearnPlus_screenshot

PSTNPss cannot be used

Hello, I entered the FASTA sequence in the required format. However, PSTNPss cannot be used.

Multi label problems

While performing multi label problems, all performance evaluation matrices are showning NA except Accuracy. ROC/ PRC is also not generated. Kindly help.

Pop out errors

sometimes when dealing with seq fasta data, it will pop out'RG' , sometimes "divided by zero“, sometimes just pop out error(with no responding). How to fix the problem if I come across with errors like that? thanks in advance.

Show a warning if special fasta headers format is violated

In a large dataset of automatically downloaded sequences there can be names including "|" symbol.
I concatenate class and train/test labels also automatically.
So, when I try to analyze this file, there are uninformative error messages like:

  • ValueError: could not convert string to float: 'P42577.2'
  • ValueError: invalid literal for int() with base 10: '6LPD'

which are caused by incorrect fasta headers:

  • P42577.2_sp|P42577.2|FRIS_LYMST|0|training
  • 6LPD_pdb|6LPD|F|1|training

A simple check when importing the file could show a warning to the user.

Pop out errors

sometimes when dealing with seq fasta data, it will pop out'RG' , sometimes "divided by zero“, sometimes just pop out error(with no responding). How to fix the problem if I come across with errors like that? thanks in advance.

something unrelated but can you please help

Hi,
I have tried all the possible solutions from google to install PyQt5.

(base) amit@amit-X705UDR:~$ /home/amit/miniconda3/bin/python
Python 3.8.12 (default, Oct 12 2021, 13:49:34) 
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from ilearnplus import runiLearnPlus
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/amit/.local/lib/python3.8/site-packages/ilearnplus/__init__.py", line 5, in <module>
    from .iLearnPlusBasic import *
  File "/home/amit/.local/lib/python3.8/site-packages/ilearnplus/iLearnPlusBasic.py", line 7, in <module>
    from PyQt5.QtWidgets import (QApplication, QWidget, QPushButton, QFileDialog, QLabel, QHBoxLayout, QGroupBox, QTextEdit,
ModuleNotFoundError: No module named 'PyQt5'

Kindly suggest.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.