Giter Club home page Giter Club logo

Comments (9)

moshi4 avatar moshi4 commented on June 7, 2024

MSA without gaps cannot be called MSA because it lacks alignment information, right?
As the package name suggests, pyMSAviz is a tool to visualize MSA, so I do not plan to implement any function to handle non-MSAs.

Sorry if I have misunderstood the meaning of your proposal.

from pymsaviz.

cx994 avatar cx994 commented on June 7, 2024

Sorry, I may not have made it clear~
As shown in the figure below, all amino acid sequences are gaps at some sites
Snipaste_2022-11-16_21-31-56
So is it possible to omit these sites but keep the position information to get a more concise MSA visualization?
I think it can preserves valid information and reduces drawing time.

from pymsaviz.

moshi4 avatar moshi4 commented on June 7, 2024

Are you saying that if there is a gap-only position in the MSA, you want to determine that position as unnecessary and exclude it from the visualization?

Personally, I don't quite understand the effectiveness of the proposed functionality, as it seems to me that there are very few cases (or there shouldn't be any) where a gap-only position is included in the alignment results.

Could you please tell me the following to help me understand?

  • In what cases does the MSA with gap-only positon you presented occur?
  • Is it inconvenient to remove the gap-only positions in the preprocessing stage of the visualization?

If I have misunderstood something, I am sorry.

from pymsaviz.

cx994 avatar cx994 commented on June 7, 2024
  • In many sparse MSA files, there all many gap-only position. But in the visualization stage, we usually want to ignore them.
    So..., I think it is necessary to provide a selection (remove gap-only position or keep them) to the users.
  • It convenient to remove the gap-only positions in the preprocessing stage of the visualization, but I didn't find effective way to keep the position information(xticklabels in the package).
    Here's an idea I had today while working with my data with this package :)

from pymsaviz.

moshi4 avatar moshi4 commented on June 7, 2024

I have spent some time thinking about how to handle this issue.

It is not realistic to exclude gap-only positions one by one, as it would also shift the xticklabel and would not represent the proper visualization results.
Personally, I think it would be reasonable to add an option to automatically exclude areas containing only gaps from the visualization on a MSA Wrap Block basis.

Below is an experimental implementation (add ignore_all_gaps option) of the visualization demo.

from pymsaviz import MsaViz
from Bio.Align import MultipleSeqAlignment
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord

gap_num = 50
test_msa = MultipleSeqAlignment(
    [
        SeqRecord(Seq("M-AT----ALLCRGRI" + "-" * gap_num + "AITFR---RGRI--"), id="01"),
        SeqRecord(Seq("M-TI-------TRGVI" + "-" * gap_num + "AITFR---RGRI--"), id="02"),
    ]
)
mv = MsaViz(test_msa, wrap_length=30, show_grid=True)
mv.set_plot_params(ignore_all_gaps=True, ticks_interval=5) # <= Newly added!!
fig = mv.plotfig()

Option: ignore_all_gaps=False => Gap-only MSA wrap block exist
ignore_gaps_false

Option: ignore_all_gaps=True => No gap-only MSA wrap block
ignore_gaps_true

I think this is a realistic and easy implementation. What do you think?

Also, this is just a personal interest question, but in what situations or tools is sparse MSA generated? I don't see it in common multiple alignment tools like muscle or mafft, so can you tell me for reference?

from pymsaviz.

cx994 avatar cx994 commented on June 7, 2024

Oh, great! I think it will solve my problem to some extent. I've tried to exclude gap-only positions one by one but found it's really cumbersome if I want to keep true xticklabel~
Besides, I don't quite understand why there are sparse MSA results. But in the results downloaded from the below database, most of the MSA file are sparse!
TreeFam database
All in all, thank you for your kind help! I will continue to think about how to solve this problem in my spare time :)

from pymsaviz.

cx994 avatar cx994 commented on June 7, 2024

To add, I think there is a convenient way:

  • Remove the gap-only site during the preprocessing phase
  • Make a location mapping for the xticklabel

from pymsaviz.

moshi4 avatar moshi4 commented on June 7, 2024

I did some checking on TreeFam.

Your MSA is based on extracting some data from the MSA of 400 TRK genes, correct? If so, it is not surprising that the gap-only positions are included.
If you are interested only in the extracted gene sequences, I suggest you remove the gaps from the extracted sequences by yourself and align them again with maftt or muscle. You will get more accurate alignment results that way.
If you don't necessarily need to rely on TreeFam alignment results, it seems to me that people generally process their data that way.
Also, if you do that, you will not have the problem you presented here.

These are my personal opinions. It may be superfluous, but I hope it will be helpful.

from pymsaviz.

moshi4 avatar moshi4 commented on June 7, 2024

Gap-only sites in MSAs are essentially never entered in normal operation. Even if a gap-only site were to exist for some reason, it would not be considered meaningful for data analysis and should be removed in the preprocessing stage of visualization.

Therefore, I shall consider not to implement processing for gap-only sites in pyMSAviz.

from pymsaviz.

Related Issues (7)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.