I think it is necessary to provide a function to draw msa without gaps

In many sparse MSA files, there all many gap-only position. But in the visualiza

To add, I think there is a convenient way: Remove the gap-only

msa without gaps about pymsaviz HOT 9 CLOSED

cx994 commented on June 7, 2024

msa without gaps

from pymsaviz.

Comments (9)

moshi4 commented on June 7, 2024

MSA without gaps cannot be called MSA because it lacks alignment information, right?
As the package name suggests, pyMSAviz is a tool to visualize MSA, so I do not plan to implement any function to handle non-MSAs.

Sorry if I have misunderstood the meaning of your proposal.

from pymsaviz.

cx994 commented on June 7, 2024

Sorry, I may not have made it clear~
As shown in the figure below, all amino acid sequences are gaps at some sites

So is it possible to omit these sites but keep the position information to get a more concise MSA visualization?
I think it can preserves valid information and reduces drawing time.

from pymsaviz.

moshi4 commented on June 7, 2024

Are you saying that if there is a gap-only position in the MSA, you want to determine that position as unnecessary and exclude it from the visualization?

Personally, I don't quite understand the effectiveness of the proposed functionality, as it seems to me that there are very few cases (or there shouldn't be any) where a gap-only position is included in the alignment results.

Could you please tell me the following to help me understand?

In what cases does the MSA with gap-only positon you presented occur?
Is it inconvenient to remove the gap-only positions in the preprocessing stage of the visualization?

If I have misunderstood something, I am sorry.

from pymsaviz.

cx994 commented on June 7, 2024

In many sparse MSA files, there all many gap-only position. But in the visualization stage, we usually want to ignore them.
So..., I think it is necessary to provide a selection (remove gap-only position or keep them) to the users.
It convenient to remove the gap-only positions in the preprocessing stage of the visualization, but I didn't find effective way to keep the position information(xticklabels in the package).
Here's an idea I had today while working with my data with this package ：）

from pymsaviz.

moshi4 commented on June 7, 2024

I have spent some time thinking about how to handle this issue.

It is not realistic to exclude gap-only positions one by one, as it would also shift the xticklabel and would not represent the proper visualization results.
Personally, I think it would be reasonable to add an option to automatically exclude areas containing only gaps from the visualization on a MSA Wrap Block basis.

Below is an experimental implementation (add ignore_all_gaps option) of the visualization demo.

from pymsaviz import MsaViz
from Bio.Align import MultipleSeqAlignment
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord

gap_num = 50
test_msa = MultipleSeqAlignment(
    [
        SeqRecord(Seq("M-AT----ALLCRGRI" + "-" * gap_num + "AITFR---RGRI--"), id="01"),
        SeqRecord(Seq("M-TI-------TRGVI" + "-" * gap_num + "AITFR---RGRI--"), id="02"),
    ]
)
mv = MsaViz(test_msa, wrap_length=30, show_grid=True)
mv.set_plot_params(ignore_all_gaps=True, ticks_interval=5) # <= Newly added!!
fig = mv.plotfig()

Option: ignore_all_gaps=False => Gap-only MSA wrap block exist

Option: ignore_all_gaps=True => No gap-only MSA wrap block

I think this is a realistic and easy implementation. What do you think?

Also, this is just a personal interest question, but in what situations or tools is sparse MSA generated? I don't see it in common multiple alignment tools like muscle or mafft, so can you tell me for reference?

from pymsaviz.

cx994 commented on June 7, 2024

Oh, great! I think it will solve my problem to some extent. I've tried to exclude gap-only positions one by one but found it's really cumbersome if I want to keep true xticklabel~
Besides, I don't quite understand why there are sparse MSA results. But in the results downloaded from the below database, most of the MSA file are sparse!
TreeFam database
All in all, thank you for your kind help! I will continue to think about how to solve this problem in my spare time ：）

from pymsaviz.

cx994 commented on June 7, 2024

To add, I think there is a convenient way:

Remove the gap-only site during the preprocessing phase
Make a location mapping for the xticklabel

from pymsaviz.

moshi4 commented on June 7, 2024

I did some checking on TreeFam.

Your MSA is based on extracting some data from the MSA of 400 TRK genes, correct? If so, it is not surprising that the gap-only positions are included.
If you are interested only in the extracted gene sequences, I suggest you remove the gaps from the extracted sequences by yourself and align them again with maftt or muscle. You will get more accurate alignment results that way.
If you don't necessarily need to rely on TreeFam alignment results, it seems to me that people generally process their data that way.
Also, if you do that, you will not have the problem you presented here.

These are my personal opinions. It may be superfluous, but I hope it will be helpful.

from pymsaviz.

moshi4 commented on June 7, 2024

Gap-only sites in MSAs are essentially never entered in normal operation. Even if a gap-only site were to exist for some reason, it would not be considered meaningful for data analysis and should be removed in the preprocessing stage of visualization.

Therefore, I shall consider not to implement processing for gap-only sites in pyMSAviz.

from pymsaviz.

msa without gaps about pymsaviz HOT 9 CLOSED

Comments (9)

Related Issues (7)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent