Comments (9)
MSA without gaps cannot be called MSA because it lacks alignment information, right?
As the package name suggests, pyMSAviz is a tool to visualize MSA, so I do not plan to implement any function to handle non-MSAs.
Sorry if I have misunderstood the meaning of your proposal.
from pymsaviz.
Sorry, I may not have made it clear~
As shown in the figure below, all amino acid sequences are gaps at some sites
So is it possible to omit these sites but keep the position information to get a more concise MSA visualization?
I think it can preserves valid information and reduces drawing time.
from pymsaviz.
Are you saying that if there is a gap-only position in the MSA, you want to determine that position as unnecessary and exclude it from the visualization?
Personally, I don't quite understand the effectiveness of the proposed functionality, as it seems to me that there are very few cases (or there shouldn't be any) where a gap-only position is included in the alignment results.
Could you please tell me the following to help me understand?
- In what cases does the MSA with gap-only positon you presented occur?
- Is it inconvenient to remove the gap-only positions in the preprocessing stage of the visualization?
If I have misunderstood something, I am sorry.
from pymsaviz.
- In many sparse MSA files, there all many gap-only position. But in the visualization stage, we usually want to ignore them.
So..., I think it is necessary to provide a selection (remove gap-only position or keep them) to the users. - It convenient to remove the gap-only positions in the preprocessing stage of the visualization, but I didn't find effective way to keep the position information(xticklabels in the package).
Here's an idea I had today while working with my data with this package :)
from pymsaviz.
I have spent some time thinking about how to handle this issue.
It is not realistic to exclude gap-only positions one by one, as it would also shift the xticklabel and would not represent the proper visualization results.
Personally, I think it would be reasonable to add an option to automatically exclude areas containing only gaps from the visualization on a MSA Wrap Block
basis.
Below is an experimental implementation (add ignore_all_gaps
option) of the visualization demo.
from pymsaviz import MsaViz
from Bio.Align import MultipleSeqAlignment
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
gap_num = 50
test_msa = MultipleSeqAlignment(
[
SeqRecord(Seq("M-AT----ALLCRGRI" + "-" * gap_num + "AITFR---RGRI--"), id="01"),
SeqRecord(Seq("M-TI-------TRGVI" + "-" * gap_num + "AITFR---RGRI--"), id="02"),
]
)
mv = MsaViz(test_msa, wrap_length=30, show_grid=True)
mv.set_plot_params(ignore_all_gaps=True, ticks_interval=5) # <= Newly added!!
fig = mv.plotfig()
Option: ignore_all_gaps=False => Gap-only MSA wrap block exist
Option: ignore_all_gaps=True => No gap-only MSA wrap block
I think this is a realistic and easy implementation. What do you think?
Also, this is just a personal interest question, but in what situations or tools is sparse MSA
generated? I don't see it in common multiple alignment tools like muscle
or mafft
, so can you tell me for reference?
from pymsaviz.
Oh, great! I think it will solve my problem to some extent. I've tried to exclude gap-only positions one by one but found it's really cumbersome if I want to keep true xticklabel~
Besides, I don't quite understand why there are sparse MSA results. But in the results downloaded from the below database, most of the MSA file
are sparse!
TreeFam database
All in all, thank you for your kind help! I will continue to think about how to solve this problem in my spare time :)
from pymsaviz.
To add, I think there is a convenient way:
- Remove the gap-only site during the preprocessing phase
- Make a location mapping for the xticklabel
from pymsaviz.
I did some checking on TreeFam.
Your MSA is based on extracting some data from the MSA of 400 TRK genes, correct? If so, it is not surprising that the gap-only positions are included.
If you are interested only in the extracted gene sequences, I suggest you remove the gaps from the extracted sequences by yourself and align them again with maftt
or muscle
. You will get more accurate alignment results that way.
If you don't necessarily need to rely on TreeFam alignment results, it seems to me that people generally process their data that way.
Also, if you do that, you will not have the problem you presented here.
These are my personal opinions. It may be superfluous, but I hope it will be helpful.
from pymsaviz.
Gap-only sites in MSAs are essentially never entered in normal operation. Even if a gap-only site were to exist for some reason, it would not be considered meaningful for data analysis and should be removed in the preprocessing stage of visualization.
Therefore, I shall consider not to implement processing for gap-only sites in pyMSAviz.
from pymsaviz.
Related Issues (7)
- get specific positions from MSA HOT 1
- Lower case nucleotides do not get any color HOT 1
- [Feature Request] new color scheme based on similarity (a la Uniprot) HOT 3
- How do you show a more complete description of the aligned entries? HOT 4
- matplotlib minimum version HOT 1
- Question - Save plot without displaying the plot interactively? HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pymsaviz.