Softwares
Names | Stars & Forks | Issues | PRs | Downloads |
---|---|---|---|---|
pyCirclize | ||||
pyGenomeViz | ||||
pyMSAviz | ||||
ANIclustermap | ||||
COGclassifier | ||||
phyTreeViz | ||||
pybarrnap |
MSA(Multiple Sequence Alignment) visualization python package for sequence analysis
Home Page: https://moshi4.github.io/pyMSAviz
License: MIT License
Names | Stars & Forks | Issues | PRs | Downloads |
---|---|---|---|---|
pyCirclize | ||||
pyGenomeViz | ||||
pyMSAviz | ||||
ANIclustermap | ||||
COGclassifier | ||||
phyTreeViz | ||||
pybarrnap |
hi what if i have to grep and display specific positions only lets say 30 40 50 and 60 from a MSA . is there a direct way rather than writing a new fasta file?
also it will be great if you can add a functionality to upload newick format tree and showing phylogeny on left hand side of header
I think it is necessary to provide a function to draw msa without gaps
Hello,
First of all I want to say that I really like this python package. I think it is really neat!
I am currently having the problem that the labels for the various sequences are cut off.
According to how I understand the code, if I have a fasta file containing aligned sequence records with the format
>Genus1 species1 strain1 | accession_number1
----M----AD---A
>Genus2 species2 strain2 | accession_number2
----M----SD---A
etc.
Then when I run the package on that particular fasta file, only the Genus is shown on the left of each sequence.
Since I have files with many species of the same genus, I also cannot use the sorted=True setting when creating the MsaViz object, as I then get a "Duplicate values" error.
If I switch the format of my alignment files around in order to have the first thing be the accession_number, the sorted=True setting functions as intended but I am left with a figure showing only the accession_numbers as a description, which makes interpreting the results hard.
Current format:
>accession_number1 | Genus1 species1 strain1
----M----AD---A
>accession_number2 | Genus2 species2 strain2
----M----SD---A
etc.
Am I doing something wrong currently, or is there some setting to ensure that the full title of the sequence record is shown?
Minimal working example assuming switched fasta files:
def make_pymsaviz_plot(path, name, outname, min_gap_length = 5, gap_fraction=0.05, gap_char="-", variable_consensus=0.4, variable_char="x", show_count=True, show_consensus=True, color_scheme="Clustal", sorted = False):
# create the input and output file names
infile = os.path.join(path, name)
outfile = os.path.join(path, outname)
# parse the input fasta file into an array of sequences
sequences = []
for record in SeqIO.parse(infile, "fasta"):
sequences.append(record.seq)
# for every position in the alignment, count the number of gaps and variable characters
gap_count = np.zeros(len(sequences[0]))
for sequence in sequences:
for i, aa in enumerate(sequence):
if aa == gap_char:
gap_count[i] += 1
# get the continuous stretches of gaps
gap_stretches = []
start = 0
end = 0
for i, count in enumerate(gap_count):
if(count/len(sequences) > (1-gap_fraction)):
end = i
else:
if end > start:
gap_stretches.append((start+2, end+1))
start = i
end = i
if end > start:
gap_stretches.append((start+2, end+1))
# remove the stretches of gaps that are too short
gap_stretches = [stretch for stretch in gap_stretches if stretch[1] - stretch[0] > min_gap_length]
print(gap_stretches)
# create a new fasta file with the gap_stretches removed
protein_accessions = []
genus_species = []
with open(outfile + "_gap_trimmed.fasta", "w") as f:
for record in SeqIO.parse(infile, "fasta"):
# get the protein accession and genus species
protein_accessions.append(record.description.split("|")[0].strip())
genus_species.append(record.description.split("|")[1])
new_seq = ""
for i, aa in enumerate(record.seq):
if not any([i >= stretch[0]-1 and i <= stretch[1]-1 for stretch in gap_stretches]):
new_seq += aa
f.write(">" + record.description + "\n")
f.write(new_seq + "\n")
# create a pymsaviz object from the both the non-trimmed and trimmed fasta file
msa = pymsaviz.MsaViz(infile, show_count=show_count, show_consensus=show_consensus, sort=sorted, color_scheme=color_scheme)
msa_trimmed = pymsaviz.MsaViz(outfile + "_gap_trimmed.fasta", show_count=show_count, show_consensus=show_consensus, sort=sorted, color_scheme=color_scheme)
# add annotations to the pymsaviz object
for gap in gap_stretches:
msa.add_text_annotation(gap, "Gap Region", text_color="black", range_color="red")
msa.savefig(outfile + "_gap.png")
# add variable markers to the trimmed pymsaviz object to show for highly non-conserved regions
high_variability = []
identity_list = msa_trimmed._get_consensus_identity_list()
for position, identity in enumerate(identity_list, 1):
if identity < variable_consensus:
high_variability.append(position)
msa_trimmed.add_markers(high_variability, marker=variable_char, color="red")
msa_trimmed.savefig(outfile + "_gap_trimmed.png")
return msa, msa_trimmed
Thank you in advance for your help.
Hi! Thank you for creating pyMSAviz
, I've been actively using it to quickly view some MSAs I've generated. While parsing through each MSA and saving them using the .savefig()
function, I noticed that the plots are always displayed interactively and quickly ran out of memory while I was doing this automatically for a bunch of files. Apologize for any ignorance, but is there a way to save the plot and not display it interactively? Added the way I am reading, and saving my MSAs below:
mv = MsaViz(rha_file, show_label=False, color_scheme='Clustal', show_consensus=True)
mv.savefig(f'../figures/{output_directory}_MSA_figures/{format_rha_file}/{format_rha_file}_RHA_MSA_figure.png')
Thank you!
With matplotlib version 3.5.3 and pymsaviz version 0.4.0 installed, this example from the docs gave an AttributeError:
from pymsaviz import MsaViz, get_msa_testdata
msa_file = get_msa_testdata("HIGD2A.fa")
mv = MsaViz(msa_file)
fig = mv.plotfig()
Extract from stack trace:
... pymsaviz/msaviz.py) in plotfig(self, dpi) ...
--> 420 fig.set_layout_engine("tight")
421 gs = GridSpec(nrows=len(plot_ax_types), ncols=1, height_ratios=y_size_list)
422 gs.update(left=0, right=1, bottom=0, top=1, hspace=0, wspace=0)
AttributeError: 'Figure' object has no attribute 'set_layout_engine'
Installing matplotlib 3.6.0 fixed the issue so perhaps the minimum version (e.g. matplotlib = ">=3.5.2" ) may need updating?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.