Giter Club home page Giter Club logo

pymsaviz's Introduction

pymsaviz's People

Contributors

moshi4 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

pymsaviz's Issues

get specific positions from MSA

hi what if i have to grep and display specific positions only lets say 30 40 50 and 60 from a MSA . is there a direct way rather than writing a new fasta file?
also it will be great if you can add a functionality to upload newick format tree and showing phylogeny on left hand side of header

msa without gaps

I think it is necessary to provide a function to draw msa without gaps

How do you show a more complete description of the aligned entries?

Hello,

First of all I want to say that I really like this python package. I think it is really neat!

I am currently having the problem that the labels for the various sequences are cut off.
According to how I understand the code, if I have a fasta file containing aligned sequence records with the format

>Genus1 species1 strain1 | accession_number1
----M----AD---A
>Genus2 species2 strain2 | accession_number2
----M----SD---A
etc.

Then when I run the package on that particular fasta file, only the Genus is shown on the left of each sequence.
Since I have files with many species of the same genus, I also cannot use the sorted=True setting when creating the MsaViz object, as I then get a "Duplicate values" error.

If I switch the format of my alignment files around in order to have the first thing be the accession_number, the sorted=True setting functions as intended but I am left with a figure showing only the accession_numbers as a description, which makes interpreting the results hard.

Current format:

>accession_number1 | Genus1 species1 strain1
----M----AD---A
>accession_number2 | Genus2 species2 strain2 
----M----SD---A
etc.

Am I doing something wrong currently, or is there some setting to ensure that the full title of the sequence record is shown?

Minimal working example assuming switched fasta files:


def make_pymsaviz_plot(path, name, outname, min_gap_length = 5, gap_fraction=0.05, gap_char="-", variable_consensus=0.4, variable_char="x", show_count=True, show_consensus=True, color_scheme="Clustal", sorted = False):
    # create the input and output file names
    infile = os.path.join(path, name)
    outfile = os.path.join(path, outname)

    # parse the input fasta file into an array of sequences
    sequences = []
    for record in SeqIO.parse(infile, "fasta"):
        sequences.append(record.seq)

    # for every position in the alignment, count the number of gaps and variable characters
    gap_count = np.zeros(len(sequences[0]))

    for sequence in sequences:
        for i, aa in enumerate(sequence):
            if aa == gap_char:
                gap_count[i] += 1


    # get the continuous stretches of gaps
    gap_stretches = []
    start = 0
    end = 0
    for i, count in enumerate(gap_count):
        if(count/len(sequences) > (1-gap_fraction)):
            end = i
        else:
            if end > start:
                gap_stretches.append((start+2, end+1))
            start = i
            end = i
    if end > start:
        gap_stretches.append((start+2, end+1))

    # remove the stretches of gaps that are too short
    gap_stretches = [stretch for stretch in gap_stretches if stretch[1] - stretch[0] > min_gap_length]
    print(gap_stretches)

    # create a new fasta file with the gap_stretches removed
    protein_accessions = []
    genus_species = []

    with open(outfile + "_gap_trimmed.fasta", "w") as f:
        for record in SeqIO.parse(infile, "fasta"):

            # get the protein accession and genus species
            protein_accessions.append(record.description.split("|")[0].strip())
            genus_species.append(record.description.split("|")[1])

            new_seq = ""
            for i, aa in enumerate(record.seq):
                if not any([i >= stretch[0]-1 and i <= stretch[1]-1 for stretch in gap_stretches]):
                    new_seq += aa
            f.write(">" + record.description + "\n")
            f.write(new_seq + "\n")

    # create a pymsaviz object from the both the non-trimmed and trimmed fasta file
    msa = pymsaviz.MsaViz(infile, show_count=show_count, show_consensus=show_consensus, sort=sorted, color_scheme=color_scheme)
    msa_trimmed = pymsaviz.MsaViz(outfile + "_gap_trimmed.fasta", show_count=show_count, show_consensus=show_consensus, sort=sorted, color_scheme=color_scheme)

    # add annotations to the pymsaviz object
    for gap in gap_stretches:
        msa.add_text_annotation(gap, "Gap Region", text_color="black", range_color="red")
    msa.savefig(outfile + "_gap.png")

    # add variable markers to the trimmed pymsaviz object to show for highly non-conserved regions
    high_variability = []
    identity_list = msa_trimmed._get_consensus_identity_list()

    for position, identity in enumerate(identity_list, 1):
        if identity < variable_consensus:
            high_variability.append(position)
    msa_trimmed.add_markers(high_variability, marker=variable_char, color="red")
    msa_trimmed.savefig(outfile + "_gap_trimmed.png")

    return msa, msa_trimmed


Thank you in advance for your help.

Question - Save plot without displaying the plot interactively?

Hi! Thank you for creating pyMSAviz, I've been actively using it to quickly view some MSAs I've generated. While parsing through each MSA and saving them using the .savefig() function, I noticed that the plots are always displayed interactively and quickly ran out of memory while I was doing this automatically for a bunch of files. Apologize for any ignorance, but is there a way to save the plot and not display it interactively? Added the way I am reading, and saving my MSAs below:

mv = MsaViz(rha_file, show_label=False, color_scheme='Clustal', show_consensus=True)
mv.savefig(f'../figures/{output_directory}_MSA_figures/{format_rha_file}/{format_rha_file}_RHA_MSA_figure.png')

Thank you!

matplotlib minimum version

With matplotlib version 3.5.3 and pymsaviz version 0.4.0 installed, this example from the docs gave an AttributeError:

from pymsaviz import MsaViz, get_msa_testdata

msa_file = get_msa_testdata("HIGD2A.fa")
mv = MsaViz(msa_file)
fig = mv.plotfig()

Extract from stack trace:

... pymsaviz/msaviz.py) in plotfig(self, dpi) ...
--> 420         fig.set_layout_engine("tight")
    421         gs = GridSpec(nrows=len(plot_ax_types), ncols=1, height_ratios=y_size_list)
    422         gs.update(left=0, right=1, bottom=0, top=1, hspace=0, wspace=0)

AttributeError: 'Figure' object has no attribute 'set_layout_engine'

Installing matplotlib 3.6.0 fixed the issue so perhaps the minimum version (e.g. matplotlib = ">=3.5.2" ) may need updating?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.