Giter Club home page Giter Club logo

sourmashconsumr's Introduction

sourmashconsumr

R-CMD-check Codecov test coverage

The goal of sourmashconsumr is to parse, analyze, and visualize the outputs of the sourmash python package. The sourmashconsumr package is still under active development.

Installation

You can install the development version of sourmashconsumr from GitHub with:

# install.packages("remotes")
remotes::install_github("Arcadia-Science/sourmashconsumr")

Eventually, we hope to release sourmashconsumr on CRAN and to provide a conda-forge package. We’ll update these instructions once we’ve done that.

Usage

See the vignette for full instructions on how to run the sourmashconsumr package (coming soon!).

To access the functions in the sourmashconsumr package, you can load it with:

library(sourmashconsumr)

The sourmashconsumr package contains a variety of functions to work with the outputs of the sourmash python package. The table below summarizes which sourmash outputs the sourmashconsumr package operates on and the functions that are available. For a complete list of functions in the sourmashconsumr package, see the documentation.

Developer documentation

The sourmashconsumr package follows package developer conventions laid out in https://r-pkgs.org/, and changes can be contributed to the code base using pull requests. For more information on how to contribute, see the developer documentation.

Citation

If you’d like more information on how sourmash works, please see the following publications:

sourmashconsumr's People

Contributors

ctb avatar taylorreiter avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

Forkers

bluegenes ctb

sourmashconsumr's Issues

refactor `tax_glom_taxonomy_annotate()` to only have piped code block occur once by auto-inheriting `glom_var`

In #37, I changed tax_glom_taxonomy_annotate() so that the user can select a glom_var. Right now you can only chose n_unique_kmers or f_unique_to_query. If I continue to expand the possible glom_vars, I'll refactor the code so that it uses the glom_var smartly and only has the piped code block once. It seemed like too much of a lift for something that might not even be that useful to implement this in #37.

for `plot_taxonomy_annotate_ts_alluvial()`, add a `show_tax` argument that allows the user to control which taxa are given alluvial ribbons

motivated by a suggestion by @elizabethmcd in #37 and inspired by show_tax in ampvis2 https://kasperskytte.github.io/ampvis2/articles/ampvis2.html

How the function works right now is it uses a fraction_threshold (by default, 0.01, or 1%) -- if a lineage is present in any of the time series at 1% or greater, it gets an alluvial ribbon in the plot. The user can change the fraction_threshold to anything they want it to be. Anything that does not get an alluvial ribbon gets automatically clumped into "other" via a process implemented in the function.

I like the idea of tax_show. This would allow users to either provide a list of taxa to tax_show or use fraction_threshold.

Functions for importing output of `sourmash taxonomy annotate` into metacoder object

  • metacoder visualization
    • read_sourmash_taxonomy_annotate(file, intersect_bp_threshold)
    • pivot_sourmash_taxonomy_annotate_wider()
    • sourmash_taxonomy_annotate_to_metacoder(sourmash_taxonomy_annotate_df, database = c("genbank", "gtdb"), summary_level = c(NULL, "genus", ...))
      • database will control class_regex for parse_tax_data()
      • summary_level will control if the sourmash results are agglomerated up the taxonomic lineage during the creation of the metacoder object (e.g. to genus level).
      • sequence of functions:
        • read_sourmash_taxonomy_annotate() to purrr::map_dfr
        • pivot_sourmash_taxonomy_annotate_wider()
        • parse_tax_data()
        • calc_taxon_abund()
        • calc_n_samples()
      • goal is to do everything that a user would need to think of doing to get the data into metacoder land to enable visualization. Check and see if there is something that needs to be done for diff abund vix/matrix viz.

document rules for naming functions

Naming functions

Functions that are exported (e.g. user-facing) are named by the action completed by the function, the sourmash output type the act on, and if relevant, a description of the action taken.

  • Action words:
    • read
    • plot
    • from
  • sourmash output types:
    • signature
    • compare_csv
    • gather
    • taxonomy_annotate
  • example actions:
    • to_metacoder
    • upset
    • heatmap
    • mds

Functions that are not exported do not follow a naming scheme but strive to be fully descriptive of their actions, and when possible use the sourmash output types to make it clear what type of data the internal function operates on.

  • examples of internal functions
    • check_compare_df_sample_col_and_move_to_rowname()
    • check_and_edit_names_in_signatures_df()
    • check_uniform_parameters_in_signatures_df()
    • make_agglom_cols()
    • make_expression()
    • get_scaled_for_max_hash()
    • pivot_wider_taxonomy_annotate()

remove themes from ggplots, or at least make sure the same themes are used throughout

to make sure there is a consistent user experience.

plot_compare_mds I think uses theme_classic, while plot_signatures_rarefaction doesn't have a theme.

I think not having a theme is probably the right what to go? except for alluvial plots and sankey plots are more rewarding with a blank background, so maybe theme_classic is a good default.

enable taxonomy plotting with LIN taxonomic framework?

In sourmash taxonomy, we're adding utils to use the LIN taxonomic framework, which allows for greater flexibility and specificity compared with standard taxonomic ranks. For example, if only certain strains of a microbe are pathogenic, the LIN framework may be useful for identifying/grouping pathogenic vs non-pathogenic strains.

Is this something you're interested in allowing for viz? Though LINs aren't super widely used yet, I think they have neat potential for sourmash applications.

LIN concept example (ref https://doi.org/10.1093/nar/gkaa190):

image

add color to the compare plots

Right now, the compare plot looks like this:

comp <- read_compare_csv("tests/testthat/comp_k31.csv")
mds <- make_compare_mds(compare_df = comp)
plt <- plot_compare_mds(mds)
plt

image

It might be nice to have this plot accept colors optionally:
image

I went to implement this, but it wasn't clear to me what the best way to do this would be. I decided to leave this as-is for now, and then as I use the functions, i think it will become clear how I interact with this and then I'll add it to the function.

Similarly, it would be cool to color the axis labels or something by sample type or group for the heatmap:
image

Again, don't know how to do this in a way that will be intuitive to downstream users yet, so will do later!

code i used to figure out how to the make the sankey plot

no promises that it runs, but recording here so it's somewhere

library(ggalluvial)
library(magrittr)
library(sourmashconsumr)

taxonomy_annotate_df <- read_taxonomy_annotate(Sys.glob("tests/testthat/SRR19*lineage*.csv"), separate_lineage = T) %>%
  dplyr::select(f_unique_to_query, f_unique_weighted, domain, phylum, class, family, order, genus, species) %>%
  dplyr::group_by(domain, phylum, class, family, order, genus, species) %>%
  dplyr::summarize(sum_f_unique_weighted = sum(f_unique_weighted))

ggalluvial::is_alluvia_form(taxonomy_annotate_df)


ggplot2::ggplot(taxonomy_annotate_df,
       ggplot2::aes(y = sum_f_unique_weighted, axis1 = domain, axis2 = phylum, axis3 = class, axis4 = order, axis5 = family)) +
  #ggalluvial::geom_alluvium(aes(fill = order), width = 1/12) +
  ggalluvial::geom_flow() +
  ggalluvial::geom_stratum(width = 1/10, alpha = .5, aes(fill = c(family))) +
  ggplot2::geom_text(stat = "stratum", aes(label = after_stat(stratum)),
                     size = 2, hjust = -0.25) +
  theme_classic() +
  labs(x = "tanomic rank", y = "abundance-weighted unique fraction\ntotaled across all samples") +
  scale_x_continuous(labels = c("domain", "phylum", "class", "order", "family"),
                     breaks = c(1, 2, 3, 4, 5))


ggplot(taxonomy_annotate_df,
       aes(x = survey, stratum = response, alluvium = subject,
           y = freq,
           fill = response, label = response)) +
  scale_x_discrete(expand = c(.1, .1)) +
  geom_flow() +
  geom_stratum(alpha = .5) +
  geom_text(stat = "stratum", size = 3) +
  theme(legend.position = "none") +
  ggtitle("vaccination survey responses at three points in time")


taxonomy_annotate_df <- read_taxonomy_annotate(Sys.glob("tests/testthat/*lineage*.csv"), separate_lineage = T) %>%
  dplyr::select(query_name, f_unique_to_query, f_unique_weighted, domain, phylum, class, family, order, genus, species) %>%
  dplyr::group_by(query_name, domain, phylum, class, family, order, genus, species) %>%
  dplyr::summarize(sum_f_unique_weighted = sum(f_unique_weighted))
to_lodes_form(taxonomy_annotate_df_long)

# create a fill variable --> it will be based on alphabetical order (which is how the alluvial plot is ordered)
# and it will be for each level of taxonomy
# probably needs to switch to long format
taxonomy_annotate_df_long <- taxonomy_annotate_df %>%
  tidyr::pivot_longer(cols = domain:species, names_to = "taxonomic_rank", values_to = "taxonomic_label")

taxonomy_annotate_df_long <- transform(taxonomy_annotate_df_long, taxonomic_label = factor(taxonomic_label))
to_lodes_form(taxonomy_annotate_df_long)
ggplot(taxonomy_annotate_df_long,
       aes(x = taxonomic_rank, stratum = taxonomic_label, alluvium = query_name,
           y = sum_f_unique_weighted,
           fill = taxonomic_label, label = taxonomic_label)) +
  scale_x_discrete(expand = c(.1, .1)) +
  geom_flow() +
  geom_stratum(alpha = .5) +
  geom_text(stat = "stratum", size = 3) +
  theme(legend.position = "none") +
  ggtitle("alluvial plot")

# test data ---------------------------------------------------------------

data(vaccinations)
vaccinations <- transform(vaccinations,
                          response = factor(response, rev(levels(response))))
ggplot(vaccinations,
       aes(x = survey, stratum = response, alluvium = subject,
           y = freq,
           fill = response, label = response)) +
  scale_x_discrete(expand = c(.1, .1)) +
  geom_flow() +
  geom_stratum(alpha = .5) +
  geom_text(stat = "stratum", size = 3) +
  theme(legend.position = "none") +
  ggtitle("vaccination survey responses at three points in time")


# try parallel sets -------------------------------------------------------

data <- reshape2::melt(Titanic)
data <- gather_set_data(data, 1:4)
data

data <- gather_set_data(taxonomy_annotate_df, 1:7)
palette <- colorRampPalette(RColorBrewer::brewer.pal(8, "Set2"))(length(unique(data$y)))
ggplot(data, aes(x, id = id, split = y, value = sum_f_unique_weighted)) +
  geom_parallel_sets(alpha = 0.3, axis.width = 0.1) +
  geom_parallel_sets_axes(axis.width = 0.2, aes(fill = y)) +
  geom_parallel_sets_labels(colour = 'black', angle = 360, size = 2, hjust = -0.25) +
  theme_classic() +
  theme(axis.line.y = element_blank(),
        axis.text.y = element_blank(),
        axis.ticks.y = element_blank(),
        axis.ticks.x = element_blank(),
        legend.position = "None") +
  labs(x = "tanomic rank") +
  scale_x_continuous(labels = c("domain", "phylum", "class", "order", "family", "genus", "species", ""),
                     breaks = c(1, 2, 3, 4, 5, 6, 7, 8),
                     limits = c(.75, 8)) +
  scale_fill_manual(values = palette)

Change some variable names in R/metacoder.R

  • summary_level to agglomeration_level or taxglom_level: summary level isn't clear. it should be made more clear this is for agglomeration.
  • switch taxonomy_annotate_tibble to taxonomy_annotate_df:
    naming it a tibble is sort of annoying, plus a tibble is technically still a data frame. Changing this would make it match with how signature data frames are referred to (signatures_df). Plus df is shorter than tibble, which is nice.
  • change taxonomy_annotate_to_metacoder() to from_taxonomy_annotate_to_metacoder()

documenting `plot_taxonomy_annotate_ts_alluvial()` and output

taxonomy_annotate_df <- read_taxonomy_annotate(Sys.glob("~/github/2022-prjna853785-sourmash/outputs/sourmash_taxonomy/SRR*lineages*csv"))

tmp <- readr::read_csv("https://raw.githubusercontent.com/Arcadia-Science/2022-prjna853785-sourmash/main/inputs/metadata.csv") %>%
  select(query_name = run_accession, time = age_months)

plot_taxonomy_annotate_ts_alluvial(taxonomy_annotate_df, time_df = tmp, tax_glom_level = "genus")

image

converting tax annotate files to phyloseq object

I have been using sourmashconsumr to convert phyloseq objects, but I keep getting the error:

Error in validObject(.Object) :
invalid class “sample_data” object: Sample Data must have non-zero dimensions.

I have ensured there are no 0 within the data frames, and the sample name in the dataframe is correct, but I still have the error. Any advice as to what could be causing this error?

Code used below for reference-

#read in CSV
taxonomy_annotate_df <- read_csv("sample1.51gtdb.with-lineages.csv")
head(taxonomy_annotate_df)

#metadata- new dataframe from existing data
query_name <- c("sample1.fq")
metadata <- data.frame(query_name = query_name)

#Replace Zero with NA Value in a dataframe
taxonomy_annotate_df [taxonomy_annotate_df == 0] <- NA

#Converting from taxonomy annotate to phyloseq object
sample1_phyloseq <- from_taxonomy_annotate_to_phyloseq(taxonomy_annotate_df = taxonomy_annotate_df,
metadata_df = metadata %>%
tibble::column_to_rownames("query_name"))

Add visualizations for specific use cases like time series or different groups

So far I've been focused on visualizations that will work no matter if samples are highly related, time series, different groups with lots of replicates, large or small sample sizes, etc. I think now that some of these base visualizations are encoded, I can do some more specific things as they come up.

Brainstorming below!

Time series

image from: The temporal dynamics of the tracheal microbiome in tracheostomised patients with and without lower respiratory infections. August 2017PLoS ONE 12(8):e0182520 DOI:10.1371/journal.pone.0182520

Differential abundance

image
Show in a vignette how to go from from_sourmash_taxonomy_to_metacoder to the differential heat tree viz (viz from the metacoder vignette)'

Visualization when we have a tree

When GTDB is the database, we have a tree we can use to build visualizations (although we would have to have a function to download it, and that might get annoying):
image from: https://www.nature.com/articles/s41579-021-00562-3

switch `tax_glom*`/agglomeration language to aggregate

@elizabethmcd pointed out in #23:

This might just be a personal thing, but the term agglomeration and referring to the function tax_glom_taxonomy_annotate seems a little confusing and maybe doesn't clearly convey what this function is doing. I think in R the similar but more known action is aggregating and people might be more familiar with this? Up to you.

I was copying the syntax/naming of the phyloseq function that does this: https://rdrr.io/bioc/phyloseq/man/tax_glom.html

I wanted to record this feedback because if we keep getting it then I want to change the wording for the function.

installation fails on R 4.2.2 on Linux/i386, installed via conda.

with R installed via the following conda environment spec,

name: env
channels:
    - conda-forge
    - bioconda
    - defaults
dependencies:
    - python>=3.8
    - snakemake-minimal>=7.19.1,<8
    - sourmash>=4.6,<5
    - curl
    - r-ggplot2
    - r-pheatmap
    - r-viridis
    - r-ggplotify
    - r-rmarkdown

result in the set of installed packages (mamba list output attached), running the remotes::install command in the README results in:
mamba-list.txt

...
* checking for file ‘/tmp/Rtmp32aOFf/remotes716552f5280e/Arcadia-Science-sourmashconsumr-9ceaa18/DESCRIPTION’ ... OK
* preparing ‘sourmashconsumr’:
* checking DESCRIPTION meta-information ... OK
* checking for LF line-endings in source and make files and shell scripts
* checking for empty or unneeded directories
* building ‘sourmashconsumr_0.1.0.tar.gz’
ERROR: dependencies ‘httr’, ‘metacoder’, ‘phyloseq’ are not available for package ‘sourmashconsumr’

Not sure there's anything you can do about this, but wanted to document it here ;).

package name: change to something that makes it clear that this package doesn't encode the sourmash functionality in R, but consumes the outputs of sourmash and does stuff to them

Following convention, I'd like to avoid putting punctuation in the name.

Names that don't make it clear that this package does not re-implement the core sourmash functionality

  • rourmash
  • sourmashR
  • souRmash (also this one is bad because bc it's basically the same as sourmash)

Names that are a catchall

  • sourmashRutils

Maybe better ideas

  • sourmashconsumR

add functions to visualize and interrogate overall taxonomy results

Like are used in the notebook here: https://github.com/Arcadia-Science/2022-prjna853785-sourmash/blob/main/notebooks/20220815-visualize-sourmash-taxonomy-results.ipynb

Visualizations that I think are worth including:

  1. fraction of sample matched/unclassified colored by database
    a. maybe add a low-confidence portion -- taxonomic matches that had less than 50kb in the entire sample. I could used the paired palette for this -- high confidence bacteria, low confidence bacteria, etc.
  2. upset plot of shared lineages
    a. would be nice to choose which level of taxonomy this plot is made at
  3. ability to dig into intersections from the upset plots

make a vignette per sourmash output type

  • signatures (output by sourmash sketch or sourmash compute):
    • read_signature(), show how to read multiple signatures using purrr,
    • upset plots: from_signatures_to_upset_df(), plot_signatures_upset()
    • rarefaction plots for signatures sketched from reads: from_signatures_to_rarefaction_df(), plot_signatures_rarefaction()
  • sourmash compare csv:
    • read_compare()
    • MDS plot: make_compare_mds(), plot_compare_mds()
    • heatmap: plot_compare_heatmap()
  • sourmash taxonomy annotate csv
    • read_taxonomy_annotate()
    • taxonomy agglomeration: tax_glom_taxonomy_annotate()
    • upset plot: from_taxonomy_annotate_to_upset_inputs(), plot_taxonomy_annotate_upset()
    • sankey plot: plot_taxonomy_annotate_sankey()
    • time series alluvial plot: plot_taxonomy_annotate_ts_alluvial()
    • to metacoder: from_taxonomy_annotate_to_metacoder()
    • to phyloseq: from_taxonomy_annotate_to_phyloseq()
  • sourmash gather csv
    • read_gather()
    • barchart: plot_gather_classified()
    • upset plot: from_gather_to_upset_df(), plot_gather_upset()
  • upset utilities
    • from_list_to_upset_df()
    • from_upset_df_to_intersection_members()
    • from_upset_df_to_intersection_summary()
    • from_upset_df_to_intersections()

`n_unique_kmers` doesn't exist

Had an error shared with me (🎉):

Error in `dplyr::select()`:
! Can't subset columns that don't exist
x Column `n_unique_kmers` doesn't exist.
Run `rlang::last_error()` to see where the error occurred.

I see that the n_unique_kmers column is added during read_taxonomy_annotate, so the error is likely caused by using read_csv rather than read_taxonomy_annotate to read the file.

Would it be worth changing this internal column to n_unique_weighted_found to avoid this error for sourmash v4.5+, since we have the column now? We figured this name more clearly described the column info, but I'm not sure we discussed outside of the sourmash PR that added it.

Or if you want to force folks to use read_taxonomy_annotate (I see you do a couple other things in there) is there a way to catch the error + suggest the solution?

thanks for the awesome software!

example of making rarefaction curves from signatures representing fastq files

remotes::install_github("Arcadia-Science/sourmashconsumr")
library(sourmashconsumr)
library(dplyr)
library(ggplot2)
library(purrr)

sigs <- Sys.glob("*100k.sig") %>%
  map_dfr(read_signature) %>%
  filter(ksize == 21)

rarefaction_df <- from_signatures_to_rarefaction_df(sigs)
plot_signatures_rarefaction(rarefaction_df) # +
  # theme_minimal() +
  # geom_point(aes(color = name))

image

uncomment lines to get colored curves and no grey background.

add functions for rarefaction for groups of signatures using vegan

like used here (note all of these links are to the specaccum branch which will be deleted after Arcadia-Science/2022-mtx-not-in-mgx-pairs#9 is merged):

Only makes sense to run signatures with abundances calculated from reads. Also only really makes sense when it's run on many signatures from the same sample.

requires signatures to be read into a data frame (see #4)

change `read_signature()` to read in from one or many files

both read_gather() and read_taxonomy_annotate() automatically determine whether a user provided one file path or many file paths, and then read all of the files into a single data frame. read_signature() currently doesn't do that...it only works on one file. But it's simple to make it read many using purrr::map_dfr(read_signature)...so I should implement that so that the user experience for the functions are consistent.

add `tidyr::drop_na()` to sankey plot function to avoid errors/warnings and inaccurate plots

I had three data points with NAs in a recent sankey plot and i got the following errors and warnings and weird looking plots. this could be fixed with a drop_na filtering step. Could be parameterized, or just documented so the user knows this is happening.

Warning messages:
1: Removed 3 rows containing non-finite values (`stat_parallel_sets()`). 
2: Removed 3 rows containing non-finite values (`stat_parallel_sets_axes()`). 
3: Computation failed in `stat_parallel_sets_axes()`
Caused by error in `compute_panel()`:
! Axis aesthetics must be constant in each split 
4: Removed 3 rows containing non-finite values (`stat_parallel_sets_axes()`). 
5: Computation failed in `stat_parallel_sets_axes()`
Caused by error in `compute_panel()`:
! Axis aesthetics must be constant in each split 

Alpha diversity estimation

Hello and thanks for the awesome tool.

I have a question, I see you efficiently introduced a method to plot and represent beta-diversity between samples (dissimiliarities).

I was thinking, what is the best way to represent alpha diversity? is the just the amount of tax detected by sourmash taxonomy? the total number of sketches, or the slope like in the tutorial?
What is the most correct way to represent richness of a community? I think people would still love to see total number of species detected. But maybe a rarefaction curve with kmers should be reported too, supporting the result?

Thanks, sorry if the question, I am still a noob in metagenomics.

Coloring strain plots and why I decided not to implement it for now

Over in #50, I implemented a function that works with the sourmash taxonomy annotate output to detect whether multiple strains of a given species in a metagenome sample have multiple strains present or not. I toyed with the idea of trying to count the number of strains likely present mostly by clustering the abundances of the matched genomes. I would then color each matched genome by the strain that I guessed it belonged to (strain1, strain2, strain3, etc.). I've decided to punt on this for now because I don't think the gather/taxonomy output have enough information to do this well -- While different strains may sometimes cluster by abundance, I think it's likely that the first genome match will scoop in k-mers from multiple strains, and because we mostly report average k-mer abundance in the gather output, deconvolving these abundances is basically impossible. I think the right thing to do here would be to take a genome-grist esque approach where for a given set of genome matches within a species, we download all of them and iteratively map k-mers or reads to those genomes. Then we could use an expectation maximization algorithm to assign k-mers/reads or a genome. Alternatively, we could align to everything at once and then still use an EM algorithm that takes advantage of all the read mapping info to do the assignation. This would be a big lift for relatively little payoff -- the perk of this sourmash approach is that it's fast, and the idea is that you could use it to detect strain variation and then used heavier tools to dig in. Do a big mapping and then EM would be a big separate endeavor.

Dumping some code i ripped out of the function that dealt with abundances and trying to guess how many strains were present

abundances

  # ABUNDANCE -- I don't actually know what to do here, so to start,
  # I'm just coding to flag species where average kmer abundances for genomes deviate by more than 2.
  # average_abund <- taxonomy_annotate_df %>%
  #   dplyr::filter(.data$species %in% more_than_one_genome_observed_for_species$species) %>% # filter to species with more than one genome observed
  #   dplyr::group_by(query_name, species) %>%
  #   dplyr::summarise(min_average_abund = min(average_abund),
  #                    max_average_abund = max(average_abund),
  #                    sd_average_abund = sd(average_abund)) %>%
  #   dplyr::mutate(range_average_abund = max_average_abund - min_average_abund)

  #average_abund_filtered <- average_abund %>%
  #  dplyr::filter(range_average_abund >= 10)

guessing strains present

  # the below code won't work, but I think this logic could be used to draw delineations to count the number of strains.
  # I'm not totally sure yet how to get this logic to work with facet_wrap() to show colors --
  # probably something like calculating it in a different data frame and then joining it to the df that's plotted.
  # I would probably make a column like "strain" where I would label each dot "strain1", "strain2", etc. based on which intervals in the sd that the abundance falls in.
  # seq(average_abund$min_average_abund, average_abund$max_average_abund, by = average_abund$sd_average_abund)

  # I think I could also use logic to label potential prophages -- something like less than 3% of the genome with >100 more abundant than any other match for that species.
  # I need to validate this first though, potentially using SRR492184 Enterococcus faecalis.
  # Genome-grist on this sample would probably be the easiest thing to do.

  # plot_df <- taxonomy_annotate_df %>%
  #   dplyr::mutate(query_name_species = paste0(.data$query_name, "-", .data$species)) %>%
  #   dplyr::filter(.data$query_name_species %in% f_match_filtered$query_name_species)
  #
  # # label with strain count guesses before plotting
  # for(query_name_species in unique(plot_df$query_name_species)){
  #   print(query_name_species)
  # }

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.