Giter Club home page Giter Club logo

branchwater's People

Contributors

bluegenes avatar luizirber avatar suzannefleishman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

branchwater's Issues

operation timed out

$ ./mastiff sequence.fasta > matches.csv
[2023-07-15T04:19:00Z INFO mastiff_client] Preparing signature
[2023-07-15T04:19:00Z INFO mastiff_client] Sending request to https://mastiff.sourmash.bio
Error:
0: error sending request for url (https://mastiff.sourmash.bio/search): operation timed out
1: operation timed out

Location:
crates/client/src/main.rs:132

Backtrace omitted. Run with RUST_BACKTRACE=1 environment variable to display it.
Run with RUST_BACKTRACE=full to include source snippets

Publish images for branchwater components

After #4 is merged tag a new release and start publishing images, to make it easier to external projects to bring up/expand on branchwater

Questions:

  • where to host? dockerhub, quay.io, ghcr.io?

match to a query is missing from branchwater Web site

The accession BK010471 is for a crAssphage that is ubiquitous in human gut metagenomes (link), and in particular is found in the 454 data set SRR073439.

When I do a containment search, I see:

% sourmash search --containment BK010471.fa.sig SRR073439.sig -k 31

selecting specified query k=31
loaded query: BK010471.fa... (k=31, DNA)
--
loaded 3 total signatures from 1 locations.
after selecting signatures compatible with search, 1 remain.

1 matches above threshold 0.080:
similarity   match
----------   -----
 59.0%       SRR073439

and the Venn diagram is pleasing:

venn2

However, the FASTA sequence does not have any matches when searched at https://branchwater.jgi.doe.gov/. Any ideas?

thanks!

SRR073439.k31.sig.zip
BK010471.k31.sig.zip
BK010471.fa.zip

bad signature - ERR1248659

@luizirber not sure where to post this, but don't want it to get lost -

/group/ctbrowngrp/irber/data/wort-data/wort-sra/sigs/ERR1248659.sig is a bad sketch file.

What's the right way to handle this? Can you/we trigger rebuilding it?

Set up integration CI

Bring up branchwater with {podman,docker}-compose and run a query end-to-end.

Needs #4 to be merged first

How to cite mastiff?

Dear Luiz,

Thank you for creating this tool! Can you please tell me how to cite it?

visualize usage stats

thanks to @luizirber's nixos setup, we're already logging with caddy

log files on the mastiff server: wc -l /var/log/caddy/access-*

3868 /var/log/caddy/access-branchwater.sourmash.bio.log
23794 /var/log/caddy/access-mastiff.sourmash.bio.log
4511 /var/log/caddy/access-minke.sourmash.bio.log

But we should make this logging visible/accessible/usable.

suggestions from luiz:

we could alternatively set up a self-hosted plausible.io if we have issues using the caddy logs...

Build a k=31 SRA metagenomes index

I'll use this issue to document steps to build a k=31,scaled=1000 index for SRA metagenomes. This is the same process used for the current k=21,scaled=1000 index in branchwater.sourmash.bio, but considering the changes from #4, and bringing new SRA datasets added after the cutoff from the current index (2023-08-17).

Move metadata from mongodb into the index manifest?

Over at sourmash-bio/sourmash#3006 (comment) I mentioned adding extra columns to manifest to hold metadata not available in a signature. I think we can do the same approach to store the SRA metadata into the manifest, and remove the mongodb dependency, returning the metadata from the search index together with the containment.

More refs on the sourmash context: sourmash-bio/sourmash#2180

But... is it a good idea?

Over at #4 I'm trying to make it easy to bring up a new branchwater installation, and there is a bit of a dance for building index, bringing up mongo, loading metadata, and then bringing up server/frontend. Moving the metadata into the index building step makes things easier, but requires to be able to update the manifest in the index in case we want different data (which is not that hard, it's a CSV). It can be more constraining for developing new frontend features, tho?

pinging @bluegenes and @SuzanneFleishman for ideas =]

running branchwater on large assemblies

Hello,
Thanks for developing such a great tool. I've been trying to run branchwater on some whole metagenome assemblies that are quite large (0.3G-1G). When I upload even the smaller ones and submit them I don't get any output. I've tried leaving a couple with the tab open for ~12 hours to no avail.
If I leave them for long enough will they eventually complete?
Thanks!
Jenny

Selecting multiple queries

Dear branchwater team,

I was wondering if branchwater is intended to be able to accept several genomes as queries. The website allows it but I am noticing that if I select two FastA files as inputs, even though they both seem to load and get green checkmarks, the results in the CSV file are only for one of the genomes in my list (and they are identical to the ones I get if I only select that one genome as query). I am wondering if this feature isn't implemented yet or if it's not intended at all (or maybe I am doing something wrong).

Thanks!
Tanya

Add dev containers for easier dev environments

add information about data privacy to branchwater web site

from an e-mail conversation, author luiz:


I'm enjoying the branchwater metagenome query tool. However, are the submitted
queries stored or used within your servers? I want to submit a couple of genomes
that are not published yet, and I want to make sure they are only available
after our manuscript is published.

They are stored only in memory while the search is happening, they are never
stored on disk [0].

I do check the server access logs from time to time to have an idea of how many
unique visitors we have, but this is only data that the HTTP server logs
regularly (time, IP), and doesn't include the HTTP request content (where the
query data actually lives).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.