Giter Club home page Giter Club logo

Comments (16)

tsibley avatar tsibley commented on September 26, 2024 2

I agree dataset isn't quite on the nose, especially since it'd (intentionally) work for narratives too. I'm hesitant to lift up its subcommands to the top-level though.

I do prefer upload/download over deploy/retrieve. Deploy has more connotations than I think we want (I regret this choice) and isn't a common term for non-developers, whereas upload/download are widely understood by even non-developers.

I definitely want to make sources first-class things understood by the CLI, but I think that's a different scope of work for the future. The current design of the deploy command is set up to support different kinds of destinations (although it only supports S3 right now), and I'll maintain this in whatever additions I make for download/delete.

from cli.

tsibley avatar tsibley commented on September 26, 2024 1

Ok, it sounds like we should re-envision the nextstrain deploy command.

What about something like the following, where deploy becomes dataset upload:

nextstrain dataset upload s3://nextstrain-data auspice/*.json
nextstrain dataset download s3://nextstrain-data/zika_tree.json some/dir/
nextstrain dataset delete s3://nextstrain-data/zika*

# Deploy will be an alias for `dataset upload`, so this still works, but asks
# you to start using the new command.
nextstrain deploy s3://nextstrain-data *.json

The dataset delete command will also take care of invalidating the CloudFront cache if necessary, the same way deploy does now.

from cli.

tsibley avatar tsibley commented on September 26, 2024 1

I spent some time yesterday and today implementing these commands under working names since the real names are still TBD. We should decide on the final names/structure so I can finish this up sooner than later. The basics are mostly there but there is still some refining of behaviour and documentation to round it out.

from cli.

tsibley avatar tsibley commented on September 26, 2024

Hmm. This seems like scope creep to me. The nextstrain command isn't meant to be a general-purpose S3 manipulation tool: aws s3 exists for that (and supports wildcards).

Note that S3 is just one possible destination for nextstrain deploy (albeit the only destination possible right now); future supported destinations might use SFTP, SCP, git, WebDAV, or other upload mechanisms. The intent is for nextstrain deploy to do what needs to be done to put locally built files at a location reachable by a deployed Nextstrain instance.

from cli.

jameshadfield avatar jameshadfield commented on September 26, 2024

with the implementation of private S3 buckets, we will need the ability to download / remove files (right now we can only do this via the AWS console).

from cli.

trvrb avatar trvrb commented on September 26, 2024

This seems very reasonable to me. I would find nextstrain dataset download useful in day-to-day work with flu.

However, dataset seems slightly non-perfect semantically, but I don't have a better suggestion.

I'm 50/50 on whether I prefer:

nextstrain dataset upload s3://nextstrain-data auspice/*.json
nextstrain dataset download s3://nextstrain-data/zika_tree.json some/dir/
nextstrain dataset delete s3://nextstrain-data/zika*

or

nextstrain retrieve s3://nextstrain-data auspice/*.json
nextstrain deploy s3://nextstrain-data/zika_tree.json some/dir/
nextstrain remove s3://nextstrain-data/zika*

And one additional thought... if we're planning for source to be a first class citizen, maybe we should elevate it in the CLI. This would make this:

nextstrain dataset upload wa-doh auspice/*.json
nextstrain dataset download wa-doh zika_tree.json some/dir/
nextstrain dataset delete wa-doh zika*

from cli.

tsibley avatar tsibley commented on September 26, 2024

I'm hesitant to lift up its subcommands to the top-level though.

That said, it may still be the right choice!

from cli.

trvrb avatar trvrb commented on September 26, 2024

I see the logic of the subcommand and think generally cleaner, but without dataset feeling right I think semantically better to just do nextstrain upload, nextstrain download, nextstrain delete. Other options I just considered, but didn't really like nextstrain io upload, nextstrain file upload, nextstrain remote upload.

from cli.

tsibley avatar tsibley commented on September 26, 2024

Sounds good to me.

from cli.

jameshadfield avatar jameshadfield commented on September 26, 2024

I have no comment on the naming but the functionality sounds perfect.

from cli.

tsibley avatar tsibley commented on September 26, 2024

I guess we'll also need a nextstrain list command?

(I'd plan to alias delete to rm and list to ls.)

from cli.

jameshadfield avatar jameshadfield commented on September 26, 2024

I said I wouldn't comment on naming, but having taught nextstrain deploy over the past 2 weeks here is a comment:

"list", "delete", "upload", "download" (and who knows, there maybe more in the future, e.g. "move") are all part of the same concept in that they interact with a "source" (or "group", not sure about terminology), and not the local computer / aws build / docker build. This should be clear from the command.

from cli.

tsibley avatar tsibley commented on September 26, 2024

Yeah, that commonality is a big reason why I'd prefer a subcommand to collect and identify them all, and why I'm hesitant to make them all top-level commands.

I dismissed dataset earlier, but coming back to it, maybe we do:

nextstrain dataset {upload,download,delete,list} …
nextstrain narrative {upload,download,delete,list} …

nextstrain deploy would become an alias for nextstrain dataset upload.

The backing code (currently what's in nextstrain/cli/deploy/* as used by nextstrain/cli/command/deploy.py) would be largely shared between both nextstrain dataset and nextstrain narrative, but the listing can be specific to one type of file and we can be dataset- or narrative-specific in help text.

from cli.

trvrb avatar trvrb commented on September 26, 2024

Nice! I was just doing some work / thinking about the proposal I had earlier in the summer to have entrypoint for builds start with downloading a flat file from S3 rather than calling out to fauna. My thought was to mirror data.nextstrain.org/zika.json with something like data.nextstrain.org/zika_sequences.fasta and data.nextstrain.org/zika_metadata.tsv. We don't have to move in this direction, but I can see creep of more file types than dataset and narrative. But that doesn't mean that dataset and narrative aren't good things to have.

That said, I am maybe coming around to nextstrain remote {upload,download,delete,list} ….

from cli.

tsibley avatar tsibley commented on September 26, 2024

Nod. During my implementation so far, I was realizing that separate dataset and narrative commands would add a lot of UI duplication to the CLI. So I was thinking better to have a single command like remote which, if desired, then could have separate filter options for file "types" (dataset, narrative, etc.).

I think nextstrain remote makes most sense, unless we want to have upload/download/etc at top-level.

from cli.

trvrb avatar trvrb commented on September 26, 2024

Thanks Tom. After thinking more, my preference is nextstrain remote upload/download/etc. If you prefer something else happy to go with that instead.

from cli.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.