scilifelabdatacentre / dds_cli Goto Github PK
View Code? Open in Web Editor NEWThe command line interface for access to the Data Delivery System primarily developed by the SciLifeLab Data Centre.
License: MIT License
The command line interface for access to the Data Delivery System primarily developed by the SciLifeLab Data Centre.
License: MIT License
If we're going to have a "config" file, i.e. the one that the username etc are saved to atm (.dds-cli.json
), there was a suggestion to add the possibility of entering the flags (e.g. destination etc) and --source
in it as well. Reduces long lines in the terminal and makes it easier for the user to check and change the options in some cases.
Testing the dds ls
command line tool I managed to get this:
$ dds ls fac002 */_old
︵
︵ ( ) ︵
( ) ) ( ( ) SciLifeLab Data Delivery System
︶ ( ) ) ( https://www.scilifelab.se/data
︶ ( ) Version 0.2.0
︶
INFO Listing files for project 'fac002' data_lister.py:124
INFO Showing files in folder '*/_old' data_lister.py:126
ERROR Failed to get list of files: (pymysql.err.OperationalError) (1139, "Regex error 'quantifier does not follow a repeatable item at offset 1'") __main__.py:265
[SQL: SELECT DISTINCT files.subpath AS files_subpath
FROM files
WHERE files.project_id = binary(%(binary_1)s) AND (files.subpath regexp %(subpath_1)s)]
+)+$'}]
(Background on this error at: http://sqlalche.me/e/14/e3q8)
I haven't dug into the code to figure out what this really means yet, but getting a SQL error is slightly scary (I'm assuming that little Bobby Tables will be safe here..? 👀 ).
Anyway, would be nice to enable file path globbing when listing files if possible, or catching this type of error if not (and just returning a files-not-found response).
Add option to be able to sort projects according to either of the fields?
At the moment the cli dds ls [projectID]
command lists the root level folders and files. Add possibility of listing entire directory structure and use the pagination functionality.
User story: As a user, I want to upload my protected data, without first having to use space on my local computer.
Connected to #121 as well.
Currently the delivery report is saved to a json file, with important information if something should fail. Change format from json perhaps? Can be discussed at some point.
dds ls
gives "default backend - 404" if DDS_CLI_ENV
is not set
It's a good idea to always log the version of the tool that's running. Also to have a --version
flag that does only this.
As an added bonus, you can also attempt to fetch the git hash if the script is in a repository and print that (I do that for MultiQC - it's pretty useful when multiple people are working with a dev version).
When using --break-on-fail
and an error occurs, the cursor disappears in the terminal. Need to use top
to get the cursor back, or restart the terminal. Why?
Change the workflow so that a message is displayed straight away if a user tries to upload or a facility tries to download.
The -spf
option currently does not produce an error/warning if the file contains non existent paths - should warn about this.
Return the size of the project and add a column to the project table
Add info
option to CLI --> dds info
This will get and display all info related to a specific project. Not the contents, just the info.
User story: As a user, I want to be able to continue a failed upload if anything goes wrong.
At the moment the CLI waits until the end of the upload/download to add error information to the error log file. The errors and important information should be saved to the file straight away, when the error occurs.
Since I messed around with the logging code, the progress bar used for uploads and downloads now jumps whenever a log message is printed, instead of elegantly dropping down below each new message.
This is because the log handler and the progress bar are using different rich Console
objects. It should be possible to fix by sharing the same Console
between both. See Textualize/rich#1317 (comment) for an example.
I think that to do this, it maybe makes sense to create a Console
object that can be returned from a utils
module or something for easy reused between disparate parts of the codebase. But whatever makes sense really.
A tester commented that the full folder was not deleted remotely if it contains subfolders. Strange behaviour which should be looked into.
Some users may want to upload/download an entire folder but exclude individual files within it.
Two or multi factor authentication helps strengthen the security of the system by means of additional factors to identify users. They also help with the problem of strong passwords and maintaining/remembering them. We would like to improve both the security and the user experience in the system.
Possibility to delete using dds rm fileprefix*
for example.
Apparently the get command creates files with different permissions randomly. Why?
The CLI only displays small question mark symbols instead of up/down/lock/warning symbols etc.
Doing a hard system exit is an absolute last resort. Normally you want to be raising an exception instead, then probably having a single location right at the top of the code to catch this and do the system exit with non-zero code.
This is especially important if other code packages are importing and using the functions. System exits cannot be caught, so for example if used in the dds_web
code then these will crash the server. Exceptions can be caught and handled differently depending on where the function is being used.
In most cases here I think it will make sense to create your own exception types. You can then pass the error message in the raise
statement and log it before exiting downstream (eg. as done here).
I use this pattern a lot in nf-core and typically do the exit call in the command-line handling code. eg. here. So then if you have 5 subcommands you'll probably have a maximum of 5 exit calls. You can be pretty sure that no-one else will be importing and reusing your cli handling code.
In some cases you have an exit code 0
because it's something like just not having any files to show etc - in other words, normal behaviour. Here you should probably just use return
to drop out of the function execution early without an exception.
You almost never want to print to the console using a rich
Console
.
One reason is that this prints to standard out - but most of the current usage is log / status messages. Normally, these should be going to standard error and only the real "results" (eg. the list of projects etc) should go to standard out. This means that command line users can split the output types for downstream use.
Much like the system exits / exceptions, the console calls are not useful for other tools importing the functions. It's better to use the logging
library instead - then the log messages can be assigned to a namespace and their output customised by any tool using the function (eg. only showing errors or being channeled to a web server log).
Rich has a logging handler so you can keep the command line outputs looking identical. Here's how I use it in nf-core. My implementation is slightly more complex, as I give the option of also logging to a file, without rich
. I also enable highlighting / rich
syntax and have a function that basically makes the colours show up in GitHub Actions CI tests.
Some analytics may be wanted from the log files. For this feature we first need to decide on what info should be saved to those files.
How to resolve and not resolve file links within and outside of specified folders. Includes saving link information to database and creating links on download.
Currently need to specify a new directory name when downloading. Ability to append to an already existing directory should be added. Make sure only the recently downloaded files can be deleted though.
Add the --destination
option to dds put
so that the end user can specify which remote existing or new folder the items should be placed in during upload.
Add option to change the chunk size in which the files are read and encrypted/decrypted.
Tried to remove a file which did not exist and got:
data_lister.DataLister.warn_if_many(count=len(not_exists) + len(delete_failed))
AttributeError: type object 'DataLister' has no attribute 'warn_if_many'
The rm
previously listed the files which where not successfully removed within the system, but warned if it was too many. The pagination should be added here in the same way as for in the data_lister.py
@inaod568 commented on Mon Jun 14 2021
At first - project size updated after each uploaded file. This produced deadlock issues when uploading a lot of small files since it tried to update the same project table field at the same time in multiple requests.
Now - Updates the project size at the end of the upload. This means that if an error occurs during the upload, the project size is not updated.
Fix: Either add a queue (for example) to the API and update the db after each file, or add the project size update to the cleanup after failed upload.
A single, all-containing executable file is easier to handle for ordinary users. No need to have a fully-functional Python setup. This alternative should be looked into at some point.
The package PyInstaller https://www.pyinstaller.org/ can create executables for Linux, MacOS and Windows.
As a unit admin and personnel, I want to create a project via the CLI.
This involves the CLI and the endpoints.
Message currently only displayed if checksum verification fails. Suggestion to add success message if the flag is used and the verification succeeded.
When specifying the wrong username and no password, the CLI still asks for the password. Suggestion to check if the username is correct before prompting.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.