scilifelabdatacentre / dds_cli Goto Github PK

View Code? Open in Web Editor NEW

7.0 7.0 10.0 103.49 MB

The command line interface for access to the Data Delivery System primarily developed by the SciLifeLab Data Centre.

License: MIT License

Python 100.00%

click command-line python

dds_cli's People

Contributors

Stargazers

Watchers

Forkers

ewels monikabrandt alneberg aanil matthiaszepper i-oden fossabot valyo wylhtydtm nylander

dds_cli's Issues

Add flags and extra options to config file

If we're going to have a "config" file, i.e. the one that the username etc are saved to atm (.dds-cli.json), there was a suggestion to add the possibility of entering the flags (e.g. destination etc) and --source in it as well. Reduces long lines in the terminal and makes it easier for the user to check and change the options in some cases.

Listing folder globs gives SQL error

Testing the dds ls command line tool I managed to get this:

$ dds ls fac002 */_old
     ︵
 ︵ (  )   ︵
(  ) ) (  (  )   SciLifeLab Data Delivery System
 ︶  (  ) ) (    https://www.scilifelab.se/data
      ︶ (  )    Version 0.2.0
          ︶

INFO     Listing files for project 'fac002'                                                                                                                  data_lister.py:124
INFO     Showing files in folder '*/_old'                                                                                                                    data_lister.py:126
ERROR    Failed to get list of files: (pymysql.err.OperationalError) (1139, "Regex error 'quantifier does not follow a repeatable item at offset 1'")           __main__.py:265
         [SQL: SELECT DISTINCT files.subpath AS files_subpath
         FROM files
         WHERE files.project_id = binary(%(binary_1)s) AND (files.subpath regexp %(subpath_1)s)]
         +)+$'}]
         (Background on this error at: http://sqlalche.me/e/14/e3q8)

I haven't dug into the code to figure out what this really means yet, but getting a SQL error is slightly scary (I'm assuming that little Bobby Tables will be safe here..? 👀 ).

Anyway, would be nice to enable file path globbing when listing files if possible, or catching this type of error if not (and just returning a files-not-found response).

`ls` sort option

Add option to be able to sort projects according to either of the fields?

Add --no-compression option

Create add_user command incl endpoint

List whole directory structure

At the moment the cli dds ls [projectID] command lists the root level folders and files. Add possibility of listing entire directory structure and use the pagination functionality.

Create tests for DataLister class

Stream processing and upload

User story: As a user, I want to upload my protected data, without first having to use space on my local computer.

Connected to #121 as well.

Change delivery report format

Currently the delivery report is saved to a json file, with important information if something should fail. Change format from json perhaps? Can be discussed at some point.

Create tests for CLI User class

Add clearer error message for `dds ls`

dds ls gives "default backend - 404" if DDS_CLI_ENV is not set

Log version

It's a good idea to always log the version of the tool that's running. Also to have a --version flag that does only this.

As an added bonus, you can also attempt to fetch the git hash if the script is in a repository and print that (I do that for MultiQC - it's pretty useful when multiple people are working with a dev version).

Weird `--break-on-fail` bug

When using --break-on-fail and an error occurs, the cursor disappears in the terminal. Need to use top to get the cursor back, or restart the terminal. Why?

Errors when no permission before more checks

Change the workflow so that a message is displayed straight away if a user tries to upload or a facility tries to download.

Fix `spf` non existent file handling

The -spf option currently does not produce an error/warning if the file contains non existent paths - should warn about this.

Return size when listing projects

Return the size of the project and add a column to the project table

Add `info` option to display project info

Add info option to CLI --> dds info
This will get and display all info related to a specific project. Not the contents, just the info.

Continue failed upload

User story: As a user, I want to be able to continue a failed upload if anything goes wrong.

Possibility to create project through CLI?

Log errors to file straight away

At the moment the CLI waits until the end of the upload/download to add error information to the error log file. The errors and important information should be saved to the file straight away, when the error occurs.

Fix glitchy progress bars

Since I messed around with the logging code, the progress bar used for uploads and downloads now jumps whenever a log message is printed, instead of elegantly dropping down below each new message.

This is because the log handler and the progress bar are using different rich Console objects. It should be possible to fix by sharing the same Console between both. See Textualize/rich#1317 (comment) for an example.

I think that to do this, it maybe makes sense to create a Console object that can be returned from a utils module or something for easy reused between disparate parts of the codebase. But whatever makes sense really.

Add action logging to REST API calls

Deleting folder - not deleted if contains subfolders?

A tester commented that the full folder was not deleted remotely if it contains subfolders. Strange behaviour which should be looked into.

Add possibility of excluding items from directory

Some users may want to upload/download an entire folder but exclude individual files within it.

Make a decision on 2FA or MFA to offer to users

Two or multi factor authentication helps strengthen the security of the system by means of additional factors to identify users. They also help with the problem of strong passwords and maintaining/remembering them. We would like to improve both the security and the user experience in the system.

Add deletion using wild card?

Possibility to delete using dds rm fileprefix* for example.

Configure email for invite confirmation

DDS_METHODS not used?

dds_cli/dds_cli/__init__.py

Line 20 in 7b09e75

DDS_METHODS = ["ls"]

Just doing a bit of code review to get into the code. I saw this was added recently in 1d99e6f, but it doesn't seem to be used anywhere? I'm not sure it's needed either, I guess click should already keep track of which commands are allowed? But maybe I'm wrong?

Random folder permissions when downloading

Apparently the get command creates files with different permissions randomly. Why?

Fix Windows icon compatibility

The CLI only displays small question mark symbols instead of up/down/lock/warning symbols etc.

Python best practices - system exits & logging

System exits

Doing a hard system exit is an absolute last resort. Normally you want to be raising an exception instead, then probably having a single location right at the top of the code to catch this and do the system exit with non-zero code.

This is especially important if other code packages are importing and using the functions. System exits cannot be caught, so for example if used in the dds_web code then these will crash the server. Exceptions can be caught and handled differently depending on where the function is being used.

In most cases here I think it will make sense to create your own exception types. You can then pass the error message in the raise statement and log it before exiting downstream (eg. as done here).

I use this pattern a lot in nf-core and typically do the exit call in the command-line handling code. eg. here. So then if you have 5 subcommands you'll probably have a maximum of 5 exit calls. You can be pretty sure that no-one else will be importing and reusing your cli handling code.

In some cases you have an exit code 0 because it's something like just not having any files to show etc - in other words, normal behaviour. Here you should probably just use return to drop out of the function execution early without an exception.

Logging

You almost never want to print to the console using a rich Console.

One reason is that this prints to standard out - but most of the current usage is log / status messages. Normally, these should be going to standard error and only the real "results" (eg. the list of projects etc) should go to standard out. This means that command line users can split the output types for downstream use.

Much like the system exits / exceptions, the console calls are not useful for other tools importing the functions. It's better to use the logging library instead - then the log messages can be assigned to a namespace and their output customised by any tool using the function (eg. only showing errors or being channeled to a web server log).

Rich has a logging handler so you can keep the command line outputs looking identical. Here's how I use it in nf-core. My implementation is slightly more complex, as I give the option of also logging to a file, without rich. I also enable highlighting / rich syntax and have a function that basically makes the colours show up in GitHub Actions CI tests.

Create parsable log

Some analytics may be wanted from the log files. For this feature we first need to decide on what info should be saved to those files.

Create pytest for endpoint DDSEndpoint.AUTH_PROJ

File link handling

How to resolve and not resolve file links within and outside of specified folders. Includes saving link information to database and creating links on download.

Add possibility of creating projects via the CLI?

Create pytest for endpoint DDSEndpoint.PROJ_PUBLIC

Download files to already existing directory

Currently need to specify a new directory name when downloading. Ability to append to an already existing directory should be added. Make sure only the recently downloaded files can be deleted though.

Add upload --destination option

Add the --destination option to dds put so that the end user can specify which remote existing or new folder the items should be placed in during upload.

Add --chunk-size option

Add option to change the chunk size in which the files are read and encrypted/decrypted.

`rm` throws error about non-existing `warn_if_many` if file does not exist

Tried to remove a file which did not exist and got:

data_lister.DataLister.warn_if_many(count=len(not_exists) + len(delete_failed))
AttributeError: type object 'DataLister' has no attribute 'warn_if_many'

The rm previously listed the files which where not successfully removed within the system, but warned if it was too many. The pagination should be added here in the same way as for in the data_lister.py

Change project size update

@inaod568 commented on Mon Jun 14 2021

At first - project size updated after each uploaded file. This produced deadlock issues when uploading a lot of small files since it tried to update the same project table field at the same time in multiple requests.

Now - Updates the project size at the end of the upload. This means that if an error occurs during the upload, the project size is not updated.

Fix: Either add a queue (for example) to the API and update the db after each file, or add the project size update to the cleanup after failed upload.