Giter Club home page Giter Club logo

invenio-files-rest's Introduction

Invenio-Files-REST

image

image

image

image

Invenio-Files-REST is a files storage module. It allows you to store and retrieve files in a similar way to Amazon S3 APIs.

Features:

  • Files storage with configurable storage backends
  • Secure REST APIs
  • Support for large file uploads and multipart upload.
  • Customizable access control
  • File integrity monitoring

Further documentation is available on https://invenio-files-rest.readthedocs.io/.

invenio-files-rest's People

Contributors

alizeepace avatar chiarabi avatar chriz-uniba avatar drjova avatar egabancho avatar emanueldima avatar fenekku avatar github-actions[bot] avatar glignos avatar inveniobot avatar jacquerie avatar jbenito3 avatar jirikuncar avatar jmartinm avatar jrcastro2 avatar kpsherva avatar lnielsen avatar max-moser avatar ntarocco avatar rekt-hard avatar samihiltunen avatar slint avatar sloria avatar spirosdelviniotis avatar switowski avatar tiborsimko avatar topless avatar utnapischtim avatar vlad-bm avatar zzacharo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

invenio-files-rest's Issues

tests: simplify doctest execution

The following cookiecutter change:

inveniosoftware/cookiecutter-invenio-module#98

should be propagated to this Invenio module.

Namely, in run-tests.sh, the sphinx for doctests is invoked after pytest run:

$ tail -3 ./\{\{\ cookiecutter.project_shortname\ \}\}/run-tests.sh
sphinx-build -qnNW docs docs/_build/html && python setup.py test && sphinx-build -qnNW -b doctest docs docs/_build/doctest

This sometimes led to problems on Travis CI with the second sphinx-build run due
to "disappearing" dependencies after the example application was tested.

A solution that worked for invenio-marc21 (see
inveniosoftware/invenio-marc21#49 (comment))
and that was integrated in cookiecutter (see
inveniosoftware/cookiecutter-invenio-module#98) was to
run doctest execution in pytest, removing the second sphinx-build invocation.

This both solved Travis CI build failures and simplified test suite execution.

Note that this change may necessitate to amend the code tests etc so that things
would be executed with the Flask application context (see
inveniosoftware/invenio-marc21@09e98fc).

api: stabilise and document

  • check existing API functionality
  • add missing important API functionality
  • check API function signatures and parameters
  • enhance API docstrings (param, returns, raises, versionadded)
  • plug API functions to existing docs
  • add required database versions

cds/zenodo: define how files are listed in record

Records UI template has a previewer and should list files (#8). This task is about how files are listed inside the record metadata.

  • Define how files are listed in record metadata.
  • What metadata should go into the record metadata and what should stay in Files.

Depends on data migration scripts.

access: implement access control

  • Permission factory - Use Records-UI/REST as example
  • Find CDS permission example and test how to implement it.
  • Script to flush permission from Record to Files (example)

api: file upload

  • Chunking (merging chunk)
  • Access control
  • Integrity tests (transaction like).
  • URL upload (via Celery task) + Possible progress reporting of status via Celery status and REST API)

Babel is missing from requirements

After installing the application,

I tried to run it with :

python example/app.py

But crashed with output being :

Traceback (most recent call last):
  File "examples/app.py", line 41, in <module>
    from flask_babelex import Babel
ImportError: No module named flask_babelex

background: fixity checker

  • API: Utility module to verify fixity (perhaps virus scan with calmat is easy as well).
  • Nice to have: Celery tasks to verify all FileInstance's (without overloading the system for a very long period).

Depends on data model and storage interface.

integration: records ui page template + previewer

  • Merge and fix previewer.
  • Integrate previewer on record template page.
  • List files in a record
  • Example app (either CDS/Zenodo or module example app)

Depends on files integrated into record metadata.

api: inconsistent error codes in REST API

Problem:
When trying to call ObjectResource.put in the REST API, if the user does not have access the status code will always be 404 whereas it should be 401 or 403 when the user has read access on the object (GET returns 200).

I didn't check every REST API method but I suspect it might not be the only one with this issue.

I mark this as hard as it requires some refactoring.

This impacts #89

models/storage: handle file creation failure

Problem:
The way files are currently created would leave files on the filesystem in case of rollback/crash without any reference in the database.

Planned solution:

  • Add a delete method to storage.
  • modify files creation in buckets so that the worst case scenario is a dangling file row in the database referencing no file on the filesystem.

RFC files CLI

In Zenodo we currently have a custom file curation CLI implemented, specific to our usecases. It might be worthwhile to have something generic. This would be a good starting point, which at the moment would also satisfy our use cases:

invenio files list [OPTIONS]
invenio files add <file> [OPTIONS]
invenio files remove <key> [OPTIONS]
invenio files rename <old_key> <new_key> [OPTIONS]

In the end the commands needs to determine a bucket on which the operations are to be made, one obvious choice for [OPTIONS] would be:
--bucket/-b <bucket UUID>
however, one would also want to have a direct replacement on a record or deposit without dealing with UUIDs on the command line, e.g.:
--recid/-r <recid> or maybe better a pair --type/-t <pid_type> --value/-v <pid_value>, so that the following is also possible:

invenio files add ./file.txt --recid 12345 or maybe pid_value-indifferent version:
invenio files add ./file.txt -t recid -v 12345

models: object version is_head not updated after head version is deleted

Problem:
When the head Object version is deleted, the previous object version is not marked as head.

Scenario:
Using the example's app.py:

$ curl -i -X PUT --data-binary @../INSTALL.rst http://localhost:5000/files/$B/INSTALL.rst
$ curl -XDELETE http://localhost:5000/files/$B/INSTALL.rst
$ curl http://localhost:5000/files/$B?versions
{
  "created": "2017-03-30T14:37:12.975024+00:00",
  "locked": false,
  "size": 26,
  "links": {
    "self": "http://localhost:5000/files/1f5a9e78-08ba-4637-a9f8-8d433efef361",
    "uploads": "http://localhost:5000/files/1f5a9e78-08ba-4637-a9f8-8d433efef361?uploads",
    "versions": "http://localhost:5000/files/1f5a9e78-08ba-4637-a9f8-8d433efef361?versions"
  },
  "id": "1f5a9e78-08ba-4637-a9f8-8d433efef361",
  "updated": "2017-03-30T14:40:03.968480+00:00",
  "contents": [
    {
      "mimetype": "application/octet-stream",
      "created": "2017-03-30T14:40:03.970038+00:00",
      "links": {
        "self": "http://localhost:5000/files/1f5a9e78-08ba-4637-a9f8-8d433efef361/INSTALL.rst?versionId=f6e6d767-0896-4e0e-a811-8d658e13770d",
        "version": "http://localhost:5000/files/1f5a9e78-08ba-4637-a9f8-8d433efef361/INSTALL.rst?versionId=f6e6d767-0896-4e0e-a811-8d658e13770d"
      },
      "updated": "2017-03-30T14:40:03.970044+00:00",
      "delete_marker": true,
      "version_id": "f6e6d767-0896-4e0e-a811-8d658e13770d",
      "key": "INSTALL.rst",
      "is_head": true
    },
    {
      "mimetype": "application/octet-stream",
      "created": "2017-03-30T14:38:00.201799+00:00",
      "size": 26,
      "links": {
        "self": "http://localhost:5000/files/1f5a9e78-08ba-4637-a9f8-8d433efef361/INSTALL.rst?versionId=76c9d785-bb40-4483-8e8a-03a26b7c4eb4",
        "version": "http://localhost:5000/files/1f5a9e78-08ba-4637-a9f8-8d433efef361/INSTALL.rst?versionId=76c9d785-bb40-4483-8e8a-03a26b7c4eb4"
      },
      "updated": "2017-03-30T14:40:03.969321+00:00",
      "delete_marker": false,
      "version_id": "76c9d785-bb40-4483-8e8a-03a26b7c4eb4",
      "checksum": "md5:7dd1e6c1407175102e662c69aa4140e0",
      "key": "INSTALL.rst",
      "is_head": false
    }
  ],
  "max_file_size": null,
  "quota_size": null
}
$ curl -XDELETE http://localhost:5000/files/1f5a9e78-08ba-4637-a9f8-8d433efef361/INSTALL.rst?versionId=f6e6d767-0896-4e0e-a811-8d658e13770d
$ curl http://localhost:5000/files/$B?versions
{
  "created": "2017-03-30T14:37:12.975024+00:00",
  "locked": false,
  "size": 26,
  "links": {
    "self": "http://localhost:5000/files/1f5a9e78-08ba-4637-a9f8-8d433efef361",
    "uploads": "http://localhost:5000/files/1f5a9e78-08ba-4637-a9f8-8d433efef361?uploads",
    "versions": "http://localhost:5000/files/1f5a9e78-08ba-4637-a9f8-8d433efef361?versions"
  },
  "id": "1f5a9e78-08ba-4637-a9f8-8d433efef361",
  "updated": "2017-03-30T14:40:03.968480+00:00",
  "contents": [
    {
      "mimetype": "application/octet-stream",
      "created": "2017-03-30T14:38:00.201799+00:00",
      "size": 26,
      "links": {
        "self": "http://localhost:5000/files/1f5a9e78-08ba-4637-a9f8-8d433efef361/INSTALL.rst?versionId=76c9d785-bb40-4483-8e8a-03a26b7c4eb4",
        "version": "http://localhost:5000/files/1f5a9e78-08ba-4637-a9f8-8d433efef361/INSTALL.rst?versionId=76c9d785-bb40-4483-8e8a-03a26b7c4eb4"
      },
      "updated": "2017-03-30T14:40:03.969321+00:00",
      "delete_marker": false,
      "version_id": "76c9d785-bb40-4483-8e8a-03a26b7c4eb4",
      "checksum": "md5:7dd1e6c1407175102e662c69aa4140e0",
      "key": "INSTALL.rst",
      "is_head": false
    }
  ],
  "max_file_size": null,
  "quota_size": null
}

models: ObjectVersion timestamps

Problem:
The only way to order ObjectVersion instances is by using their created date. The way we currently set this date is by using datetime.utcnow. The problem is that we don't have anything to check that the date is new. If we have any clock drift we might create versions in the past and mess the ordering.

Solution:
We just need to check if the created date is after the last ObjectVersion's one. No need to check the clock drift each time as it would not be efficient. Only the version ordering matters.

Side effect of this solution:
If there is a time drift in the future, no ObjectVersion can be created once the clock is synchronized again. We also need a big warning for that so that the time can be fixed. As this should happen rarely we don't need to add a mechanism to fix this case right now. It will be the sysadmin's job.

api: customizable response header

Use case: Storing metadata for each file. One way of doing this is to create a record whenever a file is uploaded, and somehow give the link to the record in the upload response.

Who needs this: B2Share

Problem: invenio-files-rest tries to be compatible with the Amazon S3 API. Thus we cannot modify the response content as it might conflict with an existing definition provided by Amazon.

Suggested solution: Enable the customization of response headers.

How it would be used: rfc5988 enables to return links in the HTTP response header. The link pointing to a file's metadata could be typed as describedby.
There is few chances that this will conflict with Amazon S3 API, and if it does, it would still be possible to define a proprietary link type. It is up to the overlay to make sure that there is no conflict.

uploader: javascript module for uploading files

Take Zenodo file upload as example.

  • List files.
  • Start/stop upload.
  • Progress bar.
  • Support for Dropbox/OwnCloud upload
  • Chunk files on upload (MB to GBs of upload)
  • Nice to haves:
  • [ ](Show upload speed)
  • Progress reporting for dropbox/owncloud.

Decide mid-sprint if we go with angular, or the existing FlightJS uploader.

Depends on File upload REST API (chunking support)

models: multiple head object versions with same key after delete

Problem:
Deleting an object version and creating another one with the same key will leave two heads for the key.

Scenario:
Using the example's app.py:

$ curl -i -X PUT --data-binary @../INSTALL.rst http://localhost:5000/files/$B/INSTALL.rst
$ curl -XDELETE http://localhost:5000/files/$B/INSTALL.rst
$ curl -i -X PUT --data-binary @../INSTALL.rst http://localhost:5000/files/$B/INSTALL.rst
$ curl http://localhost:5000/files/$B?versions
{
  "created": "2017-03-30T15:13:11.595743+00:00",
  "locked": false,
  "size": 52,
  "links": {
    "self": "http://localhost:5000/files/d9be5021-2252-415d-8819-6ad14a162f8e",
    "uploads": "http://localhost:5000/files/d9be5021-2252-415d-8819-6ad14a162f8e?uploads",
    "versions": "http://localhost:5000/files/d9be5021-2252-415d-8819-6ad14a162f8e?versions"
  },
  "id": "d9be5021-2252-415d-8819-6ad14a162f8e",
  "updated": "2017-03-30T15:15:14.526757+00:00",
  "contents": [
    {
      "mimetype": "application/octet-stream",
      "created": "2017-03-30T15:15:14.510472+00:00",
      "size": 26,
      "links": {
        "self": "http://localhost:5000/files/d9be5021-2252-415d-8819-6ad14a162f8e/INSTALL.rst",
        "version": "http://localhost:5000/files/d9be5021-2252-415d-8819-6ad14a162f8e/INSTALL.rst?versionId=39ed53e5-0978-4d7f-a26f-f560c80366e0",
        "uploads": "http://localhost:5000/files/d9be5021-2252-415d-8819-6ad14a162f8e/INSTALL.rst?uploads"
      },
      "updated": "2017-03-30T15:15:14.520953+00:00",
      "delete_marker": false,
      "version_id": "39ed53e5-0978-4d7f-a26f-f560c80366e0",
      "checksum": "md5:7dd1e6c1407175102e662c69aa4140e0",
      "key": "INSTALL.rst",
      "is_head": true
    },
    {
      "mimetype": "application/octet-stream",
      "created": "2017-03-30T15:14:48.651752+00:00",
      "links": {
        "self": "http://localhost:5000/files/d9be5021-2252-415d-8819-6ad14a162f8e/INSTALL.rst?versionId=e5984287-32a7-43f5-95a8-8876c7a9872f",
        "version": "http://localhost:5000/files/d9be5021-2252-415d-8819-6ad14a162f8e/INSTALL.rst?versionId=e5984287-32a7-43f5-95a8-8876c7a9872f"
      },
      "updated": "2017-03-30T15:14:48.651759+00:00",
      "delete_marker": true,
      "version_id": "e5984287-32a7-43f5-95a8-8876c7a9872f",
      "key": "INSTALL.rst",
      "is_head": true
    },
    {
      "mimetype": "application/octet-stream",
      "created": "2017-03-30T15:14:41.706431+00:00",
      "size": 26,
      "links": {
        "self": "http://localhost:5000/files/d9be5021-2252-415d-8819-6ad14a162f8e/INSTALL.rst?versionId=59cdf0da-6a97-4eb5-8fe6-7a612c519ab6",
        "version": "http://localhost:5000/files/d9be5021-2252-415d-8819-6ad14a162f8e/INSTALL.rst?versionId=59cdf0da-6a97-4eb5-8fe6-7a612c519ab6"
      },
      "updated": "2017-03-30T15:14:48.651368+00:00",
      "delete_marker": false,
      "version_id": "59cdf0da-6a97-4eb5-8fe6-7a612c519ab6",
      "checksum": "md5:7dd1e6c1407175102e662c69aa4140e0",
      "key": "INSTALL.rst",
      "is_head": false
    }
  ],
  "max_file_size": null,
  "quota_size": null
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.