Giter Club home page Giter Club logo

storage-inventory's Introduction

OpenCADC Storage Inventory System

The Storage Inventory system is designed to manage millions/billions of files for science data archive.

What is it? Concept Documentation

software components

versions

For libraries (cadc-{name}) the version is in the build.gradle file. Libraries are published to maven central under the org.opencadc groupId, for example org.opencadc:cadc-inventory.

For services and agents, the version is in the VERSION file. Docker images are published to the images.opencadc.org repository (currently a Harbor service).

storage-inventory-dm

This is the storage inventory data model and architecture documentation. TODO: Add an FAQ.

baldur

This is an implementation of the permissions service API using configurable rules to grant access based on resource identifiers (Artifact.uri values in the inventory data model).

Official docker image: images.opencadc.org/storage-inventory/baldur:$VER

critwall

This is an implementation of the file-sync process that runs at a storage site and downloads files.

Official docker image: images.opencadc.org/storage-inventory/critwall:$VER

fenwick

This is an implementation of the metadata-sync process that runs at both global inventory and at storage sites.

Official docker image: images.opencadc.org/storage-inventory/fenwick:$VER

luskan

This is an implementation of the metadata service that enables querying the storage inventory at both global inventory and storage sites. It is an IVOA TAP service that supports ad-hoc querying of the inventory data model.

Official docker image: images.opencadc.org/storage-inventory/luskan:$VER

minoc

This is an implementation of the file service that supports HEAD, GET, PUT, POST, DELETE operations and IVOA SODA operations.

Official docker image: images.opencadc.org/storage-inventory/minoc:$VER

raven

This is an implementation of the global locator service that supports transfer negotiation and direct file GET requests.

Official docker image: images.opencadc.org/storage-inventory/raven:$VER

ratik

This is an implementation of the metadata-validate process that runs at both global inventory and at storage sites.

Official docker image: images.opencadc.org/storage-inventory/ratik:$VER

ringhold

This is an implementation of a simplified part of the metadata-validate process that can be used to remove the local copy of artifacts from a site (file cleanup is done by tantar).

Official docker image: images.opencadc.org/storage-inventory/ringhold:$VER

tantar

This is an implementation of the file-validate process that compares the inventory database against the back end storage at a storage site.

Official docker image: images.opencadc.org/storage-inventory/tantar:$VER

vault

This is an implementation of an IVOA VOSpace service that uses storage-inventory as the back end storage mechanism.

Official docker image: images.opencadc.org/storage-inventory/vault:$VER

cadc-*

These are libraries used in multiple services and applications.

  • cadc-inventory: core data model implementation
  • cadc-inventory-db: database library
  • cadc-inventory-util: re-usable code
  • cadc-inventory-server: re-usable service code
  • cadc-storage-adapter: defines the interface between inventory and back end storage
  • cadc-storage-adapter-fs: storage adapter implementation for a POSIX filesystem back end
  • cadc-storage-adapter-ad: storage adapter for the legacy CADC Archive Directory storage system (temporary)
  • cadc-storage-adapter-swift: storeage adapter implementation for the Swift Object Store API (e.g. CEPH Object Store)
  • cadc-storage-adapter-test: re-usable test suite for storage adapter implementations

storage-inventory's People

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

storage-inventory's Issues

minoc: support http range requests for resumable downloads

complete RFC7233 support โ€“ [https://tools.ietf.org/html/rfc7233|https://tools.ietf.org/html/rfc7233]

limitation: minoc should support single byte range only so it can output binary with content-type: application/octet-stream (or maybe text/plain for part of a text/* document)

minoc and raven: implement permission checking

minoc:

  • config is documented and probably implemented
  • enable remote call (executed in parallel if multiple) to permissions service

raven:

  • document and implement config
  • enable remote call (executed in parallel if multiple) to permissions service

Common code can go into cadc-permissions-client as raven and minoc use that lib directly:

  • extract settings from MultiValuedProperties object
  • use credentials to call services and respond with first positive result

cadc-storage-adapter-fs: cannot compute MD5 on the fly

Computing the MD5 checksum on the fly while iterating cannot perform adequately.

Instead:

  • store the checksum using a file system attribute at the end of a write (see cavern)
  • return the xattr value in the iterator (consistency with inventory)
  • compute the checksum on the stream during a read and reset the checksum xattr; log this at WARN level at least (detect corruption?)

Stretch feature:

  • implement an async process that scans the file system and verifies checksum attributes pre-emptively; once the main feature above is implemented this should be split into it's own RFE.

luskan: make the database schema customisable

The tap_schema content in luskan currently hardcodes the schema name to "inventory".

Introduce a luskan.properties file with
org.opencadc.inventory.db.schema={schema}

The some code to massage the schema in the tap_schema content appropriately. This should be easy enough since the base InitDatabase code allows for setting the schema and replacing in SQL files dynamically... it just hasn't been setup to replace comtent so that mechanism is in use for setting the schema of the tap_schema tables themselves... solvable.

cadc-inventory-db: artifact iterator auto commit hard to grok

the set autocommit to false is very far removed from set autocommit to true, making it hard to analyse and prove the behaviour is correct. It violates the normal best practice of having the start trasnaction and either commit or rollback close together in the code.

Probably: refactor to merge the ArtifactIteratorQuery and the ArtifactResultSetExtractor into a single class that does both the query and the iteration. There is no good way to determine that the caller has abandoned an iterator - only code review can help - but that's the same as the best practice txn handling mentioned above.

If done right, this will provide a good example/template for other streaming query result implementations.

cadc-inventory-db: API to manage siteLocations and storageLocation should not expose implementation/optimisations

The ArtifactDAO.put(Artifact) method has an alternate put(Artifact, boolean) to force the update so that transient state (not part of the entity and therefore not part of the metaChecksum) will still be written to the database. This exposes the current implementation details and will make changes and oprtimisations hard in future.

Methods to manage (add/set and remove) siteLocation and storageLocation values for an artifact should be provided instead.

tantar: can't distinguish between 'not authorized' and 'no results found'

When tantar queries a storage site running a TAP service, it cannot tell if the caller is authorized to do the query but no query results were found, or if the caller is not authorized to do the query. Both result in zero rows returned. A incorrect certificate, or an out of date certificate, can result in all data for the archive(s) queried being deleted from the inventory database.

minoc and StorageAdapter API: support very large files

S3 API limitation: maxmimum 5 GiB upload in a single stream, then must use multi-part upload

multi-part upload: minimum 5 MiB part size (except last part), content-length & content-md5 per part

no content-md5 checksum after re-assembly

cadc-storage-adapter and tantar: allow StorageMetadata with invalid flag to support cleanup

if invalid stored objects end up in storage, the StorageAdapter.iterator() has to be able to return a StorageMetadata object so that tantar can perform cleanup. Otherwise, failure modes that leave any garbage behind will block cleanup by tantar

changes:

  • StorageMetadata allows contentLength == 0
  • add StorageMetadata.isValid()
  • tantar can then cleanup (delete stored objects that are invalid) subject to reportOnly mode and policy

Artifact Iteration crashes with very large dataset

Asking the ArtifactDAO to iterate over millions of rows results in an OutOfMemoryException as the results are actually gathered.

We may need a paginated solution. Here is the output from the Tantar project using the SQLGenerator class:

1456 [main] DEBUG SQLGenerator  - ArtifactGet: SELECT uri,uriBucket,contentChecksum,contentLastModified,contentLength,contentType,contentEncoding,siteLocations,storageLocation_storageID,storageLocation_storageBucket,lastModified,metaChecksum,id FROM inventory.Artifact WHERE storageLocation_storageBucket LIKE ? AND storageLocation_storageID IS NOT NULL ORDER BY storageLocation_storageBucket, storageLocation_storageID
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid1.hprof ...
Heap dump file created [1885647612 bytes in 10.179 secs]
445103 [main] DEBUG ArtifactDAO  - iterator: HST 443647ms
END
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
	at java.lang.String.toCharArray(String.java:2899)
	at java.util.zip.ZipCoder.getBytes(ZipCoder.java:78)
	at java.util.zip.ZipFile.getEntry(ZipFile.java:316)
	at java.util.jar.JarFile.getEntry(JarFile.java:240)
	at java.util.jar.JarFile.getJarEntry(JarFile.java:223)
	at sun.misc.URLClassPath$JarLoader.getResource(URLClassPath.java:1054)
	at sun.misc.URLClassPath.getResource(URLClassPath.java:249)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
	at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2212)
	at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:311)
	at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:447)
	at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:368)
	at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:159)
	at org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:109)
	at org.opencadc.inventory.db.SQLGenerator$ArtifactIterator.query(SQLGenerator.java:488)
	at org.opencadc.inventory.db.ArtifactDAO.iterator(ArtifactDAO.java:142)
	at org.opencadc.tantar.BucketValidator.iterateInventory(BucketValidator.java:494)
	at org.opencadc.tantar.BucketValidator.validate(BucketValidator.java:187)
	at org.opencadc.tantar.Main.main(Main.java:108)

Zero length files are not properly handled

Issuing a PUT for a zero length file causes the request to hang.

$ rm -f /tmp/zerosize.txt
$ touch /tmp/zerosize.txt
$ curl <auth> @zerosize.txt https://myserver.com/minoc/files/zerosize.txt

README consistency

module README.md should be organised like this:

  • configuration
  • building it
  • checking it
  • running it

Config templates vs examples should be clearly identified: placeholders in templates should be {surrounded} by curly braces.

To be re-organised: baldur, raven
Config templates/examples needing fixing: baldur, luskan, minoc, raven

rename: make Artifact.uri modifiable rather than immutable

How:

  • add Artifact.setURI(URI) method to Artifact && have it recompute uriBucket
  • add StorageAdapter.rename(Artifact orig, Artifact a)

Can the latter be mandatory? It puts another constraint on back end implementation when combined with wanting/encouraging the back end to be able to reconstruct artifact URI in StorageAdapter.iterator() methods (so that file-validate can recover/populate inventory from storage).

implement automated database init in relevant components

The only component thats that run at all locations (storage site and global inventory) are fenwick (metadata-sync) and luskan (TAP service).

  • minoc is the only component that is usable standalone; it writes to the db; it currently implements automated database init (for storage sites)

  • not obvious which component should init the global database

implement correct row locking in tools that use cadc-inventory-db

minoc already implemements locking and transactions, but other tools do not:

tantar
fenwick
critwall

requires: all basic operations working so the set of db operations is known; then consistent sequence to prevent deadlocks can be determined and locking added.

tantar: support range of buckets in configuration

re-use the BucketSelector in critwall to support both single and range of buckets

  • this is useful with the opaque FS adapter and will be useful with any ceph (S3 or swift) adapters as well
  • proposal: create a new library cadc-inventory-util for common code, extract BucketSelector from critwall to lib
  • in the lib, BucketSelector will also be potentially useful to do metadata-validate (buckets in parallel)

tantar: Parallelize iterator queries

There are two potentially long running queries to provide the iterators needed for comparison. They could be done in two threads in parallel while tantar waits, rather than in sequence.

luskan and minoc consistency

At a single site, an artifact should only appear in the luskan query response if minoc is able to deliver the file.

Currently:

  • minoc does behave this way for GET and HEAD, but the body of the error message is different and leaks info (not available vs not found)
  • luskan returns all rows that satisfy a query independent of the storageLocation; when running as part of a storage site it should only return artifacts with non-null storageLocation (query injection); when running as part of a global inventory it should return all artifacts (there are no storageLocation values in global)

cadc-storage-adapter-fs modes

The URIBUCKET mode is more or less useless as implemented. It helps a little with debugging but doesn't have the right properties to really be usable as a filesystem or robust as an opaque storage backend.

In URI mode, the filesystem could be mounted (read-only) and users could (in principle) open and read files by knowing the Artifact URI (complication: would open by name be costly in a directory with millions of files e.g. in a flat URI structure?)

A new OPAQUE mode could use UUID for file names and place them in a hierarchy of buckets - basically URIBUCKET mode but with a non-reusable filename. The artifact URI coul;d be stored on the file using xattr; xattr support is already needed to store the contentChecksum (see issue #33).

Minoc relies on GMSClient.getGroups() which is not implemented

Issuing a HEAD request to a Minoc resource causes a call to getUsersGroups() in checkReadPermission(), which relies on the GMSClient.getGroups() function. This method is not implemented.

Consequently, this also prevents JDBC connections from being returned to the pool. The default pool has two connections, so two HEAD requests will cause Minoc to hang.

fenwick: multiple included condition files was a bad idea

Each include file defines a separate stream of new|modified Artiact events; that is hard to manage and won't play well with also processing DeletedArtifactEvent(s) in a timely fashion.

Should just be one optional /config/artifact-filter.sql (ish) file

Fenwick: Support multiple instances on site

Due to the nature of the HarvestState, only one Fenwick can be run on a single site. Currently the HarvestState is preserved per Item (i.e. Artifact.class, DeletedEvent.class, etc), but perhaps should be harvested per process instead to allow multiple instances to be run.

There are no workarounds for this currently.

cadc-storage-adapter-fs: must hide temp files until complete

The filesystem adapter writes to a temp file and performs an atomic move to final destination.

However, the temp file is visible within the normal tree during the write so will be picked up by the iterator method (e.g. file-validate process).

Recommend: divide the configured root directory into two separate sub dirs, e.g. "complete" and "transactions", create temp files in "transactions", and move them to "complete".

baldur: support group access to permissions

The baldur.properties file only supports specifying users to give access to permissions. Add a groups property to baldur.properties and give group members access to permissions.

baldur: refactor authorization code

The code to check that a caller is authorized via config is very similiar to LogControlServlet and AvailabilityServlet (and generally: admin interfaces have this kind of A&A)... It is time to refactor into a common utility that can be re-used.

minoc and StorageAdapter API: cutouts

StorageAdapter API needs to support optional data operations:

  • content-range requests
  • FITS metadata extraction
  • FITS pixel cutouts
  • HDF5 metadata extraction?
  • HDF5 pixel cutouts?
  • sky coordinate cutouts?

raven: implement negotiation of write to storage

Some minoc instances will accept writes and they should advertise this via their StorageSite record. The raven service at a global site can determine which sites are writable from the inventory database content.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.