Giter Club home page Giter Club logo

hoard's People

Contributors

benjaminbollen avatar dennismckinnon avatar dependabot[bot] avatar gregdhill avatar puneetv05 avatar silasdavis avatar zramsay avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hoard's Issues

rare error when running get (I think)

I get rare and non-deterministically this issue from hoard -> rpc error: code = Unknown desc = SymmetricReference failed to decrypt: cipher: message authentication failed

It's intermittent, rare, and non deterministic. I will try to get some logs next time I see, altho I did check logs once and didn't see anything.

Don't remember this occuring with v3 but hard to say whether it was there and just didn't manifest.

Provide systemd socket activated service

In order to run stateless burrow locally rather than having to call binary or worry about maintaining (gRPC) daemon locally we can install Hoard as a local on-demand microservice. As soon as requests to the API are received on a certain socket the application starts and shuts down after a timeout, example here:

https://gist.github.com/drmalex07/28de61c95b8ba7e5017c

Docs: https://www.freedesktop.org/software/systemd/man/systemd.socket.html

The advantage of this is:

  • No need to call binary
  • No need to run remote service (and expose attack surface of sending plaintexts over the internet)
  • Persistent settings can be closed into a config file you don't have to worry about providing
  • Hoard can be packaged with it's own systemd unit and canonical configuration locations
  • If Hoard does end up with more stateful features (like watching a sharing contract on a chain) then migration path is simple
  • We can define gRPC interface and other languages can use Hoard like a library

Previously AN-237

add ability to have streaming inputs

unitary grpc inputs give us a hard 4mb limit per grpc spec. if hoard is to handle bigger files (which it needs to be able to do) then we need to migrate to (and/or implement in parallel) a streaming input service and message.

Shared secret based grants

Primary

Currently, Hoard allows the storage of a reference blob but does not yet facilitate the sharing of data through grants - an encrypted reference. The secret key (a file's original SHA256 digest), should be able to be securely communicated in some manner, perhaps through a contract that queries the hoard daemon which should mirror the preexisting GRPC interface.

Secondary

It may prove beneficial to use the same elliptic curves as currently employed by Burrow for grant sharing.

[Grant] Vault

Currently, access grants primarily support two modes: symmetric and asymmetric. It would be nice to use secrets from Hashicorp Vault in a similar manner.

[Core] Indexing...?

Currently we have no native ability to search documents stored with Hoard. We need to decide whether hoard should take on indexing responsibilities itself.

The opportunity exists for a Hoard instance to maintain its own index. Some thoughts:

  • Whether to index should be elective by caller
  • We should encrypt the index itself
  • We could use something like: https://github.com/blevesearch/bleve
  • We could pipe to an external service like Solr/Elasticsearch but this is a rather heavy dependency and makes it harder to manage deletion from backing store and index (when we implement deletion)

At some point index-as-a-file will become unwieldy but if callers are able to specify an index and therefore shard appropriately it might go a long way. We could consider replication of indices over Tendermint.

Bleve is a spiritual relation of lucene. I would expect its indices to be rather compact.

Storing the index itself in Hoard seems appealing - obviously this would probably need to be snapshots of the index, though that is not entirely a given (on IPFS provided a index is DAG-friendly it might be okay - which it won't be when encrypted of course...).

If we do not commit the index for each document we obviously run the risk of irretrievably loosing that index information, since we cannot do something like trawl data to re-index gaps. Though I suppose if we hung on to references to data we stored we could... Since we operate on having the secrets that encrypt grants for data stored with a particular Hoard instance we could maintain a write-ahead log of references that have been indexed in memory but whose index has not been persisted, and on crashing recover that log....

IPFS

The next step after Google Cloud integration is to support an IPFS back-end.

Avoid massive dependencies just to use Hoard client

Currently the root package depends on storage which pulls in a vast dependency tree. We may want to consider using build tags to limit that tree (i.e for specific providers) but without support in google cloud package that won't help that much anyway.

What we can do is:

  • Rename hoard.proto to services.proto
  • Change go_package so we have services/services.pb.go as replacement for hoard.pb.go
  • Bump semver to v4
  • Fix up dependencies

We also currently have protobuf generation broken - since it attempts to place generated files in ./v3 - we also have the awkward requirement to work from a checkout located in the correct location in GOPATH for our protobuf files to go to the same place.

Having looked at the tooling it seem there is no real support for go modules and checkouts outside GOPATH so I think we will have to work around it. I think moving proto files to the same dir as generated files would work for us, but make us harder to consume from javascript (and possibly other languages), so I suggest with stick with are current structure, in which case we need to do something like:

  • Update make target to:
## compile hoard.proto interface definition
%.pb.go: %.proto
	protoc -I protobuf $< --gogo_out=plugins=grpc:/.gopath
  • Have a script that does equivalent of rsync .gopath/github.com/monax/hoard/v3/ ./ (trailing slash required IIRC). I think rsync is probably fine here
  • delete .gopath

hoard.delete()

we need a way to keep the backend tidy when we want to make files go away. this is for, among other reasons, GDPR compliance for any companies that are using hoard.

Can't `go get` with go version `1.12`

Attempting to go get github.com/monax/hoard/cmd/hoardresults in multiple import errors:

go/src/github.com/monax/hoard/server/server.go:9:2: code in directory /<redacted>/go/src/github.com/monax/hoard expects import "github.com/monax/hoard/v3"
go/src/github.com/monax/hoard/config/storage/filesystem.go:8:2: code in directory /<redacted>/go/src/github.com/monax/hoard/storage expects import "github.com/monax/hoard/v3/storage"

This is a brand new go install with $GOPATH set to $HOME/go which does exist.

Verbose Log: https://pastebin.com/23dCHDf9

[Store] Azure Integration

Currently, only S3 and GCS have integration tests run by the CI. More testing is needed to ensure the compatibility with this storage backend.

Google Cloud

Hoard currently supports Amazon's S3 buckets, but it would be beneficial to include support for Google's cloud storage as well.

Make it possible to to get Grants from hoard API

We need to be able to get Grants from Hoard API. We should expose some functions that act on references and provide grants, but we might also want to provide single calls to push data and get a grant, or to unpack and repack grants from type of grant to another or from one grant recipient to another...

Get and Put act on reference and data respectively:

rpc Get (GetRequest) returns (GetResponse);
rpc Put (PutRequest) returns (PutResponse);

It should probably be possible to Put some data and get a Grant of the desired type back in a single call. Similarly we may want to Get a Grant rather than the data itself. These functions probably belong behind different API calls but we could add some options to the requests. Another possibility is to provide some sort of API-level pipelining where you can call something like Put | OpenPGPGrant and get a grant type back if the arguments types match (or simply on supported pairs of API call).

[Core] Liveness

We need a health endpoint on the hoard server to check backend connectivity. This is most appropriate for Kubernetes where we typically host this system. Currently we use hoarctl to run a test: $(echo "marmottes" | hoarctl put | hoarctl get) = "marmottes".

[Design] Pre-process Input Stream

Assuming the architecture as implemented in #90, we will stream chunks of raw data after a header which contains metadata concerning the file as uploaded. Currently, this contains some generic fields such as Type to indicate a file's MIME type. If we were to store this header last, after processing the entire input stream, we could add some interesting information such as the total data written or a rolling hash. In order to still serve a partial 'read' (where we return the header without the body) we would expect the client to re-order the references.

Delete delete

We naively implemented deletion functionality to comply with various data regulations (i.e. GDPR), but it is feasible to address this concern a different way.

[Core] Multi-Backend

To ensure data continuity and integrity it would be better to support multiple back-end storage buckets.

Grant Service Tests

  • GRPC/service test
  • End-to-end test of symmetric grant (not tested from CLI even manually yet)
  • Add examples/test of Grant service usage

Burrow integration/sharing contract

This may need to be split into multiple tickets. There are a few threads:

  • A file sharing contract for posting grants to and providing an ACL-like functionality
  • The possibility of a Hoard callback from an snative to get Hoard to provide grants
  • How Hoard should connect to Burrow; preference is to experiment with joining Tendermint network and broadcasting Txs using Burrow as lib

NPM Updates

  • Make example.js a proper integration test run by CI
  • Reconsider breaking change (for JS) upper-casing the protobuf

Go-Cloud

The functionality for using a gcp storage back-end (#18) is now embedded with go-cloud. It would make sense to also use this for aws support as it minimizes hoard's dependencies, and enables future back-end scaling.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.