storj / roadmap Goto Github PK

Storj Public Roadmap

License: Other

roadmap's Introduction

Storj Public Roadmap

❇️ View the official Storj Public Product Roadmap

In keeping with one of our core company values of transparency, we’re excited to share our public product roadmap to give you visibility into our key product focus and feature priorities. Whether you’re a developer, CTO, partner, Web3 enthusiast, community member, and/or user, you can see planned features and timelines designed to improve and enhance object storage. This repository is designed to work in conjunction with our open development process, for which we’re always looking for comments, feedback, and, of course, code contribution. We’re also looking forward to feedback on how to improve the way we communicate, present, and collaborate on the roadmap over time.

In each issue, you can read about the description of the feature and the problem it’s designed to solve. New discoveries and revised priorities are anticipated—and sometimes bodies of work take longer than expected or get reprioritized in preference of more impactful features. The product team will be updating this roadmap and responding to comments on planned issues.

If you have any questions or feedback about a specific issue on this roadmap please contribute by commenting on the issue. You can also share general feedback on our forum.

Guide to the roadmap

How to read the Roadmap:

The column the issue is in is when the functionality is expected to be finished.
When a team starts work on an issue it will be updated with corresponding links to product requirement documents, blueprints, GitHub milestones.
As work on issues starts, the expected completion quarter may change. Please see our disclaimer below.
Storj Labs has a number of teams; these teams are assigned to issues on the roadmap. You can see what each team is currently working on by visiting their Github projects.
- Edge Team: https://github.com/orgs/storj/projects/12
A label is assigned to each issue depending on what part of the network or code will be affected.

How to contribute

Contribution guideline - How to contribute to Storj.
Give Feedback - Suggest new product ideas in the community forum.
Storj on Github - All Storj repositories in one place.
Awesome Storj - A curated list of projects, tools, and resources for the Storj platform.

Disclaimer:

This document and the roadmap contain forward-looking statements about our product direction. Any statements in this document, Storj repositories, or the roadmap that do not describe anything past or otherwise historical are considered forward-looking statements. The planned development, release, and timing of any features or functionality described in this roadmap may change at any time. Any forward-looking statements are made based upon the information available at the time the statement was made, and Storj makes no commitment to update or maintain any forward-looking statements. The information herein is not a commitment to deliver any material, code, or functionality, by any particular time or date, and should not be relied upon in making purchase decisions.

roadmap's People

Contributors

Stargazers

Watchers

roadmap's Issues

Satellite Interface Enforced Immutability

Summary

As an application owner, I want to add an additional layer of security to prevent credential-based attacks. I would like to be able to enforce restrictions on the creation of API Keys to ensure the immutability of data stored on the network. This includes accounting for multiple project team members with access to Access Grant creation and the creation of subsequent Access grants on a project with existing data.

Pain Point

While Access Grants can be constructed to achieve immutable data storage, users on Projects can subsequently create unrestricted credentials that can interact with any data stored on the platform.

Intended Outcome

A Project Owner can restrict Access Grant creation by Project users to enforce immutability.

Etherum Wallet Authentication

Summary:

To better support Web3 we would like to add the ability to create an account using an Etherum wallet.

Intended Outcome:

Users are able to create an account on V3 with an Etherum wallet instead of a username/ password.

Auth Database Refactor Code-complete

Summary:

We want to replace CockroachDB with something else. The new solution will be based on a key-value store like BadgerDB and use replication with relaxed consistency constraints. The auth service does not need strong consistency, so an eventually, consistent database will simply It.

Pain Point:

CockroachDB is too complicated to run and maintain for a simple service like this one especially because this service does not need strong consistency.

Intended Outcome:

The auth service database will be replaced with something other than CockroachDB. This should allow us to reduce the number of requests not handled due to cross-regional latency, lower maintenance costs, make some queries less complicated due to the nature of our data model and make it easier for other people (e.g. community) to run auth service in a cluster.

Links:

Blueprint: https://review.dev.storj.io/c/storj/gateway-mt/+/6030/33/docs/blueprints/new-auth-database.md#12
Milestone: https://github.com/storj/gateway-mt/milestone/1

Encryption Key UX | project-level passphrase

When a customer creates a project, I want them to be able to create a passphrase that is associated with all buckets and files within that project so that they only have to enter their passphrase once per session instead of every time they open a bucket.

AC:

user is able to only enter a passphrase once per session to access all contents within a project
user can switch to a different passphrase if desired within a project
user can clear the passphrase for that session

Geo Fencing 2.0 (support DPA filtering)

Security Improvements

Summary:

V3 is focused on security with end-to-end encryption and macaroon-based access control but there are still security improvements we could make for users on the Satellite GUI.

Pain Points:

Currently, we do not have any of the following items implemented:

Alert users if they use a username/password combo that is in a breached database.
Email users whenever they have logged in from a previously unseen IP address.
Email users when new access grants are created on projects they are the admin of.
Account recovery keys for users.

Intended Outcome:

Implement the items listed above to improve user account security.

Audit Scaling

Summary:

When a node joins the network they go through a vetting process. One aspect of the node vetting process is auditing the nodes for pieces they should be storing. A node must successfully complete a certain number of audits to pass the vetting process. As more and more nodes join the network the vetting process for each individual node takes longer because the satellite is limited on how many total audits it can perform.

Another item this will help ensure is that we are not adding low-quality storage nodes to the network. adding these low-quality nodes to the network will affect our object's durability in the long term.

Pain Point:

Because we are not able to scale our audit workers horizontally the satellite is limited to how many audits it can perform. This means that as we onboard new nodes and store more data on the network the node vetting process takes longer and longer.

Intended Outcome:

We are able to scale node auditing on the fly depending on how many new nodes have joined the network recently or how much more data is being stored on the network.

Milestones

Planning Phase - Horizontally Scalable Storage Node Auditing https://github.com/storj/storj/milestone/19

Auth Database Refactor Deploy

Separate tracking for the deploy and rollout of #3

Manual Freeze admin API / Auto Unfreeze User Accounts

*MVP: freeze user accounts via admin api, auto unfreeze when new payment method is added
https://github.com/storj/storj/milestone/23

Summary:

Satellites don't automagically freeze user accounts. If a user does not pay their invoice or does something against the terms of use we must manually freeze their account and contact them.

Job Stories:

When a user fails to pay their invoice I want to freeze their account so that they are not able to accumulate usage until their bill is paid.
When a frozen user accounts bill is paid I want their account to be unfrozen instantly so that they can continue to use the service.

Pain Point:

Freeze user accounts is a manual time-consuming process right now.

Intended Outcome:

User accounts are frozen during certain circumstances.
User accounts are automatically unfrozen when no longer under those circumstances.
Users are notified of potential account freeze before, and when it happens.
Users are notified when their account is unfrozen.
Notifications must include the cause so that users understand why it was frozen and how to avoid the situation in the future.

Gateway MT Object Versioning Support

Storj pack management client library

Overall objective: provide useful functionality to developers who have lots of small files and want to use Storj

Context: Storj is not ideal (yet) for small objects. The best case scenario would be to fix the Satellite core protocols to be more robust and cheaper and efficient in the face of lots of small files, but for now we charge users a segment fee.

Until we are able to fix this more generally, it would be nice for us to provide useful tools to customers of Storj to make managing packing small files into larger objects easier.

We have two tickets to add useful functionality of this sort:

There may be other things we can do not in the above list, but both of the above things will require a shared library.

While we don't want to assume we will always use ZIP files for packs, they do have some unique advantages and so we are currently using zip files. We have an undocumented library that both of the above tickets can use: https://github.com/storj/zipper.

This ticket is around providing a general packing library (could be zipper after it is cleaned up) that we maintain and advertise in our documentation. We want to provide clear developer tools and libraries to assist developers in packing lots of small objects together.

Step 1 is likely to be cleaning up github.com/storj/zipper and documenting it.
Step 2 is likely to be adding custom Zip central directory metadata for file data offsets (currently missing and incur a performance penalty), which will make Storj-created zip files faster for Storj-read zip files.
Step 3 is likely to be considering support for other pack formats, such as Chromium's PAK files or https://github.com/vasi/pixz, or maybe something else.

These later steps should provide direct benefits to the above two issues, in addition to providing benefits to developers who directly use this library

Billing page overview, navigation, view history UX improvements

We want to improve the ux of the billing page to provide better clarity to the billing overview, billing history, and ux improvements to adding and removing payment methods.

White label the Satellite UI: ability to set up environments for partner branding

S3 Compatibility Improvements

Summary:

One of the main ways we can drive adoption for V3 is to have great s3 compatibility support. The community and our users have pointed out areas we could improve our s3 compatibility which could make it easier to transition from s3 to V3.

Pain Point:

There are still some s3 features we don't fully support. For example, when doing a list operation the objects are not sorted in lexicographical order which is how s3 sorts objects.

Intended Outcome:

S3 compatibility shortcomings are fixed so that we have better S3 support for users.

Links

Gateway-MT Milestone: https://github.com/storj/gateway-mt/milestone/3
Gateway-ST Milestone: https://github.com/storj/gateway-st/milestone/1

Login with Storj (OAUTH)

Summary:

As we continue to grow, we need to enable engineers to build distributed web applications on top of the Storj platform.

Pain Point:

Third-party application developers, do not want to deal with user management. They want to rely on third-party services such as Google, GitHub, etc.
The ability for applications to access information from the authenticating service on the user's behalf.

Intended Outcome:

App developers will be able to use Storj for user management and easier access to data stored on the network.

How will it work?

https://review.dev.storj.io/c/storj/storj/+/6212

Community Satellites

Summary:

We need to enable 3rd parties to more easily participate in all parts of the network in order to further our decentralization goals.

Pain Point:

Make it easier and simpler for 3rd parties to operate satellites.

Native STORJ Token Deposits (Customer Payments)

Summary:

The satellites currently use a third-party service to accept STORJ Token payments. The use of this service creates a set of problems for users and satellite operators. Removing the use of this third-party service is going to alleviate these pain points.

Pain Point:

It's not always advantageous for users to pay in STORJ Token because of the transaction fees charged by the third-party service we currently use. Relying on a third-party service for Token payments also makes it more difficult to stand up to community satellites. The third-party service we use generates a new wallet address for each payment made by a user, this is a problem because users do not have a consistent deposit address. We want to give our users a single stable deposit address they can reliably send tokens to that will be associated with their account. With the satellite controlling the deposit addresses, it would also make reporting on the Token payments easier for satellite operators.

Intended Outcome:

The satellite will generate a deposit address for a user who wants to deposit Tokens into their account. The satellite will keep track of these wallets as well as the transactions (deposits) made to them and which users they belong to.

How will it work?

The satellite will have a service that generates wallets and monitors the Etherum blockchain for transactions to the wallets it generated.

Links:

BluePrint: https://storjlabs.atlassian.net/wiki/spaces/TD/pages/2310733825/Native+Token+Payments+Design
Test Plan: storj/storj#4342
Storj Milestone: https://github.com/storj/storj/milestone/2
Storjscan Milestone: https://github.com/storj/storjscan/milestone/1

Server Side Copy

Summary:

Users are unable to copy data within a project or bucket without downloading it and uploading it to the new location.

Pain Point:

Copying data without server-side copy is costly because the user must pay for the download bandwidth.

Intended Outcome

Users will have the ability to copy data from one location to another within a project without having to download and reupload it. This will enable users to move data within their projects and buckets without having to pay for egress bandwidth. This functionality will be added to the library's and CLI tool first and then incorporated into the satellite GUI later.

How will it work?

The technical details of how this functionality will work can be found on the blueprint.

Links:

Storj Milestone: https://github.com/storj/storj/milestone/5
Uplink Milestone: https://github.com/storj/uplink/milestone/1
Blueprint: https://review.dev.storj.io/c/storj/storj/+/5930
Test Plan: storj/storj#4317

Performance Tuning & Optimizations (PTO) - 25% improvement on 4 MB from unconstrained Gateway

Reduce the 4 MB download benchmark from an unconstrained Gateway by 25%

The goal of this Roadmap item is to move the blue line down

Milestones

Google Docs:

S3: Object Lock & Retention

Description:

What is the problem/pain point?

What is the impact?

Why now?

Acceptance Criteria:

Uplink CLI pack Support

Overall objective: provide useful functionality to Uplink CLI users for packing a large amount of objects together.

Proposal (started in https://review.dev.storj.io/c/storj/storj/+/6107):

uplink pack create sj://bucket/prefix/pack.zip path1 path2 path3 - this makes a zip file out of files and folders provided on the commandline. it should expect a zip extension at the destination location so that we can support other packing formats with other extensions in the future. Please see #30 for other details there.
uplink pack ls sj://bucket/prefix/pack.zip - this operation lists all of the objects in the pack
uplink pack cp sj://bucket/prefix/pack.zip object/in/pack destination - this operation copies an object out of the pack

We contemplated adding pack support to existing commands with a special sjp:// path prefix or something, but there were a lot of weird edge cases. To get something out the door, this subcommand seems like the shortest route.

This subcommand can use storj.io/zipper like the linksharing zip support.

Satellite GUI Responsiveness

Milestones
https://github.com/storj/storj/milestone/21

Consistency Improvements

Summary:

We want to improve our consistency model when downloading an object in the presence of concurrent uploads.
Currently, when replacing an object what happens is the object is deleted then the new one is uploaded. This means another observer will see the old object, then nothing, then the new object.

Pain Point:

Object replace is not atomic.

Intended Outcome:

All object uploads are atomic, specifically during an object replace action.
An observer should see the old object, then the new; nothing more nothing less.

Macaroon based Enforced Immutability

Summary:

Delegated authorization models can be used in conjunction with server-side or end-to-end encryption configurations. This approach provides significant protection against credential-based attacks because no subsequent change to account or bucket configurations can alter the authorization or access of a credential once created. Moreover, credentials are cryptographically verifiable, making it impossible to tamper with or alter the authorization restrictions encoded in the credentials.

Pain Points:

We don't have good documentation on how to achieve macaroon-based enforced immutability.

Intended Outcome:

Technical documentation is created that explains how users can leverage this functionality.

Object Map Usability

Summary:

When we launch on 4/20/2021 we wanted to highlight the object map as Aha moment for first-time users. We believed seeing the global distribution of the object they just uploaded would give them a better understanding of how differentiated we are from centralized object storage solutions.

Pain Point:

The object map is not very useful. Users oftentimes don't even notice it because it's so small and in the bottom corner of the screen.
Even users who notice it can't really interact with it.

Intended Outcome:

The object map is more useful for users to see their objects distribution as well as interact with it.

S3 Compatibility - UploadPartCopy

Background

What is the problem/pain point?

Many S3 libraries such as boto3 for Python set a object size threshold, after which, uploads and copies will default to multipart upload. Given that we do not currently support multipart copy (link to that endpoint) this default threshold will return an error.

What is the impact?

Customers expect their integrations to “just work” when Storj advertises S3 Compatibility. This feature gap frustrates customers as they’re onboarding and if not resolved, we will lose business.

Why now?

Customers are asking for this and it helps round out our compatibility with the “core” features of the S3 API.

Requirements

Assumptions

Objects can only be copied between buckets within the same Storj DCS account

Out of Scope

Will not support copying objects from one Storj account to another
- Open question: What would be the additional lift to support this? This is supported in Amazon S3 using access points only when the source and destination buckets are in the same AWS Region

User Story

As a Storj DCS User I want to be able to use UploadPartCopy to copy large objects already within a Storj DCS bucket so that I can switch from my existing S3 compatible storage to Storj without having to change my multipart copy configuration.

S3 Method: UploadPartCopy - Uploads a part by copying data from an existing object as data source.

Acceptance Criteria

Request Params

Param Description Required? Support Needed?

Bucket The bucket name Yes ✅

Key Object key for which the multipart upload was initiated. Yes ✅

partNumber Part number of part being copied. This is a positive integer between 1 and 10,000. Yes ✅

uploadId Upload ID identifying the multipart upload whose part is being copied. Yes ✅

x-amz-copy-source Specifies the source object for the copy operation. Yes ✅

x-amz-copy-source-if-match Copies the object if its entity tag (ETag) matches the specified tag. No ❌

x-amz-copy-source-if-modified-since Copies the object if it has been modified since the specified time. No ❌

x-amz-copy-source-if-none-match Copies the object if its entity tag (ETag) is different than the specified ETag. No ❌

x-amz-copy-source-if-unmodified-since Copies the object if it hasn't been modified since the specified time. No ❌

x-amz-copy-source-range The range of bytes to copy from the source object. The range value must use the form bytes=first-last, where the first and last are the zero-based byte offsets to copy. For example, bytes=0-9 indicates that you want to copy the first 10 bytes of the source. You can copy a range only if the source object is greater than 5 MB. Yes ✅

x-amz-copy-source-server-side-encryption-customer-algorithm Specifies the algorithm to use when decrypting the source object (for example, AES256). No ❌

x-amz-copy-source-server-side-encryption-customer-key Specifies the customer-provided encryption key for Amazon S3 to use to decrypt the source object. No ❌

x-amz-copy-source-server-side-encryption-customer-key-MD5 Specifies the 128-bit MD5 digest of the encryption key according to RFC 1321. No ❌

x-amz-expected-bucket-owner The account ID of the expected destination bucket owner. No ❌

x-amz-request-payer Confirms that the requester knows that they will be charged for the request. No ❌

x-amz-server-side-encryption-customer-algorithm Specifies the algorithm to use to when encrypting the object (for example, AES256). No ❌

x-amz-server-side-encryption-customer-key Specifies the customer-provided encryption key for Amazon S3 to use in encrypting data. No ❌

x-amz-server-side-encryption-customer-key-MD5 Specifies the 128-bit MD5 digest of the encryption key according to RFC 1321. No ❌

x-amz-source-expected-bucket-owner The account ID of the expected source bucket owner. No ❌

Param	Description	Required?	Support Needed?
Bucket	The bucket name	Yes	✅
Key	Object key for which the multipart upload was initiated.	Yes	✅
partNumber	Part number of part being copied. This is a positive integer between 1 and 10,000.	Yes	✅
uploadId	Upload ID identifying the multipart upload whose part is being copied.	Yes	✅
x-amz-copy-source	Specifies the source object for the copy operation.	Yes	✅
x-amz-copy-source-if-match	Copies the object if its entity tag (ETag) matches the specified tag.	No	❌
x-amz-copy-source-if-modified-since	Copies the object if it has been modified since the specified time.	No	❌
x-amz-copy-source-if-none-match	Copies the object if its entity tag (ETag) is different than the specified ETag.	No	❌
x-amz-copy-source-if-unmodified-since	Copies the object if it hasn't been modified since the specified time.	No	❌
x-amz-copy-source-range	The range of bytes to copy from the source object. The range value must use the form bytes=first-last, where the first and last are the zero-based byte offsets to copy. For example, bytes=0-9 indicates that you want to copy the first 10 bytes of the source. You can copy a range only if the source object is greater than 5 MB.	Yes	✅
x-amz-copy-source-server-side-encryption-customer-algorithm	Specifies the algorithm to use when decrypting the source object (for example, AES256).	No	❌
x-amz-copy-source-server-side-encryption-customer-key	Specifies the customer-provided encryption key for Amazon S3 to use to decrypt the source object.	No	❌
x-amz-copy-source-server-side-encryption-customer-key-MD5	Specifies the 128-bit MD5 digest of the encryption key according to RFC 1321.	No	❌
x-amz-expected-bucket-owner	The account ID of the expected destination bucket owner.	No	❌
x-amz-request-payer	Confirms that the requester knows that they will be charged for the request.	No	❌
x-amz-server-side-encryption-customer-algorithm	Specifies the algorithm to use to when encrypting the object (for example, AES256).	No	❌
x-amz-server-side-encryption-customer-key	Specifies the customer-provided encryption key for Amazon S3 to use in encrypting data.	No	❌
x-amz-server-side-encryption-customer-key-MD5	Specifies the 128-bit MD5 digest of the encryption key according to RFC 1321.	No	❌
x-amz-source-expected-bucket-owner	The account ID of the expected source bucket owner.	No	❌

Response Elements

Element	Description	Required?	Support Needed?	Notes
CopyPartResult	Root level tag for the CopyPartResult parameters.	Yes	✅
ETag	Entity tag of the object.	Maybe	Maybe
LastModified	Date and time at which the object was uploaded.	Maybe	Maybe
x-amz-copy-source-version-id	The version of the source object that was copied, if you have enabled versioning on the source bucket.	No	❌	Do not yet support object versioning
x-amz-request-charged	If present, indicates that the requester was successfully charged for the request.	No	❌
x-amz-server-side-encryption	The server-side encryption algorithm used when storing this object in Amazon S3 (for example, AES256, aws:kms)	No	❌
x-amz-server-side-encryption-aws-kms-key-id	If present, specifies the ID of the AWS Key Management Service (AWS KMS) symmetric encryption customer managed key that was used for the object.	No	❌
x-amz-server-side-encryption-bucket-key-enabled	Indicates whether the multipart upload uses an S3 Bucket Key for server-side encryption with AWS KMS (SSE-KMS).	No	❌
x-amz-server-side-encryption-customer-algorithm	If server-side encryption with a customer-provided encryption key was requested, the response will include this header confirming the encryption algorithm used.	No	❌
x-amz-server-side-encryption-customer-key-MD5	If server-side encryption with a customer-provided encryption key was requested, the response will include this header to provide round-trip message integrity verification of the customer-provided encryption key.	No	❌
ChecksumCRC32	The base64-encoded, 32-bit CRC32 checksum of the object. This will only be present if it was uploaded with the object. With multipart uploads, this may not be a checksum value of the object.	No	❌
ChecksumCRC32C	The base64-encoded, 32-bit CRC32C checksum of the object. This will only be present if it was uploaded with the object. With multipart uploads, this may not be a checksum value of the object.	No	❌
ChecksumSHA1	The base64-encoded, 160-bit SHA-1 digest of the object. This will only be present if it was uploaded with the object. With multipart uploads, this may not be a checksum value of the object.	No	❌
ChecksumSHA256	The base64-encoded, 256-bit SHA-256 digest of the object. This will only be present if it was uploaded with the object. With multipart uploads, this may not be a checksum value of the object.	No	❌

Measures of Success

Useful Links

Trusted/ Untrusted Delegated Repair

Tickets we closed until we decided to work on this effort.

https://github.com/storj/storj/milestone/10

Upload Codepath Refactor

Summary:

We want to Rewrite or refactor the code that performs uploads and downloads inside libuplink to enable a collection of new features and improve upon lessons we've learned from the existing code.

Pain Point:

Our current upload code is rigid and brittle. It has encryption and erasure coding tied too tightly together, and it has hardcoded assumptions about long-tail management and segment pipelining.

Milestone:

https://github.com/storj/uplink/milestone/2

Intended Outcome:

Separate encryption from Reed Solomon so we can build libuplink lite
Add support for dynamic long-tail cancelation
Add better support for parallel upload
Add better support for congestion control and connection establishment
Eliminate stalls between segments

Account Management API

Summary:

We want to give users the ability to do things and stuff on their accounts programmatically. For example, create projects, add users, get billing data, and get data about their projects usage. The Satellite uses the Console API which is authenticated via a cookie that is inserted when a customer logs in. Although this approach is sufficient for using the Satellite GUI, it is suboptimal for a developer who wishes to programmatically perform operations on resources relating to his account. Therefore, both cookie-based authentication and API-key authentication must be supported in the new API.

Pain Point:

Users must log into the satellite GUI in order to do things such as create projects, update billing information and get data. These types of items are not available for users to update or get programmatically via API endpoints.

Intended Outcome:

Users will have the ability to do everything they can do in the satellite GUI programmatically via API endpoints.

How will it work?

Please see blueprint: https://review.dev.storj.io/c/storj/storj/+/6341

Links:

Blueprint: https://review.dev.storj.io/c/storj/storj/+/6341
Milestone: https://github.com/storj/storj/milestone/4

Project Members Invite UX: project owner labels, pending invites, project member management

Segment Limits per Project

Summary:

Satellites have the ability to limit users on storage and bandwidth, we want to add the ability to limit segments as well.

Pain Point:

Users are uploading millions of very small objects (less than 5MB) creating millions of segments that the satellites must store metadata for. The metadata cost for these objects is more expensive than the cost to store the data which makes it difficult to operate a satellite without taking a loss.

Intended Outcome:

Satellites will have the ability to set Segment limits on a per-project basis. There will be two limits, a limit for free accounts and a limit for paid accounts. This will give satellite operators flexibility to limit the number of segments a user can upload thus helping them control their metadata costs.

How will it work?

Satellite Operators will be able to set a per-project segment limit in the satellite config.

Links:

Milestone: https://github.com/storj/storj/milestone/3

Better Performance for "Hot" Files

Summary

We know there are high usage files that may benefit from something like caching of objects in Gateway-MT and Link sharing. The scenario is one where there is a set of (usually small) files that are requested in a short amount of time. Some examples include a linksharing file that was shared via a popular forum, or video files accessed via gateway-mt after release of a popular sporting event.

Originally this roadmap item was titled: "Gateway MT/Linksharing Object Caching." While we still think that caching may be all or part of the solution, we want to focus the roadmap item on the customer pain point, rather than the solution.

Pain Point:

The question we are trying to answer is how to scale "hot file" downloads specifically for the Edge services. While a typical "hot file" in the Storj network could be scaled by altering the number of erasure encoded pieces (Whitepaper section 6.2), the Edge services are unfortunately a centralized point of failure.

Currently, the Storj DCS infrastructure is not highly responsive to dynamic load changes in the Edge services. The Edge Services are also the most likely place that hot file load will be seen, due to their public nature. Thus the Edge service may need centralized scaling, likely in the form of a local persistence mechanism (such as files on disk), AKA caching.

Intended Outcome:

Any outcome which enables the Edge service to gracefully deliver "hot files" beyond their intended network capacity should be considered.

How will it work?

Many things are not yet determined about how it will work. How will we capture billing information? How will we know if / when to invalidate a cache and/or detect changes on the satellite? Will all files be eligible, or will this be as setting? Do we need to actively detect "hot files" or can we cache "everything"? Do we have appropriate hardware for these workflows? Will SNOs be compensated in any way? How will range queries be billed [per byte? per segment]? Can we leverage an off-the-shelf caching layer, such as Squid? Will our current infrastructure support adequate cache size?

Links:

storj/gateway-mt#75
https://github.com/minio/minio/blob/master/docs/disk-caching/DESIGN.md
Whitepaper: https://www.storj.io/storjv3.pdf
Gateway-MT Milestone: https://github.com/storj/gateway-mt/milestone/2

Public API: rework end points, restructuring backend code

Description

the way our code is currently structured, to add a very basic piece of functionality requires writing/modifying code across three completely separate packages and understanding the (sometimes very confusing) ways they interact with each other. I've created some proof of concepts to demonstrate that we can contain all related code within the same package to make it a lot easier to work on a feature without context-switching. It also reduces the amount of boilerplate since most of the repetitive work is handled by the code generation.

files necessary to modify a user endpoint with the current code:

satellite/console/consoleweb/server.go
satellite/console/consoleweb/consoleapi/users.go
satellite/console/service.go
files necessary to modify a user endpoint with the structure I would like to migrate to:
satellite/console/users/api.go
satellite/console/users/service.go

Reed Solomon Simulator

Create a python simulator that will receive Reed Solomon variables as hyper-parameters and calculate durability and repair impact.

Small File Packing (Satellite)

Summary:

Currently storing small files (5 MB or less) is not healthy for the satellites. for each file stored on the network, the satellite must store some metadata. If the size of the object is small the amount of metadata does not change so the ratio of data stored to metadata stored is unhealthy.

Pain Point:

For Each segment that is stored on the network the satellite must store some metadata. The to store the metadata on the satellite for a signal small file and a large (single segment) file is the same. So in terms of
Meta data in CRDB is very expensive.

Small objects take as much metadata as large objects so we want to optimize this.

We can not scale the satellite DB without having a satellite solution to packing files so that the cost of the metadata relative to objects stays low.
Small files end up becoming small Pieces that are stored on the Storage Nodes. Small Pieces are bad for the Nodes on the network because they are not optimally stored on the hard drive. data on hard drives is stored in blocks; small pieces occupy an entire block even if they are smaller than a block. Nodes are only paid for the size of the piece they store not for the entire size of the block.

Intended Outcome:

Backwards compatible for users
The satellite will have a process to pack small files uploaded by customers.

How will it work?

Blueprint: https://review.dev.storj.io/c/storj/storj/+/6543

CI/CD of Satellite UI: Console team staging environments

Goal

We want to be able to deploy frontend changes without being tied to backend satellite deployment process. This will allow us to move faster and more easily test new features. Today we can't get feedback until a change is merged -- after this milestone is complete the feedback cycle will be much quicker and we can get more iterations in with feedback from non-devs.

In scope

Separate the front end from backend (so that it can be deployed separately)
Staging environments setup for Console team to use as needed

Out of scope

CI/CD for backend satellite deployments

zkSync Era Support for Storage Node Operator Payments

Summary:

The purpose of this item is to add support for zkSync Era to our open-source payment tools.

Storj Labs Open source payments tool:
https://github.com/storj/crypto-batch-payment

Intended Outcome:

We have added support for zkSync Era into https://github.com/storj/crypto-batch-payment/
We have figured out how to integrate with a paymaster in order to support zkSync Era
We have tested the functionality with the Matter Labs development team.

CLI UX Improvements

Description:

What is the problem/pain point?

What is the impact?

Why now?

Acceptance Criteria:

Ranged Loop

Summary:

Right now, for many background tasks such as accounting, repair, auditing, garbage collection, etc., we process all objects sequentially, one by one, and as a result, the number of objects directly influences how long it takes to run one of these jobs.

Pain Point:

As the number of objects grows, the accounting granularity will get worse and worse, and eventually, we won't be able to do daily accounting rollups unless we intervene. This also impacts how quickly we can react in repair scenarios, and how frequently objects may get audited. This is a pressure cooker with no release valve. We need to build the release valve.

Intended Outcome:

We can get through each task that observes the metainfo loop (accounting, repair, auditing, gc, etc), without requiring a single sequential sweep of all objects. We would like to be able to horizontally scale the metainfo loop so that we can have more subsections running in parallel.

This will probably require individual solutions for each of the existing metainfo observers - garbage collection will likely be handled differently than accounting or repair checking, for instance.

How will it work?

The broad approach is to have multiple cores running concurrently, processing portions of the metainfo table at the same time.

For metainfo observers that we can more easily eliminate the need for the metainfo loop altogether (such as efficient reverse indexes, etc), we should spend some timeboxed research time evaluating that as well.

Milestone: https://github.com/storj/storj/milestone/25

Forgotten Deletes

In the event of a DB restoration from backup, it may be possible for deleted segments to be restored. This would result in the satellite auditing these pieces and the nodes will fail.

Nodes need some way to prove that pieces were legitimately deleted.

Mentioned on:
https://storjlabs.atlassian.net/wiki/spaces/NEWS/pages/925434065/Durability+2020-10-12
https://storjlabs.atlassian.net/wiki/spaces/NEWS/pages/890273817/Durability+2020-09-28
https://storjlabs.atlassian.net/wiki/spaces/ENG/pages/1716781220

Formerly DUR-58

Secure Custom Domain Support (TLS) - Linksharing

Background

What is the problem/pain point?

The Storj Linksharing Service does not support HTTPS for custom domains. Current documentation for using a custom domain with Linkshare: https://docs.storj.io/dcs/how-tos/host-a-static-website/host-a-static-website-with-the-cli-and-linksharing-service/

What is the Impact?

Customers who want to provide access to objects in Storj buckets to their own end users from a web frontend use Linksharing. When they want (or need) to have a custom domain that is secured with TLS for the content served, the url must be proxied through a CDN such as Cloudflare or BunnyCDN. A couple problems unfold as a result:

Proxying certain types of content, such as streaming video, is against the terms of service for providers such as Cloudflare – so these customers are stuck
CDN providers (without a TOS restriction) cache content, reducing the amount of egress from Storj and adding an unnecessary third party

Storj DCS customers often have strict requirements from their own customers, including strict security reviews and policies. There are customers who have to whitelabel every domain that can be accessed, and the security teams won’t approve a generic hosting domain, such as linkshare, because understandably unblocking that opens up more than they want.

Why now?

This particular pain point is trending among existing and new customers.

Requirements

User Story

As a Storj DCS user, I want to host content stored in DCS on my own domain via https, so my content is delivered securely and with a link my customers will be comfortable with.

Acceptance Criteria

User can define a custom domain for linksharing
- Storj generates certificate on behalf of users
- User has a way to prove ownership of domain to generate the certificate
Linkshare URLs generated for this particular customer should have an additional URL, that contains their custom domain

Out of Scope

"Bring your own certificate"
- Customers might want to bring their own certificate rather than rely on the autogenerated LetsEncrypt cert
- In this case, users would upload their own certificate (such as a DigiCert certificate)
- This will be saved for future consideration on the roadmap

Measures of Success

Egress

Increased Egress via Linksharing

New Customers

Increased # customers using Linksharing with Custom Domains

Project Milestones

Project Initiation: https://github.com/storj/gateway-mt/milestone/8
Research: https://github.com/storj/gateway-mt/milestone/11
Limits: https://github.com/storj/gateway-mt/milestone/12
TLS Termination: https://github.com/storj/gateway-mt/milestone/13
Backend Work: https://github.com/storj/gateway-mt/milestone/14
Observability: https://github.com/storj/gateway-mt/milestone/15
Paid Tier: https://github.com/storj/storj/milestone/31

Project Blueprint

Blueprint: Linksharing Secure Custom Domain Support (TLS)

Link Sharing Pack Support

Summary:

Storing small files individually is not ideal; compressing them into a zip is the best way to store them. We do not have the ability to download or view parts of a zip file without first downloading the entire thing.

Pain Point:

Users are unable to view the contents of a zip file that is uploaded to v3 without first downloading the entire zip.

Intended Outcome:

Users have the ability to view the contents of a zip file without having to download it. They would also have the ability to download a specific file within the zip so that they do not have to download the entire thing.

Links:

Milestone: https://github.com/storj/gateway-mt/milestone/2
https://review.dev.storj.io/c/storj/gateway-mt/+/6494
https://github.com/storj/gateway-mt/milestone/2

Related tickets

#29
#30

Usecase Onboarding Flows

Summary:

When a new user creates an account on a satellite we want to have an understanding of what they may be coming to us so that we can provide onboarding flows catered to the user's use case and serve helpful content per the user's use case.

Pain Point:

The V3 network has a handful of use cases. Users need to have documentation present in order for them to get set up properly in their use case to use the product smoothly.

Intended Outcome:

When a user onboards to a satellite they are provided the directions they need to get set up for their desired use case along with helpful content they can take advantage of.

How will it work?

Upon creating an account, a user will answer a couple of questions that will trigger a dedicated onboarding guide for their use case based on the information they provided.

Send Node Alerts from Satellite Service for timely email arrivals

When a node has a change in status, we want to alert them via the satellite service instead of using a dataflow and sending via customer.io, so that node operators receive their status alerts in a timely manner.

Emails sent from the satellite upon important node status changes (e.g. suspended, disqualified, offline for a certain amount of time). Look at the emails being sent from Customer.io, and update this list accordingly - PRD: https://storjlabs.atlassian.net/wiki/spaces/PM/pages/2196045827/Send+Node+Status+Emails+via+the+Satellite+or+update+logic+in+c.io
The implementation is not as important as the end result. One potential implementation is an "email queue" similar to the repair queue. The audit service, when a node's status changes, can add an email to this queue to be sent by a separate service on the satellite which processes this queue.
Email sending should not reduce audit performance

Geolocation API

Background

What is the problem/pain point?

Customers want to be able to identify the (approximate) geolocations of all segments/pieces on the network for a given "dataset."

Who is impacted?

All customers/users

What is the impact?

It is not possible to easily see the geolocation of object data (with the exception of files that are shared via Linksharing). Customers will be able to easily see and showcase where their data resides. If customers are using geofencing, they can confirm the locations within their selected region that the data resides.

Requirements

User Story

As a Storj user, I want to be able to retrieve the geolocation data for all the pieces of a specified dataset stored on Storj DCS so I can use them in a web application or data visualization tool.

Acceptance Criteria [WIP]

Geolocation data can be retrieved for a given dataset
- A dataset can be one or many objects within a whole bucket, prefix, or single object
Geolocation data can be retrieved by a frontend application (such as a data visualization tool or web app) without the use of a server
- If we go the route of dumping the JSON into a DCS bucket, linksharing can be used to enable this
Geolocation data should be refreshed at least once per week (may not be relevant if generated real time)
User should be able to have Geolocation data publicly accessible OR private (limited) access
- Enabling public view of Geolocation data should not expose the underlying objects

Success Metrics

Geolocation data can be easily retrieved for a given dataset
Performance isn't degraded

Key Considerations

The dataset may not be at the "root" of the bucket so should be able to handle whole bucket, prefix, or single object
The dataset can be comprised of many objects

Open Discussion/Questions

How is "dataset" defined? Will it be a single file? Multiple files?
- The answer to this will impact the design (bucket level, object level, object tag level, etc)
- What is the frequency in which the data needs to be refreshed? Does it need to be refreshed?
Does this functionality need to live in uplink? S3 gateway? both? Doesn't matter?
Are we able to share the coordinates generated (any licensing restrictions?)
Our use of geolocation database requires attribution unless we purchase a commercial license ($456/year)

Possible Design

Add new endpoint/method to Linksharing that returns GeoJSON (similar to the map endpoint: https://link.storjshare.io/s/accesshere/bucketname%2Ffilename?map=1)
Dump JSON into Buckets:
- Use object tags (PutObjectTagging) or other metadata when uploading a dataset to DCS
  - If a dataset if we need to perform this query on a bucket, the use of object tags might not be the right solution (note that we do not currently support PutBucketTagging)
- Periodically "dump" a GeoJSON file into a specified bucket for a specified object tag - say each week
  - or even simpler per request, but don't generate a new GeoJSON if the existing one is younger than, say, a week
  - Or even simpler -- a one time operation that generates the GeoJSON once and dumps it into a bucket

Milestone(s)

https://github.com/storj/gateway-mt/milestone/19

Requestor Pays

Background

Requestor pays is a pricing model where the user who initiates data transfer or accesses the data is responsible for paying the associated egress fees. The person storing the data is responsible for paying the storage fees.

What is the problem/pain point?

Users cannot share datasets with other users without paying for the egress fees when these data sets are downloaded.

Why now?

We have had customers request this functionality. A great example is the blockchain snapshot use case; in this situation, users would like to share snapshots without being liable to pay any egress fees when those blockchains are downloaded.

Requirements

users can download datasets from other users
the user downloading the dataset is charged for the egress
the user storing the data is NOT charged for egress other users use

User Story

As a user who needs to download a data set from another user
I want the ability to pay for the egress I use when downloading that data
So that the other user will make it available to me without having to pay for my egress.

Success Metrics

how many users utilize this functionality?
how much egress is charged to users who download data that other users make available.
how many buckets or objects have used this functionality?

All Projects Dashboard - improved UX for viewing and managing projects

What is the problem/pain point?

There are a few pain points that we hope to address:

Users don't have a way to see all their projects in a single view.
User is always defaulted to their first project in the list and it's not always clear how to get to the project a user wants to access when they log in.
People need to select a project first when logging in, to enter the correct project level passphrase. If people don't want to view project or their data, but only there for billing or account info, they can do that with the new projects dashboard.

What is the impact?

Confusion around the project level passphrase. Inability to get to billing without being prompted for a passphrase.

Why now?

We switched the passphrase flow so it asks for a passphrase when opening a project.

Links:

Milestone: https://github.com/storj/storj/milestone/1
Figma: https://www.figma.com/file/HlmasFJNHxs2lzGerq3WYH/Satellite-GUI?node-id=14389%3A131874

Object Versioning

What is the problem/pain point?

Object Versioning is a key feature of S3-compatible object storage that Storj does not currently support. In order to attract and retain enterprise customers, object versioning needs to be supported.

What is Object Versioning?

Object versioning maintains multiple versions of every unique object in a bucket. Each time a new version of the same object is uploaded to the bucket, a new version is created. In addition, when an object is “deleted” a Delete Marker is added to the object rather than the execution of a hard delete. This allows for recovery of deleted objects in addition to recovery of previous versions of the object.

What is the impact?

Currently, Storj users have no way of keeping track of changes to objects in Storj DCS, and therefore customers have no ability to make use of the added benefits of versioning. Versioning would allow more control that users are already accustomed to having with other cloud storage providers and specifically enable use cases that require recovery from accidental deletion or overwriting of files.

Why now?

All major clouds (AWS, Google, Azure) support object versioning, so it is something all customers expect out of the box. As we begin to onboard enterprise customers with sophisticated use cases, closing this gap in S3 compatibility will bring us closer to our goal of being enterprise grade.

Partner Value Attribution

Summary:

As a part of our partner program, we want to incentivize the use of our network for data storage needs. For Example, an open-source project can build an integration with Storj and the incentive for that project to do so is we will do revenue sharing with them.

Pain Point:

Currently, our value attribution system requires us to specify partners on a file in the satellite. We want to change how this works so that we can keep track of user agents from projects that are not on that list.

Intended Outcome:

Partners can pass a user agent through their application to Storj and start to receive attribution for all of the data storj on Storj for them.

links

Milestone: https://github.com/storj/storj/milestone/7

Remaining strong consistency cleanup

After #18 is done, we need to make sure we can declare success and market strong consistency across our product. #18 should be sufficient, but we need to confirm.

Acceptance criteria: all Uplink operations provide at least strong read-after-write consistency (https://aws.amazon.com/blogs/aws/amazon-s3-update-strong-read-after-write-consistency/), but it would be better to be higher up on https://jepsen.io/consistency.