application-research / outercore-eng-kb Goto Github PK

View Code? Open in Web Editor NEW

5.0 6.0 0.0 6.22 MB

Official Knowledge base repo of Estuary

Home Page: https://estuary.tech

estuary knowledge repo

outercore-eng-kb's Introduction

Engineering

Engineering, a team at Protocol Labs.

UI Checklist

Logo: ꧁𓀨꧂ Website: https://arg.protocol.ai Projects: https://github.com/application-research

Goals

Obtain and nurture developer adoption over tools that benefit our ecosystem.
Create prototypes that people would actually use.
High-frequency shipping of upgrades, new templates, prototypes, and full applications.
Do qualitative and quantitative research with users when an idea catches adoption.
Share all source code, process, and ideas with the greater ecosystem.

Live assets

https://estuary.tech ➝ Automated Filecoin storage deals with retrieval and replication.
https://docs.estuary.tech ➝ Our documentation product for Estuary.
https://file.app ➝ Miner index aggregation and Filecoin price data.
https://storage.market ➝ Prototyping on the easiest way to make a storage deal.
https://storage.market/f08403 ➝ Prototyping on miner profile pages so they can receive easier storage deals.
https://github.com/application-research/origin ➝ A toolkit that everyone in the world can use to make a Web3 application or static website.
https://github.com/application-research/next-bucket ➝ A template for Textile, Metamask, and other integrations to be quickly put together for a Web3 Application.
https://github.com/application-research/next-sass ➝ A template currently being used for event websites.

Repositories

https://github.com/application-research/estuary ➝ our custom IPFS node for retrieval
https://github.com/application-research/estuary-www ➝ GUI to upload files to this custom IPFS node.
https://github.com/application-research/wallet ➝ Electron Filecoin wallet that supports ledger.
Barge
Metaboard

Technology Knowledge Base

Reference Architectures and Designs

FVM Website Hero Animation

CryptoComputeLab

Estuary Storage Provider Feedback

Estuary Client Considerations

Storage Product Intelligence

Estuary Infrastructure

Estuary Stability

Metrics Tracking and Metrics API

Estuary - Auto Retrieve

Proposal: Collections API V2

Proposal: Directory API

Proposal: API Versioning for Estuary

Proposal: Proxy-Forwarder

Proposal: API Gateway

Proposal: CM and AR Component Separation

Proposal: EstuaryFS

Proposal: EstuaryV2 / WhyPFS

Proposal: Estuary CLI

Proposal: Estuary Desktop App

Proposal: Estuary Mobile App

Proposal: Estuary Browser Extension

Proposal: Estuary sidecar

outercore-eng-kb's People

Contributors

Stargazers

Watchers

outercore-eng-kb's Issues

Estuary Messaging Queue for Events

Proposal: Estuary Messaging Queue for Events

Contributors	@en0ma, @alvin-reyes
Status	Draft
Revision

Proposal/Overview

Replace the current inter-process communication system(WebSocket) to use a queue (kafka, activemq, or other queuing system).

EstuaryV2

Proposal: EstuaryV2 / WhyPFS

Contributors	@alvin-reyes
Status	Draft
Revision

Proposal/Overview

Single node model of Estuary built from different importable microservices

Approach

We're going to use Go micro - a microservice framework for go lang applications.
gRPC for services communication protocol
whypfs-core for p2p node
postgres database
nsq for queueing

Breakdown of the services

Core services

1 - database
2 - blockstore node
3 - gateway
4 - storage-deal-making
5 - retrieval-deal-making

Outer services

1 - authentication
2 - logging
3 - pinning
4 - collections
5 - staging-buckets
6 - queue

Definition of done:

1 - dockerfile
2 - docker-compose.yml

WhyPFS Node

WhyPFS-Core
- We should let this IPFS node take care of get, put, manage the data store
- This will also take care of the peering and libp2p host configuration
Storage Provider Functionality
- Miner and Miner Selection
- Filecoin Deal (filclient)
Users Functionality
- Users, Admin
- Upload, Download / Retrieval
- Collections
- File/Directory upload
- All existing Estuary V1 endpoints will be available
Content Manager Queue
- Redesign the bucket functionality that stages the CIDs using boltDB
- Content Manager queue checks each buckets of CIDs
- Buckets are created and a garbage collector removes empty or processed buckets
- Commitment Piece computation
- Miner selection and deal maker
Auto Retrieve Queue
Gateway
Tooling
- S3 driver
- Barge?
WebUI - file manager / metrics

The new Estuary node will be a single node model where the actors can either be an API node, Core Node or Both (API and Core Node).

API nodes will have a gateway and a web ui.
Core Node can process Contents/CIDs and push them to Filecoin. It will also come with a web ui to manage the shuttle.

Technical Design

WhyPFS Core

WhyPFS core - the core node that will be use to peer with other WhyPFS nodes. This is an importable module that has IPLD Dag Service built in. It’s also the main component that manages blockstore, datastore.

WhyPFS Node

Holds all the functionality from Estuary
- Content and Directories
- Collections
- Stats and Metrics
- WebUI / Explorer
- Storage Providers
- Content Manager using WhyPFS Queue
  - Deal Making using Filc

WhyPFS Queue

This is a batch job framework built it on the WhyPFS node. This will run several pre-configured or customer configured jobs within the lifespan of the running node including Content Management Queue, Pinning requests / Auto Pinning, and Garbage collection

Custom Queue Job: Bucket / Queuing Jobs for Processing Content and Deals.

There will be 2 types of bucket

Dedicated Bucket
- User can create their own buckets and the number of files it can hold. When the user chose to a bucket, the cid will be associated to that bucket. The bucket then will be processed by the global timer (blocktime).
Global Bucket
- A global bucket are for cids that doesn’t have a bucket defined upon upload.

Bucket Components

Buckets Creator
- creates bucket objects
  - Note: this is a bucket table with specs
    - Threshold size
    - Schedule (run)
- check unassigned cids
- If there unassigned cids, create new bucket
- Assign cid to bucket objects
- Notes:
  - We will need a table to store the bucket information
  - Content will have a new column “bucket-uuid” to indicate what bucket the content is assigned.
Bucket Processor
- check buckets in init status
- Check if buckets either have cids over size threshold or time is more than 2 days old.
- If both or one of conditions met, then submit bucket for deal creation
- Create deals for each content on the bucket. Lookup cid.
- Set status of each content (6 deals)
- Set the status of bucket to complete
Bucket checker
- checks existing completed buckets to ensure that al cids have deals or have 6 deals
- Create deals if one of the cid doesn’t have deals yet.
Bucket GC
- Clean up buckets that are more than 3 months old
- Clean up cids that are more than 3 months old and still failing

WhyPFS WebUI

This the webui interface for WhyPFS. This will be a dashboard of stats for each node. The stat includes:

CIDs
Uptime
Performance
Peers
File Manager

WhyPFS - Service

Proxifier - load balancer layer

Authorization - decoupled the authorization that nodes can opt-in.

Deliverables / Definition of Done

Code changes (Code/UT)
SQL File to add the new table
Swagger documentation changes
Documentation changes

[Idea/Proposal]Commp Standalone

Problem

EdgeURID currently does aggregation and commp and these 2 functional aspect of the system are resource-heavy processes.

To define why these are resource heavy

Aggregation is a functional aspect of Edgeurid that groups small files into large collections. It uses the abstracted feature called buckets to collect the files, aggregate and generate a CAR for all of them.
Commp is the process of generating piece information and is the main unit of negotiation for data that users store on the Filecoin network. Generating a commp requires generating proof that can consume significant RAM relative to the size of the CAR file.

When there's too much aggregation and commp being in done parallel, edgeurid demands more resource which, if it doesn't get, will terminate on it's own (OOM).

Stats/ Metrics

Running a 64GB RAM Linux OS seems can only accommodate 5 parallel (4GB to 6GB CAR size) Commp
*TBA

Solution

My proposal is to separate the COMMP from the aggregator. We can create a COMMP Node which can PULL CAR files from a given edge node, run the piece commitment logic and return the piece information back.

Estuary Barge V2

Proposal: Estuary Barge V2

Author	Alvin Reyes
Status	Draft
Revision

This is a WIP

Proposal/Overview

With the birth of whypfs-core, we should revisit re-building a CLI tool for uploading large sets of data either by creating a new tool or building a new version of Estuary barge.

Solution

Barge has a simple requirement. Allow any cli users to pipe, stream upload data (files, cars and directories) from a dedicated node to Estuary. With whypfs-core, we can use the core node to rebuild the servicing functions (uploads and download) while retaining the same libp2p identity, blockstore and data store - something that bargeV1 fails to retain.

Technical Design

re-introduce the folllowing
node creation using whypfs-core - strip the node initialisation and use whypfs-core instead
Plumb - upload file, car and directories - store the CID on local blockstore and call estuary /pinning/pins passing the CIDs. Use the current local node as “origins” to allow Estuary pull the data from local node.
instead of a terminating cli, barge should run as a daemon that runs on the back that can be called by the user from another terminal session to run commands.
Using whypfs-core will allow the user to reuse the same libp2p key, blockstore and data store if the daemon gets terminated.
Introduce several methods of uploading and downloading data from estuary or any peered IPFS node.
Support for retrieval from AR server. This will be inherited from whypfs-core as part of the bootstrap peering.

Breakdown of task / features

Barge

Cli daemon
- Whypfs-core
  - In memory data store
  - Persistent libp2p key
  - Persistent blockstore
- Whypfs-core peered to estuary api and shuttles
- Whypfs-core dag service to add files and dirs
cli
- API calls to estuary (pinning pins)
- Features
  - Add file (stream)
    - File
    - Car
  - Add directory
  - Every time we add a file or dir, we need to call the pinning/pins and pass the CID to it. This way, estuary can create a deal for those CID. The catch here is that the barge should live long enough for estuary to pull the blocks from barge. This is why we are building a CLI daemon for it.
  - Monitor progress
    - Listen to queues or REST API pins (http)
    - listen to topics for new messages OR just check DB via rest
- Advance features
  - chunk add (upload)
  - Stream upload
  - Stream download
Gateway / webui

Deliverables / Definition of Done

Code changes (Code/UT)
SQL File to add the new table
Swagger documentation changes
Documentation changes

Documentation - Data Prep / Deal Making

In order for our users to know how to use our tools, we need to define some tutorials for them. This github issue outlines what I think we should have so we can approach all types of target users and give them the guidance they need to start using FDT.

There are two functional aspect that our tools solves, data prep and deal making.

Data Preparation

Data prep guidelines for Users

How to prepare data for delta
How to prepare data with ptolemy
How to prepare data using Edgeur
How to prepare data using Edgeurid
How to use Car gen tools (chunker, go-car)

Data prep guideline for SP

How to prepare data with ptolemy

Deal Making

Deal making for users

How to upload data to edgeurid
How to upload data to delta

Deal making for SPs

How to make deals with delta-dm
How to make deals with delta
How to make deals with edgeurid

Hybrid

How to prepare data with ptolemy and use delta to make deals
How to prepare data with ptolemy and use delta-dm to make deals
How to prepare data with edgeurid and use delta-dm to make deals
How to prepare data with edgeurid and use delta to make deals

Directory API

Proposal: Directory API

Author
Status	Draft
Revision

This is a WIP

Proposal/Overview

Estuary currently doesn’t support uploading directories. In order to broaden the scope of our target users, we need to have support for both files and directories.

Solution

A directory in the IPFS sense is a collection of specific type of node (DirNode) that can be associated with different links (ChildNode). The DirNode creates the initial underlying structure of the merkle which then can be used by the developer to programmatically add links to it to form a “directory” structure in a merkle dag format. This merkle dag then is stored on the estuary node blockstore via go-blockstore library.

Assumptions

Directories are similar to ipfs add -r where it’ll take a directory, recursively go thru the directories and add each nodes as either parent (dir) or child (dir or file)

Technical Design

Impacted components - estuary-www and estuary node

Upload directory using js-ipfs / estuary-www - we will need to include the js-ipfs to estuary-www to allow the user to upload the directory via the browser with built-in IPFS. We then use the js-ipfs to create the DIR/Child structure from estuary.

Upload directory using endpoint - introduce a new uploadDir endpoint that accepts a raw json file with the CID or metadata generated from the frontend

We can either create the CID for each file and directory
We can also base64 the files and pass them as is on the uploadDir endpoint (limitation: size)

Endpoints

/directory
/directory/file
/directory/upload - pass a json object with name and base64encoded string of each file. Note that a base64encoding has a limit of 192mb only. Which is more than enough for most websites.

Testing

Deliverables / Definition of Done

Code changes (Code/UT)
SQL File to add the new table
Swagger documentation changes
Documentation changes

Total number of users (Query) select count(*) from users

Idea/Proposal: EDGE-URID

Edge-urid

Functional / features

Content creator
File aggregator
- Aggregates small files (> BUCKET_AGG_SIZE) to bucket
- The last file to be added can exceed the BUCKET_AGG_SIZE
- Creates a CAR file for the bucket
File splitter
- Splits a large file into (SPLIT_SIZE configuration)
- Applies to files that are larger than the BUCKET_AGG_SIZE.
- The large file will be associated to a new bucket with each split being added as a content
- Creates a CAR file for the bucket
Bucket-creator
- If the content has a miner, a bucket will be created for the miner and for the content. This bucket will need to be filled based on BUCKET_AGG_SIZE.
Deal-checker
- If bucket is already replicated more than (MAX_REP) times, it can be deleted by the ADMIN
Gateway for serving content

System objects

Content
Content_Deals
Buckets

Endpoints

To serve file

To add a file

add
- File, bucket_uuid (optional), miner (optional)
gw
- File, bucket_uuid (optional), miner (optional)
fetch-from-url
- File, bucket_uuid (optional), miner (optional)

To get buckets (pull model from SP)

get-available-open-buckets
- Return list of CAR files and COMMP and deal request metadata
Get-open-bucket
- Return the CAR file and COMMP and deal request metadata
get-private-bucket (private)
- Return list of CAR files and COMMP of each for a specific bucket

To manage buckets

create-bucket (admin or anyone)
- create a bucket with specific meta/keys
- bucket_uuid
- miner
delete-buckets

To get status

status/content
status/bucket
status/cid

To get stats

stats
- Number of content
- Number of buckets
- Number of deals attempted
- Number of deals made

FEVM/FVM Development Docs and Tools Platform

Proposal: FEVM/FVM Development Docs and Tools Platform

Contributors	@alvin-reyes
Status	Draft
Revision

The idea is to create a full e2e platform for educating developers on using FEVM with a pre-defined framework of tools.

Build different contract creation tools and SDKs that wraps the creation and management of smart contracts. This includes incorporating estuary and filecoin into each of the SDKs.
Build examples using the SDKs built from bullet 1.
Create tutorials, how-to guides and quick scaffolding examples of creating contracts.
Provide examples to get users and potential contributors to engage with us to build more content based on the tools we built.
Create a academic driven bootcamp using the tools we built.

Details or the design to follow

Estuary on FEVM (estuary.sol)

Idea/Proposal: Estuary on FEVM

Contributors	@alvin-reyes
Status	Draft
Revision

Proposal/Overview

Create a estuary.sol that wraps some functions of estuary. This idea might not be the best way but still worth trying.

The design is to allow solidity developers to pin cids and record their request on chain via a standard estuary abstract contract and a estuaryfevm specific js to launch external compute service that will run the pinning / estuary process.

Components:

estuary.sol
estuaryfevm.js
wasm whypfs-code (node on browser)

The clear assumption here is that we cannot pass a file to solidity but we can pass a CID that we pinned from an WHYPFS node (cid).

HL steps of implementation:

we create a compute (docker image) to process requests. This can be a docker image or a process that can accept args and it has a whypfs-node on it.
estuary.sol will have functions like pinToEstuary, makeDealsForCid etc but it'll only use the IPFS hash of the compute docker image to pass the CID.
when a client calls a contract function (pinToEstuary(cid)), it calls the IPFS hash of the compute docker image, passes the CID to the compute docker image and perform the "centralized service" to pin the CID. The IPFS instance can be a WASM compiled component that can run on a browser (on the DAPP).

function pinToEstuary(string memory cid) estuaryFevmRequest // make our own interface for this {
     //baseURI of the docker image
     return baseURI + "?cidToPin=" + cid;
}

We will have to force the user to use estuaryfevm.js. the JS file checks the ABI to see the tagged function estuaryFevmRequest

when the user calls the contract using fevmestuary.js from a web app, it needs an whypfs instance on the browser. The compute docker image does the pinning and communication with Estuary but the transaction itself is persisted on FEVM.

Collections V2

Proposal: Collections API V2

Author	Outercore Engineering
Status	In-Progress
Revision

This is WIP

Proposal/Overview

The collections API is seemingly used as a directory upload rather than a grouping mechanism and I think we need to create a distinction between the two use cases. A directory in the IPFS sense is a collection of specific type of node (DirNode) that can be associated with different links (ChildNode). The DirNode creates the initial underlying structure of the merkle which then can be used by the developer to programmatically add links to it to form a “directory” structure in a merkle dag format. This merkle dag then is stored on the estuary node blockstore via go-blockstore library.

To this day, we observed that most users who uses collections api treat them as directories and use them as such. This is clearly not the case since it won’t perform the same as a directory. primarily because the current version lacks all the endpoints to manage a collection “as a” directory.

Solution

We need to have a distinction between collections API and directories and my proposal is to re-vamp the collections API as a tagging mechanism as oppose to a directory system. We should create a new API for directories which will be similar to ipfs add -r command and use DirNode for directories.

For collections, we should just group them on the database level (postgres) and make sure they are retrievable via an endpoint. The creation of a new merkledag for this approach is completely optional since the source of “grouping” will be from the database.

Assumptions

Collections API will only live in the context of Estuary. Source of truth of the grouping is on postgres database.
The Collections API will not be propagated to any other storage provider services similar to Estuary since it’ll be a unique feature only to Estuary.

Technical Design

Descriptive

create a tag
- insert database entry to collections table with type = tag, tag column = tag name
add a content to a tag
- get tag name
- query database with tag name
  - validate if exist. if it doesn’t exist. create a new tag with the name
- insert database entry on collections table with type = child and tag column = tagname
add list to a tag (cid or content id)
- get tag name
- query database with tag name
  - validate if exist. if it doesn’t exist. create a new tag with the name
- loop through the list
  - For each child
    - insert database entry on collections table with type = child and tag column = tagname

Endpoints

/collections/tag
/collections/tag/:tagname/:content - content
/collections/tag/:tagname/contents - list of content
/collections/tag/:tagname/cid - list of content
/collections/untag/:tagname/:content - content
/collections/untag/:tagname/contents - list of content
/collections/untag/:tagname/cid - list of content
/collections/commit/:tag - tag
/collections/download/:tag - tag
/collections/search/tag/:tag - search key words
/collections/search/ - search keywords

Testing

User scenarios
- As a user I want to manage a tag
  - Create
  - Delete
  - Update
  - Rename
- As a user, I want to add content
  - add a single content using estuary content id
  - add a list of content using estuary content id
  - add a single content using CID
  - add a list of content using collection of CID
- As a user, I want to unpin content
  - unpin already added content
- As a user, I want to commit a tag
- As a user I want to search
  - Search by tag name
  - Search by content id
  - Search by CID
- As a user I want to download
  - Download by tag name
  - Download by content id
  - Download by CID

Open Items

All transactions will include a database interaction and blockstore
We can introduce an uncommit with a given CID.

Deliverables / Definition of Done

Code changes (Code/UT)
SQL File to add the new table
Swagger documentation changes
Documentation changes

Collections API Call (mike, gabe, lawrence, alvin)

we need directories first
add file manager (create folder structure on backend)
- user can CRUD folders
- maybe symlinks (good to have)
database mocks what user sees as a directory
background worker to create ipfs
tagging: use ipfs metadata tag

Estuary Stability

Author	Alvin Reyes
Status	In-progress
Revision

Overview

These are the things we need to accomplish to get Estuary to the alpha and post alpha stage. I’d like to look at each as Pillars with each being built and should perfectly levelled to stablize the Estuary platform.

This is all Tech. No productization / product lifecycle steps here.

Github Project: https://github.com/orgs/application-research/projects/7/views/5

Priority	Improvements Issues	Conversations / Discussions
1	System Errors	All systems error that estuary encounters
2	Infrastructure	All the action items we need to do to stablize the infrastructure along with the code changes
3	Data Clean up	Any stale data we need to remove or clean up
4	Debugging	All the action items we need to do to debug or provide us more information on how to debug.
5	Functional	All functional / design / code that needs to be optimized and improved
6	Support	All the action items I think we need to do ensure we have the proper customer support

System Errors (Panics)

All on it’s own page. We need to handle all the panics.

Log file:

log_file_from_shuttle6

msg":"couldnt decode pid

pinning queue error: context canceled\nfailed to walk DAG\nmain.

failed to handle rpc command: Unable to send restart request: exhausted 5 attempts but failed to open stream to

pinning queue error: context deadline exceeded\nfallback provide failed\nmain

tried to add pin for content we failed to pin previously

failed to handle rpc command: failed to compute commP

failed to handle rpc command

Infrastructure

Grafana agent on ansible so we can source out the storage of the logs to grafana. We save up some space if we do so. We do have this on shuttles, but not enabled properly. https://filecoinproject.slack.com/archives/C016APFREQK/p1665703251824289
Install / Enable agents on all shuttles - enabling these agents will ensure that logs are stored on grafana only.
Document the release and deployment (https://www.notion.so/Estuary-Infrastructure-40ddc4cd518d478a81b76f5c0df1a276)
Troubleshooting guide for the infrastructure - I’ll be adding more information on this.
Back up and restore (enable data and blockstore backups) - I’d like to work with infrastructure point of content for this.
Infra improvement: Dockerize all components
Infra improvement: Create a simple kube cluster for POC
E2E Test Env: estuary + lotus + boost

Data Clean up

https://filecoinproject.slack.com/archives/C016APFREQK/p1660258369066179

Write an SQL script to remove the majority of the non-active pins (14m plus records) on shuttle-4, e.i Delete all non-active pins.
The negative impact of removing these records is that if some of the pins are on the blockstore then anyone who uses the /gw will fail to look up the CID since this gateway relies on the database record.
There might be some failed pins that are yet to be processed by the SP so lost of opportunity there.
Another solution is to create a clean up script on shuttle-4 to traverse thru the blockstore using the CIDs from the pins table, identify those that the shuttle can't "walk" - meaning it's in the database but not on the blockstore (using merkledag.Walk), and delete them on the database. It will be like a "estuary shuttle reconciler" tool to match the blockstore CID with the pins table.
Write scripts that can perform backups on specific filters.
Write SQL script to delete the CIDs that doesn’t exist on the blockstore of the local node.

Debugging

Enable developers that they have the proper debugging tools (GoLand).
Set up dedicated shuttles for each developer (for dev testing)
Enable pprof on all shuttles and api node
Enable grafana agents

Functional

Revisit the pinning mechanism
- We need to revisit the pinning process, specifically the infinite loops and initialization of workers to pin specific content. The current process right now is causing a build of unnecessary memory allocation on the PinningOperation which contributes to the OOM issue.
- https://www.notion.so/ecosystem-wg/Pin-Manager-use-a-disk-based-queue-757a79f9fc8d47b09a2f46112d2c423c
Revisit the queueing mechanism
- I’d like to explore the possibility of separating the queuing from the main api node. We had discussions on this before and I would like to revisit.
Revisit all the infinite for loops and check if we need to create intervals or optimize them.

shuttle.go

handleShuttleMessages
autoretrieve.go
shuttle/main.go

RunRpcConnection

websocket connection handleRpcCmd

websocket.JSON.Send

addDatabaseTrackingToContent
handlers.go

addDatabaseTrackingContent (duplicate code)

websocket.JSON (duplicate code)

handleShuttleConnection
pinmgr.go

Run(workers int)
replication.go

runStagingBucketWorker

runDealWorker
trackbs.go
benchtest/main.go
AutoRetrieve

AR
Unit Tests (Quality Assurance) - there is a unit-tests branch that has placeholder of unit tests source files in go. I know it’s not the best thing to do so I think we should just collectively, slowly and piece by piece put up a “chore” commit to clean up and create unit tests as we go.
- Revisit: https://github.com/application-research/estuary/tree/unit-tests
Automated / Regression Tests - we should at least run the shell or postman jobs to run the API endpoint tests.
- Revisit if we can run a script to call postman passing the collection file using postman CLI: https://github.com/application-research/estuary/tree/master/tests

Functional Improvements

Proposal: Collections API V2

Proposal: Directory API

Proposal: API Versioning for Estuary

Proposal: Proxy-Forwarder

Proposal: API Gateway

Support

Customer Support Ticket System
IsEstuaryDown.com public monitoring tool

Refactor / Rearchitecture

Refactor code to its appropriate packages
Redesign

Estuary performance testing

Proposal: Estuary performance testing

Author	Anjor
Status	Draft
Revision

This is a WIP

Proposal/Overview

We should have metrics on estuary's data onboarding performance. We should be able to answer questions such as

What is the data throughput? How does it scale with increasing data size? Is there a sweet spot?
What is the maximum size estuary can handle?

The current plan is to set up datasets in increasing sizes ranging from 1GB up to 1TB and measure data onboarding performance.

Technical Design

The performance testing will be carried out on an equinix box. We will download public datasets ranging in sizes from 1GB up to 1TB and try uploading them to estuary.

Known problems

Files larger than 32GB might have issues. Once the endpoint is unable to handle the upload, we will attempt using different preparation tools such as barge and singularity.

Idea/Proposal: DataDAO

Proposal: DataDAO

Author	Gabriel Cruz
Status	Draft
Revision	0.0.1

Proposal/Overview

DataDAO is an organization that curates data stored in Filecoin, allowing its members to vote on the CIDs that should be part of the collection of curated data.

Obs: heavily inspired in idea and code snippets from https://aayushguptaji.hashnode.dev/how-to-build-your-first-datadao-factory-on-fvm

Why this is important

There has been increased demand for useful data on the Filecoin network. Allowing peers to vote on CIDs that contain this type of data incentivizes more quality of data in Filecoin.

Non-goals

We are not trying to ensure that the accepted CIDs actually have "useful" data (whatever the definition of "useful" is). We only allow for voting. It is up to the members of the organization to revise the contents of the proposals and vote wisely.

Design Overview

Storage Provider (SP) creates a proposal to add a CID to DataDAO.
DataDAO members vote on the proposal until it expires.
Once expired, if upvotes > downvotes, the CID is added to the DataDAO.

Detailed Design

Another level of detail beyond the design overview, if needed.

Creating Proposal

SP will create a proposal using the following function

    function createCIDProposal(bytes calldata cidraw, uint size) public {
        proposalCount++;
        Proposal memory proposal = Proposal(proposalCount, msg.sender, cidraw, size, 0, 0, block.timestamp, block.timestamp + 1 hours);
        proposals[proposalCount] = proposal;
        cidSet[cidraw] = true;
        cidSizes[cidraw] = size;
    }

Voting on a Proposal

DataDAO members upvote or downvote the Proposal

    function voteCIDProposal(uint256 proposalID, bool upvote) public {
        require(proposals[proposalID].storageProvider != msg.sender, "Storage Provider cannot vote his own proposal");
        require(!hasVotedForProposal[msg.sender][proposalID], "Already Voted");
        require(!votingIsExpired(proposalID), "Voting Period Finished");

        if (upvote == true) {
            proposals[proposalID].upVoteCount = proposals[proposalID].upVoteCount + 1;
        } else {
            proposals[proposalID].downVoteCount = proposals[proposalID].downVoteCount + 1;
        }

        hasVotedForProposal[msg.sender][proposalID] = true;
    }

Add voted CID to DataDAO

TODO

Miscellaneous

Data and auxiliary variables

contract DataDAO {
    uint64 constant public AUTHORIZE_MESSAGE_METHOD_NUM = 2643134072; 
    // number of proposals currently in DAO
    uint256 public proposalCount;
    // mapping to check whether the cid is set for voting 
    mapping(bytes => bool) public cidSet;
    // storing the size of the cid
    mapping(bytes => uint) public cidSizes;

    mapping(bytes => mapping(bytes => bool)) public cidProviders;

    // address of the owner of DataDAO
    address public immutable owner;

    struct Proposal {
        uint256 proposalID;
        address storageProvider;
        bytes cidraw;
        uint size;
        uint256 upVoteCount;
        uint256 downVoteCount;
        uint256 proposedAt;
        uint256 proposalExpireAt;
    }

    // mapping to keep track of proposals
     mapping(uint256 => Proposal) public proposals;

    // mapping array to track whether the user has voted for the proposal
    mapping(address => mapping(uint256 => bool)) public hasVotedForProposal;

/**
 * @dev constructor: to set the owner address
 */
constructor(address _owner) {
     require(_owner != address(0), "invalid owner!");
     owner = _owner;
}

Check if voting time has expired

    function votingIsExpired(uint256 proposalID) view public returns(bool) {
       return proposals[proposalID].proposalExpireAt <= block.timestamp;
    }

application-research/estuary#880

Idea/Proposal: Tekton Data Pipeline Framework to Onboard Data

Overview

Once we have K8s installed on our EHI, we need to start looking into Data Onboarding Tools.

I propose the use of Tekton Data Pipeline (https://github.com/tektoncd/pipeline). This is essentially a task framework that leverages k8s service infrastructure to create ephemeral task runners in the form of pods/containers.

How it'll work.

*Queue is optional.

We'll set up docker compose / dockerfile for ptolemy, delta and the downloader script i.e containerize them
Set up a Tekton task for each.
The downloader then assigns a batch to process for each ptolemy and delta.

Estuary Reputation System

Proposal: Estuary Reputation System

Author
Status	Draft
Revision

This is a WIP

Problem Statement

Estuary currently selects SPs at random when making deals. We should build a reputation system that ranks/directs deals towards SPs that perform in a way that is advantageous for our network. We will use this issue to discuss the inputs/calculations for such a reputation system.

Currently, the most important metric we should be concerned with is **retrieval performance **

Estuary currently does not provide any incentives for Storage Providers to serve up CIDs that we deal to them. This is problematic, as autoretrieve relies on SP's serving up content to work properly. Without retrievals working, it is risky to offload content from our shuttles as it may result in unretrievable files.

Proposed Solution

Autoretrieve knows the count of successful/failed retrievals per SP, and we can track this data
Using these stats, we can come up with a retrieval-based reputation score and use it to influence how we make deals (@gmelodie has kicked us off below)

Data Storage Markets

Data Storage Markets API

Repo Link

Project Description

Storage Markets is an application that tracks storage providers in the Filecoin network and provides an interface to query their statistics. These statistics are useful when making deals with storage providers.

In addition to storage statistics, the Storage Markets system will be extended to track retrieval success/failure metrics via autoretrieve, so that a Storage Provider's retrieval performance can be assessed.

Reputation

Storage Markets will provide a lightweight reputation system where SP's are assigned a score, based on their storage and retrieval performance.

Estuary FEVM Oracle Library and Oracle Execution Provider

Proposal: Estuary FEVM Oracle Library and Execution Provider

Contributors	@alvin-reyes, @kelindi
Status	Draft
Revision

I had a discussion with @kelindi on potentially creating an Oracle Service that will use Estuary as the execution provider.

Proposal

The idea is to create an Oracle library in Solidity that will run URL or Service request to Estuary.

Components:

Queues for individual jobs using nsq
Job node component for execution providers
Oracle.sol - generic importable contract for smart contracts to access different execution providers
EstuaryProvider.sol - estuary specific provider contracr
Oracle samples

Details or the design to follow

Idea/Proposal: Permissions for Api Keys and Pre signed upload urls.

Idea/Proposal: Build permissions for Api Keys and Pre signed upload urls.

Contributors	@kelindi
Status	Draft
Revision

Proposal

NOTE: This is a draft and is not finalize yet. We'll have to polish it until we all agreed on the approach.

Only give api keys the necessary permissions.

Read
Write
Read/Write
User defined limits for certain actions (ex: "An Api key limited to only upload one file")
- This could be used to implement a temporary pre signed url/api-key for serverless uploads
- User requests to upload a file a file from the frontend -> frontend receives a temporary api key from Estuary -> Frontend uses the temporary api key to directly upload the file to Estuary

FVM HeatCheck

Scratching some notes will close later...

General Issues

@snissn - filecoin-project/lotus#9839
@snissn @alvin-reyes - filecoin-project/lotus#8865 (6 months stale), collaboration with @geoff-vball (thanks)

Add the multisig approve and proposal commands plus an feature to automatically encode the json string passed as parameters .

lingering questions

what are the implications of not having/storing this index by default? Does this mean that solidity/EVM developers would need to either run, or find, a node that supports proper ethereum transaction hashes? If a developer deploys a contract to a node that doesn't have this index, do they still get a proper eth transaction back? If so, can they not poll against the receipt on the same node? What about block explorers?

resolution PR: filecoin-project/lotus#9965

Specs

@jlogelin A trustless notary - application-research/estuary#877
@jlogelin Hot Storage Protocol - application-research/estuary#878
@jlogelin Automated Market Maker - application-research/estuary#879
@jlogelin DataDAO - application-research/estuary#880
@jlogelin @jcace @elijaharita - Data Persistence application-research/estuary#881

Solutioning

@alvin-reyes - ERC 721 Contracts - https://github.com/application-research/fevm-nft-estuary

Edge nodes for Upload and Retrieval

Proposal: Edge Upload and Retrieve

Contributors	@alvin-reyes
Status	Draft
Revision

The idea is to create node instance that does the following:

node to accept uploads and queue them to estuary for pinning
node to retrieve CIDs and serve it as a gateway

This will allow us to redirect uploads to different servers instead of putting them directly on the shuttles.

Development HL guide:

With whypfs-core, we can build several microservices that we see fit to scale estuary.

We can use whypfs-core to create the node and introduce upload endpoints and pass the CIDs to the estuary api node. Estuary will take care of pulling the CID from the upload nodes.
We can use whypfs-core and create a gateway in top of to serve the CIDs from estuary or any peers. This is similar to whypfs-gateway.
These nodes needs to be peered for discoverability.
This is a single node with "mode" parameter. Either upload and retrieve, upload only or retrieve only.

will add more details later.

[Client Request][Encloud] CAR Reader

WHYPFS gateway cluster + Distributed Filesystem

Idea/Proposal: Estuary Gateway cluster based on WHYPFS + Distributed filesystem

Contributors	@snissn
Status	Draft
Revision

Proposal

NOTE: This is a draft and is not finalize yet. We'll have to polish it until we all agreed on the approach.

NOTE: This is draft draft is based on the proposal by @alvin-reyes here and is built off its formatting!

WHYPFS with Fuse+SeaweedFS to expose a distributed filesystem will allow for a very safe and scalable architecture for file storage, file ipfs pinning and file serving via http.

We want to have users data be highly available and resilient against hardware failures or even geographic or data center failures.
Moreover, it is a disadvantage and confusing UX for a user to have to declare and dedicate themselves to a particular gateway.
whypfs + Distributed File System will allow us to create a distributed pinning system with resilience from data loss due to individual drive failures, individual server failures and also entire datacenter outages.
example: We have two data centers + 5 nodes per data center. Files uploaded to seaweedfs can be replicated 2 times per data center. So we can have 10 servers across 2 data centers with 4 copies of each piece of data to guard against hardware failure. We will be able to add nodes to scale and have each node serve as a pinning gateway.
SeaweedFS can be set up as a mount point on each node. This node would have whypfs-gateway installed on it and use flatfs with the distributed filesystem mount point as its flat filesystem data store.
the put / upload and delete APIs would be protected via a secure password that only the api node would have. but get and gw api endpoints would be fully public.
the api node can be moderately changed to rely on a highly available and very fast whypfs+seaweedfs cluster without the end user knowing anything about provisioning a gateway!
over time data can be deleted from the whypfs cluster with filecoin as a long term storage backup!

Detailed plan:

SeaWhypfs

Step 0 edit this master plan document

Step 1
Investigate how much disk is used for the ipfs pin cluster usage in production.

Step 2. Spec out how many servers we would want to have in production given the amount of disk we need to store, having room to grow before needing to add more nodes, and redundancy we want in our data set. Identify what we will need ie we will need 2x data centers and 5x servers for 10 servers total with each server having xyz terabytes of disk with raidx redundancy

Step 3 Code deploy scripts
Code up ansible etc required software for deploying and managing a seaweedfs + whypfs cluster.

Step 4. Set up test cluster
Using ansible make a three node cluster with 2x disk replication in seaweedfs and put whypfs gateway on each of the seaweedfs nodes and load the nodes with at least 1tb using whypfs API endpoints and verify whypfs put and get works

Step 5. Set up full scale cluster
Deploy large scale cluster for production

Step 6. Back fill.
Clone estuary’s pin data into seawhypfs

Step 7.
Change estuary code base to take advantage of data lake. Change add pin in estuary to push to the data lake and use the gw dns for the cluster for reads. Make sure deal making API uses the new url also if it needs it.

[Client Request][Hivemapper] Change/Custom Functionality Request

Signed URLs
Stats/Metrics

Idea/Proposal: Delta Incentivization layer (Incentivizing Data Uploads) for public nodes

Details

https://docs.google.com/document/d/1Q9XYryiys5JqtufcNV4zxtoB2T2_jWwRhulWc0nL8zE/edit#

EstuaryPinningContract on FEVM

Idea/Proposal: EstuaryPinningContract on FEVM

Contributors	@alvin-reyes
Status	Draft
Revision

Proposal/Overview

Create an EstuaryPinningContract to allow users to request CIDs to be processed by Estuary.

Projects/Profiles for each user

Idea/Proposal: Projects/Profiles for each user

Contributors	@kelindi
Status	Draft
Revision

Proposal

NOTE: This is a draft and is not finalize yet. We'll have to polish it until we all agreed on the approach.

Create a projects/profiles paradigm for the Estuary dashboard, letting users create projects, and have separate dashboards and api keys for each project.

Proposed Implementation

Have a Primary user and a secondary user created for each project.

Feedback: Outercore's request for design feedback on: stFIL, SFT, Filet, and Glif

Who is reviewing...

Hi 👋 I go by Cake in the ecosystem-dev channel in the Filecoin Slack.

I am an engineer, designer, and I help manage a fund. Before joining Protocol Labs (where I have been for ~3.5 years), I was either a very early technical contributor or founding team member for 7 startups.

A few years back I designed and launched https://slate.host for consumers on Filecoin.

I obviously care a lot about Filecoin and financial services! Some of my close design peers have designed at Paradigm, Square Cash, Coinbase, DyDx, and Fey and I've had many great design conversations with them. So I'm happy to lend a critical eye on the user flows for the following websites:

stFIL, [needs work]
Filet, [almost there]
Glif [just some nits]
SFT Protocol [just some nits]

In addition, I will try not to focus on the visual design quality of your sites unless it feels buggy. Instead, I have listed some high level guidance:

So lets move on to the feedback, here are my notes from spending 10 minutes on each staking site.

stFIL

There is no way to give you my FIL easily from the marketing page.
- Other sites have a big deposit CTA, why doesn't stFIL have one? Did I miss the button to connect my wallet through WalletConnect/RainbowKit/MetaMask/etc?
The loading interstitial at the beginning is unnecessary.
The scroll lock performs poorly, a couple of scrolls up and down the page are frustrating already.
The fancy background is cutting off in a strange place near some text. It makes the page feel glitchy, here is a screenshot:

Saying you are "Community First" without showing a single member of the community or anything they have said makes me feel like you have no community.
- Maybe show some testimonials?
The security section should just list the audits and who they are by on the marketing page without having to click.
Whenever I click into a section, If I hit back, I have to see the loading interstitial again, I really do not like this loading interstitial.
Transition effect over the cards adds nothing to the importance of clicking the CTA.
Every page load I have to see the interstitial, I recommend you get rid of it.
Clicking on the Chinese translation doesn't do anything. Am I missing something?
Would benefit a lot to have some regulatory guidance
Would benefit a lot to have some tax tips for users in the US (or anywhere with more strict tax laws).
Clicking the protocol link is just a way to get the loading interstitial to appear again. That isn't fun.
The documentation site (https://docs.stfil.io/) has far more functional purpose than the marketing site (stfil.io), I feel like if you replaced your marketing site with the documentation site and added the join mailing list to your documentation site, you wouldn't need your marketing site until you had more to offer people visiting your marketing site.

Overall: Seems like a site for people who know what they're supposed to do here, otherwise I have no idea what to actually do here. I'll do something which is called "drop off", where I don't really see the point of being on this site. The "Subscribe to our mailing list" is the best part of your site, its the only portion that converts the user in a meaningful way (collects e-mail).

Would stFIL convert me? No.

Filet

IMPORTANT: DAPP should say "Stake Filecoin" instead, no one knows what a button that says "DAPP" is supposed to do.
- Improving the CTA will improve the conversion on this site.
Ledger and Metamask should be options for staking. A lot of western users probably do not have the wallets mentioned. In addition, everyone in the west is used to seeing WalletConnect or RainbowKit https://www.rainbowkit.com/. Both of those wallet integrations have good experiences.

"A trustworthy platform providing staking service stably" should say "We are providing a trustworthy Filecoin staking service"
- Bonus points: Just show who trusts you (which companies, notable users, etc), it would be a better signal than the three boxes you have. People like to see people they trust using your service.
The StakeFIL to earn FIL section could easily go up higher on the page, this is what people actually care about.
The Media Partners section carousel is broken. Before you fix the carousel, consider not having one. Instead you could just render all of the media partners out into a grid. It will look better and the user doesn't have to click or tap to see more of the media partners.
- Side note: The content inside of these posts is good! You should surface more of this content on the marketing page, it gives Filet more credibility.
The FAQ section "How does Filet Work?" has a lot of great content. Some of this content would do better as hooks on the marketing page so you can convert the user sooner.
Tell people what TVL means.
Would benefit a lot to have some regulatory guidance
Would benefit a lot to have some tax tips for users in the US (or anywhere with more strict tax laws).

Overall: The page loads fast and helps add to the feeling of professionalism. The website could use some copy improvements and more meaningful copy, I think if Ledger and Metamask are supported wallets, using Filet could become a popular option. Definitely need to fix the CTA so people understand what to click to stake.

Would Filet convert me? No.

Glif

https://www.glif.io/?txtype=send this is a smart URL to have and very good idea, everyone should learn from Glif here. Immediately I know what to do.

Connecting a wallet looks easy.
I wouldn't open with "Genesis Pool" is now "Infinity Pool", I would open with marketing explaining what this all is sooner, and then explain the name change later.
"Trusted by the heart and soul of the Filecoin community" can be better served with a real example of how the community trusts GLIF.
- Example: "Join the ten thousand users using GLIF tools today" something like this, with examples.
Page loads quick, the experience is straight to the point.
Would benefit a lot with some audit information
Would benefit a lot to have some regulatory guidance
Would benefit a lot to have some tax tips for users in the US (or anywhere with more strict tax laws).
More numbers explaining the potential returns (like Filet) could help.
Love the big deposit CTA.

Overall: Fast user conversion experience sets Glif's website apart from the rest of the staking sites I have seen so far. Seems like the goal is to get your FIL and they focused the website experience on getting your FIL. True to intent.

Would GLIF convert me? Yes.

SFT Protocol

The hero copy is worded awkwardly.

Liquid staking derivatives for Filecoin while ecological infrastructure provider

It should say something like

Ecologically friendly staking derivatives for Filecoin

I like the design, clear CTA (call to action), if I fail to convert, I'm taken (with proper visual hierarchy) to the next section where I can learn more.
"Latest News" SEO images look terrible and take away from the clean design of the site, I would just use custom ones.

You might want to bump the font weight on the primary CTA text, might make it a little more enticing to click.
On the Mint section, make sure you remind the user to connect their wallet, it doesn't hurt to show the connect button in the Mint section in case they don't see it in the top navigation.

Overall: Another website with a faster user conversion experience (like Glif). The site could use less vertical space/height and make sure they don't lose users because they missed content below the fold. The website interface also needs more testing in different viewports, some components that should obviously resize when the screen resizes are static.

Would SFT Protocol convert me? Yes.

Idea/Proposal: EV1 to use EDGE-URIDs

Proposal

I wanted to propose that we completely revamp EstuaryV1 frontend to use EdgeURID and the upcoming deal status oracle service.

Problem

EV1 frontend is the best frontend that the #ecosystem when uploading files to the filecoin network. It has all the features a user needs to upload files to the filecoin network in seamlessly. Unfortunately it's tightly coupled with estuary node - which can't handle the usage/demand. This is why we opted to decouple estuary in microservice which is now EV2, Delta and Edge.

The lack of scalability option of estuary node made it difficult to make deals. To this day, a chunk of upload request from Estuary frontend is not on the filecoin network, only on the estuary node which serves the content as a hot storage. This impacts the reliability of service and lower number of deals made thru the app.

Solution

I propose we decouple the Estuary frontend from the it's backend and use EdgeURID instead.

This means:

every content upload goes to one of the edgeurids available.
zone/staging will be represented by a edgeurid bucket.
deal status will be available via deal status oracle service.

We will need to do this in phase with the priority of onboarding data to the filecoin network
Phase 1:

change the upload to use edgeurid.
display the bucket information where the content is located.
hide the deals page for now until we have the deal status oracle available.

End result of Phase 1:

content should be shown on the page and is included on the aggregation bucket on a specific edgeurid
we will no longer use shuttles once we redirected all upload / gw to edgeurid
metrics should show the totals based on edgeurid uploads.

Phase 2:
TBD

Idea/Proposal: Perpetual Storage Contracts

Proposal: Perpetual Storage Contracts

Author	@gmelodie @elijaharita @jcace
Status	Draft
Revision	0.0.1

Proposal/Overview

This document outlines a potential scheme for perpetual Filecoin storage contracts on the Filecoin Virtual Machine (FVM).

Background

Currently, Filecoin deals are limited in length to 540 days. While there is discussion about increasing this up to 5 years, there still remains a situation where a Storage Client would like to store data for a much longer term, potentially several times the length of a single storage deal.

Benefits

A reference implementation / example for perpetual, auto-renewing storage deals would be a useful building block for others building on Filecoin and FVM

Goals

Outline, at a high level, how a perpetual storage contract on FVM could work
Call out certain areas of complexity / considerations that must be addressed for it to function
Link back to relevant code snippets that would be used for the contract

Design Overview

Use Lotus web3 client contract to make deals with storage provider

Construct DealProposal

Client Inputs

CID*
number of replicas
Initial balance
End epoch
Max. price
Fil+/Datacap

*Every parameter is configurable once the contract is deployed, except for the CID.

Detailed Design

Smart Contract Functions

Client-Side

Change Bounty
Change # of replicas
Suspend/cancel (stop renewals, refund balance)

SP-Side

Claim deal
Publish deal
Terminate deal

Functionality

Initial Replica

Client deploys the contract using the initial params
Client must "seed" the file, keeping it available and downloadable until first replicas have been sealed
SPs call claimDeal() function, indicating they want to seal and store it, and receive the download information
SP downloads the CID from the "seed" location
SP seals the CID into a sector
SP calls the publishDeal() function, indicating they have successfully on-boarded the data
Contract tracks the expiry epoch of the deal, opens up another replica slot before it expires.

Once all replicas have been claimed, the claimDeal() function simply returns the next expected epoch when the soonest one will expire.

Subsequent Replica
After a deal expires, a slot is opened up. claimDeal() returns a list of all other SPs that the CID has been replicated to, for retrieval and the next deal.
SPs can call claimDeal() and the flow is the same as detailed in Initial Replica

Dependencies

Deal needs to have another SP specified as the source location for file transfer

Performance Implications

Gas costs for various transactions

Questions

How do providers find the address / methods of the deployed smart contracts with deals available?
What happens if the storage provider fails to seal after claiming the deal?
What happens if a deal gets slashed/lost? How do we ensure the contract is synchronized with the actual state of deals on chain?

Assumptions / Considerations

Once perpetual deal is kicked off, file has to be retrievable. SP has to serve retrievals in the future.
Smart Contract "owns" the storage deals, pay for them using their internal wallet and/or datacap
There needs to be enough of a window between one deal ending at the next one starting to allow for a new SP to claim it. As it gets closer to the deadline, smart contract could increase the bounty to incentivize providers to pick it up

Mutable Naming for CIDS

Idea/Proposal: Mutable Naming for CIDS

Contributors	@kelindi
Status	Draft
Revision

Proposal

NOTE: This is a draft and is not finalize yet. We'll have to polish it until we all agreed on the approach.

Have mutable references to CID's. This could be through IPNS or an uuid stored on a db etc.
This feature would allow us to build out the following quality of life features.

A stable reference to a mutable CID allowing users to host a static site, and easily make changes without having to update DNS
Shorter gateway urls for sharing content (ex: estuary.tech/X67HG) (this is assuming we create a random uuid for the CID)

Proposed Implementation

I have tinkered with IPNS this quarter and here are the following issue I came across.

Each record needs to have a public and private key. If we are managing this on behalf of the user then it's simpler to store an uuid associated with the CID
If a user is to manage the public and private key, they need to create a new account on their wallet every time they want a mutable CID

This is why my personal preference is to create a new table in the database that stores a uuid with a reference to a CID that can be changed by the user. I'm interested in hearing other suggestions on how we can implement this feature wether it be through IPNS or my proposed implementation.

Estuary Metrics

Metrics Tracking and Metrics API

Author	Alvin Reyes
Status	Completed
Revision
Github Repo	https://github.com/application-research/estuary-metrics
Grafana	https://protocollabs.grafana.net/d/0-0ztE97z/estuary-team-metrics-dashboard?orgId=1&from=now%2Ffy&to=now%2Ffy
Github Issue:	application-research/estuary#283

Overview

The purpose of this document is to create a specification of the Estuary Metrics API.

Purpose

In order for any consumers to monitor estuary, be it their own node or the Outercore hosted estuary, there needs to be a way to monitor and consume different functional metrics that estuary provides.

As of today, there’s only one way to monitor metrics for estuary

1 - thru Grafana

2 - thru public/stats endpoint

Solution: Grafana

Started working on this: https://protocollabs.grafana.net/d/0-0ztE97z/estuary-team-metrics-dashboard?orgId=1&from=now%2Ffy&to=now%2Ffy

Solution: Estuary Metrics API

Tech Components

Go
Grafana
Gorm
Cacher
Mux
PQ
IPFS

Use cases

For System metrics, in addition to aggregate, we also want breakdown by shuttle / primary node.

System

Total objects pinned (Query) **select** *count*(***) **from** contents **where pinning**
Total TiBs uploaded (Query)**select** *sum*(**size**) **from** objects
Total TiBs sealed data on Filecoin **select** *sum*(**size**) **from** contents **where pinning and active**
Available free space (custom Grafana plugin)
Total space capacity (custom Grafana plugin)
Downtime (this is usually notoriously difficult to define) (custom Grafana plugin)
Performance (this needs to be fleshed out)

Users

Total number of Storage Providers (Query) select count(*) from storage_miners
#12
Ongoing user activity — DAUs, WAUs, MAUs etc. Are users coming back? (custom Grafana plugin) - we would need to build a tracking system for this - Persistent layer for Tracking

For Storage/Retrieval deal metrics, in addition to aggregate, we also want the following breakdowns

per day breakdown (Query)
per week breakdown (Query)
per provider breakdown (Query)

Storage

Retrieval

Implementation

https://github.com/application-research/estuary-metrics

WHYPFS dedicated gateway provisioning and subscription

Idea/Proposal: Dedicated Estuary Gateway provisioning and subscription

Contributors	@alvin-reyes
Status	Draft
Revision

Proposal

NOTE: This is a draft and is not finalize yet. We'll have to polish it until we all agreed on the approach.

We need to allow users to avail their own dedicated gateway so they can directly interact with for their contents.

1 - user needs to subscribe to a gateway. We will need to ask for the parameters

Name (for domain name)
Storage Size
Payment based on storage size + service

2 - We need to develop a wizard like page to get gateway information.

user clicks on "request" dedicated gateway". This launches a wizard step by step to get information from the user.
Information - gateway name, storage size, payment method
payment method - if user wants FIL, we use the FEVM deposit contract, otherwise - STRIPE or PAYPAL

3 - we need to develop a page for each user to navigate and manage their gateway(s).

List view of all created gateways
Dedicated page for a specific gateway

4 - middleware code

The user needs to have an API key to access each gateways.
The user needs to upload their files on their dedicated gateway and we will need to have an authentication to only serve contents from the user uploaded on the specific gateway only
License file to be generated for the user with all the META of the subscription.

5 - backend

dockerized components
ansible scripts to provision the gateway. We need to provision the gateway with the defined resources on the docker-compose yaml file. The yaml includes the META information of the subscription, resources, storage, domain name, certificate generation, server and the WHYPFS gateway.