Giter Club home page Giter Club logo

outercore-eng-kb's Introduction

Engineering

Engineering, a team at Protocol Labs.

UI Checklist

Logo: ꧁𓀨꧂ Website: https://arg.protocol.ai Projects: https://github.com/application-research

Goals

  • Obtain and nurture developer adoption over tools that benefit our ecosystem.
  • Create prototypes that people would actually use.
  • High-frequency shipping of upgrades, new templates, prototypes, and full applications.
  • Do qualitative and quantitative research with users when an idea catches adoption.
  • Share all source code, process, and ideas with the greater ecosystem.

Live assets

Repositories

Technology Knowledge Base

Reference Architectures and Designs

FVM Website Hero Animation

CryptoComputeLab

Estuary Storage Provider Feedback

Estuary Client Considerations

Storage Product Intelligence

Estuary Infrastructure

Estuary Stability

Metrics Tracking and Metrics API

Estuary - Auto Retrieve

Proposal: Collections API V2

Proposal: Directory API

Proposal: API Versioning for Estuary

Proposal: Proxy-Forwarder

Proposal: API Gateway

Proposal: CM and AR Component Separation

Proposal: EstuaryFS

Proposal: EstuaryV2 / WhyPFS

Proposal: Estuary CLI

Proposal: Estuary Desktop App

Proposal: Estuary Mobile App

Proposal: Estuary Browser Extension

Proposal: Estuary sidecar

outercore-eng-kb's People

Contributors

10d9e avatar alvin-reyes avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

outercore-eng-kb's Issues

Estuary Messaging Queue for Events

Proposal: Estuary Messaging Queue for Events

Contributors @en0ma, @alvin-reyes
Status Draft
Revision

Proposal/Overview

Replace the current inter-process communication system(WebSocket) to use a queue (kafka, activemq, or other queuing system).

EstuaryV2

Proposal: EstuaryV2 / WhyPFS

Contributors @alvin-reyes
Status Draft
Revision

Proposal/Overview

Single node model of Estuary built from different importable microservices

image

Approach

  • We're going to use Go micro - a microservice framework for go lang applications.
  • gRPC for services communication protocol
  • whypfs-core for p2p node
  • postgres database
  • nsq for queueing

Breakdown of the services

Core services

1 - database
2 - blockstore node
3 - gateway
4 - storage-deal-making
5 - retrieval-deal-making

Outer services

1 - authentication
2 - logging
3 - pinning
4 - collections
5 - staging-buckets
6 - queue

Definition of done:

1 - dockerfile
2 - docker-compose.yml


WhyPFS Node

  • WhyPFS-Core
    • We should let this IPFS node take care of get, put, manage the data store
    • This will also take care of the peering and libp2p host configuration
  • Storage Provider Functionality
    • Miner and Miner Selection
    • Filecoin Deal (filclient)
  • Users Functionality
    • Users, Admin
    • Upload, Download / Retrieval
    • Collections
    • File/Directory upload
    • All existing Estuary V1 endpoints will be available
  • Content Manager Queue
    • Redesign the bucket functionality that stages the CIDs using boltDB
    • Content Manager queue checks each buckets of CIDs
    • Buckets are created and a garbage collector removes empty or processed buckets
    • Commitment Piece computation
    • Miner selection and deal maker
  • Auto Retrieve Queue
  • Gateway
  • Tooling
    • S3 driver
    • Barge?
  • WebUI - file manager / metrics

The new Estuary node will be a single node model where the actors can either be an API node, Core Node or Both (API and Core Node).

  • API nodes will have a gateway and a web ui.
  • Core Node can process Contents/CIDs and push them to Filecoin. It will also come with a web ui to manage the shuttle.

Technical Design

WhyPFS Core

WhyPFS core - the core node that will be use to peer with other WhyPFS nodes. This is an importable module that has IPLD Dag Service built in. It’s also the main component that manages blockstore, datastore.

WhyPFS Node

  • Holds all the functionality from Estuary
    • Content and Directories
    • Collections
    • Stats and Metrics
    • WebUI / Explorer
    • Storage Providers
    • Content Manager using WhyPFS Queue
      • Deal Making using Filc

WhyPFS Queue

This is a batch job framework built it on the WhyPFS node. This will run several pre-configured or customer configured jobs within the lifespan of the running node including Content Management Queue, Pinning requests / Auto Pinning, and Garbage collection

Custom Queue Job: Bucket / Queuing Jobs for Processing Content and Deals.

image

There will be 2 types of bucket

  • Dedicated Bucket
    • User can create their own buckets and the number of files it can hold. When the user chose to a bucket, the cid will be associated to that bucket. The bucket then will be processed by the global timer (blocktime).
  • Global Bucket
    • A global bucket are for cids that doesn’t have a bucket defined upon upload.

Bucket Components

  • Buckets Creator
    • creates bucket objects
      • Note: this is a bucket table with specs
        • Threshold size
        • Schedule (run)
    • check unassigned cids
    • If there unassigned cids, create new bucket
    • Assign cid to bucket objects
    • Notes:
      • We will need a table to store the bucket information
      • Content will have a new column “bucket-uuid” to indicate what bucket the content is assigned.
  • Bucket Processor
    • check buckets in init status
    • Check if buckets either have cids over size threshold or time is more than 2 days old.
    • If both or one of conditions met, then submit bucket for deal creation
    • Create deals for each content on the bucket. Lookup cid.
    • Set status of each content (6 deals)
    • Set the status of bucket to complete
  • Bucket checker
    • checks existing completed buckets to ensure that al cids have deals or have 6 deals
    • Create deals if one of the cid doesn’t have deals yet.
  • Bucket GC
    • Clean up buckets that are more than 3 months old
    • Clean up cids that are more than 3 months old and still failing

WhyPFS WebUI

This the webui interface for WhyPFS. This will be a dashboard of stats for each node. The stat includes:

  • CIDs
  • Uptime
  • Performance
  • Peers
  • File Manager

WhyPFS - Service

Proxifier - load balancer layer

Authorization - decoupled the authorization that nodes can opt-in.

Deliverables / Definition of Done

  • Code changes (Code/UT)
  • SQL File to add the new table
  • Swagger documentation changes
  • Documentation changes

[Idea/Proposal]Commp Standalone

Problem

EdgeURID currently does aggregation and commp and these 2 functional aspect of the system are resource-heavy processes.

To define why these are resource heavy

  • Aggregation is a functional aspect of Edgeurid that groups small files into large collections. It uses the abstracted feature called buckets to collect the files, aggregate and generate a CAR for all of them.
  • Commp is the process of generating piece information and is the main unit of negotiation for data that users store on the Filecoin network. Generating a commp requires generating proof that can consume significant RAM relative to the size of the CAR file.

When there's too much aggregation and commp being in done parallel, edgeurid demands more resource which, if it doesn't get, will terminate on it's own (OOM).

Stats/ Metrics

  • Running a 64GB RAM Linux OS seems can only accommodate 5 parallel (4GB to 6GB CAR size) Commp
  • *TBA

Solution

My proposal is to separate the COMMP from the aggregator. We can create a COMMP Node which can PULL CAR files from a given edge node, run the piece commitment logic and return the piece information back.

image

Estuary Barge V2

Proposal: Estuary Barge V2

Author Alvin Reyes
Status Draft
Revision

This is a WIP

Proposal/Overview

With the birth of whypfs-core, we should revisit re-building a CLI tool for uploading large sets of data either by creating a new tool or building a new version of Estuary barge.

Solution

Barge has a simple requirement. Allow any cli users to pipe, stream upload data (files, cars and directories) from a dedicated node to Estuary. With whypfs-core, we can use the core node to rebuild the servicing functions (uploads and download) while retaining the same libp2p identity, blockstore and data store - something that bargeV1 fails to retain.

Technical Design

  • re-introduce the folllowing
  • node creation using whypfs-core - strip the node initialisation and use whypfs-core instead
  • Plumb - upload file, car and directories - store the CID on local blockstore and call estuary /pinning/pins passing the CIDs. Use the current local node as “origins” to allow Estuary pull the data from local node.
  • instead of a terminating cli, barge should run as a daemon that runs on the back that can be called by the user from another terminal session to run commands.
  • Using whypfs-core will allow the user to reuse the same libp2p key, blockstore and data store if the daemon gets terminated.
  • Introduce several methods of uploading and downloading data from estuary or any peered IPFS node.
  • Support for retrieval from AR server. This will be inherited from whypfs-core as part of the bootstrap peering.

Breakdown of task / features

Barge

  • Cli daemon
    • Whypfs-core
      • In memory data store
      • Persistent libp2p key
      • Persistent blockstore
    • Whypfs-core peered to estuary api and shuttles
    • Whypfs-core dag service to add files and dirs
  • cli
    • API calls to estuary (pinning pins)

    • Features

      • Add file (stream)
        • File
        • Car
      • Add directory
      • Every time we add a file or dir, we need to call the pinning/pins and pass the CID to it. This way, estuary can create a deal for those CID. The catch here is that the barge should live long enough for estuary to pull the blocks from barge. This is why we are building a CLI daemon for it.
      • Monitor progress
        • Listen to queues or REST API pins (http)
        • listen to topics for new messages OR just check DB via rest
    • Advance features

      • chunk add (upload)
      • Stream upload
      • Stream download
  • Gateway / webui

Deliverables / Definition of Done

  • Code changes (Code/UT)
  • SQL File to add the new table
  • Swagger documentation changes
  • Documentation changes

Documentation - Data Prep / Deal Making

In order for our users to know how to use our tools, we need to define some tutorials for them. This github issue outlines what I think we should have so we can approach all types of target users and give them the guidance they need to start using FDT.

There are two functional aspect that our tools solves, data prep and deal making.

Data Preparation

Data prep guidelines for Users

  • How to prepare data for delta
  • How to prepare data with ptolemy
  • How to prepare data using Edgeur
  • How to prepare data using Edgeurid
  • How to use Car gen tools (chunker, go-car)

Data prep guideline for SP

  • How to prepare data with ptolemy

Deal Making

Deal making for users

  • How to upload data to edgeurid
  • How to upload data to delta

Deal making for SPs

  • How to make deals with delta-dm
  • How to make deals with delta
  • How to make deals with edgeurid

Hybrid

  • How to prepare data with ptolemy and use delta to make deals
  • How to prepare data with ptolemy and use delta-dm to make deals
  • How to prepare data with edgeurid and use delta-dm to make deals
  • How to prepare data with edgeurid and use delta to make deals

Directory API

Proposal: Directory API

Author
Status Draft
Revision

This is a WIP

Proposal/Overview

Estuary currently doesn’t support uploading directories. In order to broaden the scope of our target users, we need to have support for both files and directories.

Solution

A directory in the IPFS sense is a collection of specific type of node (DirNode) that can be associated with different links (ChildNode). The DirNode creates the initial underlying structure of the merkle which then can be used by the developer to programmatically add links to it to form a “directory” structure in a merkle dag format. This merkle dag then is stored on the estuary node blockstore via go-blockstore library.

Assumptions

Directories are similar to ipfs add -r where it’ll take a directory, recursively go thru the directories and add each nodes as either parent (dir) or child (dir or file)

Technical Design

Impacted components - estuary-www and estuary node

Upload directory using js-ipfs / estuary-www - we will need to include the js-ipfs to estuary-www to allow the user to upload the directory via the browser with built-in IPFS. We then use the js-ipfs to create the DIR/Child structure from estuary.

Upload directory using endpoint - introduce a new uploadDir endpoint that accepts a raw json file with the CID or metadata generated from the frontend

  • We can either create the CID for each file and directory
  • We can also base64 the files and pass them as is on the uploadDir endpoint (limitation: size)

Endpoints

  • /directory
  • /directory/file
  • /directory/upload - pass a json object with name and base64encoded string of each file. Note that a base64encoding has a limit of 192mb only. Which is more than enough for most websites.

Testing

Deliverables / Definition of Done

  • Code changes (Code/UT)
  • SQL File to add the new table
  • Swagger documentation changes
  • Documentation changes

Idea/Proposal: EDGE-URID

Edge-urid

image

Functional / features

  • Content creator
  • File aggregator
    • Aggregates small files (> BUCKET_AGG_SIZE) to bucket
    • The last file to be added can exceed the BUCKET_AGG_SIZE
    • Creates a CAR file for the bucket
  • File splitter
    • Splits a large file into (SPLIT_SIZE configuration)
    • Applies to files that are larger than the BUCKET_AGG_SIZE.
    • The large file will be associated to a new bucket with each split being added as a content
    • Creates a CAR file for the bucket
  • Bucket-creator
    • If the content has a miner, a bucket will be created for the miner and for the content. This bucket will need to be filled based on BUCKET_AGG_SIZE.
  • Deal-checker
    • If bucket is already replicated more than (MAX_REP) times, it can be deleted by the ADMIN
  • Gateway for serving content

System objects

  • Content
  • Content_Deals
  • Buckets

Endpoints

To serve file

  • gw/

To add a file

  • add
    • File, bucket_uuid (optional), miner (optional)
  • gw
    • File, bucket_uuid (optional), miner (optional)
  • fetch-from-url
    • File, bucket_uuid (optional), miner (optional)

To get buckets (pull model from SP)

  • get-available-open-buckets
    • Return list of CAR files and COMMP and deal request metadata
  • Get-open-bucket
    • Return the CAR file and COMMP and deal request metadata
  • get-private-bucket (private)
    • Return list of CAR files and COMMP of each for a specific bucket

To manage buckets

  • create-bucket (admin or anyone)
    • create a bucket with specific meta/keys
    • bucket_uuid
    • miner
  • delete-buckets

To get status

  • status/content
  • status/bucket
  • status/cid

To get stats

  • stats
    • Number of content
    • Number of buckets
    • Number of deals attempted
    • Number of deals made

FEVM/FVM Development Docs and Tools Platform

Proposal: FEVM/FVM Development Docs and Tools Platform

Contributors @alvin-reyes
Status Draft
Revision  

The idea is to create a full e2e platform for educating developers on using FEVM with a pre-defined framework of tools.

  • Build different contract creation tools and SDKs that wraps the creation and management of smart contracts. This includes incorporating estuary and filecoin into each of the SDKs.
  • Build examples using the SDKs built from bullet 1.
  • Create tutorials, how-to guides and quick scaffolding examples of creating contracts.
  • Provide examples to get users and potential contributors to engage with us to build more content based on the tools we built.
  • Create a academic driven bootcamp using the tools we built.

Details or the design to follow

Estuary on FEVM (estuary.sol)

Idea/Proposal: Estuary on FEVM

Contributors @alvin-reyes  
Status Draft
Revision  

Proposal/Overview

Create a estuary.sol that wraps some functions of estuary. This idea might not be the best way but still worth trying.

The design is to allow solidity developers to pin cids and record their request on chain via a standard estuary abstract contract and a estuaryfevm specific js to launch external compute service that will run the pinning / estuary process.

Components:

  • estuary.sol
  • estuaryfevm.js
  • wasm whypfs-code (node on browser)

The clear assumption here is that we cannot pass a file to solidity but we can pass a CID that we pinned from an WHYPFS node (cid).

HL steps of implementation:

  1. we create a compute (docker image) to process requests. This can be a docker image or a process that can accept args and it has a whypfs-node on it.
  2. estuary.sol will have functions like pinToEstuary, makeDealsForCid etc but it'll only use the IPFS hash of the compute docker image to pass the CID.
  3. when a client calls a contract function (pinToEstuary(cid)), it calls the IPFS hash of the compute docker image, passes the CID to the compute docker image and perform the "centralized service" to pin the CID. The IPFS instance can be a WASM compiled component that can run on a browser (on the DAPP).
function pinToEstuary(string memory cid) estuaryFevmRequest // make our own interface for this {
     //baseURI of the docker image
     return baseURI + "?cidToPin=" + cid;
}

We will have to force the user to use estuaryfevm.js. the JS file checks the ABI to see the tagged function estuaryFevmRequest

  1. when the user calls the contract using fevmestuary.js from a web app, it needs an whypfs instance on the browser. The compute docker image does the pinning and communication with Estuary but the transaction itself is persisted on FEVM.

Collections V2

Proposal: Collections API V2

Author Outercore Engineering
Status In-Progress
Revision

This is WIP

Proposal/Overview

The collections API is seemingly used as a directory upload rather than a grouping mechanism and I think we need to create a distinction between the two use cases. A directory in the IPFS sense is a collection of specific type of node (DirNode) that can be associated with different links (ChildNode). The DirNode creates the initial underlying structure of the merkle which then can be used by the developer to programmatically add links to it to form a “directory” structure in a merkle dag format. This merkle dag then is stored on the estuary node blockstore via go-blockstore library.

To this day, we observed that most users who uses collections api treat them as directories and use them as such. This is clearly not the case since it won’t perform the same as a directory. primarily because the current version lacks all the endpoints to manage a collection “as a” directory.

Solution

We need to have a distinction between collections API and directories and my proposal is to re-vamp the collections API as a tagging mechanism as oppose to a directory system. We should create a new API for directories which will be similar to ipfs add -r command and use DirNode for directories.

For collections, we should just group them on the database level (postgres) and make sure they are retrievable via an endpoint. The creation of a new merkledag for this approach is completely optional since the source of “grouping” will be from the database.

Assumptions

  • Collections API will only live in the context of Estuary. Source of truth of the grouping is on postgres database.
  • The Collections API will not be propagated to any other storage provider services similar to Estuary since it’ll be a unique feature only to Estuary.

Technical Design

Descriptive

  • create a tag
    • insert database entry to collections table with type = tag, tag column = tag name
  • add a content to a tag
    • get tag name
    • query database with tag name
      • validate if exist. if it doesn’t exist. create a new tag with the name
    • insert database entry on collections table with type = child and tag column = tagname
  • add list to a tag (cid or content id)
    • get tag name
    • query database with tag name
      • validate if exist. if it doesn’t exist. create a new tag with the name
    • loop through the list
      • For each child
        • insert database entry on collections table with type = child and tag column = tagname

Endpoints

  • /collections/tag
  • /collections/tag/:tagname/:content - content
  • /collections/tag/:tagname/contents - list of content
  • /collections/tag/:tagname/cid - list of content
  • /collections/untag/:tagname/:content - content
  • /collections/untag/:tagname/contents - list of content
  • /collections/untag/:tagname/cid - list of content
  • /collections/commit/:tag - tag
  • /collections/download/:tag - tag
  • /collections/search/tag/:tag - search key words
  • /collections/search/ - search keywords

Testing

  • User scenarios
    • As a user I want to manage a tag
      • Create
      • Delete
      • Update
      • Rename
    • As a user, I want to add content
      • add a single content using estuary content id
      • add a list of content using estuary content id
      • add a single content using CID
      • add a list of content using collection of CID
    • As a user, I want to unpin content
      • unpin already added content
    • As a user, I want to commit a tag
    • As a user I want to search
      • Search by tag name
      • Search by content id
      • Search by CID
    • As a user I want to download
      • Download by tag name
      • Download by content id
      • Download by CID

Open Items

  • All transactions will include a database interaction and blockstore
  • We can introduce an uncommit with a given CID.

Deliverables / Definition of Done

  • Code changes (Code/UT)
  • SQL File to add the new table
  • Swagger documentation changes
  • Documentation changes

Collections API Call (mike, gabe, lawrence, alvin)

  • we need directories first
  • add file manager (create folder structure on backend)
    • user can CRUD folders
    • maybe symlinks (good to have)
  • database mocks what user sees as a directory
  • background worker to create ipfs
  • tagging: use ipfs metadata tag

Estuary Stability

Estuary Stability

Author Alvin Reyes
Status In-progress
Revision

Overview

These are the things we need to accomplish to get Estuary to the alpha and post alpha stage. I’d like to look at each as Pillars with each being built and should perfectly levelled to stablize the Estuary platform.

This is all Tech. No productization / product lifecycle steps here.

Github Project: https://github.com/orgs/application-research/projects/7/views/5

Priority Improvements Issues Conversations / Discussions
1 System Errors All systems error that estuary encounters
2 Infrastructure All the action items we need to do to stablize the infrastructure along with the code changes
3 Data Clean up Any stale data we need to remove or clean up
4 Debugging All the action items we need to do to debug or provide us more information on how to debug.
5 Functional All functional / design / code that needs to be optimized and improved
6 Support All the action items I think we need to do ensure we have the proper customer support

System Errors (Panics)

All on it’s own page. We need to handle all the panics.

Log file:

log_file_from_shuttle6

msg":"couldnt decode pid

pinning queue error: context canceled\nfailed to walk DAG\nmain.

failed to handle rpc command: Unable to send restart request: exhausted 5 attempts but failed to open stream to

pinning queue error: context deadline exceeded\nfallback provide failed\nmain

tried to add pin for content we failed to pin previously

failed to handle rpc command: failed to compute commP

failed to handle rpc command

Infrastructure

  • Grafana agent on ansible so we can source out the storage of the logs to grafana. We save up some space if we do so. We do have this on shuttles, but not enabled properly. https://filecoinproject.slack.com/archives/C016APFREQK/p1665703251824289
  • Install / Enable agents on all shuttles - enabling these agents will ensure that logs are stored on grafana only.
  • Document the release and deployment (https://www.notion.so/Estuary-Infrastructure-40ddc4cd518d478a81b76f5c0df1a276)
  • Troubleshooting guide for the infrastructure - I’ll be adding more information on this.
  • Back up and restore (enable data and blockstore backups) - I’d like to work with infrastructure point of content for this.
  • Infra improvement: Dockerize all components
  • Infra improvement: Create a simple kube cluster for POC
  • E2E Test Env: estuary + lotus + boost

Data Clean up

https://filecoinproject.slack.com/archives/C016APFREQK/p1660258369066179

  • Write an SQL script to remove the majority of the non-active pins (14m plus records) on shuttle-4, e.i Delete all non-active pins.
  • The negative impact of removing these records is that if some of the pins are on the blockstore then anyone who uses the /gw will fail to look up the CID since this gateway relies on the database record.
  • There might be some failed pins that are yet to be processed by the SP so lost of opportunity there.
  • Another solution is to create a clean up script on shuttle-4 to traverse thru the blockstore using the CIDs from the pins table, identify those that the shuttle can't "walk" - meaning it's in the database but not on the blockstore (using merkledag.Walk), and delete them on the database. It will be like a "estuary shuttle reconciler" tool to match the blockstore CID with the pins table.
  • Write scripts that can perform backups on specific filters.
  • Write SQL script to delete the CIDs that doesn’t exist on the blockstore of the local node.

Debugging

  • Enable developers that they have the proper debugging tools (GoLand).
  • Set up dedicated shuttles for each developer (for dev testing)
  • Enable pprof on all shuttles and api node
  • Enable grafana agents

Functional

  • Revisit the pinning mechanism
  • Revisit the queueing mechanism
    • I’d like to explore the possibility of separating the queuing from the main api node. We had discussions on this before and I would like to revisit.
  • Revisit all the infinite for loops and check if we need to create intervals or optimize them.

Untitled

Functional Improvements

Proposal: Collections API V2

Proposal: Directory API

Proposal: API Versioning for Estuary

Proposal: Proxy-Forwarder

Proposal: API Gateway

Support

  • Customer Support Ticket System
  • IsEstuaryDown.com public monitoring tool

Refactor / Rearchitecture

  • Refactor code to its appropriate packages
  • Redesign

Estuary performance testing

Proposal: Estuary performance testing

Author Anjor
Status Draft
Revision

This is a WIP

Proposal/Overview

We should have metrics on estuary's data onboarding performance. We should be able to answer questions such as

  • What is the data throughput? How does it scale with increasing data size? Is there a sweet spot?
  • What is the maximum size estuary can handle?

The current plan is to set up datasets in increasing sizes ranging from 1GB up to 1TB and measure data onboarding performance.

Technical Design

The performance testing will be carried out on an equinix box. We will download public datasets ranging in sizes from 1GB up to 1TB and try uploading them to estuary.

Known problems

Files larger than 32GB might have issues. Once the endpoint is unable to handle the upload, we will attempt using different preparation tools such as barge and singularity.

Idea/Proposal: DataDAO

Proposal: DataDAO

Author Gabriel Cruz
Status Draft
Revision 0.0.1

Proposal/Overview

DataDAO is an organization that curates data stored in Filecoin, allowing its members to vote on the CIDs that should be part of the collection of curated data.

Obs: heavily inspired in idea and code snippets from https://aayushguptaji.hashnode.dev/how-to-build-your-first-datadao-factory-on-fvm

Why this is important

There has been increased demand for useful data on the Filecoin network. Allowing peers to vote on CIDs that contain this type of data incentivizes more quality of data in Filecoin.

Non-goals

We are not trying to ensure that the accepted CIDs actually have "useful" data (whatever the definition of "useful" is). We only allow for voting. It is up to the members of the organization to revise the contents of the proposals and vote wisely.

Design Overview

  1. Storage Provider (SP) creates a proposal to add a CID to DataDAO.
  2. DataDAO members vote on the proposal until it expires.
  3. Once expired, if upvotes > downvotes, the CID is added to the DataDAO.

Detailed Design

Another level of detail beyond the design overview, if needed.

Creating Proposal

SP will create a proposal using the following function

    function createCIDProposal(bytes calldata cidraw, uint size) public {
        proposalCount++;
        Proposal memory proposal = Proposal(proposalCount, msg.sender, cidraw, size, 0, 0, block.timestamp, block.timestamp + 1 hours);
        proposals[proposalCount] = proposal;
        cidSet[cidraw] = true;
        cidSizes[cidraw] = size;
    }

Voting on a Proposal

DataDAO members upvote or downvote the Proposal

    function voteCIDProposal(uint256 proposalID, bool upvote) public {
        require(proposals[proposalID].storageProvider != msg.sender, "Storage Provider cannot vote his own proposal");
        require(!hasVotedForProposal[msg.sender][proposalID], "Already Voted");
        require(!votingIsExpired(proposalID), "Voting Period Finished");

        if (upvote == true) {
            proposals[proposalID].upVoteCount = proposals[proposalID].upVoteCount + 1;
        } else {
            proposals[proposalID].downVoteCount = proposals[proposalID].downVoteCount + 1;
        }

        hasVotedForProposal[msg.sender][proposalID] = true;
    }

Add voted CID to DataDAO

TODO

Miscellaneous

Data and auxiliary variables

contract DataDAO {
    uint64 constant public AUTHORIZE_MESSAGE_METHOD_NUM = 2643134072; 
    // number of proposals currently in DAO
    uint256 public proposalCount;
    // mapping to check whether the cid is set for voting 
    mapping(bytes => bool) public cidSet;
    // storing the size of the cid
    mapping(bytes => uint) public cidSizes;

    mapping(bytes => mapping(bytes => bool)) public cidProviders;

    // address of the owner of DataDAO
    address public immutable owner;

    struct Proposal {
        uint256 proposalID;
        address storageProvider;
        bytes cidraw;
        uint size;
        uint256 upVoteCount;
        uint256 downVoteCount;
        uint256 proposedAt;
        uint256 proposalExpireAt;
    }

    // mapping to keep track of proposals
     mapping(uint256 => Proposal) public proposals;

    // mapping array to track whether the user has voted for the proposal
    mapping(address => mapping(uint256 => bool)) public hasVotedForProposal;

/**
 * @dev constructor: to set the owner address
 */
constructor(address _owner) {
     require(_owner != address(0), "invalid owner!");
     owner = _owner;
}

Check if voting time has expired

    function votingIsExpired(uint256 proposalID) view public returns(bool) {
       return proposals[proposalID].proposalExpireAt <= block.timestamp;
    }

application-research/estuary#880

Idea/Proposal: Tekton Data Pipeline Framework to Onboard Data

Overview

Once we have K8s installed on our EHI, we need to start looking into Data Onboarding Tools.

I propose the use of Tekton Data Pipeline (https://github.com/tektoncd/pipeline). This is essentially a task framework that leverages k8s service infrastructure to create ephemeral task runners in the form of pods/containers.

How it'll work.

image

*Queue is optional.

  • We'll set up docker compose / dockerfile for ptolemy, delta and the downloader script i.e containerize them
  • Set up a Tekton task for each.
  • The downloader then assigns a batch to process for each ptolemy and delta.

Estuary Reputation System

Proposal: Estuary Reputation System

Author
Status Draft
Revision

This is a WIP

Problem Statement

Estuary currently selects SPs at random when making deals. We should build a reputation system that ranks/directs deals towards SPs that perform in a way that is advantageous for our network. We will use this issue to discuss the inputs/calculations for such a reputation system.

Currently, the most important metric we should be concerned with is **retrieval performance **

Estuary currently does not provide any incentives for Storage Providers to serve up CIDs that we deal to them. This is problematic, as autoretrieve relies on SP's serving up content to work properly. Without retrievals working, it is risky to offload content from our shuttles as it may result in unretrievable files.

Proposed Solution

  • Autoretrieve knows the count of successful/failed retrievals per SP, and we can track this data
  • Using these stats, we can come up with a retrieval-based reputation score and use it to influence how we make deals (@gmelodie has kicked us off below)

Data Storage Markets

Data Storage Markets API

Repo Link

Project Description

Storage Markets is an application that tracks storage providers in the Filecoin network and provides an interface to query their statistics. These statistics are useful when making deals with storage providers.

In addition to storage statistics, the Storage Markets system will be extended to track retrieval success/failure metrics via autoretrieve, so that a Storage Provider's retrieval performance can be assessed.

Reputation

Storage Markets will provide a lightweight reputation system where SP's are assigned a score, based on their storage and retrieval performance.

Estuary FEVM Oracle Library and Oracle Execution Provider

Proposal: Estuary FEVM Oracle Library and Execution Provider

Contributors @alvin-reyes, @kelindi  
Status Draft
Revision  

I had a discussion with @kelindi on potentially creating an Oracle Service that will use Estuary as the execution provider.

Proposal

The idea is to create an Oracle library in Solidity that will run URL or Service request to Estuary.

Components:

  • Queues for individual jobs using nsq
  • Job node component for execution providers
  • Oracle.sol - generic importable contract for smart contracts to access different execution providers
  • EstuaryProvider.sol - estuary specific provider contracr
  • Oracle samples

Details or the design to follow

Idea/Proposal: Permissions for Api Keys and Pre signed upload urls.

Idea/Proposal: Build permissions for Api Keys and Pre signed upload urls.

Contributors @kelindi 
Status Draft
Revision  

Proposal

NOTE: This is a draft and is not finalize yet. We'll have to polish it until we all agreed on the approach.

Only give api keys the necessary permissions.

  • Read
  • Write
  • Read/Write
  • User defined limits for certain actions (ex: "An Api key limited to only upload one file")
    • This could be used to implement a temporary pre signed url/api-key for serverless uploads
    • User requests to upload a file a file from the frontend -> frontend receives a temporary api key from Estuary -> Frontend uses the temporary api key to directly upload the file to Estuary

FVM HeatCheck

Scratching some notes will close later...

General Issues

@snissn - filecoin-project/lotus#9839
@snissn @alvin-reyes - filecoin-project/lotus#8865 (6 months stale), collaboration with @geoff-vball (thanks)

Add the multisig approve and proposal commands plus an feature to automatically encode the json string passed as parameters .

lingering questions

what are the implications of not having/storing this index by default? Does this mean that solidity/EVM developers would need to either run, or find, a node that supports proper ethereum transaction hashes? If a developer deploys a contract to a node that doesn't have this index, do they still get a proper eth transaction back? If so, can they not poll against the receipt on the same node? What about block explorers?

resolution PR: filecoin-project/lotus#9965

Specs

@jlogelin A trustless notary - application-research/estuary#877
@jlogelin Hot Storage Protocol - application-research/estuary#878
@jlogelin Automated Market Maker - application-research/estuary#879
@jlogelin DataDAO - application-research/estuary#880
@jlogelin @jcace @elijaharita - Data Persistence application-research/estuary#881

Solutioning

@alvin-reyes - ERC 721 Contracts - https://github.com/application-research/fevm-nft-estuary

Edge nodes for Upload and Retrieval

Proposal: Edge Upload and Retrieve

Contributors @alvin-reyes
Status Draft
Revision  

The idea is to create node instance that does the following:

  • node to accept uploads and queue them to estuary for pinning
  • node to retrieve CIDs and serve it as a gateway

This will allow us to redirect uploads to different servers instead of putting them directly on the shuttles.

Development HL guide:

With whypfs-core, we can build several microservices that we see fit to scale estuary.

  • We can use whypfs-core to create the node and introduce upload endpoints and pass the CIDs to the estuary api node. Estuary will take care of pulling the CID from the upload nodes.
  • We can use whypfs-core and create a gateway in top of to serve the CIDs from estuary or any peers. This is similar to whypfs-gateway.
  • These nodes needs to be peered for discoverability.
  • This is a single node with "mode" parameter. Either upload and retrieve, upload only or retrieve only.

image

  • will add more details later.

WHYPFS gateway cluster + Distributed Filesystem

Idea/Proposal: Estuary Gateway cluster based on WHYPFS + Distributed filesystem

Contributors @snissn  
Status Draft
Revision  

Proposal

NOTE: This is a draft and is not finalize yet. We'll have to polish it until we all agreed on the approach.

NOTE: This is draft draft is based on the proposal by @alvin-reyes here and is built off its formatting!

WHYPFS with Fuse+SeaweedFS to expose a distributed filesystem will allow for a very safe and scalable architecture for file storage, file ipfs pinning and file serving via http.

  • We want to have users data be highly available and resilient against hardware failures or even geographic or data center failures.

  • Moreover, it is a disadvantage and confusing UX for a user to have to declare and dedicate themselves to a particular gateway.

  • whypfs + Distributed File System will allow us to create a distributed pinning system with resilience from data loss due to individual drive failures, individual server failures and also entire datacenter outages.

  • example: We have two data centers + 5 nodes per data center. Files uploaded to seaweedfs can be replicated 2 times per data center. So we can have 10 servers across 2 data centers with 4 copies of each piece of data to guard against hardware failure. We will be able to add nodes to scale and have each node serve as a pinning gateway.

  • SeaweedFS can be set up as a mount point on each node. This node would have whypfs-gateway installed on it and use flatfs with the distributed filesystem mount point as its flat filesystem data store.

  • the put / upload and delete APIs would be protected via a secure password that only the api node would have. but get and gw api endpoints would be fully public.

  • the api node can be moderately changed to rely on a highly available and very fast whypfs+seaweedfs cluster without the end user knowing anything about provisioning a gateway!

  • over time data can be deleted from the whypfs cluster with filecoin as a long term storage backup!

Detailed plan:

SeaWhypfs

Step 0 edit this master plan document

Step 1
Investigate how much disk is used for the ipfs pin cluster usage in production.

Step 2. Spec out how many servers we would want to have in production given the amount of disk we need to store, having room to grow before needing to add more nodes, and redundancy we want in our data set. Identify what we will need ie we will need 2x data centers and 5x servers for 10 servers total with each server having xyz terabytes of disk with raidx redundancy

Step 3 Code deploy scripts
Code up ansible etc required software for deploying and managing a seaweedfs + whypfs cluster.

Step 4. Set up test cluster
Using ansible make a three node cluster with 2x disk replication in seaweedfs and put whypfs gateway on each of the seaweedfs nodes and load the nodes with at least 1tb using whypfs API endpoints and verify whypfs put and get works

Step 5. Set up full scale cluster
Deploy large scale cluster for production

Step 6. Back fill.
Clone estuary’s pin data into seawhypfs

Step 7.
Change estuary code base to take advantage of data lake. Change add pin in estuary to push to the data lake and use the gw dns for the cluster for reads. Make sure deal making API uses the new url also if it needs it.

EstuaryPinningContract on FEVM

Idea/Proposal: EstuaryPinningContract on FEVM

Contributors @alvin-reyes  
Status Draft
Revision  

Proposal/Overview

Create an EstuaryPinningContract to allow users to request CIDs to be processed by Estuary.

image

Projects/Profiles for each user

Idea/Proposal: Projects/Profiles for each user

Contributors @kelindi 
Status Draft
Revision  

Proposal

NOTE: This is a draft and is not finalize yet. We'll have to polish it until we all agreed on the approach.

Create a projects/profiles paradigm for the Estuary dashboard, letting users create projects, and have separate dashboards and api keys for each project.

Proposed Implementation

Have a Primary user and a secondary user created for each project.

Feedback: Outercore's request for design feedback on: stFIL, SFT, Filet, and Glif

Who is reviewing...

Hi 👋 I go by Cake in the ecosystem-dev channel in the Filecoin Slack.

I am an engineer, designer, and I help manage a fund. Before joining Protocol Labs (where I have been for ~3.5 years), I was either a very early technical contributor or founding team member for 7 startups.

A few years back I designed and launched https://slate.host for consumers on Filecoin.

I obviously care a lot about Filecoin and financial services! Some of my close design peers have designed at Paradigm, Square Cash, Coinbase, DyDx, and Fey and I've had many great design conversations with them. So I'm happy to lend a critical eye on the user flows for the following websites:

In addition, I will try not to focus on the visual design quality of your sites unless it feels buggy. Instead, I have listed some high level guidance:

  • - should be obvious in 2023 to be using a grid system
  • - should be obvious to keep all measurements in a pt system
  • - you should measure first paint speed
  • - get a lighthouse score over 95
  • - solve FOUT issues
  • - all transitions should be at 60FPS, learn how to kick the compositor early on in the browser's page load before the user event occurs.
  • - font hierarchies are good, see this link: https://www.editorx.com/shaping-design/article/font-size
  • - check for typos
  • - don't leak user information in the client such as in user information that comes from a database
  • - optimize your SEO
  • - make sure that if the user has an ad blocker, your scripts are non-blocking, meaning the exception thats thrown doesn't cause the rest of the client-side JavaScript to crash.
  • - everything should be mobile responsive, design your components to be fluid from the start so you don't have to make a mobile version for everything.
  • - make sure things are vertically aligned.
  • - make sure you're using assets that are 2x-4x their normal size for retina screens. No one likes logos with bad antialias

So lets move on to the feedback, here are my notes from spending 10 minutes on each staking site.

stFIL

  • There is no way to give you my FIL easily from the marketing page.
    • Other sites have a big deposit CTA, why doesn't stFIL have one? Did I miss the button to connect my wallet through WalletConnect/RainbowKit/MetaMask/etc?
  • The loading interstitial at the beginning is unnecessary.
  • The scroll lock performs poorly, a couple of scrolls up and down the page are frustrating already.
  • The fancy background is cutting off in a strange place near some text. It makes the page feel glitchy, here is a screenshot:

Screenshot 2023-04-23 at 10 51 43 PM

  • Saying you are "Community First" without showing a single member of the community or anything they have said makes me feel like you have no community.
    • Maybe show some testimonials?
  • The security section should just list the audits and who they are by on the marketing page without having to click.
  • Whenever I click into a section, If I hit back, I have to see the loading interstitial again, I really do not like this loading interstitial.
  • Transition effect over the cards adds nothing to the importance of clicking the CTA.
  • Every page load I have to see the interstitial, I recommend you get rid of it.
  • Clicking on the Chinese translation doesn't do anything. Am I missing something?
  • Would benefit a lot to have some regulatory guidance
  • Would benefit a lot to have some tax tips for users in the US (or anywhere with more strict tax laws).
  • Clicking the protocol link is just a way to get the loading interstitial to appear again. That isn't fun.
  • The documentation site (https://docs.stfil.io/) has far more functional purpose than the marketing site (stfil.io), I feel like if you replaced your marketing site with the documentation site and added the join mailing list to your documentation site, you wouldn't need your marketing site until you had more to offer people visiting your marketing site.

Overall: Seems like a site for people who know what they're supposed to do here, otherwise I have no idea what to actually do here. I'll do something which is called "drop off", where I don't really see the point of being on this site. The "Subscribe to our mailing list" is the best part of your site, its the only portion that converts the user in a meaningful way (collects e-mail).

Would stFIL convert me? No.

Filet

  • IMPORTANT: DAPP should say "Stake Filecoin" instead, no one knows what a button that says "DAPP" is supposed to do.
    • Improving the CTA will improve the conversion on this site.
  • Ledger and Metamask should be options for staking. A lot of western users probably do not have the wallets mentioned. In addition, everyone in the west is used to seeing WalletConnect or RainbowKit https://www.rainbowkit.com/. Both of those wallet integrations have good experiences.

Screenshot 2023-04-23 at 11 10 57 PM

  • "A trustworthy platform providing staking service stably" should say "We are providing a trustworthy Filecoin staking service"
    • Bonus points: Just show who trusts you (which companies, notable users, etc), it would be a better signal than the three boxes you have. People like to see people they trust using your service.
  • The StakeFIL to earn FIL section could easily go up higher on the page, this is what people actually care about.
  • The Media Partners section carousel is broken. Before you fix the carousel, consider not having one. Instead you could just render all of the media partners out into a grid. It will look better and the user doesn't have to click or tap to see more of the media partners.
    • Side note: The content inside of these posts is good! You should surface more of this content on the marketing page, it gives Filet more credibility.
  • The FAQ section "How does Filet Work?" has a lot of great content. Some of this content would do better as hooks on the marketing page so you can convert the user sooner.
  • Tell people what TVL means.
  • Would benefit a lot to have some regulatory guidance
  • Would benefit a lot to have some tax tips for users in the US (or anywhere with more strict tax laws).

Overall: The page loads fast and helps add to the feeling of professionalism. The website could use some copy improvements and more meaningful copy, I think if Ledger and Metamask are supported wallets, using Filet could become a popular option. Definitely need to fix the CTA so people understand what to click to stake.

Would Filet convert me? No.

Glif

Screenshot 2023-04-23 at 11 20 18 PM

Screenshot 2023-04-23 at 11 20 30 PM

  • Connecting a wallet looks easy.
  • I wouldn't open with "Genesis Pool" is now "Infinity Pool", I would open with marketing explaining what this all is sooner, and then explain the name change later.
  • "Trusted by the heart and soul of the Filecoin community" can be better served with a real example of how the community trusts GLIF.
    • Example: "Join the ten thousand users using GLIF tools today" something like this, with examples.
  • Page loads quick, the experience is straight to the point.
  • Would benefit a lot with some audit information
  • Would benefit a lot to have some regulatory guidance
  • Would benefit a lot to have some tax tips for users in the US (or anywhere with more strict tax laws).
  • More numbers explaining the potential returns (like Filet) could help.
  • Love the big deposit CTA.

Overall: Fast user conversion experience sets Glif's website apart from the rest of the staking sites I have seen so far. Seems like the goal is to get your FIL and they focused the website experience on getting your FIL. True to intent.

Would GLIF convert me? Yes.

SFT Protocol

  • The hero copy is worded awkwardly.

Liquid staking derivatives for Filecoin while ecological infrastructure provider

  • It should say something like

Ecologically friendly staking derivatives for Filecoin

  • I like the design, clear CTA (call to action), if I fail to convert, I'm taken (with proper visual hierarchy) to the next section where I can learn more.
  • "Latest News" SEO images look terrible and take away from the clean design of the site, I would just use custom ones.

Screenshot 2023-04-23 at 11 29 54 PM

  • You might want to bump the font weight on the primary CTA text, might make it a little more enticing to click.
  • On the Mint section, make sure you remind the user to connect their wallet, it doesn't hurt to show the connect button in the Mint section in case they don't see it in the top navigation.

Overall: Another website with a faster user conversion experience (like Glif). The site could use less vertical space/height and make sure they don't lose users because they missed content below the fold. The website interface also needs more testing in different viewports, some components that should obviously resize when the screen resizes are static.

Would SFT Protocol convert me? Yes.

Idea/Proposal: EV1 to use EDGE-URIDs

Proposal

I wanted to propose that we completely revamp EstuaryV1 frontend to use EdgeURID and the upcoming deal status oracle service.

Problem

EV1 frontend is the best frontend that the #ecosystem when uploading files to the filecoin network. It has all the features a user needs to upload files to the filecoin network in seamlessly. Unfortunately it's tightly coupled with estuary node - which can't handle the usage/demand. This is why we opted to decouple estuary in microservice which is now EV2, Delta and Edge.

The lack of scalability option of estuary node made it difficult to make deals. To this day, a chunk of upload request from Estuary frontend is not on the filecoin network, only on the estuary node which serves the content as a hot storage. This impacts the reliability of service and lower number of deals made thru the app.

Solution

I propose we decouple the Estuary frontend from the it's backend and use EdgeURID instead.

This means:

  • every content upload goes to one of the edgeurids available.
  • zone/staging will be represented by a edgeurid bucket.
  • deal status will be available via deal status oracle service.

We will need to do this in phase with the priority of onboarding data to the filecoin network
Phase 1:

  • change the upload to use edgeurid.
  • display the bucket information where the content is located.
  • hide the deals page for now until we have the deal status oracle available.

End result of Phase 1:

  • content should be shown on the page and is included on the aggregation bucket on a specific edgeurid
  • we will no longer use shuttles once we redirected all upload / gw to edgeurid
  • metrics should show the totals based on edgeurid uploads.

Phase 2:
TBD

Idea/Proposal: Perpetual Storage Contracts

Proposal: Perpetual Storage Contracts

Author @gmelodie @elijaharita @jcace
Status Draft
Revision 0.0.1

Proposal/Overview

This document outlines a potential scheme for perpetual Filecoin storage contracts on the Filecoin Virtual Machine (FVM).

Background

Currently, Filecoin deals are limited in length to 540 days. While there is discussion about increasing this up to 5 years, there still remains a situation where a Storage Client would like to store data for a much longer term, potentially several times the length of a single storage deal.

Benefits

A reference implementation / example for perpetual, auto-renewing storage deals would be a useful building block for others building on Filecoin and FVM

Goals

  • Outline, at a high level, how a perpetual storage contract on FVM could work
  • Call out certain areas of complexity / considerations that must be addressed for it to function
  • Link back to relevant code snippets that would be used for the contract

Design Overview

Use Lotus web3 client contract to make deals with storage provider

Construct DealProposal

Client Inputs

  • CID*
  • number of replicas
  • Initial balance
  • End epoch
  • Max. price
  • Fil+/Datacap

*Every parameter is configurable once the contract is deployed, except for the CID.

Detailed Design

Smart Contract Functions

Client-Side
  • Change Bounty
  • Change # of replicas
  • Suspend/cancel (stop renewals, refund balance)
SP-Side
  • Claim deal
  • Publish deal
  • Terminate deal

Functionality

Initial Replica

  1. Client deploys the contract using the initial params
  2. Client must "seed" the file, keeping it available and downloadable until first replicas have been sealed
  3. SPs call claimDeal() function, indicating they want to seal and store it, and receive the download information
  4. SP downloads the CID from the "seed" location
  5. SP seals the CID into a sector
  6. SP calls the publishDeal() function, indicating they have successfully on-boarded the data
  7. Contract tracks the expiry epoch of the deal, opens up another replica slot before it expires.

Once all replicas have been claimed, the claimDeal() function simply returns the next expected epoch when the soonest one will expire.

Subsequent Replica
After a deal expires, a slot is opened up. claimDeal() returns a list of all other SPs that the CID has been replicated to, for retrieval and the next deal.
SPs can call claimDeal() and the flow is the same as detailed in Initial Replica

Dependencies

  • Deal needs to have another SP specified as the source location for file transfer

Performance Implications

  • Gas costs for various transactions

Questions

  • How do providers find the address / methods of the deployed smart contracts with deals available?
  • What happens if the storage provider fails to seal after claiming the deal?
  • What happens if a deal gets slashed/lost? How do we ensure the contract is synchronized with the actual state of deals on chain?

Assumptions / Considerations

  • Once perpetual deal is kicked off, file has to be retrievable. SP has to serve retrievals in the future.
  • Smart Contract "owns" the storage deals, pay for them using their internal wallet and/or datacap
  • There needs to be enough of a window between one deal ending at the next one starting to allow for a new SP to claim it. As it gets closer to the deadline, smart contract could increase the bounty to incentivize providers to pick it up

Mutable Naming for CIDS

Idea/Proposal: Mutable Naming for CIDS

Contributors @kelindi 
Status Draft
Revision  

Proposal

NOTE: This is a draft and is not finalize yet. We'll have to polish it until we all agreed on the approach.

Have mutable references to CID's. This could be through IPNS or an uuid stored on a db etc.
This feature would allow us to build out the following quality of life features.

  • A stable reference to a mutable CID allowing users to host a static site, and easily make changes without having to update DNS
  • Shorter gateway urls for sharing content (ex: estuary.tech/X67HG) (this is assuming we create a random uuid for the CID)

Proposed Implementation

I have tinkered with IPNS this quarter and here are the following issue I came across.

  • Each record needs to have a public and private key. If we are managing this on behalf of the user then it's simpler to store an uuid associated with the CID
  • If a user is to manage the public and private key, they need to create a new account on their wallet every time they want a mutable CID

This is why my personal preference is to create a new table in the database that stores a uuid with a reference to a CID that can be changed by the user. I'm interested in hearing other suggestions on how we can implement this feature wether it be through IPNS or my proposed implementation.

Estuary Metrics

Metrics Tracking and Metrics API

Author Alvin Reyes
Status Completed
Revision
Github Repo https://github.com/application-research/estuary-metrics
Grafana https://protocollabs.grafana.net/d/0-0ztE97z/estuary-team-metrics-dashboard?orgId=1&from=now%2Ffy&to=now%2Ffy
Github Issue: application-research/estuary#283

Overview

The purpose of this document is to create a specification of the Estuary Metrics API.

Purpose

In order for any consumers to monitor estuary, be it their own node or the Outercore hosted estuary, there needs to be a way to monitor and consume different functional metrics that estuary provides.

As of today, there’s only one way to monitor metrics for estuary

1 - thru Grafana

2 - thru public/stats endpoint

Solution: Grafana

Started working on this: https://protocollabs.grafana.net/d/0-0ztE97z/estuary-team-metrics-dashboard?orgId=1&from=now%2Ffy&to=now%2Ffy

Solution: Estuary Metrics API

Untitled

Tech Components

  • Go
  • Grafana
  • Gorm
  • Cacher
  • Mux
  • PQ
  • IPFS

Use cases

For System metrics, in addition to aggregate, we also want breakdown by shuttle / primary node.

System

  • Total objects pinned (Query) **select** *count*(***) **from** contents **where pinning**
  • Total TiBs uploaded (Query)**select** *sum*(**size**) **from** objects
  • Total TiBs sealed data on Filecoin **select** *sum*(**size**) **from** contents **where pinning and active**
  • Available free space (custom Grafana plugin)
  • Total space capacity (custom Grafana plugin)
  • Downtime (this is usually notoriously difficult to define) (custom Grafana plugin)
  • Performance (this needs to be fleshed out)

Users

  • Total number of Storage Providers (Query) select count(*) from storage_miners
  • #12
  • Ongoing user activity — DAUs, WAUs, MAUs etc. Are users coming back? (custom Grafana plugin) - we would need to build a tracking system for this - Persistent layer for Tracking

For Storage/Retrieval deal metrics, in addition to aggregate, we also want the following breakdowns

  • per day breakdown (Query)
  • per week breakdown (Query)
  • per provider breakdown (Query)

Storage

  • Storage Deal Success Rate (Success % / All Deals)
  • Storage Deal Acceptance Rate (Success % / Accepted Deals)
    • Total number of storage deals proposed (Total Deals / Proposed)
    • Total number of storage deal proposals accepted (Total Deals / Accepted Deals)
    • Total number of storage deal proposals rejected (Total Deals / Rejected Deals)
  • Total number of storage deals attempted
    • Total number of successful deals
    • Total number of failed deals
  • Distribution of data size uploaded per user
  • Performance metrics
    • Time to a successful deal
      • how does that scale with data size?

Retrieval

  • Retrieval Deal Success Rate
  • Retrieval Deal Acceptance Rate
    • Total number of retrieval deals proposed
    • Total number of retrieval deal proposals accepted
    • Total number of retrieval deal proposals rejected
  • Total number of retrieval deals attempted (per day and per week breakdown)
    • total number of successful retrievals
    • total number of failed retrievals
  • Deals Failed Because Of Undialable Miners
  • Time To First Byte (retrieval deals)

Implementation

https://github.com/application-research/estuary-metrics

WHYPFS dedicated gateway provisioning and subscription

Idea/Proposal: Dedicated Estuary Gateway provisioning and subscription

Contributors @alvin-reyes  
Status Draft
Revision  

Proposal

NOTE: This is a draft and is not finalize yet. We'll have to polish it until we all agreed on the approach.

We need to allow users to avail their own dedicated gateway so they can directly interact with for their contents.

image

1 - user needs to subscribe to a gateway. We will need to ask for the parameters

  • Name (for domain name)
  • Storage Size
  • Payment based on storage size + service

2 - We need to develop a wizard like page to get gateway information.

  • user clicks on "request" dedicated gateway". This launches a wizard step by step to get information from the user.
  • Information - gateway name, storage size, payment method
  • payment method - if user wants FIL, we use the FEVM deposit contract, otherwise - STRIPE or PAYPAL

3 - we need to develop a page for each user to navigate and manage their gateway(s).

  • List view of all created gateways
  • Dedicated page for a specific gateway

4 - middleware code

  • The user needs to have an API key to access each gateways.
  • The user needs to upload their files on their dedicated gateway and we will need to have an authentication to only serve contents from the user uploaded on the specific gateway only
  • License file to be generated for the user with all the META of the subscription.

5 - backend

  • dockerized components
  • ansible scripts to provision the gateway. We need to provision the gateway with the defined resources on the docker-compose yaml file. The yaml includes the META information of the subscription, resources, storage, domain name, certificate generation, server and the WHYPFS gateway.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.