filecoin-project / specs Goto Github PK

View Code? Open in Web Editor NEW

370.0 55.0 171.0 22.25 MB

The Filecoin protocol specification

Home Page: https://spec.filecoin.io

License: Other

HTML 27.77% Go 15.67% JavaScript 12.83% SCSS 28.35% Mermaid 15.38%

fil filecoin-protocol filecoin-specification spec specification

specs's Introduction

Filecoin Specification

This is the Filecoin Specification, a repository that contains documents, code, models, and diagrams that constitute the specification of the Filecoin Protocol. This repository is the singular source of truth for the Filecoin Protocol. All implementations of the Filecoin Protocol should match and comply with the descriptions, interfaces, code, and models defined in this specification.

https://spec.filecoin.io is the user-friendly website rendering, which we recommend for reading this repository. The website is updated automatically with every merge to master.

Install
Writing the spec
Check your markdown
Page Template
Code
Images
Links
Shortcodes
- embed
- listing
- mermaid
- hint
- katex
Math mode
Front-matter
References

Install

To build the spec website you need

node & npm

On macOS you can get node from Homebrew

brew install node

Clone the repo, and use npm install to fetch the dependencies

git clone https://github.com/filecoin-project/specs.git
npm install

To run the development server with live-reload locally, run:

npm start

Then open http://localhost:1313 in the browser

Writing the spec

The spec is written in markdown. Each section is markdown document in the content directory. The first level of the directory structure denotes the top level sections of the spec; (Introduction, Systems, etc.) The _index.md file in each folder is used as the starting point for each section. For example the Introduction starts in content/intro/_index.md.

Sections can be split out into multiple markdown documents. The build process combines them into a single html page. The sections are ordered by the weight front-matter property. The introduction appears at the start of the html page because content/intro/_index.md has weight: 1, while content/systems/_index.html has weight: 2 so it appears as the second section.

You can split out sub-sections by adding additional pages to a section directory. The content/intro/concepts.md defines the Key Concepts sub-section of the the Introduction. The order of sub-sections within a section is again controlled by setting the weight property. This pattern repeats for sub sub folders which represent sub sub sections.

The markdown documents should all be well formed, with a single h1, and headings should increment by a single level.

Note: Regular markdown files like content/intro/concepts.md can't reference resources such as images, or other files. Such resources can be referenced only from _index.md files. Given that a folder will have an _index.md file already, there is the following work around to reference resources from any file: create a new sub-folder in the same folder where the initial .md file was, e.g., content/intro/concepts/_index.md, include the content from concepts.md in the _index.md file, add the resource files (for example, images) in the new folder and reference the resource file from the new _index.md file inside the concepts folder. The referencing syntax and everything else works the same way.

Check your markdown

Use npm test to run a markdown linter and prettier to check for common errors. It runs in CI and you can run it locally with:

npm test
content/algorithms/crypto/randomness.md
  15:39-15:46  warning  Found reference to undefined definition  no-undefined-references  remark-lint
  54:24-54:31  warning  Found reference to undefined definition  no-undefined-references  remark-lint

⚠ 2 warnings

Format errors can be fixed by running npm run format.

Checking formatting...
[warn] content/systems/filecoin_token/block_reward_minting.md
[warn] Code style issues found in the above file(s). Forgot to run Prettier?

Page Template

A spec document should start with a YAML front-matter section and contain at least a single h1, as below.

---
title: Important thing
weight: 1
dashboardState: wip
dashboardAudit: missing
---

# Important thing

Code

Wrap code blocks in code fences. Code fences should always have a lang. It is used to provide syntax highlighting. Use text as the language flag for pseudocode for no highlighting.

```text
Your algorithm here
```

You can embed source code from local files or external other repos using the embed shortcode.

{{<embed src="/path/to/local/file/types.go"  lang="go" symbol="Channel">}}

{{<embed src="https://github.com/filecoin-project/lotus/blob/master/build/bootstrap.go" lang="go">}}

Images

Use normal markdown syntax to include images.

For dot and mermaid diagrams you link to the source file and the pipelines will handle converting that to svg.

# relative to the markdown file

![Alt text](picture.jpg)

# relative to the content folder

![Alt text](/content/intro/diagram1.mmd)

![Alt text](graph.dot 'Graph title')

The alt text is used as the title if not provided.

Links

Use markdown syntax [text](markdown-document-name).

These links use "portable links" just like relref. Just give it the name of the file and it will fetch the correct relative path and title automatically. You can override the title by passing a second string in the link definition.

Note: When using anchors the title can't be fetched automatically.

[](storage_power_consensus)

# Renders to

<a href="/systems/filecoin_blockchain/storage_power_consensus" title="Storage Power Consensus">Storage Power Consensus</a>

[Storage Power](storage_power_consensus 'Title to override the page original title')

# Renders to

<a href="/systems/filecoin_blockchain/storage_power_consensus" title="Title to override the page original title">Storage Power</a>

[Tickets](storage_power_consensus#the-ticket-chain-and-drawing-randomness 'The Ticket chain and drawing randomness')

# Renders to

<a href="/systems/filecoin_blockchain/storage_power_consensus#the-ticket-chain-and-drawing-randomness" title="The Ticket chain and drawing randomness">Tickets</a>

Shortcodes

hugo shortcodes you can add to your markdown.

`embed`

# src relative to the page

{{<embed src="piece_store.go" lang="go">}}

# src relative to content folder

{{<embed src="/systems/piece_store.go" lang="go">}}

# can just embed a markdown file

{{<embed src="section.md" markdown="true">}}

# can embed symbols from Go files

# extracts comments and symbol body

{{<embed src="types.go"  lang="go" symbol="Channel">}}

# can embed from external sources like github

{{<embed src="https://github.com/filecoin-project/lotus/blob/master/build/bootstrap.go" lang="go">}}

This shortcode also supports the property title to add a permalink below the embed.

`listing`

The listing shortcode creates tables from externals sources, supports Go struct.

# src relative to the page

{{<listing src="piece_store.go" symbol="Channel">}}

# src relative to content folder

{{<listing src="/systems/piece_store.go" symbol="Channel">}}

# src can also be from the externals repos

{{<listing src="/externals/go-data-transfer/types.go"  symbol="Channel">}}

`mermaid`

Inline mermaid syntax rendering

{{< mermaid >}}
graph TD
  A[Christmas] -->|Get money| B(Go shopping)
  B --> C{Let me think}
  C -->|One| D[Laptop]
  C -->|Two| E[iPhone]
  C -->|Three| F[fa:fa-car Car]

{{</ mermaid >}}

`hint`

<!-- info|warning|danger -->

{{< hint info >}}
**Markdown content**  
Lorem markdownum insigne. Olympo signis Delphis! Retexi Nereius nova develat
stringit, frustra Saturnius uteroque inter! Oculis non ritibus Telethusa
{{< /hint >}}

`katex`

We should only use inline mode for now! Display mode has a bug and is not responsive the formulas don't break in small screen. Track: KaTeX/KaTeX#2271

<!-- Use $ math $ for inline mode-->

{{<katex>}}
$SectorInitialConsensusPledge = \\[0.2cm] 30\% \times FILCirculatingSupply \times \frac{SectorQAP}{max(NetworkBaseline, NetworkQAP)}$
{{</katex >}}

<!-- Use $$ math $$ for display mode-->

{{<katex>}}
$$SectorInitialConsensusPledge = \\[0.2cm] 30\% \times FILCirculatingSupply \times \frac{SectorQAP}{max(NetworkBaseline, NetworkQAP)}$$
{{</katex >}}

Math mode

For short snippets of math text (e.g., inline reference to parameters, or single formulas) it is easier to use the {{<katex>}}/{{/katex}} shortcode (as described just above). Check how KaTeX parses math typesetting here.

For extensive blocks of math content it is more convenient to use math-mode to avoid having to repeat the katex shortcode for every math formula.

Check this example example

Some syntax like \_ can't go through HUGO markdown parser and for that reason we need to wrap math text with code blocks, code fendes or the shortcode {{<plain>}}. See examples below.

Add math-mode prop to the Frontmatter
---
title: Math Mode
math-mode: true
---

Wrap `def`, `gdef`, etc.

Math text needs to be wrapped to avoid Hugo's Markdown parser. When wrapping defs or any math block that doesn't need to be rendered the recommended option is to use the shortcode {{<plain hidden}} with the hidden argument.

{{<plain hidden>}}

$$
\gdef\createporepbatch{\textsf{create_porep_batch}}
\gdef\GrothProof{\textsf{Groth16Proof}}
\gdef\Groth{\textsf{Groth16}}
\gdef\GrothEvaluationKey{\textsf{Groth16EvaluationKey}}
\gdef\GrothVerificationKey{\textsf{Groth16VerificationKey}}
{{</plain>}}
$$

Wrap inline math text with code blocks

The index of a node in a `$\BinTree$` layer `$l$`. The leftmost node in a tree has `$\index_l = 0$`.

Wrap math blocks with code fences

```text
$\overline{\underline{\Function \BinTree\dot\createproof(c: \NodeIndex) \rightarrow \BinTreeProof_c}}$
$\line{1}{\bi}{\leaf: \Safe = \BinTree\dot\leaves[c]}$
$\line{2}{\bi}{\root: \Safe = \BinTree\dot\root}$

$\line{3}{\bi}{\path: \BinPathElement^{[\BinTreeDepth]}= [\ ]}$
$\line{4}{\bi}{\for l \in [\BinTreeDepth]:}$
$\line{5}{\bi}{\quad \index_l: [\len(\BinTree\dot\layer_l)] = c \gg l}$
$\line{6}{\bi}{\quad \missing: \Bit = \index_l \AND 1}$
$\line{7}{\bi}{\quad \sibling: \Safe = \if \missing = 0:}$
$\quad\quad\quad \BinTree\dot\layer_l[\index_l + 1]$
$\quad\quad\thin \else:$
$\quad\quad\quad \BinTree\dot\layer_l[\index_l - 1]$
$\line{8}{\bi}{\quad \path\dot\push(\BinPathElement \thin \{\ \sibling, \thin \missing\ \} \thin )}$

$\line{9}{\bi}{\return \BinTreeProof_c \thin \{\ \leaf, \thin \root, \thin \path\ \}}$
```

Front-matter

Description for all the available frontmatter properties

# Page Title to be used in the navigation
title: Libraries
# Small description for html metadata, if not present the first couple of paragraphs will be used instead
description: Libraries used from Filecoin
# This will be used to order the ToC, navigation and any other listings of pages
weight: 3
# This will make a page section collapse in the navigation
bookCollapseSection: true
# This will hidden the page from the navigation
bookhidden: true
# This is used in the dashboard to describe the importance of the page content
dashboardWeight: 2
# This is used in the dashboard to describe the state of the page content options are "missing", "incorrect", "wip", "reliable", "stable" or "n/a"
dashboardState: stable
# This is used in the dashboard to describe if the theory of the page has been audited, options are "missing", "wip", "done" or "n/a"
dashboardAudit: wip
# When dashboardAudit is stable we should have a report url
dashboardAuditURL: https://url.to.the.report
# The date that the report at dashboardAuditURL was completed
dashboardAuditDate: '2020-08-01'
# This is used in the dashboard to describe if the page content has compliance tests, options are 0 or numbers of tests
dashboardTests: 0

References

specs's People

Contributors

Stargazers

Watchers

Forkers

nmarley kostadin doragogonet bugxzhu odyslam litianc ttch arsstone gnunicorn nijynot taoshengshi jnthnvctr fmtmbs stanxii porcuquine yinyishun mikeal zixuanzh stebalien hannahhoward derbaltasar metagates-dev steven004 shishi614 starli-trapdoor gajendra2017 nickboot cindywu awfeequdng ec2 jonnycrunch alexey-n-chernyshov riptl zondax leodenale 2381004955 qjawe holajiawei liqingmubai storswiftlabs joepeak nj-steve sternhenri har00ga acmefocus zhouyj0213 leozhang404 nonsense rvagg snario hunjixin eduardo-silla olizilla mortdeus ipfsunion-k wu168888 ipfsunion sa8 ksh2i2onsa caitincatherine awatin darkd-comd jsoares telecom999 irinazheltisheva defi-tools bryanchriswhite natalielulu xiyangge fillcoin laashub-soa dkkapur satoshi-kusumoto haadcode yaoyf888 manny27nyc cl34nd4n thogiti djanngau gh-efforts hbw12345 aixushuai leveleven wangzhen0101 lparth songjxin sallewarkiran syncstudy chenhong805 z6301087 salespaulo yeousunn baby636 hivebrain hiroya-ff chihchengliang creath itpreneur conanza00 shankarganesh-pj

specs's Issues

Define how constructors work

Price Discovery

Miners advertise a price per byte that they are willing to accept (either via an ask, or some future mechanism). But thats not quite enough. Miners will either not accept pieces below a certain size (as each thing they store has some baseline overhead), or they will want to charge more per byte for smaller pieces. In addition, clients will want to have their data stored with differing amounts of collateral, and miners will likely want to have a higher price per byte for storage that is more collateralized. Another factor is that miners may want to either put a limit on the longest they will store a file for, and maybe give discounts for longer deals (means less work over time).

Also, since this is an automated market, 'bartering' is difficult and provides not great UX. If the client finds a miner, and wants to store a file of a given size with a given amount of collateral, they need to pick a price. Without a function for determining the correct price given all these factors, the client is effectively guessing. They can send over a proposal to the miner, and the miner could say 'no', or the miner could say 'price too low', or they could even say 'my price for those parameters is X'. In the final case, this means that the miner has some function locally that they use to price their own storage, and the 'price proposal for a too low price' from the client is really a 'price query' that can be used to price storage at any point.

Given all this, I see (at least) three possible ways to do price discovery:

Super Specific Asks

Each miner will have a set of asks that defines a fixed price per byte given certain parameters. These asks could be something like: "0.0001FIL/GBBlock for pieces between 200MB and 1GB with 0.01FIL/GB of collateral and for at least six months duration"

Pros:

Possibly easier to reason about pricing brackets for the client
No communication required for price discovery

Cons:

Imprecise control over pricing for the miner
Lots of data on chain
Changing prices can be rather expensive

Pricing Function stored on-chain

Each miner could specify the function that defines their storage price on-chain. Clients can easily run this function with their parameters to get a price for their storage that they can be confident the miner will accept.

Pros:

Exact pricing for all parties
No communication required for price discovery
Probably cheaper than many asks in terms of on-chain data required

Cons:

Still requires on-chain interaction for pricing updates
Could get very complicated

Price discovery entirely via query

Instead of having on-chain asks, each miner can simply provide an endpoint for clients to query for storage price given certain parameters. Clients would then ask miners for prices directly instead of using the chain.

Pros:

Exact pricing for all parties
No on-chain overhead for pricing or price updates
Miners can change prices freely
Miners can choose any function for pricing

Cons:

Price discovery requires selecting random miners and asking them for prices
A passive observer has no idea what price storage is being bought or sold at

Given these options (and acknowledging that there may be undiscussed options) I think that using the third option provides the best combination of user experience and scalability factors. The obvious downside now becomes, as a client, how do I pick which miners to even ask for a price?

I think this is a solveable problem, and solutions to this problem would need to be implemented anyways to help clients distinguish from large batches of miners who all give the same price in the current model. Possible solutions could include gossipping pricing functions around, or using a basic 'past performance' based reputation system.

Take care to separate abstract concepts and their concrete definitions

In general, we should take care to not conflate abstract concepts (PoRep, EC, State Machine) with the concrete implementations. For example, Expected consensus doesnt necessarily care about storage power, or proofs of spacetime. That is (for lack of a better name) 'Filecoin Consensus'.

Another more actionable example is that we should find a name for our specific implementation of each proof. It is not exactly correct to say that "Proof of Replication ... produces a SNARK proof". Proof of replication simply proves in some way that a unique replica of some input data has been created. The description at the top of the proofs.md spec file does this well, it's just later in the doc where we give a concrete definition that we should phrase it differently. One simple proposal would be to just call this SNARK construction "PoRep-1" and say that "PoRep-1 ... produces a SNARK proof" (and so on). Obviously someone with a knack for naming could pick something better than a numeric naming system.

cc @porcuquine @nicola

Define Proof Formats

Filecoin network storage limit due to deal expiry

In the recent storage throughput calculator I put together, A previously un-worried-about factor popped up, deal expiry. Essentially, if we assume that there is some average length of time that deals are made for, even under a distribution with high variance, we set a 'maximum' throughput in the system.

Potential avenues of mitigation:

Increase the block size
- Only gives us scaling linear to the increase in the block size. Not a workable solution
Reduce the size of bids/deals
- A much better lever to pull than increasing block size, though getting smaller bids and deals is complicated. See this doc on bid/deal aggregation.
Increase the average deal duration
- Easier said than done, this means changing how people will want to use filecoin.

🚀 Specs v1.0 Plan

This is the master plan for getting `Specs` into version 1

Using this issue to track work that needs to be done in order for the 'spec' to be complete.

Refine & Review Specs

Approve/reject all PRs generated before Nov. 2018
Seek developer approval for beta specs
🚀 Ensure Retrieval & Storage Market specs are accurate
🚀 retrieval market completion - filecoin-project/venus#1283
🚀 storage market completion - #130
🚀 payment channels - usage
🚀 process on how to sync the blockchain
🚀 Add searchable wiki generated from specs (ex. GitBook)

New Specs

Documentation

🚀 'How to use the spec' document
- Standardize a process for adding specs #113
- Link to the spec process in the spec
'how to run a node' should be documented
- document started here
- filecoin-project/venus#1312
Serialization - cbor CHAMP documentation
- filecoin-project/venus#1268
- Link to 'cbor-ipld' spec - ipld/specs#73
Account IDs - How do account IDs work? - filecoin-project/venus#1267

This plan uses the following issue as a template: https://github.com/filecoin-project/research/issues/26

Legend

Required to ship: 🚀
Effort (E):
- 1: the answer is known and requires simply document the answer in detail to count as 'solved'.
- 2: a solid day of work to get the solution fully thought out and written down.
- 3: anywhere from 2 to 4 days of dedicated research time to arrive at a solution to the problems
- 4: a solid week (possibly more) is needed to arrive at a solution. In person sync time with other researchers may be required for E4.
- 5: at least a week of research time is needed, very likely more, and that very little progress can be made without face to face discussion with others.

(@whyrusleeping)

Get libp2p specs written

This is a tracking issue for getting specs for the parts of libp2p used by Filecoin.

See: libp2p/specs#110

DRGPorep/Zigzag specs should include graph public params

Graph random seed
Graph params (e.g.)
- Graph algorithm choice
- Depth

Sector re-sealing and removal

After talking with @nicola, we came to the conclusion that we won't support re-sealing sectors for filecoin v1. With much of the information necessary for making this work reasonably well now off-chain, its not easy to do. Plus, most sectors should be filled by (on average) one or two pieces, meaning that the overhead in terms of wasted space for expired deals shouldnt be too high in any case (you know all the durations for deals youre making ahead of time, the cost is known ahead of time).

A possible way to solve the problem would be for miners who want to reseal a sector to contact the client of each of the extant deals in the sector, and ask them to agree to the new sector (note, this requires miners to actually do the re-seal before knowing if it will actually work out). Once all clients agree to the new sector, the new sector may be added, and the old one removed. Any client trying to claim that their data is no longer being stored (after agreeing to the changeover) could then be refuted (at the clients cost) by the miner with the new agreement.

Previous: https://github.com/filecoin-project/aq/issues/116

Effort: Outline the Repair protocol v0

The repair protocol is written abstractly in the paper.

What I want to get out of this:

A simple Repair protocol without erasure coding
This finds missing proofs and triggers penalize
This finds missed assignements (too many proofs missing) and reintroduces the order
Describe how the order can be picked (since there is no client doing delivery of the file), hence this is a non-standard process for making deals

De-blurring the definition of file

Here's how it's currently defined in https://github.com/filecoin-project/specs/blob/master/definitions.md:

Files are what clients bring to the filecoin system to store. A file is split up into pieces, which are what is actually stored by the network.

If a client needs to update a document, and it is only 1% different, does the whole "file" get brought into the filecoin system with each update? Or will something like IPFS identify differing leaves, and only bring the new content into the system?
Is an object store still considered a "file"? Or is a single object or field update considered the "file"?
If your token-incentivized storage system stores objects or blobs, can it even be called filecoin? :groan:

Not 100% sure but we may want a term that can encompass both file storage and object storage. The storage world has "block storage" but that's a dup in our world.

Storage Market Spec Improvement

Add:

Client <--> Miner protocol diagram
Add serialization & formatting for all messages in the storage market
Add example protocol for each market operation

(For awareness: @whyrusleeping)

Define Address Format

First draft is up in #19

RFC: Light-client Friendly Orderbook

Background

Little background on Ask and Bid and Orderbook:

When Ask and Bid orders are added to the chain, they are added to the orderbook
The orderbook is a list of open bid, ask and deal orders
The orderbook's hash is calculate at every new block and added to the new block
In order to run MatchOrders, clients will have to query the orderbook for open orders matching their query (mostly in terms of price)

Standard approach

A standard approach would be to:

have the OrderBook to be an append-only list of orders
orders are appended to their correspondent list as they arrive

Key question: How do nodes find matching orders?
Here I evaluate the two different ways full nodes and light nodes can find matching orders:

Full nodes: full nodes receive all the new blocks, validate them and keep a state tree
Light nodes: light nodes only know the hash of the latest block and can talk to a third party and query the state tree
Full nodes options:
- Option 1: linear (could be expensive)
  - Run a linear search through the entire list of orders
  - Filter the ones they are interested while scanning
- Option 2: local indexing (requires extra storage and computation for indexing)
  - While receiving new blocks, they extract the orders and keep a local sorted index of orders sorted by price
  - Run a sublinear search to fin matching orders
Light nodes options:
- Option 1: trust third party (light nodes can be hidden orders)
  - Ask to full node to give her matching orders
  - Full nodes gives matching orders from the list and a proof of membership for each order
  - However, the proofs prove that orders are in the state tree not that they are not (in order to give proofs of non-membership, the third party would have to give the entire orderbook to the light client)
- Option 2: get the entire orderbook (tooooooo big)
  - Ask full node to give her full orderbook
  - Check that received the right orderbook
  - Run the query

Optimized orderbook

If we structure the orderbook in a special way such that the orders are ordered by price, then we can run proofs of memberships and proofs of non-memberships just using the Merkle tree, without sending the entire tree.

Example of proof of non-membership + membership for light nodes:

Orders are ordered by price in the Orderbook (hence in the state tree)
When requested a match, full nodes:
- return all the match orders as well as: the order before their match (lower price), the order after their match (higher price) and a proof of membership of those.
- in this way if the orderbook has been correctly ordered by the miners (and it is, since otherwise the block would not be valid!), then the light node has a guarantee that they got all the nodes in between their range

Pending questions:

Is there a cheap way for miners to run re-ordering when creating new blocks?
Can someone attack miners by forcing a crazy re-ordering?
Is pricing the right ordering?
Can we implement this just as an extra indexing that is stored on-chain beyond the standard append-only? (if so, then, this is low priority and may be introduced with hard fork)

Contract ABI

We need to figure out what we're doing here.

For reference, here is ethereums: https://github.com/ethereum/wiki/wiki/Ethereum-Contract-ABI

For now, I think we can just use a CBOR array of the arguments. I'm not against switching in the future, but this will let us move forward quickly.

Number formats

I wanted to fill in the details for number formats and would like some input.

Current thinking is:

use varints for IDs
use varints to represent anything larger than u64
- alternative to that we can use the bignum representation from cbor for all larger values
use varints like https://github.com/multiformats/unsigned-varint, but with a higher limit
- what should that limit be?
- do we need to consider signed varints?
should we store numbers that fit into (u)int{8|16|32|64} into varints as well?
- if not how do we enforce number storage in cbor representation?

RFC: Long Deals

Main Question:

Can a Storage Miner sign a deal where the Bid.Time bigger their available storage Pledge.Time?

Possible Options:

Yes:
- They can always plan on renewing their pledge
- They can always re-sell the storage - or they will lose their collateral otherwise
No:
- In this way if they want to accept deals, they must commit more storage and put more collateral (which is great!!)
- They won't be able to plan on re-selling storage by default
- Great that they cannot overcommit (accept deals that they cannot fulfill completely), hence there is less risk of "re-selling storage", which also means less risk for clients to loose their files.

Questions:

If a Storage Miner does not re-sell on time deal that they could not store (since their Pledge.Time is smaller than the Bid.Time), how do we penalize them..?

My take: NO (looks like a safe option), YES (if we figure out how to penalize the miners)

Sending and Reiving Tokens

Sparked by a conversation with @nicola it becomes clear there are more considerations on how the token actor works and sending and receiving tokens should work.

Some notes from ethereum on this

Effort: Figure out the smallest amount of Filecoin to represent

If Filecoin is infinitely divisible.. what is the smallest unit of Filecoin that we want to represent in the code?

Calling @jbenet, @Stebalien, @whyrusleeping, @dignifiedquire

RFP Draft for Storage-based Proof Work

More details: https://docs.google.com/document/d/12I9RqaHp331MtLb8p8BjLyKE7MiLzgEY5w6m3HWN_OM/edit can be found here.

Details on the seal function
More context on the proofs
Define interfaces for the library
Describe the circuits as NP statements

Define how power & sectors are stored on chain and used in consensus

We are switching to using sector count instead of bytes for power calculations in filecoin-project/venus#894.

This needs at least three things before finalizing for testnet

decide if we want to do this going forward, and articulate why
update spec accordingly
define how changing the sector size works

Challenges and encoding checks for ZigZag spec

This is a proposal to improve the zigzag spec, originally proposed by Ben:

Have different number of challenged nodes (encoding+inclusion) at each layer
Have different number of encoding checks at each layer

Addressing data

When thinking through how to actually build filecoin, one of the first big (and unanswered) questions I come across is how to we reference data? when i say filecoin storefile file.zip what does it do with the file? does it run it through an ipfs importer?

RFC: Open Orders

I decided not to include open orders in the current spec

Open Orders are orders that only specify funds and optionally specify time. The idea of these open orders is to be able to set some funds and set the price every X blocks to be the current average price. In this way if someone wants to store the file for one year, they don't have to re-send orders every month because they don't want to predict the future price.

I leave this request for comment open until we figure out all the details about this.

If this doesn't make it into the main spec, it could be an RFC/update to the spec.

Current idea

  if b.Price == 0 {
    if b.Time > 0 {
      // Spec: if Time is specified calculate Price from it
      b.Price = b.Funds / (b.Time * b.Size) // TODO check for remainder
    } else {
      // Spec: Pick the average Price of the past 100 deals
      // RFC: How risky is this?
      b.Price = 0 // TODO fix to 10 for now
    }
  }
  // Spec: If the Bid time is not specified, it can be calculated from b.Price
  if b.Time == 0 && b.Price > 0 {
    b.Time = b.Funds / (b.Price * b.Size) // TODO check for remainder
  }

Signature Spec

There is not a spec governing how cryptographic signatures should be generated, saved, and serialized.

Questions to be answered by the spec:

How are signatures used in Filecoin
What algorithms are supported
How should signatures be serialized
What are the dependencies/reliances of signatures in Filecoin
What interfaces should be used in Filecoin signatures

(@frrist)

repair protocol notes

(NOTE: this is not actionable. I will be condensing this into both a spec document and a series of actionable work units separately. This issue is just so I don't lose the notes)

Repair Protocol

Setup for repair

during storage protocol, when client prepares data for storing
Data or Files are chunked according to a dispersal scheme into a set of pieces. The dispersal scheme which includes choices like erasure coding scheme, chunking algorithm, dag layout, etc.
- NB: All the algorithms are the choice of the Client.
- NB: All algorithms must be known to the participants of the network (code is available somehow, likely part of the reference implementation)
- NB: Some dispersal schemes will combine erasure coded pieces with storing 1 or more complete copies (without erasure coding) to allow for expedient retrieval.
TODO:
- specify where to store the metadata of the encoding
- specify how we map the hash of the original data to the hash of the erasure coded pieces stored by different miners

Proving the pieces are stored

During the mining protocol, when miners prove they are storing data, Proofs-of-Replication are produced, which are then used to determine when particular pieces were last verified to exist. This section describes how that works.
The mining protocol includes the creation of Proofs-of-Replication (PoRep) over the sectors that a miner is storing. An individual PoRep $\pi_s$ proves that a corresponding sector $s$ has been stored.
- NB: A sector contains an ordered set of pieces $p = {p_0, ..., p_n} \in s$, so proving the storage of a sector (probabilistically) proves the storage of the underlying pieces.
- NB: Because Proofs-of-Storage (PoRep included) are probabilistic, we need many proofs over time to ensure (with overwhelming probability) that a particular piece $p_i \in s$ has been proved to be stored.
- NB: Some protocol designs might wish to use the explicit underlying pieces tested by each PoRep over a sector $s$. But for simplicity, we treat a single PoRep over sector $s$ as proving all pieces $p_i \in s$.
A PoRep $\pi_s$ over sector $s$ should be clearly understood to correspond to $s$, meaning that every participant of the network should be able to clearly relate $\pi_s$ to $s$. This means either using a deterministic and clear algorithm, or some information added to the chain to indicate the correspondence. So, given the chain, which includes a sequence of PoReps, we can find the last PoRep $\pi_s$ produced for a particular sector s (and its pieces ${p_0, ..., p_n} \in s$) and know when the sector (and its pieces) "were last proved to be stored (or verified to exist)".
- NB: we will use these proofs later to determine "how many blocks ago" was a piece proved.

Storage Failure Detection

A Storage Failure Detection Algorithm (SFDA) determines whether a particular piece $p$ is currently being stored correctly or not.
Given how the mining protocol works and the trace of PoReps it produces, SFDA can merely audit the PoReps produced through mining to check when all pieces have been proved to be stored.
NB: SFDA is probabilistic, and relies on frequent PoReps to ensure freshness and accuracy.

type SFDA interface {
  // FailedPieces audits chain c and returns the set of pieces
  // determined to have failed. For example, all pieces that have
  // not been proved since some threshold time. 
  FailedPieces(c Chain) []PieceID
}

Given the magnitude of data any SFDA would have to deal with, practical implementations will offer only partial results, and may use caches. They may also be designed to minimize operations needing to be done per block, or to minimize auxiliary cache storage.

Conditions of a failed piece

A miner will fail to prove a piece if they (a) do not have the sector that contains the piece, or if they (b) go offline during the proving period and cannot produce the proofs. Since going offline is considered a kind of failure (and failure to honor the commitment to the client), we treat these the same.
Sector Failure. Miners produce traces of proofs over time, one randomly chosen sector at a time. At some point, the sector containing a particular piece will be challenged (and either proved or not). If the miner fails to produce the proof for a particular sector, then we can deem that sector (and all its pieces) failed.
Miner Failure. If miners fail to produce any proofs at all for the last MinerFailureHeight blocks, we can assume they have suffered extreme failure (or are just gone) and we can deem all their sectors (and all their pieces) failed. We can detect this even if miners fail to win any blocks, as all storage miners are asked to re-introduce their proof chains into the blockchain over time. The lack of these proof chains signals miner failure. (MinerFailureHeight should be on the order of a week).

Candidate practical SFDA: LRP Piece Cache

Consider an SFDA that uses a " ", where each piece is ordered according to the height at which it was last proved. We can update this cache per-block, and then quickly query it to list out the
One candidate SFDA that is cheap per each new block, but that
to make sure that the storage of each piece has been verified within a certain time frame. Verified here means that a Proof-of-Replication has been produced and externalized to the network in some way. (more on this later).
For any file that is missing proofs, allow the repair actor to submit a bid for that file to the storage network
- This bid a special bid, any miner may create a deal for it without a 'client signature'
- This bid has an amount of money for the remaining time it will be stored, at the original spacetime price.

Questions

When is a sector considered lost? (when a sector is lost, the orders are re-introduced by the network)
Can we tolerate missing of some proofs?
If proofs aren't submitted every block (which they arent) how often should we require miners submit proofs so that the repair protocol can detect faults?
Given the current implementation/description, an individual piece will never be lost on its own. It will always be an entire sector at a time. Is this something we want to use as an optimization? We could potentially go through and mark each of the relevant bids as unused again, and change their amounts.

Notes

May be useful to have a way for miners to say that they will have downtime for a given period. Sacrificing their collateral, but avoiding losing all their sectors.

`Bid.Time` number of blocks vs block number

When a client submits an order, they specify the Bid.Time which could be:

The number of blocks they are interested in storing the file (I call it Duration)
- Great since users just need to deposit the amount of money Price*Duration
- Max Block number would be anyway easily calculated with Expiry+Duration
- Min Block number would be CurrentBlock + Duration
The block number (meaning the time!) until their data should be stored for (I call it BlockHeight):
- Great since Clients can specify directly the maximum amount of time and not pay for extra time (in the previous case they might pay extra time they might not want their file stored)
- In order to submit the Bid order to the chain, they would need to lock money for BlockHeight-CurrentBlock, and if their Bid order waits in the orderbook for say Expiry-1 time, then they would need to get money back (Price*(Expiry-1)) since they didn't pay for that time.

Solutions:

Have Time == Duration since it's easy to handle and locking funds is always exact, at the cost of clients storing for some time (Expiry) that they would not be interested in storing
Have Time == BlockHeight since it is the exact time for which clients are willing to pay, at the cost of the network handling all the refund calculations and the client locking slightly more money they they would (in the worst case (Price*(Expiry-1)))
Have both Duration and BlockHeight and users can choice the order they are interested in

The purpose of 'Asks', and 'maybe we should just have miner prices'

Moving the storage market off-chain means that there are no on-chain bids or deals. But, asks are still (currently) on-chain. They are what we are using to advertise price, and a miner may have multiple asks. Previously, we also specified how much space each ask was for, separately from the amount of space that miner had pledged. As each deal was made, this amount was decremented.

With an off-chain storage market, asks are pretty pointless. We could instead just have each miner set a price for all their storage (as there isnt really any reason for an individual miner to have different prices since theres no way to provably distinguish between different types of storage the miner has locally).

Generally, price discovery needs more thought. I will follow up in another issue

clarify nonce naming

I've previously argued that nonce should have a more clear name: in my opinion it's not clear what nonce means or how it is to be used if it is named "nonce". Last week I suggested "message counter" (MsgCount) or "message sequence id" (MsgSeqId), both of which were shot down as "nonce" is a term of art. Obviously it is, but that doesn't make it clear how to use a "nonce" -- nonces are often random for example and "nonce" doesn't tell you if it's the last one seen or the next one expected. In filecoin-project/venus#262 I renamed Nonce to NextNonce to make it more clear that the field holds the value of the next message nonce we expect to see, for the reasons stated above. @dignifiedquire points to #61 which has yet a different proposal. This issue is to clarify how the nonce should work. It does not block filecoin-project/venus#262 which will proceed as renaming is a small refactor and there is no clarity about what to do and yet I have to implement something.

Paying the storage miner under the offchain repair model

Since we no longer have funds locked up in on-chain deals for the storage miner, we have to find some way to pay them that provides assurance both to them, and to clients. (Note: 'client' refers to anyone interacting with the storage miner trying to store files. this could be a storage broker, or a non-miner client)

We can start by saying the client must have funds locked up in an open payment channel to the miner up front. A set of pregenerated and presigned updates should be sent to the miner, each of which is time-locked (redeemable only after a specific block height) and contains funds for that time period (if the total agreement is 100 FIL for storing the file for 100 blocks, then you could send ten updates, the first for 10 FIL after 10 blocks, etc). But, they also need to not be valid if the miner isnt storing the data.

Let's look at the different points in the protocol we could send these, and what it would take to ensure the constraints are met.

Note: More complicated payment channel updates can be 'reconciled' by asking the signer for a succinct update. For example, if the miner has finished the agreement, they could go back to the client and ask for a simple payment channel update that invalidates the previous, and is good for the same amount. Allowing the miner to submit a much smaller transaction. This will be effective in the Broker<->Miner scenario, less so in any Client<->Miner one.

Up Front

If the client sends a set of payment channel updates to the miner up-front, along with the initial proposal, they would need to be valid contingent on the miner having a sector committed to disk that provably contains the file the client wants to be stored.

To do this, the redemption of the payment channel update would have to be accompianied by a proof. This proof could be a snark, with a sector ID and the file hash as public inputs. The proof could actually be a snarked version of the proof the miner has to send the client anyways. This amounts to somewhere north of 300 bytes, which is pretty rough. But, if we can assume payment channel reconciliation, that might work. It's also pretty annoying to have to write an extra circuit for this, so let's see if we can do it better

After Data Transfer

After the data is transferred, the miner could finish filling up the sector (but not sealing it) and send the client an inclusion proof of their data within CommD (the merkletree of the un-encrypted sector). If the CommD is part of the public inputs to the seal proof, then the payment channel updates could be contingent on the existence of a valid sector on-chain whose CommD is the agreed upon value.

This is a much cleaner requirement, but:

i'm not sure if CommD actually goes to the chain (need to ask @nicola or @clkunzang )
It means the miner has to go around collecting payments before it starts sealing in case some client doesnt pay up. This is a DoS vector, which is annoying to deal with.

After Sealing

If the miner waits until after sealing to collect their payment assurance, then they are near-trivially DoSable. It definitely needs to happen before then.

Up Front Take 2

Upon receiving the deal, the miner tells the client a sector ID that their file will be placed at. The client then sends the channel updates along with the data. These proofs can be cashed out at any time after the time period, given that the sector that was agreed upon is still valid on-chain.
If the miner does not send the client a valid proof of inclusion after sealing and committing the sector, the client may post the agreement from the miner on-chain as a challenge. This gives the miner a fixed period of time to respond to the challenge with the appropriate proof, or be slashed.

Problem 1: What if the client never sends the file after receiving the promise from the miner, and then posts that promise to the chain as a challenge?

Problem 2: What if the miner makes that promise to the client, and the client is slow? The miner would have to wait for the client to finish their sector and seal it up. Malicious clients might be able to use this to DoS miners.

Spec process proposal

Spec Update Process

This is a proposal for creating a standardized process around documenting and
updating the Filecoin spec.

Intent

We need to produce a technical specification of the Filecoin network and
protocol. Without a spec, it will be difficult to accurately communicate (or
come to consensus on) how Filecoin works, and it will be nearly impossible to
write additional valid Filecoin clients.

Producing a spec is challenging. Like all documentation, writing it takes
considerable effort, and it must be continually maintained to stay current.
This proposal aims to describe a v0 process for writing and maintaining the
spec. This is explicitly not the final process; see EIP 1 for an example
of what a fully developed spec process would look like. This spec proposal is
much lighter weight.

Requirements

It must be possible to write a Filecoin client using just the spec
The spec needs to communicate design intent
The spec needs to have a clear "lifecycle" that makes changes to the spec
visible before merging, so that interested parties (e.g. developers and
users) can provide feedback and plan accordingly
We need to synchronize the spec with the behavior of go-filecoin
We need to accomodate that the current behavior of go-filecoin is mostly
unspecified, so the spec needs to catch up

Proposed Process

Updates to the spec will happen through a 3 part process:

Proposal
Draft
Spec

Proposal

A proposal is a GitHub issue posted in this repository
filecoin-project/specs. The purpose of a proposal is to communicate
high-level intent to change the protocol. Proposals do not need to be at a very
high level of detail, and should be written early in the process towards
changing the spec. The GitHub issue number assigned to the proposal will be
used to identify and reference to it.

Draft

A draft is a markdown document located in the master branch of this
repository, within the /drafts folder. The title of the markdown document
should start with the number of the proposal that spawned it.

A draft is a concrete suggestion for a change to the spec, and should be
written at the same level of detail prevailing in the spec. Drafts should merge
easily (i.e. consensus from the team is NOT required to merge the PR).

Each draft should be identified by the GitHub issue number of the proposal that
spawned it.

Spec

Once a draft has been officially accepted into the spec, the specs owners
(@whyrusleeping and @bvohaska) will merge the draft into the spec code. The
draft can be left in the repository for historical tracking.

Catching Up

We need to synchronize the spec with go-filecoin, and write it at a high level
of detail. As of Oct 2018, we have a lot of catching up to do. For now, there
is a lot of Filecoin behavior that is unspecified.

Until we have a v1 spec, the requirements for the spec are relaxed. We will not
require that every draft be implemented at a high level of detail: it must only
improve on the current document. Also, we are not yet blocking feature
development in go-filecoin on their details being fully specified, although we
should encourage anyone involved in developing unspecified features to go
through spec review first.

Once we have a v1 spec, we'll tighten up the requirements for merging a draft,
and will start blocking any unspecified changes to go-filecoin.

Note this is just a proposal. Comments are welcome, once we have some consensus we can make it into a draft and then get it merged as an early piece of the spec.

Get IPLD Spec written

Tracking issue for getting specs for IPLD that are sufficient to implement filecoin.

IPLD Specs repo: https://github.com/ipld/specs/

TODO:

Figure out what exactly is needed

Note on Bid Orders Expiration

Since putting a bid order on the chain deposits the funds in the bid order, if the order is never taken the funds will be locked forever, so we need to support an expiry block data for which the order becomes invalid.

This is also useful in case I set a price and given the volatility I dont want that order to be with that particular price forever.

Need to update spec draft
Need to update code

Pseudocode for retrieval markets

It will help us flesh things out a bit to get some pseudocode sketches of this.

@phritz could you handle this?

offchain repair feedback

@whyrusleeping I know i promised this a while ago, sorry for the tardy feedback. Better late than never? cc @dignifiedquire as someone working this are.

This feedback here is specific to https://github.com/filecoin-project/specs/blob/master/drafts/offchain-repair.md (at d198292) knowing that #103 represents further changes to it. I'll review 103 after I'm done with the original offchain repair. Some of these comments may be obviated/outdate because of 103, I'm not going to try to spend the time reconciling that because it'll be obvious to you what is or is not still relevant.

No need to respond directly to any of these, take em or leave em as you see fit.

My high level bit of feedback is consistent with the first impression I think I gave you during review, that this makes sense but introduces a lot of complexity, I think we could aggressively strip out the optimizations and have them come as follow-ons. This would make the whole thing easier to understand, easier to know what to shoot for, and easier / less scary to give good feedback on because there's so much there now.

Other stuff:

there are still some instances of "broker" in the doc
storage linear in the number of files+aggregation: seems like linear in number of files with an optional optimization that makes the constant smaller is definitely better than scaling with size of the data but still seems like it's going to break down. Might be helpful to the reader to say one sentence about how long we expect this strategy to work for us and why we think that (rough back of envelope, eg).
Bid.Price -- what are the units? Is it FIL/MB per #87? Seems like we should encode that in a price type, so Price might be Price TokenAmountPerUnit or something?
do bids ever expire?
ask naming: StorageAsk, RepairedStorageAsk? Bid should probably be named StorageBid for symmetry.
"If they lose some data, then they may either report a fault, or simply fail to provide proofs (TODO: incentivize miners to report failures up front, makes things easier)." -- or not, one less thingy would make things simpler. we have sooooo much complexity in the system already and have an existing mechanism for detecting failures (we detect that they stop proving), seems like we should just rely on it until need proves otherwise.
"TODO: should the storage broker have to be responsible for slashing the miner? Or should they react to someone else slashing their miner when they fail to post proofs? (ref: Honest vs Malicious failures)" -- knowing very little about this seems like they should react to someone slashing their miner. That someone could be themselves, a separately running thing that checks, but could be others. Seems like maybe good protocol design to keep these things separate.
"Note: It may be valuable to distinguish between degraded storage and unrecoverable data" Sounds like a good optimization we can do later.
"Compensation" -- wait, we have a mechanism for repair miners to be compensated, and that's markup. Can't we just let them do the markup thing? Seems like a lot more complexity to add power or to conflate repair mining with storage mining.
" repair miners may gain a noticeable amount of efficiency by having the ability to check if a miner is actually storing any piece of data at will" - Sounds like a great optimization, can we do it later?
"It is no longer easy to look up which file is being stored by which storage miner." -- Is this important? If I'm using S3 to store my data I don't care which thingy under the hood is storing my data, I care about being able to Put and Get it. Seems like we could just not support this operation (unless I'm missing a reason we have to have this, seems like the retrieval market handles what I as a client want to do, which is get data).
"Other changes needed" -- this is a set of optimizations that i think we can move out to their own proposals. We have a lot to do before we get to them. In particular would like to discuss "Multi-invocation messages: " in a lot more detail as a separate issue.

Actor Storage API Proposal

This is a candidate way forward for the filecoin storage actor interface

The low level interface for filecoin actors to interact with persistent storage looks something like this:

type BlockStorage interface {
    Put(blocks.Block) (*cid.Cid, error)
    Get(*cid.Cid) (blocks.Block, error)
    
    GetHead() (*cid.Cid, error)
    SetHead(*cid.Cid) error
}

This API is really flexible and allows for a lot of powerful things to be built on top of it. Apps can really choose to store their data however they like, specifically tailored to their needs. However, on its own, it can be a little difficult to use productively.

Moving forward, we should have something with a simple interface that wraps this to provide a nice interface. Note that all of the filecoin datastructures we already have can work cleanly with this API, as that is essentially the backing storage API we already use (The blockstore). For example, one datastructure we are already using in the codebase is the HAMT (The storage interface the HAMT uses matches our expectations: https://github.com/ipfs/go-hamt-ipld/blob/master/ipld.go#L44).

In addition, the 'cbor store' interface we are using throughout the codebase is a fairly simple wrapper on top of an api that matches with the above storage API. Reusing and exposing this code for actors to use saves us from having to maintain multiple separate libraries for dealing with content addressed DAGs of data. (Note: we need to move the cborstore code into a better place, probably the go-ipld-cbor repo)

Actors with simple storage requirements can just define a struct, and Put it to the cbor store. Data requiring a map can use the HAMT, and data needing arrays can also use the HAMT, with indexes as the key (in the same way that ethereum does for their 'arrays'). Structs can just be stucts, and get marshalled to cbor.

Taking this approach has several really nice advantages:

Improving the API and code benefits ourselves as developers of filecoin, anyone writing an actor, and more broadly, anyone working with ipld.
Other implementors of filecoin will have less work to do than if we implemented something completely different for the actor storage.
Minimal scaffolding needed.

When we move towards introducing third party actors written in WASM, we can compile this code to web assembly and anyone writing a filecoin actor can choose to use the filecoin way of data storage, or they can implement their own libraries.

Ergonomics

This is just a sketch of how using this could work.

type MyActorStorage struct {
    // Normal fields can just be fields
    Name string
    AccountNumber uint64
    
    // anything that wants to be a 'map' should be switched to a hamt
    // We can set things up to dereference cids and unmarshal them into struct fields
    // to make things easier to use. Alternatively, we can have a wrapper type that
    // lazy loads things on-demand to save computation.
    Friends hamt.HAMT
    
    // Any data that you don't change frequently should probably be stored
    // in a separate linked structure, that way it doesnt need to be rewritten
    // when making changes. We could add a struct tag to inform the marshaler that
    // the referenced object should be added separately and linked to.
    // (Though, maybe we want things to be loaded on demand to avoid unnecessary
    // computation)
    Extra *Data `ipld:"link"`
}

type Data struct {
    // ... other stuff that doesnt change much ...
}

Then, WithStorage can unmarshal the node referenced by storage.Head() into the given struct (for example, the MyActorStorage above), and then on exiting the closure, marshal it back up, storage.Put it, and storage.SetHead the result.

Filecoin Spec Work

Using this issue to track work that needs to be done in order for the 'spec' to be complete.

DesignDocs - Sprint 1

We are starting Sprint 1 of the DesignDocs effort.

Start: 2018-09-06 (Thu)
End: 2018-09-18 (Tue)

Focused on important requirements for alpha:

Allocations are written in the doc, for now. Will move to the queue.

(still playing around with best formats, forgive duplication)

[WIP] Studying the specification from the perspective of a developer new to Filecoin

As suggested by why I'll be doing a general review of the specification from the perspective of a developer who wants to implement Filecoin from scratch.

The summary for now is that I really need a clear high-level picture of the system to understand how the components interact together (and why do we have each of those components in the first place), especially since the specification still has considerable amounts of blanks throughout its documents, which may not be a blocker for someone with a clear mental model writing it (as it's expected that there are some sections that need to be prioritized over others) but that poses some problems to a new reader, or particularly me :)

Let's not answer any question here but rather create separate issues/PRs to address worthwhile questions (if any).

There are many TODOs already in the spec that I won't mention here to avoid repeating ourselves (but that doesn't mean I think those parts weren't important).

Although I'm familiar with the distinction between a specification and "some document to learn about what is Filecoin and its motivation" I have done a poor job at distinguishing them here and I am reading the spec from both of those perspectives, which is unfair because the spec shouldn't perform both functions, and there are many comments here that conflate them, but since I haven't seen a more complete document than this one explaining Filecoin (besides the whitepaper and some online talks) I couldn't help reading it both ways.

I should note my particular background which will definitely define the subjectivity of this document: I'm someone very familiar with IPFS, and somewhat familiar with the Filecoin through its Go implementation (which I'll try not to think about while writing this since the spec should be designed independently of any particular implementation) and I am very familiar with the Bitcoin cryptocurrency but not with Ethereum (which seems to be more relevant in the Filecoin context).

I've done most of the reading a couple of weeks ago so some things might have changed during that time (since this document is evolving rapidly) but my reference is roughly version b4bd3c6.

Intro

There's no high level architecture of Filecoin so my best bet is to go through the items in the order specified in the summary, e.g., I'm starting with the node operation, is that the first component I should be looking at? Is this order reflecting some architecture of Filecoin?

Some references to previous works, influences or recommended reading material could be very useful here, e.g., is the Ethereum spec useful to understand the motivations/design behind this one?

Filecoin Node Operation

Feels like "node", or "full node", are terms that should be ticked (same for "messages" and some other terms since this is the first section of the spec the user will be reading without any previous knowledge of the system). pubsub channel: maybe we could have link to the libp2p spec explaining how the lower layers communicate.

Coming from ÌPFS there are some terms that could be confusing like node or block, but this may just be a condition from my particular background.

Mining

Since it's below the node operation I'm assuming that a miner is also a node, does that make sense? CommitSector is ticked but is not in the definitions, is commitment (a.k.a. Filecoin Proofs) what it should be pointing at?

Why is the storage miner in charge of the blockchain? Are there different types of miners or is "storage miner" just a synonym to a generic "miner" ?. (I guess the answer is no since there is also a "retrieval miner".) What's a leader? There's no definition of it, where could we point to to give more information about it?

Is the proof added to the blockchain? Yes. (this may be too much of a naive questions but what are we storing in the blockchain besides the transactions is yet not very clear to me, or is the proof considered to be part of the transaction?)

Chain Validation

Similar to the Filecoin Node Operation seems to make sense but there are many details that I don't know about to actually check the reasoning behind each step.

The Filecoin Storage Market

I was at first unclear if the storage market is an actor/entity itself or the system/group of client-miners, later I read that

The storage market actor is a built-in network actor.

but the first time I read this, depending on which part of the doc I was reading, I was thinking of one or the other.

I really think the motivation is important here (but maybe not from the spec perspective), as a new user to Filecoin myself I feel that the idea of a market resonates strongly (in my head) against the idea that we are decentralizing the network, and this entity gives me the impression like it's pushing back on that.

What's a client? Is that also a node? What's the common denominator for the entities at play here? I feel like that's an actor (but only from what I've read in the go-filecoin implementation but not from the spec).

Are there more than one storage market? If so how do they interact?

It is assumed that the client, or some delegate of the client, remains online to monitor the storage miner and slash it in the case that it fails to submit proofs for the data.

Is the market the "delegate" of the client? Is each client responsible to check its own stored data? Where does the responsibility of the market end and the client begin? How much can the client trust a market? (these questions mainly follow the assumption that the market is actually an entity/actor.)

Deals among clients, and storage miners are done through a payment channel but settled on the Filecoin blockchain

"Done" vs "Settled"?

At this point I'm reading for the first time references to the actor term but the definition of actor seems very confusing to me since it circularly refers to addresses:

Actor: An address is an identifier that refers to an actor in some Filecoin state. All actors [...] have an address.

CreateStorageMiner: what does it mean to create a miner? (From go-filecoin it seemed that the miner should be registered in the chain but I'm not getting the same sense here). Later I found more information in Becoming a miner: Registration which clarifies some more on this, maybe we could link to that (or maybe not if it's too much information at this point).

SlashConsensusFault: Is there a single market whose authority controls the rest of the actors? What's the consequence of slashing someone? Who (if any) has the power to discipline/punish someone else? (the slash seems to be in the blockchain itself, how is it validated?)

Power? Is it storage capacity?

The Filecoin storage market is the underlying system used to discover, negotiate and form storage contracts between clients and storage providers called storage miners.

Clients negotiate directly with the storage miner that owns that ask, off-chain.

Is the market actually the intermediary here? In which capacity?

The storage market contains the following data:

Where? Is the blockchain the single source of truth or the client has to check with other actors like the market for certain information? (related to off-chain vs on-chain, which might need more clarification, what information should be found there?)

Is there any criteria for selecting a storage miner besides its asking price? (maybe power?) Is there some kind of reputation system? (I would expect not, but maybe this should be explicitly stated somewhere.)

Note: In order to provide the piece confirmation, the miner needs to fill the sector. This may take some time. So there is a wait between the time the data is transferred to the miner, and when the piece confirmation becomes available.

So how much (approximately) does the client need to be online for? Can it go offline and return later? Is there some waiting period on the side of the miner for this scenario?

Storage Deal Abort:

If a client attempts to abort a deal that they have actually made with a miner, the miner can submit a payment channel update to force the channel to stay open for the length of the agreement.

How is this case resolved exactly? (I may need to reread the payment section to better understand that).

The power table is exported by the storage market for use by consensus.

Exported? Is this related to the blockchain? Where is power used exactly?

//                            It creates 'count' vouchers, each of which is
// redeemable only after an certain block height, evenly spaced out between
// start and end.

Is that how we measure time? (update: an epoch is a block?)

Payments

This doc didn't clear many of my doubts, I should reread it more carefully, I still have no idea how a payment channel works, I get the motivation though.

The Voucher definition didn't help much:

Held by an actor as part of a payment channel to complete settlement when the counterparty defaults.

The Filecoin Mining Process and Expected Consensus

The explanations on these document seem to be very coupled to the point that I had to read them side by side jumping from one to the other to make some mental model of the chain (e.g., while reading the mining process I needed to look in the expected consensus document to find a clear definition of ticket, which seems a fundamental concept to understand the first document).

I still have many doubts so what follows is just a rough depiction of what was my (disorganized) mental process to understand those two sections:

The first important clue is the definition of leader:

who gets to create the next block

So anyone can't just submit a valid block to the network like Bitcoin for example (which seemed like an important distinction to me).

Ticket determines the leader.

In English, that is the signature (using the miner's keypair) over the hash of the smallest ticket in the parent set.

Smallest? It seemed like the ticket had a magnitude but later realized that

Tickets are just "random" data.

But then it gets compared with miner power, so I'm not sure if we elect leaders randomly or based on how much power (storage?) do they have? or (most probably) a mix of both?

(going back to "Ticket Generation"): smallestTicket:

So "small" is just an arbitrary measure to select one ticket

("small" may be a confusing term here.)

proof := post.Prove(storage, challenge, postCount) and ticket := minerPrivKey.Sign(sha256.Sum(proof.Bytes())), still not getting the relationship between the ticket and the storage proof (maybe I need to read the proof document to understand this part).

Can we have multiple parents for the same block? How?

Note(why): This slashing is insufficient to protect against most attacks. Miners can cheaply break up their power into multiple un-linkable miner actors that will be able to mine on multiple chains without being caught mining at the same height at the same time.

Agreed.

Filecoin Proofs

As mentioned in the document itself this is still in transition so it's not an exhaustive description of how proofs work but it is a nice high level introduction that helped to read the Rust proofs (especially since it has a good number of term definitions used in the code).

Review existing specs

Before we start to write the spec we want to have a better understanding of what format and style we are aiming for. To do that lets collect various specs in this issue and collect thoughts on

What do we like about the spec?
What do we not like about the spec?
Other thoughts

Relationship between actors and addresses is unclear | definitions

The definition here is still a bit fuzzy to me:

Address
An address refers to an actor in some Filecoin state.

How can address own another address? For example, when I create a miner I specify a from address. Once the miner is created it too has an address. Given a miner address, I can find the address used to create it by running the command go-filecoin miner owner <miner_address>.

Storage rent thoughts

Storage Expiry

Everything on chain takes up space that must be stored by all full nodes indefinitely unless explicitly cleared out. As the chain ages and grows over time, this becomes fairly significant. Dead accounts and actors may at some point end up taking up most of the storage space of the chain. The notes here start from a conversation between @Stebalien and I while walking to a coffee shop summer 2017.

There are two main approaches here for incentivizing chain cleanup. The first is the simplest to describe.

Actor Deposits

The idea here is simply to charge a larger fee for creating a new actor that gets returned when the actor is destroyed. This provides a simple incentive for users to clear out their own unused actors. To further improve the incentives, we could charge no transaction fees for these 'destroy' messages, and give the miner a percentage of the deposit for including the message. The number of these messages in each block would be limited to prevent DoS attacks, but simply unlinking a contract should be a reliably cheap enough operation that that limit can be high.

Rent

The second, more complicated (but complimentary to the first) idea is to charge actors rent. Actors have a balance that is deducted from (periodically...? could be expensive, can be done lazily). When that balance hits zero, the actor gets frozen. The actor is marked as frozen, all of its associated its storage is deleted from the chain, but a reference to its merkleroot is kept. Anyone may unfreeze this actor by including all the state referenced by the frozen storage hash in a message and submitting enough funds to get it going again (possible minimum duration?).

This is pretty nice, except that it expects anyone who might want to unfreeze a contract to always be watching the chain. It is likely that archive nodes somewhere will keep this data, so any node that was offline during the freeze is likely to be able to find it, but I think thats not a great assumption to make. Also note that given current blockchain constructions, the state may be regenerated by doing a playback of historical transactions. This assumption is a bad one as we will want to do checkpointing and heavy pruning of historical blockchain data, and those transactions will disappear. One approach could be to dogfood filecoin here and either rely on retrieval miners to think that expired actor data has a high enough potential profit, or use the remaining rent funds when they run low to pay for the actors state to be stored in the storage market itself for some length of time.

This approach still has the issue that storage required for tracking dead actor storage hashes will take up a growing amount of space on-chain. This could be remedied by freezing actors before their funds run out entirely, using the remaining funds to pay for just the actor and its storage root hash for a long period of time until it is finally fully dies.

Miner Profiles

Big Filecoin miners will be fairly serious business ventures. As such, they will likely want a way to advertise information about themselves, such as a website link, gpg key, name, country, or other information about their services.

I think that it makes a lot of sense for this information to be stored directly in the miner actor (or, a cid of the information could be stored in the miner actor). This would allow different filecoin UIs to display some more interesting information about each miner.

We should figure out a base format for this information and suggest miners fill out pieces they care about. Miners can probably add whatever fields they want to, but setting the conventions early will make things easier.

Define Storage Tries

write a designdoc giving the high level picture

Once it's done, we should reflect anything that we've resolved into the spec and consider checking the doc into the repo.

WIP: https://docs.google.com/document/d/1TdPbCKW4qi217Z6cP5901z9LLvW7ofD5drLE19gDx0A/edit#

Spec for Filecoin<-->IPFS Interop

A product ask that came up during the hackathon yesterday was how to enable IPFS-based dapps to seamlessly use Filecoin to incentivize distributed hosting of their existing IPFS projects. Many of our current community members are builders of dapps (ex OpenBazar), but currently host their own centralized pinning nodes to make their content accessible. When Filecoin launches, many of these partners will likely try to use Filecoin as a replacement/additional hosting mechanism for their existing IPFS dapp. This process should be seamless and not require reconfiguring/rebuilding their application, downloading all their content locally, or change any end-user-visible routing/links. We should figure out what the MVP for initial launch should be and make sure iterations are on our roadmap as the network gets more performant for dapp hosting needs.

Orders, Asks, Bids and Matching

There were multiple discussions around this in person as well as in slack. @nicola @whyrusleeping please fill in your current thoughts and proposals in here.

Conflicting and/or duplicated information in specs: mining vs. expected-consensus

Description

There are some duplicated and seemingly conflicting passages between mining and expected-consensus documentation.

In particular there isn't full agreement between the two on the structure of types.Block and also with the source types/block.go. E.g. there's no NullBlocks count in the source.

Acceptance criteria

Documentation should be internally consistent and also reflect reality

Comment from @whyrusleeping copied to here since this is the appropriate repo (sorry for the n00b mistake):

As it says in the expected consensus document, what is described there is how to generally implement expected consensus. Expected consensus is an algorithm, much like 'merge sort'. Merge sort doesnt specify exactly what sort of things you will be sorting, or even how to compare elements in the list.

The mining document describes the filecoin implementation of expected consensus. This is what we should be aiming to implement.

I would love to make this more clear to the reader, could you suggest some changes in wording to help get that point across better?

Suggestion: make it much easier to get fast feedback

I have a suggestion that will make us get a ton more feedback, faster (from people like me).

Set up a program to build the gitbook into a PDF and send it to a set of subscribers, at a periodic cadence (once a week, twice a week, nightly). This could be as easy as mailing it to a mailing list people can subscribe to ([email protected]), or posting it to an issue on github, or even to slack. Subscribers are asked to read through the spec and give feedback in some form that is easy to do, and works for you (annotations on a PDF, a list of bullet points, etc). (the diff between revisions should be attached to the pdf too, so people know what changed).

If this sounds like a good idea, thumbs it up and i can take care of it.

Effort: Figure out smallest unit of storage to account for

Several function calls take Size as an input, where Size is the storage size:

What is the minimum Size we want to account for?
Should I be able to store 1GB and 1byte or would that round up to 1GB and 1kb or to 1GB and 1MB?
Would this choice compromise the size that we will end up writing on chain (and if we choose bytes), then we end up having incredibly huge numbers?

I don't think I can find an answer myself, need more tech/optimiziation people like @whyrusleeping or @Stebalien to tap on this!

filecoin-project / specs Goto Github PK

specs's Introduction

Filecoin Specification

Table of Contents

Install

Writing the spec

Check your markdown

Page Template

Code

Images

Links

Shortcodes

embed

listing

mermaid

hint

katex

Math mode

Add math-mode prop to the Frontmatter

Wrap def, gdef, etc.

Wrap inline math text with code blocks

Wrap math blocks with code fences

Front-matter

References

specs's People

Contributors

Stargazers

Watchers

Forkers

specs's Issues

Super Specific Asks

Pricing Function stored on-chain

Price discovery entirely via query

This is the master plan for getting Specs into version 1

Refine & Review Specs

New Specs

Documentation

Background

Standard approach

Optimized orderbook

Pending questions:

Current idea

Repair Protocol

Setup for repair

Proving the pieces are stored

Storage Failure Detection

Conditions of a failed piece

Candidate practical SFDA: LRP Piece Cache

Questions

Notes

Up Front

After Data Transfer

After Sealing

Up Front Take 2

Spec Update Process

Intent

Requirements

Proposed Process

Proposal

Draft

Spec

Catching Up

Ergonomics

Intro

Mining

The Filecoin Mining Process and Expected Consensus

Filecoin Proofs

Storage Expiry

Actor Deposits

Rent

Description

Acceptance criteria

Recommend Projects

Recommend Topics

Recommend Org

`embed`

`listing`

`mermaid`

`hint`

`katex`

Add `math-mode` prop to the Frontmatter

Wrap `def`, `gdef`, etc.

This is the master plan for getting `Specs` into version 1