Giter Club home page Giter Club logo

dweb-transports's Introduction

DwebTransports

General transport library for Decentralized Web handles multiple underlying transports.

Background

This library is part of a general project at the Internet Archive (archive.org) to support the decentralized web.

Goals

  • to allow a single API that can be used for most basic interactions with decentralized transports.
  • to support multiple URLs, on different transports, so that the (current) underlying unreliability is hidden.
  • to allow Internet Archive content to be made available decentralized.

Installation for developers in node / yarn

In your app's package.json file, add

"@internetarchive/dweb-transports": "latest",

then yarn install

Installation for developers on browsers.

  • Install node and npm or yarn
  • Clone this repo and cd to it.
  • webpack --mode producton will create dist/dweb_transports_bundle.js
  • Add <SCRIPT type="text/javascript" src="dweb_transports_bundle.js"></SCRIPT> to your <HEAD>

Then code like this should work.

async function main(url) {
    try {
        // and if not found will use the defaulttransports specified here.
        await DwebTransports.p_connect({
            defaulttransports: ["HTTP","IPFS"],                         // Default transports if not specified
            transports: searchparams.getAll("transport")    // Allow override default from URL parameters
        });
        // Any code you want to run after connected to transports goes here.
    } catch(err) {
        console.log("App Error:", err);
        alert(err.message);
    }
}
var searchparams = new URL(window.location.href).searchParams;

To develop on dweb-transports

Clone this repo from github. yarn install Should pick up the dependencies.

Notes on implemntation

Implementation on HTTP (TransportHTTP.js)

The HTTP interface is pretty simple.

fetch or streams are straightforward HTTP GETs to a URL

Lists and Tables are implemented via append-only files on the HTTP server using URLs that contain hashes. “Add” appends to this file, “list” retrieves the file.

Listmonitor isn’t supported, and can’t really be as there is no open channel to the client.

Implementation on IPFS (TransportIPFS.js)

This section will only make sense if you understand something about IPFS.

See TransportIPFS.js in the repository for details for code.

IPFS has two Javascript versions, both of which currently implement only a subset of IPFS (JS-IPFS is missing IPNS, and JS-IPFS-API is missing pubsub). We are mostly using JS-IPFS because JS-IPFS-API creates a centralisation point of failure at a known HTTP host, and because JS-IPFS-API has issues connecting to a local IPFS peer because of some odd security choices by IPFS.

IPFS is initialized via creating a IPFS object with a configuration. We use the Websockets connctor since the alternative, but its got a single point of failure. WebRTC is an alternative but is seriously broken (crashes both chrome and firefox)

Blocks are stored and retrieved via ipfs.files.get and ipfs.files.add.

For lists and tables - see YJS which uses IPFS.

Issues with IPFS

Error feedback is a little fuzzy, generally you'll get a success code and then no data. So fallback to http fails.

There are issues with IPFS swarms that we haven’t been able to figure out about how to ensure that “put”ting to IPFS creates an object that can be read at all other browsers, and persists. See DT issue#2

Naming hasn’t been implemented in IPFS yet, partly because IPNS is not available in the JS-IPFS, and partly because IPNS has serious problems: (requirement to rebroadcast every 24 hours so not persistent; merkle tree so change at leaf changes top level; doesnt work in JS-IPFS;) We implemented naming outside of IPFS (in dweb-archivecontoller.Routing.js) to get it to work.

To install IPFS for Node (and this needs testing)

yarn add ipfs ipfs-http-client

This will get overridden by an update of dweb-mirror, so its probably you will want this as a dependency of whatever is using dweb-transports instead.

To use IPFS pass "IPFS" in during the "connect" step

Implementation on WebTorrent

WebTorrent implements the BitTorrent protocol in the browser. It will work for retrieval of objects and currently has the fastest/most-reliable stream interface.

We also have a modified Seeder/Tracker which are currently (Sept2018) in testing on our gateway.

To install WebTorrent for Node (and this needs testing)

yarn add webtorrent

This will get overridden by an update of dweb-mirror, so its probably you will want this as a dependency of whatever is using dweb-transports instead.

To use WebTorrent pass "WEBTORRENT" in during the "connect" step

Implementation on YJS (TransportYJS.js)

YJS implements a decentralized database over a number of transports including IPFS. It supports several modes of which we only use “Arrays” to implement append-only logs and "Map" to implement key-value tables.

There is no authentication built into YJS but If using via the higher level CommonList (CL) object, the authentication isnt required since the CL will validate anything sent.

To install YJS for Node (and this needs testing)

yarn add yjs

This will get overridden by an update of dweb-mirror, so its probably you will want this as a dependency of whatever is using dweb-transports instead.

To use YJS pass "YJS" in during the "connect" step

Implementation on GUN

GUN implements a decentralized database and we have mostly migrated to it (from YJS) because there is some support and an active team.

Our tables and Lists are mapped as JSON objects inside GUN nodes due to some limitations in GUN's architecture for multi-level objects.

Still (as of Sept2018) working on Authentiction, and some reliability/bug issues.

To install GUN for Node (and this needs testing)

yarn add gun

This will get overridden by an update of dweb-mirror, so its probably you will want this as a dependency of whatever is using dweb-transports instead.

To use GUN pass "GUN" in during the "connect" step

Implementation on WOLK

WOLK has implemented and maintain there own shim which is part of dweb-transports

To install WOLK for Node (and this needs testing)

yarn add "git://github.com/wolkdb/wolkjs.git#master"

This will get overridden by an update of dweb-mirror, so its probably you will want this as a dependency of whatever is using dweb-transports instead.

To use WOLK pass "WOLK" in during the "connect" step

Implementation on FLUENCE

FLUENCE has implemented and maintain there own shim which is part of dweb-transports

To install FLUENCE for Node (and this needs testing)

yarn add fluence

This will get overridden by an update of dweb-mirror, so its probably you will want this as a dependency of whatever is using dweb-transports instead.

To use FLUENCE pass "FLUENCE" in during the "connect" step

Implementation of ContentHash

We have a simple Contenthash store/fetch that supports lists and key-value databases, and knows about retrieving content by sha1 hash from the Archive

No installation is required - it builds on the HTTP transport

To use, pass "HASH" in during the "connect" step

See also

See example_block.html for an example of connecting, storing and retrieving.

See API.md for the detailed API.

See Dweb document index for a list of the repos that make up the Internet Archive's Dweb project, and an index of other documents.

dweb-transports's People

Contributors

amark avatar barslev avatar dependabot[bot] avatar jhiesey avatar kant avatar krivchun avatar machawk1 avatar mitra42 avatar rodneywitcher avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dweb-transports's Issues

Split: Naming - move into transports then refactor

GUN issue with node.once

Problem trying to use same code on Node as on Browser means we cant use GUN for metadata on the node based dweb-mirror.

Persistence of storage - IPFS

Moved from: internetarchive/dweb-transport#2
There are issues with persistence of the IPFS content stored. This is inherent to IPFS since there is no guarantee of persistence in IPFS and things are only stored by people who publish, pin, or for a period look at them.

Since the publisher is a browser, and is probably offline at this point, and noone may have looked at content, we need a way to be able to store. Its unclear if this should be via Pin-ning, or if we have to go outside of IPFS to do so.

For now - given the challenge of pinning on a browser, this is solved with https://github.com/internetarchive/dweb-transport/issues/13which stores both on our http servers and in IPFS.
Note, I'm leaving this open in the hope that an IPFS specific solution can be found.
2018-01-23: Confirmed this is not possible directly in IPFS currently. Solution would be building a pinning service e.g. hit by HTTP from client, and then pins it. This would introduce another single point of failure (client access to http), so would really need to be using something like a IPFS pubsub channel that picks it up and passes back for pinning, which needs GoLang skills or maybe a separate node.js client at IA. For now will stick to HTTP for persistent storing from browsers.

Naming: cors and 403 errors on /services/img

Also … When I fetch images via services I’m seeing data dependent results which just look wrong …
https://archive.org/services/img/software etc and all of them work fine

HTTP/1.1 200 OK
Server: nginx/1.14.0 (Ubuntu)
Date: Sat, 23 Nov 2019 06:14:06 GMT
Content-Type: image/jpeg; charset=UTF-8
Content-Length: 3286
Connection: keep-alive
Cache-Control: max-age=21600
Expires: Sat, 23 Nov 2019 07:14:06 GMT
Last-Modified: Thu, 05 Jul 2018 02:34:06 GMT
ETag: "5b3d839e-cd6"
Expires: Sat, 23 Nov 2019 12:14:06 GMT
Access-Control-Allow-Origin: *
Accept-Ranges: bytes
Strict-Transport-Security: max-age=15724800
Accept-Ranges: bytes

But …some images are missing the cors headers e.g.

curl -o/dev/null -Lv “https://archive.org/services/img/DonkeyKong64_101p”
HTTP/1.1 200 OK
Server: nginx/1.14.0 (Ubuntu)
Date: Sat, 23 Nov 2019 06:13:38 GMT
Content-Type: image/jpeg; charset=UTF-8
Content-Length: 7181
Connection: keep-alive
Cache-Control: max-age=3600
Expires: Sat, 23 Nov 2019 07:13:18 GMT
Last-Modified: Sat, 23 Nov 2019 06:13:18 GMT
Strict-Transport-Security: max-age=3600
X-Fastcgi-Cache: HIT
Accept-Ranges: bytes

https://archive.org/services/img/opensource_movies fails with a 403 (Forbidden) . If I access it directly in the browser its fine. I can’t think what could be different about it ?

Webtorrent: Fork or monkeypatch to support http urls

Should fork or monkeypatch Webtorrent library so that if it sees a HTTP (or WS) URL for download or for the tracker and is running under https, and has no other usable URL that it will try the https or wss URL.

IPFS - Websockets - disconnected subnets & single point of failure

IPFS currently uses WebSocketStar (WSS), (since WebRTC crashes browsers on pretty much any decentralized platform, not just IPFS- see internetarchive/dweb-transport#1 )

There are several issues with WSS:
Most critical is that clients connecting via WSS can only retrieve ipfs CIDs that are known by the node they are connected to. This essentially means CIDs aren't universal, just known to the subset of connected peers. The "websocket-relay" project at Protocol is supposed to fix this.

MostUrgent: This could be made better (for the archive), especially in the short term by connecting directly to the IPFS instance at the archive since that node also knows all the IA files we've added, but so far none of the Protocol Lab people have been able to do this.

Most important long term: WSS's star gives a single point of failure, which means that IPFS using WSS is innappropriate for any anti-censorship applications. I think that the WSS-Relay could be used along with a changing list of places to connect to, ideally that would be built into IPFS, but in its absence someone is going to have to build a wrapper, that for example saves potential places to connect to between sessions, and feeds to the config during p_connect.

Feel free to pull these into separate issues if working on them ....

Most urgent is

Split transports up, make the bundle smaller include transports separately

As more transports get integrated into dweb-transports, and as the IA's UI team start looking at using dweb-archivecontroller which depends on dweb-transports, its become necessary to split up dweb-transports and make it lighter.

Solution will need ...

  • Work in nodejs
  • Work in browser via webpack
  • At some point work in browser via ES6 modules

Experimentation is in the 'split' branch, which may or may not always be working !

Steps might be ... (this section will be edited)

  • Pull out each transport to its own script (most are NOT ES6 Module compatable)

    • Ideally these will be the standard, maintained scripts that each transport supports
    • have TransportXxx.load() find the result of the <script> and wrap in its own API
    • figure out how to do this in nodejs,
      • maybe at the application level (where it chooses to pass e.g. "IPFS" to Transports.connect()
  • Split each transport, test in browser and node

    • IPFS - see below
    • gun - see below
    • http (splits out httptools or maybe include them always)
    • webtorrent - see below
    • wolk - see below
    • yjs
  • Move code from DA/archive.html and DM/internetarchive.js into DTS/Transports &/or shims

  • Cleanup - when this is done

  • figure out distribution mechanisms for

    • dweb-mirror shouldnt include IPFS/GUN etc by defaut, but needs way to add
    • dweb-archive bundled with DM, should not include node_modules or bundle with transports
    • dweb-archive on its own, needs to have all transports

split out createReadStream

createReadStream is a useful function in itself, for cases where not coming from an AV element.

  • Transports should have createReadStream looping through transports like other functions - see below
  • IPFS should implement createReadStream - see below
  • HTTP should implement createReadStream- see below
  • HASH gets it for free when HTTP implements it - see below
  • WEBTORRENT should implement createReadStream - see below
  • FLUENCE, GUN, WOLK, YJS dont implement p_f_createReadStream
  • Testing - webtorrent esp from client against archive
  • WRITINGSHIMS.md should reflect

Add DAT

should support DAT protocol in dweb-transports, this should be relatively straightforward.

Note, this won't (currently) work in the browser due to WebRTC issues, but should work in Node (e.g. in dweb-mirror).

See DAT meta: mitra42/dweb-universal#1

List deletion

Moved from: internetarchive/dweb-transport#7
Lists should support deletion, note that a deletion is just a flag of some sort (I think YJS supports it) so any retrieval should also have the option of eliminating deletions or retaining them.

Note there is already code that filters out duplicates, it probably belongs as a argument to that code to decide whether to eliminate Deletions (first - so deduplications get the not-deleted one).

Note - part of this is having some way to delete a list all the way back to empty.

Naming: localhost

Need a shim that will work with naming and intercept archive.org for localhost,
THEN can remove terniaries in DwebMirror
Part of #22 and #20

Adding "seed"

I'm working on adding "seed" as another supported function - the design thinking is in dweb-mirror#117 which is the first use case.

@rodneywitcher - particularly interested in how we might want this to work with Wolk as well. I think it would involve adding keys during config, but not sure what info needs passing during the request to seed a file or directory.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.