Giter Club home page Giter Club logo

noms's Introduction

Warning - This project is not active

Noms is not being maintained. You shouldn't use it, except maybe for fun or research.

If you are interested in something like Noms, you probably want Dolt (https://github.com/dolthub/dolt) which is a fork of this project and actively maintained.

Send me (aaron at aaronboodman.com) a message if you have questions.


Use Cases  |  Setup  |  Status  |  Documentation  |  Contact

Build Status Docker Build Status GoDoc

Welcome

Noms is a decentralized database philosophically descendant from the Git version control system.

Like Git, Noms is:

  • Versioned: By default, all previous versions of the database are retained. You can trivially track how the database evolved to its current state, easily and efficiently compare any two versions, or even rewind and branch from any previous version.
  • Synchronizable: Instances of a single Noms database can be disconnected from each other for any amount of time, then later reconcile their changes efficiently and correctly.

Unlike Git, Noms is a database, so it also:

  • Primarily stores structured data, not files and directories (see: the Noms type system)
  • Scales well to large amounts of data and concurrent clients
  • Supports atomic transactions (a single instance of Noms is CP, but Noms is typically run in production backed by S3, in which case it is "effectively CA")
  • Supports efficient indexes (see: Noms prolly-trees)
  • Features a flexible query model (see: GraphQL)

A Noms database can reside within a file system or in the cloud:

  • The (built-in) NBS ChunkStore implementation provides two back-ends which provide persistence for Noms databases: one for storage in a file system and one for storage in an S3 bucket.

Finally, because Noms is content-addressed, it yields a very pleasant programming model.

Working with Noms is declarative. You don't INSERT new data, UPDATE existing data, or DELETE old data. You simply declare what the data ought to be right now. If you commit the same data twice, it will be deduplicated because of content-addressing. If you commit almost the same data, only the part that is different will be written.


Use Cases

Because Noms is very good at sync, it makes a decent basis for rich, collaborative, fully-decentralized applications.

Mobile Offline-First Database

Embed Noms into mobile applications, making it easier to build offline-first, fully synchronizing mobile applications.


Install

  1. Download the latest release:
  1. Unzip the directory somewhere and add it to your $PATH
  2. Verify Noms is installed correctly:
$ noms version
format version: 7.18
built from <developer build>

Run

Import some data:

go install github.com/attic-labs/noms/samples/go/csv/csv-import
curl 'https://data.cityofnewyork.us/api/views/kku6-nxdu/rows.csv?accessType=DOWNLOAD' > /tmp/data.csv
csv-import /tmp/data.csv /tmp/noms::nycdemo

Explore:

noms show /tmp/noms::nycdemo

Should show:

struct Commit {
  meta: struct Meta {
    date: "2017-09-19T19:33:01Z",
    inputFile: "/tmp/data.csv",
  },
  parents: set {},
  value: [  // 236 items
    struct Row {
      countAmericanIndian: "0",
      countAsianNonHispanic: "3",
      countBlackNonHispanic: "21",
      countCitizenStatusTotal: "44",
      countCitizenStatusUnknown: "0",
      countEthnicityTotal: "44",
...

Status

Nobody is working on this right now. You shouldn't rely on it unless you're willing to take over development yourself.

Major Open Issues

These are the major things you'd probably want to fix before relying on this for most systems.


Learn More About Noms

For the decentralized web: The Decentralized Database

Learn the basics: Technical Overview

Tour the CLI: Command-Line Interface Tour

Tour the Go API: Go SDK Tour


Contact Us

Interested in using Noms? Awesome! We would be happy to work with you to help understand whether Noms is a fit for your problem. Reach out at:

Licensing

Noms is open source software, licensed by Attic Labs, Inc. under the Apache License, Version 2.0.

noms's People

Contributors

aboodman avatar ahl avatar arv avatar cmasone-attic avatar coriolinus avatar ehalpern avatar iand avatar jangie avatar jesseditson avatar jhuangtw avatar kalman avatar kav avatar kyleder avatar mgedigian avatar mikegray avatar mnm678 avatar moxiegirl avatar mzats avatar nustiueudinastea avatar ptarjan avatar rafaelweinstein avatar sairoutine avatar samherrmann avatar sunglim avatar tegansnyder avatar vinibaggio avatar wardn avatar willglynn avatar willhite avatar zcstarr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

noms's Issues

Make assert.Equals() call Value::Equals()

It's easy to accidentally use assert.Equals() on two values. This will sometimes work, but it depends on whether the internal caches inside the two values are populated or not. Instead, we should be calling Value::Equals().

It should be possible to use some Go interface trickery to make assert.Equals() automatically delegate to Value::Equals() but it is complicated by the fact that the dbg package would then depend on types, which already depends on dbg.

Using types.* is very verbose and frustrating

It should feel very declarative and direct to write to noms, but instead it feels super frustrating.

We need write a codegen system that generates immutable Go structs and collections from definitions.

A simpler short-term solution might be to write reflection code that can decode a Go struct from a types.Value. But this has the disadvantage that we won't have strongly-typed collections or lazy collections.

Setup benchmarking system

Once we have various aspects of the bits tuned the way we like (see #52), we should add some kind of continuous benchmarking so that we know when we regress it. There is a 'benchmark' thing built into go testing, but glancing at it, it's not clear how you're supposed to integrate it into your workflow.

Remove xml_importer

We don't really need it right? xml2noms can do same thing when combined with wget, but xml2noms does more!

MLB demo

Right now we are testing on just 1 game worth of data. I think we should be able to comfortable suck in all the data from all of history on one macbook and display it in the UI. Let's step up through increasingly large chunks (day, month, year, forever) and use that to drive improvements throughout the stack.

This bug will be complete when:

  • xml2noms on the entire directory of mlb data takes "some reasonable amount of time". I don't know how long this should be, but my spidey sense says "< ~2x the time it takes to cp -R the entire directory?". @cmasone-attic, @arv, @rafael-atticlabs - please override me here if you have a better idea how long is "reasonable" for this to take.
  • running heatmap should take < 25% the time it takes to cp -R the entire xml2noms dir (because we're only reading a tiny bit of it?)
  • loading the initial ui and switching pitchers should be "fast" for normal definitions of ui speediness.

Re-introduce the notion of per-application roots

Current thinking is that, for the MLB pitcher heatmap demo, we want to use a generic json importer to suck in the data and then have a "distiller" app the post-processes the data into some kind of more structure application-specific form. It seems like the most logical way to build something like this is to enable each app to have its own view of the data, and the simplest way to do THAT is to allow each app to have its own root set.

datasets should be a map

Currently the datasets is a Set and we iterate over the set to find the dataset with the correct ID. We should just use a Map and use the ID as the key.

@cmasone-attic

Generic xml import

Lots of data out there is XML. You should be able to dump it into noms and work with it.

Seems like we'd have to come up with some kind of type family that is essentially XML DOM. See also issue #19 and issue #20.

Marking this milestone:later since it's not clear what the requirements are exactly.

ChunkStore::Writer::Close() should also commit the write

Currently, only ::Ref() commits the write. I believe this is different than other io.Writer impls in the Go stdlib. For example File will flush the write on Close (I think).

If this indeed a guarantee in other io.Writer impls, then it should be for us too. And then I guess we'd want an Abandon() method to actually abandon halfway through a write.

Need design sketch: GC

We should have some. I wonder if the fact that we are actually a DAG with a single root helps us at all.

We probably want something simple for v0 that runs periodically on commit, but it's a lowish priority.

Generic xml importer

See also clients/json_importer and #29.

XML is interesting because it is so rich compared to json and csv (attributes, namespaces, entities, etc). We probably want to capture all of that into some noms equivalent of the xml dom.

This seems like a lower priority than json and csv to me, but we can increase the priority if we run into some important app that needs it.

Reduce boilerplate in uses of nomgen.go

Each user of nomgen (e.g., dataset/gen/gen.go) is sepcifying an -o flag that is getting passed in from the rungen.go file. I think it would be simpler to just have dataset/gen/gen.go hardcode that path instead of having a flag. We don't need two layers of indirection.

xml2noms usage unclear

$ ./xml2noms 
Usage of ./xml2noms:
  -aws-store-auth-from-env=false: creates the aws-based chunkstore from authorization found in the environment. This is typically used in production to get keys from IAM profile. If not specified, then -aws-store-key and aws-store-secret must be specified instead
  -aws-store-bucket="": s3 bucket to create an aws-based chunkstore in
  -aws-store-dynamo-table="noms-root": dynamodb table to store the root of the aws-based chunkstore in
  -aws-store-key="": aws key to use to create the aws-based chunkstore
  -aws-store-region="us-west-2": aws region to put the aws-based chunkstore in
  -aws-store-secret="": aws secret to use to create the aws-based chunkstore
  -dataset-id="": dataset id to store data for
  -file-store="": directory to use for a file-based chunkstore
  -file-store-root="root": filename which holds the root ref in the filestore
  -httptest.serve="": if non-empty, httptest.NewServer serves on this address and blocks
  -memory-store=false: use a memory-based (ephemeral, and private to this application) chunkstore

It should tell you that it crawls a dir and how to specify that dir.

Add a read through/caching store

The idea is to add a new chunk store that takes two chunks stores, one acting as "caching" which is first consulted in the case of reads. For puts we always put to the "backing" store.

Get rid of the String/flatString, List/flatList, etc... abstraction

So like:

  • Remove string.go
  • Rename flat_string->string.go
  • Repeat for the others

I was originally thinking that the interfaces isolated the guts of the impl, and that there might be different implementations of each interface (e.g., a chunked implementation).

But there's no need to maintain multiple implementations at the same time, and isolating the guts can be done just with normal private fields.

This will simplify the types package somewhat, which is getting a bit hairy.

Implement chunking for blobs, at least

We want to split large pieces of data up into chunks. Initially, we'll do this for blobs, but probably want to expand beyond that.

We'll use a rolling hash function to figure out where to chunk things, and will need to deal with chunks that are still too big.

A blob value will really just contain some list of Refs to the chunks that make it up, and the Ref of a blob will be some kind of hash-of-hashes thing. This does introduce some complexity to the Reader() of a blob, since it now needs to find and decode all the chunks that make up the blob. Aaron proposes a "Resolver" interface that all composite types will have a pointer to, which looks like this:

type Resolver struct{}
func (r *Resolver) Resolve(r ref.Ref) Value

Lazy loading/decoding of data

We currently eagerly load all values recursively.

While this is sometimes what you want, it frequently isn't. For example right now, when someone asks DataStore for the current roots, we will read the entire database into memory.

It requires some thought as to where to put the gradual expansion.

Encoding needs to be aware of lazy-loading

Even after #11 is implemented, we still have to do more work in the encoding path, otherwise we end up re-expanding everything when trying to save. For example:

v, _ := ReadValue(<bigval>, cs)
v = v.Append(Int32(42))
// At this point,we've only read one ref

// This line ends up reading all the refs because it walks the tree in order to serialize :-/.
r, _ := WriteValue(v, cs)

G+ stream importer

Google has started reducing the visibility of this feature. Capturing it into noms could be a great thing for some people who are worried about losing that data.

Implement generic csv import

A lot of data in the world is csv or tsv. It would be good to have generic support for this.

An interesting question is how to represent csv in noms types. Assuming the column names are known, the first thing that comes to mind is a list of maps. But that would have a problem: it wouldn't enforce that all the maps are the same shape, so it's hard to do things like graphs.

What we really want is List<T> where T is a struct with fields corresponding to to the columns of the csv.

We can easily create a StructDef dynamically using the column names as the fields. But it is not easy with the current generated Go code to dynamically create instances of such a struct. Perhaps there should be.

Generic csv importer

See clients/json_importer as an example.

Since csv is a table, we should import it into List<T>. Not sure what best way to determine T is ... maybe look at some existing csv data, but ideas include:

  • If it's common for the first row to be the headers, have a -first-row-is-header flag on the importer that we use to get field names of T.
  • Take a flag that either references or defines a type to use as T.

Encode structs

Our "structs" are currently just helpers around loosely-typed maps. We need to actually encode structs in order to enable various other features -- such as commit-time validation of structs, search by struct type, etc.

I'm not sure exactly what form this should take. The first thing that springs to mind in the current JSON encoding is something like:

{"sha1-xyz": <current map encoding>}
{"List<sha1-xyz>": <current list encoding>}
{"Set<List<sha1-syz>>" ...

... but there is likely something better.

Implement generic json import

Lots of data out there is JSON. You should be able to dump it into noms and work with it.

Seems like we'd have to import it as just loosely typed lists and maps. Leaving this as milestone:later since it's not very clear what form this should take right now.

See also #19 and #21.

noms pull

We want two nodes to be able to work with the same datastore offline easily. Like git, the way that they catch each other up will be through push and pull operations.

noms pull http://noms.io/cmasone/mlb/xml would tell the node at noms.io that I'm currently at hash XYZ of the "mlb/xml" dataset in the "cmasone" datastore and I wish to update to head. noms.io would figure out what refs I'm missing and send them all down to me, including the new root.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.