Giter Club home page Giter Club logo

c4's Introduction

C4 ID - Universally Unique and Consistent Identification

Go Report Card Build Status GoDoc MIT License

import "github.com/Avalanche-io/c4"

This is a Go package that implements the C4 ID system SMPTE standard ST 2114:2017. C4 IDs are universally unique and consistent identifiers that standardize the derivation and formatting of data identification so that all users independently agree on the identification of any block or set of blocks of data.

C4 IDs are 90 character long strings suitable for use in filenames, URLs, database fields, or anywhere else that a string identifier might normally be used. In ram C4 IDs are represented in a 64 byte "digest" format.

Features

  • A single C4 id can represent multiple files.
  • C4 ids are unique, random, and unforgeable.
  • C4 ids are identical for the same file in different locations or points in time.
  • A network connection is not required to generate C4 ids.
  • A C4 id can be used in filenames, URLs, json and xml.
  • C4 ids can be selected easily with double click (a problem for many unique identifiers).
  • Easily discover C4 ids in arbitrary text with a simple regex c4[1-9A-HJ-NP-Za-km-z]{88}
  • Naming files by their C4 id automatically deduplicates them.

Comparison of Encodings

C4 is the shortest self identifying SHA-512 encoding and is the only standardized encoding. To illustrate, the following is the SHA-512 of "foo" in hex, base64 and c4 encodings:

# encoding     length   id
  hex          135:     sha512-f7fbba6e0636f890e56fbbf3283e524c6fa3204ae298382d624741d0dc6638326e282c41be5e4254d8820772c5518a2c5a8c0c7f7eda19594a7eb539453e1ed7
  base64        95:     sha512-9/u6bgY2+JDlb7vzKD5STG+jIErimDgtYkdB0NxmODJuKCxBvl5CVNiCB3LFUYosWowMf37aGVlKfrU5RT4e1w==
  c4            90:     c43inc2qGhSWQUMRvDMW6GAjJnRFY5sxq399wcUcWLTuPai84A2QWTfYu1gAW8f5FmZFGeYpLsSPyrSUh9Ao3J68Cc

Example Usage

package main

import (
  "fmt"
  "strings"

  "github.com/Avalanche-io/c4"
)

func main() {

  // Generate a C4 ID for any contiguous block of data...
  id := c4.Identify(strings.NewReader("alfa"))
  fmt.Println(id)
  // output: c43zYcLni5LF9rR4Lg4B8h3Jp8SBwjcnyyeh4bc6gTPHndKuKdjUWx1kJPYhZxYt3zV6tQXpDs2shPsPYjgG81wZM1

  // Generate a C4 ID for any number of non-contiguous blocks...
  var ids c4.IDs
  var inputs = []string{"alfa", "bravo", "charlie", "delta", "echo", "foxtrot", "golf", "hotel", "india"}
  for _, input := range inputs {
    ids = append(ids, c4.Identify(strings.NewReader(input)))
  }
  fmt.Println(ids.ID())
  // output: c435RzTWWsjWD1Fi7dxS3idJ7vFgPVR96oE95RfDDT5ue7hRSPENePDjPDJdnV46g7emDzWK8LzJUjGESMG5qzuXqq
}

Releases

Current release: v0.8.0

Links

Videos:

C4 ID Whitepaper

Contributing

Contributions are welcome. The following are some general guidelines for project organization. If you have questions please open an issue.

The master branch holds the current release, and older releases can be found by their version number. The dev branch represents the development branch from which bug and feature branches should be taken. Pull requests that are accepted will be merged against the dev branch and then pushed to versioned releases as appropriate.

Feature and bug branches should follow the github integrated naming convention. Features should be given the new tag, and bugs the bug tag. Here is an example of checking out a feature branch:

> git checkout dev
Switched to branch 'dev'
Your branch is up-to-date with 'origin/dev'.
> git checkout -b new/#99_some_github_issue
...

If a branch for an issue is already listed in this repository, then check it out and work from it.

License

This software is released under the MIT license. See LICENSE for more information.

c4's People

Contributors

alrs avatar buckleyc avatar matryer avatar mrjoshuak avatar waffle-iron avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

c4's Issues

c4 id

Add command group to the c4 cli for identification related commands.

Duplication report

Given some new flag, provide a (de)duplication report that shows totals for amount of data that is duplicated.

c4 command db integration

c4 filesystem scans get recorded to the local K/V database directly if c4d is not configured or not reachable.

Cli JSON output

The C4 spec requires JSON for services, so we should default when not matching output for cp compatibility etc.

Test helpers

Support functions to simplify testing.

Initially configuration mocking, but later we will likely need mocks for remote remote connections and the like.

Dockerfile

Setup docker container for base c4 environment.

Local configuration

In prep for the upcoming persistent data needs, (env, credentials, db storage, etc), we need a way to set configuration.

New Character Set [BREAKING CHANGE!!]

Soon we will be making a change to the base58 character set used represent C4 IDs.

Old character set:

"123456789abcdefghijkmnopqrstuvwxyzABCDEFGHJKLMNPQRSTUVWXYZ"

New character set:

"123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz"

This must be done to insure that sorting of C4 IDs produces the same order as sorting the raw bytes.

Note that this is the same as the bitcoin base58 character set.

Watcher

Watch local filesystem path for updates.

K/V database

A simple database that maps keys to IDs, and IDs to IDs.

ID while copying `cp` replacement

First a cp clone that IDs the copied file on the fly, then add multiple destinations.

Ideally this will work just as fast as regular cp in the single target case, and faster then cp in the muilti-target case if the targets are on different IO busses / devices.

c4 cp should act as a drop in replacement for standard cp, so it must match feature for feature except for id reporting and a new flag to support multiple targets.

It would be nice to have comparative performance tests as part of this as well.

JSON Filesystem description

It takes 2 parts to package / represent a file system tree.

  1. A Merkle Tree that describes all files.
  2. A description file dispassions and attributes.

c4 directory diff

A mechanism to find and represent the differences between two directory trees in a way that can be followed as instructions to conform one into the other.

I got into the weeds on advanced algorithms so I'm walking that back a bit to simply get the basic features out (fun stuff though). Then we'll come back and improve the performance with better computer science.

Reduce external dependancies.

There are package dependancies such as "github.com/cheekybits/is", which while being very handy add unnecessary complexity. So we should avoid them when possible. In particular when the external package is an Avalanche-io package, which makes the separation a little silly. I'm still learning the best ways to work with go projects.

It does make sense not to keep the C4 interface clean and minimal, so if we integrate packages into C4 directly we should be careful to avoid exporting those APIs.

utilities

A category of functionality that is handy but inappropriate for the standard API.

For the command line tool c4 this means adding a new mode util.

Utilities should be things like:

  • Alternate representations of c4 ids.
  • Convert from old character set ids.

List other utilities here as they come up.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.