Giter Club home page Giter Club logo

contentcouch's Introduction

ContentCouch

ContentCouch is a Git-like system designed to store snapshots of files and directories for backup/synchronization/sharing purposes.

The model is based on RDF, and is thereby somewhat agnostic to specific data formats and URI schemes. This implementation uses RDF+XML for serializing objects and bitprint URIs for referencing blobs.

The URI schemes and serialization formats are common to several projects. See https://github.com/TOGoS/ContentCouchRepositoryFormat for a more formal specification of the on-disk repository format.

Data model

Like Git, the data model consists of a graph of blobs, trees, and commits, and objects are referenced by hash.

Unlike Git:

  • Blobs are stored internally as themselves, optionally by hardlinking
    • Good for storing large media files, since data can be deduplicated at the filesystem level
    • Hash of the file = the object ID
    • Contrast to Git, which adds a small object type + length header, messing up your hashes
  • Data structures are serialized as RDF+XML
    • Easy to inspect or author with a text editor
    • Can be extended with arbitrary metadata as desired
  • Objects are referenced by URI
    • So different URI schemes may be used

To differentiate between a blob and the serialization of a non-blob data structure (e.g. the serialization of a Directory object), URIs of abstract concepts are made by prepending “x-rdf-subject:” to the URI of the object’s serialized form.

e.g. the URI “x-rdf-subject:urn:bitprint:7ZI4YNKSIYB3OZGVLAUYBBQ4YQZDS5BK.L4LVKJV4XBF7MSXVP7FXLCFN6PXGYKMYERATYQA” refers to a directory containing two subdirectories, which is RDF+XML encoded as ”urn:bitprint:7ZI4YNKSIYB3OZGVLAUYBBQ4YQZDS5BK.L4LVKJV4XBF7MSXVP7FXLCFN6PXGYKMYERATYQA”.

RDF Schema Documentation

  • http://ns.nuke24.net/ContentCouch/ - the primary namespace
  • http://ns.nuke24.net/ContentCouch/Blob - a finite byte squence, i.e. the contents of a file
    • http://bitzi.com/xmlns/2002/01/bz-core#fileLength - the length of the blob, in bytes
  • http://ns.nuke24.net/ContentCouch/Commit - a commit object; can point to a target and any number of parent commits
    • http://ns.nuke24.net/ContentCouch/description - commit message
    • http://ns.nuke24.net/ContentCouch/parent - a parent commit, corresponding to a state of the thing being versioned from which this state was derived. More than one parent attribute may be present (unlike directory entries, the set of parents is not modeled as a collection; you just have multiple attribute values).
    • http://ns.nuke24.net/ContentCouch/target - the target of the commit, usually but not necessarily a directory; a snapshot of the ‘thing being versioned’. Usually provided as a reference.
    • http://ns.nuke24.net/ContentCouch/targetType - (deprecated) indicator of what is to be found at the target, “Blob” or “Directory”
    • http://purl.org/dc/terms/created - creation date of the commit (e.g. “2009-02-25 14:40:39”)
    • http://purl.org/dc/terms/creator - name of the author (e.g. “Fred Q”)
  • http://ns.nuke24.net/ContentCouch/Directory - a directory
    • http://ns.nuke24.net/ContentCouch/entries - the list of DirectoryEntry objects
  • http://ns.nuke24.net/ContentCouch/DirectoryEntry - a directory entry
    • http://ns.nuke24.net/ContentCouch/name - directory entry name, i.e. filename of contained file or subdirectory
    • http://ns.nuke24.net/ContentCouch/target - content of the directory entry; probably a Directory or Blob
    • http://ns.nuke24.net/ContentCouch/targetType - (deprecated) indicator of what is to be found at the target, “Blob” or “Directory”
    • http://purl.org/dc/terms/modified - modification time of the file; may be omitted

Examples

urn:bitprint:7ZI4YNKSIYB3OZGVLAUYBBQ4YQZDS5BK.L4LVKJV4XBF7MSXVP7FXLCFN6PXGYKMYERATYQA (browseable link) describes a directory containing two subdirectories:

<Directory xmlns="http://ns.nuke24.net/ContentCouch/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
	<entries rdf:parseType="Collection">
		<DirectoryEntry>
			<name>2009</name>
			<target>
				<Directory rdf:about="x-rdf-subject:urn:bitprint:UNYPOT6UW4GQCZH3ZNLWXEDK7ULMEF6Y.YKMAGQZNPTIHD2WYGRTZOO7KAFPOOPAY6R3XICA"/>
			</target>
		</DirectoryEntry>
		<DirectoryEntry>
			<name>2010</name>
			<target>
				<Directory rdf:about="x-rdf-subject:urn:bitprint:7YXK4RQGBCGT5NH6SU7GEWVV2TUMACFN.GA2FOEGTT7YLO6YTV46AHXRMEIVT4B5JVMHIPHQ"/>
			</target>
		</DirectoryEntry>
	</entries>
</Directory>

A subdirectory of the above, ’2009’, containing two JPEG files:

<Directory xmlns="http://ns.nuke24.net/ContentCouch/" xmlns:bz="http://bitzi.com/xmlns/2002/01/bz-core#" xmlns:dc="http://purl.org/dc/terms/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
	<entries rdf:parseType="Collection">
		<DirectoryEntry>
			<name>0817-IMG_0233.JPG</name>
			<target>
				<Blob rdf:about="urn:bitprint:RO5JL2GVCACEG7MGIK4JATPB6RTWCLFC.INYA5BK64TKIB2B7FEXWBVD56H55FL3ZLF3TGGY">
					<bz:fileLength>797878</bz:fileLength>
				</Blob>
			</target>
			<dc:modified>2009-08-17 14:45:01 GMT</dc:modified>
		</DirectoryEntry>
		<DirectoryEntry>
			<name>0817-IMG_0236.JPG</name>
			<target>
				<Blob rdf:about="urn:bitprint:EQLJSS4FNM4NATV4CZDOISF7YMAVPPWL.GFKPQM5ZQ5FLZRX77PMTBI4RWG4H547MU4JW4OA">
					<bz:fileLength>651881</bz:fileLength>
				</Blob>
			</target>
			<dc:modified>2009-08-17 14:45:52 GMT</dc:modified>
		</DirectoryEntry>
	</entries>
</Directory>

Note: references to blobs can be made simply with a “urn:bitprint:” URI. References to objects described by blobs require the “x-rdf-subject:” prefix (in other situations I have simply appended “#” to mean the same thing; see my notes on the subject in ’Existing standards for hash-based URN schemes’).

Note: DirectoryEntry target attributes, when represented in RDF+XML, may link directly to the target (<target rdf:resource="urn:WHATEVER"/>) or provide some metadata (<target><Blob rdf:about="urn:WHATEVER"><bz:fileLength>1234</bz:fileLength></Blob></target>). The latter is redundant, but provides potentially useful information about the target so that it can be considered without having to actually download the linked object, which may be large or unavailable. e.g. to show a directory listing, one generally would like to see subdirectoriers listed differently than files, and for files to at least have their size shown. The deprecated targetType attribute served this same role for type information in a hackier way.

TODO: Example of a commit object

Implementation(s)

This repository contains the original, still-in-use but not-actively-maintained Java implementation of the system. It was created in 2007 targetting JDK 1.4. In 2008 I rearchitected the project using a ‘resource-oriented’ approach. Components communicate by making lightweight (no network sockets are involved) REST calls to each other. This provides some flexibility and uniformity between internal and external APIs, but resulted in logic for anything other than very basic GET/PUT blob calls being in kind of weird places, and important information being crammed into Request and Response metadata. Lesson learned: it’s probably best to keep your functions simple and the call tree shallow.

See also:

Old documentation

./doc/old-README.txt

contentcouch's People

Contributors

togos avatar stevedxfph avatar

Stargazers

 avatar matthew mcdonald avatar Angus H. avatar

Watchers

James Cloos avatar  avatar  avatar

Forkers

vijayeluri

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.