Giter Club home page Giter Club logo

edn's Introduction

edn

extensible data notation [eed-n]

Rationale

edn is an extensible data notation. A superset of edn is used by Clojure to represent programs, and it is used by Datomic and other applications as a data transfer format. This spec describes edn in isolation from those and other specific use cases, to help facilitate implementation of readers and writers in other languages, and for other uses.

edn supports a rich set of built-in elements, and the definition of extension elements in terms of the others. Users of data formats without such facilities must rely on either convention or context to convey elements not included in the base set. This greatly complicates application logic, betraying the apparent simplicity of the format. edn is simple, yet powerful enough to meet the demands of applications without convention or complex context-sensitive logic.

edn is a system for the conveyance of values. It is not a type system, and has no schemas. Nor is it a system for representing objects - there are no reference types, nor should a consumer have an expectation that two equivalent elements in some body of edn will yield distinct object identities when read, unless a reader implementation goes out of its way to make such a promise. Thus the resulting values should be considered immutable, and a reader implementation should yield values that ensure this, to the extent possible.

edn is a set of definitions for acceptable elements. A use of edn might be a stream or file containing elements, but it could be as small as the conveyance of a single element in e.g. an HTTP query param.

There is no enclosing element at the top level. Thus edn is suitable for streaming and interactive applications.

The base set of elements in edn is meant to cover the basic set of data structures common to most programming languages. While edn specifies how those elements are formatted in text, it does not dictate the representation that results on the consumer side. A well behaved reader library should endeavor to map the elements to programming language types with similar semantics.

Spec

Currently this specification is casual, as we gather feedback from implementors. A more rigorous e.g. BNF will follow.

General considerations

edn elements, streams and files should be encoded using UTF-8.

Elements are generally separated by whitespace. Whitespace, other than within strings, is not otherwise significant, nor need redundant whitespace be preserved during transmissions. Commas , are also considered whitespace, other than within strings.

The delimiters { } ( ) [ ] need not be separated from adjacent elements by whitespace.

# dispatch character

Tokens beginning with # are reserved. The character following # determines the behavior. The dispatches #{ (sets), #_ (discard), #alphabetic-char (tag) are defined below. # is not a delimiter.

Built-in elements

nil

nil represents nil, null or nothing. It should be read as an object with similar meaning on the target platform.

booleans

true and false should be mapped to booleans.

If a platform has canonic values for true and false, it is a further semantic of booleans that all instances of true yield that (identical) value, and similarly for false.

strings

Strings are enclosed in "double quotes". May span multiple lines. Standard C/Java escape characters \t, \r, \n, \\ and \" are supported.

characters

Characters are preceded by a backslash: \c, \newline, \return, \space and \tab yield the corresponding characters. Unicode characters are represented with \uNNNN as in Java. Backslash cannot be followed by whitespace.

symbols

Symbols are used to represent identifiers, and should map to something other than strings, if possible.

Symbols begin with a non-numeric character and can contain alphanumeric characters and . * + ! - _ ? $ % & = < >. If -, + or . are the first character, the second character (if any) must be non-numeric. Additionally, : # are allowed as constituent characters in symbols other than as the first character.

/ has special meaning in symbols. It can be used once only in the middle of a symbol to separate the prefix (often a namespace) from the name, e.g. my-namespace/foo. / by itself is a legal symbol, but otherwise neither the prefix nor the name part can be empty when the symbol contains /.

If a symbol has a prefix and /, the following name component should follow the first-character restrictions for symbols as a whole. This is to avoid ambiguity in reading contexts where prefixes might be presumed as implicitly included namespaces and elided thereafter.

keywords

Keywords are identifiers that typically designate themselves. They are semantically akin to enumeration values. Keywords follow the rules of symbols, except they can (and must) begin with :, e.g. :fred or :my/fred. If the target platform does not have a keyword type distinct from a symbol type, the same type can be used without conflict, since the mandatory leading : of keywords is disallowed for symbols. Per the symbol rules above, :/ and :/anything are not legal keywords. A keyword cannot begin with ::

If the target platform supports some notion of interning, it is a further semantic of keywords that all instances of the same keyword yield the identical object.

integers

Integers consist of the digits 0 - 9, optionally prefixed by - to indicate a negative number, or (redundantly) by +. No integer other than 0 may begin with 0. 64-bit (signed integer) precision is expected. An integer can have the suffix N to indicate that arbitrary precision is desired. -0 is a valid integer not distinct from 0.

integer
  int
  int N
digit
  0-9
int
  digit
  1-9 digits
  + digit
  + 1-9 digits
  - digit
  - 1-9 digits

floating point numbers

64-bit (double) precision is expected.

floating-point-number
  int M
  int frac
  int exp
  int frac exp
digit
  0-9
int
  digit
  1-9 digits
  + digit
  + 1-9 digits
  - digit
  - 1-9 digits
frac
  . digits
exp
  ex digits
digits
  digit
  digit digits
ex
  e
  e+
  e-
  E
  E+
  E-

In addition, a floating-point number may have the suffix M to indicate that exact precision is desired.

lists

A list is a sequence of values. Lists are represented by zero or more elements enclosed in parentheses (). Note that lists can be heterogeneous.

(a b 42)

vectors

A vector is a sequence of values that supports random access. Vectors are represented by zero or more elements enclosed in square brackets []. Note that vectors can be heterogeneous.

[a b 42]

maps

A map is a collection of associations between keys and values. Maps are represented by zero or more key and value pairs enclosed in curly braces {}. Each key should appear at most once. No semantics should be associated with the order in which the pairs appear.

{:a 1, "foo" :bar, [1 2 3] four}

Note that keys and values can be elements of any type. The use of commas above is optional, as they are parsed as whitespace.

sets

A set is a collection of unique values. Sets are represented by zero or more elements enclosed in curly braces preceded by # #{}. No semantics should be associated with the order in which the elements appear. Note that sets can be heterogeneous.

#{a b [1 2 3]}

tagged elements

edn supports extensibility through a simple mechanism. # followed immediately by a symbol starting with an alphabetic character indicates that that symbol is a tag. A tag indicates the semantic interpretation of the following element. It is envisioned that a reader implementation will allow clients to register handlers for specific tags. Upon encountering a tag, the reader will first read the next element (which may itself be or comprise other tagged elements), then pass the result to the corresponding handler for further interpretation, and the result of the handler will be the data value yielded by the tag + tagged element, i.e. reading a tag and tagged element yields one value. This value is the value to be returned to the program and is not further interpreted as edn data by the reader.

This process will bottom out on elements either understood or built-in.

Thus you can build new distinct readable elements out of (and only out of) other readable elements, keeping extenders and extension consumers out of the text business.

The semantics of a tag, and the type and interpretation of the tagged element are defined by the steward of the tag.

#myapp/Person {:first "Fred" :last "Mertz"}

If a reader encounters a tag for which no handler is registered, the implementation can either report an error, call a designated 'unknown element' handler, or create a well-known generic representation that contains both the tag and the tagged element, as it sees fit. Note that the non-error strategies allow for readers which are capable of reading any and all edn, in spite of being unaware of the details of any extensions present.

rules for tags

Tag symbols without a prefix are reserved by edn for built-ins defined using the tag system.

User tags must contain a prefix component, which must be owned by the user (e.g. trademark or domain) or known unique in the communication context.

A tag may specify more than one format for the tagged element, e.g. both a string and a vector representation.

Tags themselves are not elements. It is an error to have a tag without a corresponding tagged element.

built-in tagged elements

#inst "rfc-3339-format"

An instant in time. The tagged element is a string in RFC-3339 format.

#inst "1985-04-12T23:20:50.52Z"

#uuid "f81d4fae-7dec-11d0-a765-00a0c91e6bf6"

A UUID. The tagged element is a canonical UUID string representation.

comments

If a ; character is encountered outside of a string, that character and all subsequent characters to the next newline should be ignored.

discard

# followed immediately by _ is the discard sequence, indicating that the next element (whether separated from #_ by whitespace or not) should be read and discarded. Note that the next element must still be a readable element. A reader should not call user-supplied tag handlers during the processing of the element to be discarded.

[a b #_foo 42] => [a b 42]

The discard sequence is not an element. It is an error to have a discard sequence without a following element.

equality

Sets and maps have requirements that their elements and keys respectively be unique, which requires a mechanism for determining when 2 values are not unique (i.e. are equal).

nil, booleans, strings, characters, and symbols are equal to values of the same type with the same edn representation.

integers and floating point numbers should be considered equal to values only of the same magnitude, type, and precision. Comingling numeric types and precision in map/set key/elements, or constituents therein, is not advised.

sequences (lists and vectors) are equal to other sequences whose count of elements is the same, and for which each corresponding pair of elements (by ordinal) is equal.

sets are equal if they have the same count of elements and, for every element in one set, an equal element is in the other.

maps are equal if they have the same number of entries, and for every key/value entry in one map an equal key is present and mapped to an equal value in the other.

tagged elements must define their own equality semantics. #uuid elements are equal if their canonic representations are equal. #inst elements are equal if their representation strings designate the same timestamp per RFC-3339.

edn's People

Contributors

davidrupp avatar ersiner avatar honkfestival avatar richhickey avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

edn's Issues

Ordered Map?

What about an ordered map syntax? Many XML applications use the implicit definition order, maybe using ordered maps is better than using lists for some tasks.

Regex tag?

Is there a high probability of a built-in tag for regex?

regex "something"

isn't that much shorter than

tld.company/regex "something"

I'm just curious.

is 1M a valid floating point number?

The proposed grammar for floating point numbers does not allow: "1M" since at least a frac or exp part is required following the int part. Clojure does allow this. Is this deliberate?

Discard reader macro has possible collision with tags that start with an underscore

According to the edn spec, symbols can begin with an underscore. ("Symbols begin with a non-numeric character and can contain alphanumeric characters and . * + ! - _ ?. If - or . are the first character, the second character must be non-numeric. Additionally, : # are allowed as constituent characters in symbols but not as the first character.")

Tags are symbols prefixed with a pound sign. Therefore, you could have a tag like this: #_cnd/my-snazzy-tag.

#_ is a reader macro used to discard the next value, however. In this case, instead of my tag, I discard the symbol cnd/my-snazzy-tag, which is not what I want.

Should there be a special rule eliminating underscores as the first character of a tag? Or should this be handled a different way?

Escape characters in strings

Am I correct to assume that only the 3 escape sequences \t \r \n are supported in EDN strings? Seems to not be complete since escaped double quotes are so common?

Spec versions for implementations to state compliance

It might be helpful for implementations to have the ability to state exactly what version of the spec they conform to.

'Though this might only be most useful as the spec solidifies and likely won't be useful after the final BNF is published - assuming the definition of Edn is stable after that.

Make the tag output capable of be consumed by another tag or specify it can't

Suppose you have defined the #x/table tag which takes a list with a header and rows to increase a little the readability and diminish the data length:

 #x/table ((:name   :surname  :age)
           ("bob"   "smith"   33)
           ("miles" "davis"   83)
          )

This tag will yield the list ({:name "bob" :surname "smith" :age 33}, {:name "miles" :surname "davis" :age 83}). My question is: can you give this output value as an input to another tag handler? Like by doing:

#x/othertag #x/table ((:name   :surname  :age)
                      ("bob"   "smith"   33)
                      ("miles" "davis"   83)
                     )

By reading the EDN spec I think is not clear if you can compose tags or not, maybe the phrase "This value is the value to be returned to the program and is not further interpreted as edn data by the reader." prohibits it.

So, will the reader pass the value returned by the #x/table handler as the input of the #x/othertag handler? That will be nice, in any case needs to be clear in the spec if is permitted or not.

What is a valid 'middle' of a Symbol

The / character is valid in the middle of a Symbol.

Is ./. a valid Symbol? Other questionable Symbols: -/.

Is the rule that the / character may not appear at the end or beginning of a Symbol with more than one character? Is the middle anywhere but the end or beginning of a Symbol with more than one character (which would make sense)? Does this imply the restriction that prefix and name are not allowed to be empty (which would make sense)?

suggestion: char literals \ should not quote white space

I assume that character literals are intended to be like those in Clojure, however these are legal in Clojure:

\<an actual space>
\<an actual newline>
\<an actual return>
\<an actual tab>

I don't think it would be a good idea to support these in edn as they are hard for a human to read unambiguously. I'd suggest requiring:

\space
\newline
\return
\tab
...

and disallowing whitespace immediately following the \ introducing a
character literal.

Is :/x a valid Keyword?

Does the : fulfill the beginning of a Symbol where the / must be in the middle of a Symbol?

Escaping of quotes and backslashes in strings

Escaping of double quotes and backslashes in strings is not specified at all.

I would assume that:

  • "foo\\bar" => foo\bar
  • "foo\"bar" => foo"bar

And the clojure implementation confirms this. It should be specified in edn though.

support for metadata?

I noticed that the relevance/edn-ruby lib is supporting metadata e.g.
^{:doc "This is my vector" :rel :temps} [98.6 99.7]

Is this going to be part of the official edn spec? At the very least I can imagine suggesting that languages which do not have support for metadata can treat it as a comment/discard?

parsing characters should have a prefix to '\'

Most languages will interpret "\r" as a literal escape code in the input stream.

Common Lisp solves this problem by representing characters as #\r

(char "r" 0)
> #\r

Which would make parsing input easier.

is "foo/-0bar" an acceptable symbol?

"If - or . are the first character, the second character must be non-numeric. Additionally, : # are allowed as constituent characters in symbols but not as the first character."

So, is "foo/-4bar" allowed or not? It would seem allowed, but this has the unfortunate property that the we're left with an unreadable symbol if we strip the prefix: "-4bar".

specify expected integer size

The floating point numbers states:
64-bit (double) precision is expected.

The integers definition should have an analog statement.
e.g. something like
64-bit (signed integer) precision is expected.

Would #<...> be valid edn?

Greater than and less than aren't in the symbols section and I would like some clarification. Can you have greater than and less than in tags in edn?

Extra symbols in Symbol

Will other symbols be allowed in Symbol?

I'm thinking of: ~ ` @ $ % ^ & + = | \ < >

I'd love in particular: | and %

iOS framework?

Hi Rich,

Have you heard of any iOS frameworks for parsing/generating EDN?
If not, I might do one for myself (and open sourcing it afterwards).

Best,
Henrik

EDN Schemas?

Considering EDN is a simpler XML, can we also get a simpler XSD?

David Nolen and I were discussing the potential for formalizing the ClojureScript AST. My first thought was that I wish I had an AST validator, like the ASTValidator.java in Google Closure. My next thought was "I'm sure core.logic could make this very easy". I typed not more than two sentences of notes on this topic before I thought: This smells an awful lot like XML Schemas...

Clarifying the definition of 'constituent characters' in Symbols

I found the clisp page on constituent, and that seems to mean that the related character is valid for that element in edn. Is that correct?

I'm confused about where : and # are valid in Symbols. Clearly they can be 'in' a Symbol, and not the first character. The Clojure Reader page says that Symbols ending with : are reserved by Clojure - is that also true of edn? That same Clojure Reader page says that : must be non-repeating. Is that true in edn of : and #?

Comments and newlines

Is the newline that terminates a ; comment platform dependant or is it strictly \newline?

Unicode and Symbols

In the explanation for Symbols it says, "non-numeric character" and "alphanumeric characters." Are those restricted to ASCII or can relevant Unicode fit?

The wikipedia page for 'alphanumeric' states that it is commonly restricted to Latin Letters and Arabic Digits, but then makes exception for 'other locales.'

Is `:/` a valid keyword?

The spec says about symbols that "/ by itself is a legal symbol" and continues that "[k]eywords follow the rules of symbols, except they can (and must) begin with a colon, e.g. :fred or :my/fred". So :/ seems to be a valid keyword. Is this intended?

Is "#_ #_ 1 2 3" equivalent to "#_ 1 #_ 2 3"?

It's not clear to me if "#_ #_ 1 2 3" is equivalent to "#_ 1 #_ 2 3". This is the case in Clojure's LispReader, but I thought I'd ask to be certain.

(One could argue for "#_ #_ 1 2 3" being equivalent to "1 2 3" if the first #_ were to eat the second.)

Can literal tags contain periods?

The Clojure reader currently does not allow periods in tagged literals (see http://dev.clojure.org/jira/browse/CLJ-1100). The EDN spec is unclear on this issue:

edn supports extensibility through a simple mechanism. # followed immediately by a symbol starting with an alphabetic character indicates that that symbol is a tag.

Are periods allowed in tags?

Special floating point values have no encoding

The "floating point numbers" section refers to 64-bit double precision -- presumably IEEE-754 -- and specifies how to encode typical numbers, but it does not specify encoding of special values such as NaN or infinities. This is undesirable, as now any object graph that includes any IEEE-754 floating point field could conceivably become unserializable, depending on the value that field contains.

JSON makes no allowances for NaN or +/-Inf, forcing tools to use tricks like serializing these values as strings, and coercing certain strings ("NaN") back into their floating-point counterparts on deserialization. Not only does this add semantics on an ad-hoc basis outside the JSON specification, it is simply too much magic for my taste. On the other hand, YAML provides .inf, -.Inf, and .NAN as language-independent tokens representing these particular floating point values. I find this much preferable, as it causes no ambiguity.

I'd like the encoding of these values to be specified, since edn seems to be aiming at general-purpose data interchange, and the set of floating point numbers includes these special values.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.