Data Desk v1.0.0

Description

Data Desk is a project utility that parses a simple C-like data description format. Input files in this data description format are parsed to create corresponding abstract syntax trees which represent the information extracted from the files. These abstract syntax trees are then sent to project-specific custom code that is written by the user. This custom code is simply a dynamic library with a few exported functions that are used as callbacks for the parser. Below is a list of the callbacks.

DataDeskCustomInitCallback(void) is called when the parser starts.
DataDeskCustomParseCallback(DataDeskNode *root, char *filename) is called for every top-level construct that is parsed.
DataDeskCustomCleanUpCallback(void) is called before the parser shuts down.

The abstract syntax graph is formed completely by DataDeskNode structures. This structure can be found in the data_desk.h file.

Data Desk also offers a number of utility functions for introspecting on abstract syntax trees it passes to your custom code. A list of these is in the data_desk.h file, which can be included into your custom layer.

Usage

To use Data Desk, you'll need to do a few things:

Get Data Desk
Make or get some Data Desk format files (.ds)
Make a project-specific custom layer

Step 1: Get Data Desk

Run the command git clone https://github.com/ryanfleury/data_desk
cd data_desk
build on Windows or ./build.sh on Mac/Linux

NOTE: The build.bat script on Windows expects to find cl (MSVC). Your environment should know about this. The easiest way to do this is to use one of the Visual Studio command prompts (titled x64 Native Tools Command Prompt for VS<version>, or x86 Native Tools Command Prompt for VS<version>). Otherwise, you can call vcvarsall.bat in your terminal environment, which is packaged with Visual Studio.

Step 2: Make or get Data Desk format files (.ds)

Grab an example or make your own.

Step 3: Make a project-specific custom layer

An easy way to write the code for this is to check out the custom layer template, located here. Fill out the functions in your custom layer code however you want to. There are some helper functions available in data_desk.h that might be useful for you here. This can be dropped into your code and used.
To build a custom layer, you just need to build a DLL (or .so file) with the function callbacks you've written as the appropriate exported symbols. data_desk.h outlines what symbols are used for each callback.

Step 4: Run Data Desk

To run Data Desk with your custom layer, you can use the following command template:

data_desk --custom /path/to/custom/layer /file/to/parse/1 /file/to/parse/2 ...

Data Desk (.ds) File Documentation

A valid Data Desk file is defined as a set of zero or more Declarations, Structs, Unions, Enums, Flagss, Consts, Procedure Headers, or Comments. Each of the following sections defines these (and what they are comprised of).

Identifiers
Keywords
Numeric Constants
String Constants
Character Constants
Unary Operators
Binary Operators
Expressions
Types
Declarations
Structs
Unions
Enums
Flags
Constant Expressions
Procedure Headers
Comments
Tags

Identifiers

Identifiers are defined as a non-keyword sequence of characters that begin with either an underscore or an alphabetic character, and contain numeric characters, alphabetic characters, or underscores (similar to C).

Keywords

There are a number of reserved keywords in Data Desk files that cannot be used as identifiers. Here is a list of them:

struct: Used for defining structs.
union: Used for defining unions.
enum: Used for defining enums.
flags: Used for defining flags.
proc: Used for defining procedure headers.
_: Reserved for blank/unused identifiers.

Numeric Constants

Numeric constants (Numbers) are defined as a sequence of characters that begin with a numeric character, and contain only numeric characters, periods, or alphabetic characters.

NOTE: Data Desk does not guarantee the correctness as defined by programming languages of your numeric constants. For example, the following will be interpreted by Data Desk as a numeric constant: 1.2.3.a.b.c. Because Data Desk does not do any evaluation of numeric constants, it will not enforce validity of numeric constants.

String Constants

String constants (Strings) can be single-line or multi-line.

A single-line string constant is defined similarly to those in C. It begins with a double-quote character, and ends with a non-escaped double-quote character. Double-quote characters can be escaped with a backslash.

A multi-line string constant is defined as beginning with three double-quote characters ("""), and ending with three double-quote characters (""").

Character Constants

Character constants (Chars) are defined almost identically to single-line string constants, but with single-quote beginning and ending characters instead of double-quote characters.

Unary Operators

Data Desk defines a subset of the unary operators found in C:

-: Numeric Negation
!: Boolean Not
~: Bitwise Negate

Binary Operators

Data Desk defines a subset of the binary operators found in C. It does not define shorthand assignment operators, like += or >>=, because there is no execution in Data Desk. The following binary operators are defined (in order of ascending precedence):

+: Addition
-: Subtraction
*: Multiplication
/: Division
%: Modulus
<<: Left Bitshift
>>: Right Bitshift
&: Bitwise And
|: Bitwise Or
&&: Boolean And
||: Boolean Or

Expressions

An expression (Expr) in Data Desk is defined as:

Expr <- Identifier | Number | String | Char | Expr Binary Operator Expr

Types

Types are used in declarations. They are defined as being the following:

A group of 0 or more * characters, representing the number of layers of indirection.
A type name, which can be:
- Some Identifier referring to a type name.
- A Struct definition.
- A Union definition.
A group of 0 or more array size specifiers, being defined as: [ Expr ]

Declarations

Declarations are defined as follows:

Identifier : Type ;

Structs

Structs are groups of zero or more declarations. They are defined as:

Identifier :: struct { Zero or more Declarations, each followed by , or ; characters. }

Unions

Unions are groups of zero or more declarations. They look almost identical to (Struct)[#structs]s, but are meant to correspond to unions in C, which imply a different memory layout and intended usage. They are defined as:

Identifier :: union { Zero or more Declarations, each followed by , or ; characters. }

Enums

Enums are groups of one or more identifiers. They are defined as:

Identifier :: enum { One or more Identifiers, each followed by , or ; characters. }

When transpiled to C, these will be defined as a normal C enum; that is, the first one will be defined as a constant that evaluates to 0, the next to 1, and so on.

Flags

Flagss are groups of one or more identifiers. They are defined as:

Identifier :: flags { One or more Identifiers, each followed by , or ; characters. }

When transpiled to C, these will be defined as several C preprocessor macros that evaluate to unique bits inside of an integral value. These are similar to Enums, but their purpose is to define unique bits instead of unique integral values for a set of constants.

Constant Expressions

Constant expressions (Consts) are defined as:

Identifier :: Expression

Procedure Headers

Procedure headers (Procedure Headers) are defined as:

Identifier :: proc ( Declaration, Declaration, ..., Declaration ) [ -> Type ]

Comments

Comments are ignored by the parser. They can be single-line or multi-line.

Single-line comments can be defined with two / characters. They are terminated by a newline character.

Multi-line comments can be defined with a /* pattern. They are terminated by a */ pattern. They can also be nested. For example, if there exists the pattern /*/*, it will require */*/ to terminate.

sybelblue / data_desk Goto Github PK

data_desk's Introduction