Giter Club home page Giter Club logo

data_desk's Introduction

Data Desk v1.0.0

Description

Data Desk is a project utility that parses a simple C-like data description format. Input files in this data description format are parsed to create corresponding abstract syntax trees which represent the information extracted from the files. These abstract syntax trees are then sent to project-specific custom code that is written by the user. This custom code is simply a dynamic library with a few exported functions that are used as callbacks for the parser. Below is a list of the callbacks.

  • DataDeskCustomInitCallback(void) is called when the parser starts.
  • DataDeskCustomParseCallback(DataDeskNode *root, char *filename) is called for every top-level construct that is parsed.
  • DataDeskCustomCleanUpCallback(void) is called before the parser shuts down.

The abstract syntax graph is formed completely by DataDeskNode structures. This structure can be found in the data_desk.h file.

Data Desk also offers a number of utility functions for introspecting on abstract syntax trees it passes to your custom code. A list of these is in the data_desk.h file, which can be included into your custom layer.

Usage

To use Data Desk, you'll need to do a few things:

  1. Get Data Desk
  2. Make or get some Data Desk format files (.ds)
  3. Make a project-specific custom layer

Step 1: Get Data Desk

  1. Run the command git clone https://github.com/ryanfleury/data_desk
  2. cd data_desk
  3. build on Windows or ./build.sh on Mac/Linux

NOTE: The build.bat script on Windows expects to find cl (MSVC). Your environment should know about this. The easiest way to do this is to use one of the Visual Studio command prompts (titled x64 Native Tools Command Prompt for VS<version>, or x86 Native Tools Command Prompt for VS<version>). Otherwise, you can call vcvarsall.bat in your terminal environment, which is packaged with Visual Studio.

Step 2: Make or get Data Desk format files (.ds)

Grab an example or make your own.

Step 3: Make a project-specific custom layer

  1. An easy way to write the code for this is to check out the custom layer template, located here. Fill out the functions in your custom layer code however you want to. There are some helper functions available in data_desk.h that might be useful for you here. This can be dropped into your code and used.

  2. To build a custom layer, you just need to build a DLL (or .so file) with the function callbacks you've written as the appropriate exported symbols. data_desk.h outlines what symbols are used for each callback.

Step 4: Run Data Desk

To run Data Desk with your custom layer, you can use the following command template:

data_desk --custom /path/to/custom/layer /file/to/parse/1 /file/to/parse/2 ...

Data Desk (.ds) File Documentation

A valid Data Desk file is defined as a set of zero or more Declarations, Structs, Unions, Enums, Flagss, Consts, Procedure Headers, or Comments. Each of the following sections defines these (and what they are comprised of).

  1. Identifiers
  2. Keywords
  3. Numeric Constants
  4. String Constants
  5. Character Constants
  6. Unary Operators
  7. Binary Operators
  8. Expressions
  9. Types
  10. Declarations
  11. Structs
  12. Unions
  13. Enums
  14. Flags
  15. Constant Expressions
  16. Procedure Headers
  17. Comments
  18. Tags

Identifiers

Identifiers are defined as a non-keyword sequence of characters that begin with either an underscore or an alphabetic character, and contain numeric characters, alphabetic characters, or underscores (similar to C).

Keywords

There are a number of reserved keywords in Data Desk files that cannot be used as identifiers. Here is a list of them:

  • struct: Used for defining structs.
  • union: Used for defining unions.
  • enum: Used for defining enums.
  • flags: Used for defining flags.
  • proc: Used for defining procedure headers.
  • _: Reserved for blank/unused identifiers.

Numeric Constants

Numeric constants (Numbers) are defined as a sequence of characters that begin with a numeric character, and contain only numeric characters, periods, or alphabetic characters.

NOTE: Data Desk does not guarantee the correctness as defined by programming languages of your numeric constants. For example, the following will be interpreted by Data Desk as a numeric constant: 1.2.3.a.b.c. Because Data Desk does not do any evaluation of numeric constants, it will not enforce validity of numeric constants.

String Constants

String constants (Strings) can be single-line or multi-line.

A single-line string constant is defined similarly to those in C. It begins with a double-quote character, and ends with a non-escaped double-quote character. Double-quote characters can be escaped with a backslash.

A multi-line string constant is defined as beginning with three double-quote characters ("""), and ending with three double-quote characters (""").

Character Constants

Character constants (Chars) are defined almost identically to single-line string constants, but with single-quote beginning and ending characters instead of double-quote characters.

Unary Operators

Data Desk defines a subset of the unary operators found in C:

  • -: Numeric Negation
  • !: Boolean Not
  • ~: Bitwise Negate

Binary Operators

Data Desk defines a subset of the binary operators found in C. It does not define shorthand assignment operators, like += or >>=, because there is no execution in Data Desk. The following binary operators are defined (in order of ascending precedence):

  • +: Addition
  • -: Subtraction
  • *: Multiplication
  • /: Division
  • %: Modulus
  • <<: Left Bitshift
  • >>: Right Bitshift
  • &: Bitwise And
  • |: Bitwise Or
  • &&: Boolean And
  • ||: Boolean Or

Expressions

An expression (Expr) in Data Desk is defined as:

Expr <- Identifier | Number | String | Char | Expr Binary Operator Expr

Types

Types are used in declarations. They are defined as being the following:

  • A group of 0 or more * characters, representing the number of layers of indirection.
  • A type name, which can be:
  • A group of 0 or more array size specifiers, being defined as: [ Expr ]

Declarations

Declarations are defined as follows:

Identifier : Type ;

Structs

Structs are groups of zero or more declarations. They are defined as:

Identifier :: struct { Zero or more Declarations, each followed by , or ; characters. }

Unions

Unions are groups of zero or more declarations. They look almost identical to (Struct)[#structs]s, but are meant to correspond to unions in C, which imply a different memory layout and intended usage. They are defined as:

Identifier :: union { Zero or more Declarations, each followed by , or ; characters. }

Enums

Enums are groups of one or more identifiers. They are defined as:

Identifier :: enum { One or more Identifiers, each followed by , or ; characters. }

When transpiled to C, these will be defined as a normal C enum; that is, the first one will be defined as a constant that evaluates to 0, the next to 1, and so on.

Flags

Flagss are groups of one or more identifiers. They are defined as:

Identifier :: flags { One or more Identifiers, each followed by , or ; characters. }

When transpiled to C, these will be defined as several C preprocessor macros that evaluate to unique bits inside of an integral value. These are similar to Enums, but their purpose is to define unique bits instead of unique integral values for a set of constants.

Constant Expressions

Constant expressions (Consts) are defined as:

Identifier :: Expression

Procedure Headers

Procedure headers (Procedure Headers) are defined as:

Identifier :: proc ( Declaration, Declaration, ..., Declaration ) [ -> Type ]

Comments

Comments are ignored by the parser. They can be single-line or multi-line.

Single-line comments can be defined with two / characters. They are terminated by a newline character.

Multi-line comments can be defined with a /* pattern. They are terminated by a */ pattern. They can also be nested. For example, if there exists the pattern /*/*, it will require */*/ to terminate.

Tags

Structs, Unions, Declarations (including those within Structs, Unions, and Procedure Headers), Consts, Identifiers within Enums or Flags, and Procedure Headers can be preceded with one or more Tags. A Tag is defined as beginning with a @ character, followed by an Identifier, with an optional set of parentheses, with an optional set of comma-separated expressions. These are used to annotate meta-information about various things. They will be passed to custom-layer code.

data_desk's People

Contributors

ryanfleury avatar debiatan avatar jstimpfle avatar sasluca avatar theaifam5 avatar said6289 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.