Giter Club home page Giter Club logo

ndsev / zserio Goto Github PK

View Code? Open in Web Editor NEW
101.0 24.0 27.0 68.55 MB

zero sugar, zero fat, zero serialization overhead

Home Page: https://zserio.org/

License: BSD 3-Clause "New" or "Revised" License

Java 36.70% CMake 1.45% FreeMarker 5.99% C++ 36.55% Shell 2.42% Python 14.01% ANTLR 0.13% ZenScript 2.74% HTML 0.01% C 0.01%
schema-language serialization-framework code-generation cpp java grpc sqlite wire-format serialization compactness

zserio's People

Contributors

0xdead avatar fklebert avatar johannes-wolf avatar mi-la avatar mikir avatar mistergc avatar nlohmann avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

zserio's Issues

Improve documentation comment parsing

Documentation comment parsing is buggy. Consider to rewrite documentation comment grammar.

For example, the following multiple line documentation comment

/** Comment
 **/

compiles with error

unexpected token: null

Implement zserio runtime library for Python

Python runtime library

  • read/write bits (signed/unsigned)
  • alignTo, getBitPosition
  • read/write string
  • read/write variable integers
  • read/write floats
  • builtin operators
  • arrays

Add support for code documentation

I would like to document my zserio code so that the documentation also appears in the generated code. In the end I want to be able to generate a documentation (of the generated code!) with Doxygen or Javadoc respectively.

Something like the following should be properly considered by the emitter:

/*
 * @brief Experience instance
 * @since 0.1.0
 */
struct Experience
{
    bit:6       yearsOfExperience;
    Language    programmingLanguage;
};

/*
 * @brief Programming language enum
 * @since 0.2.0
 */
enum bit:2 Language
{
    CPP     = 0,
    JAVA    = 1,
    PYTHON  = 2,
    JS      = 3
};

Consider command line argument to disable "optional clause warnings"

When an optional field depends on another optional field or on an optional parameter, zserio tries to check that the optional clause for both fields is the same. However we are only able to compare the two expressions as strings and when the strings are not equal, we fire the warning, even when the expressions are semantically the same.
The warning is still very useful to inform a user that something could be wrong, so it's not a good idea to remove the warning. However, to be able to write a warning free zserio source, we should be able to disable this warning at least from command line.

Add possibility to build out-of-source

scripts/build.sh always builds into the fixed directories /build and deploys into /distr. That will not work on read-only source checkouts.

Please add an option to specify a build-directory, creating the subdirectories build / distr inside the specified directory is imho fine.

Introduce C++ runtime library version

Java runtime library has version stored in the Manifest file. C++ runtime library doesn't have any version. Such version could be stored anywhere in C++ header file just to have any reference during bug reporting.

We are not able to check correct version of runtime library automatically.

Consider to create new script update_version.sh which updates version in all zserio packages (core, extensions, runtime libraries).

NullPointerException when reading an empty file

when running this commend on the shell:
touch test.ds && java -jar ./zserio.jar test.ds

zserio throws a unhandled NullPointerException:

[ERROR] Internal error
java.lang.NullPointerException
	at zserio.tools.ZserioTool.checkPackageName(ZserioTool.java:336)
	at zserio.tools.ZserioTool.parsePackage(ZserioTool.java:303)
	at zserio.tools.ZserioTool.parse(ZserioTool.java:179)
	at zserio.tools.ZserioTool.process(ZserioTool.java:161)
	at zserio.tools.ZserioTool.execute(ZserioTool.java:155)
	at zserio.tools.ZserioTool.runTool(ZserioTool.java:66)
	at zserio.tools.ZserioTool.main(ZserioTool.java:47)```

Implement Python generator

Implement Python generator as an zserio extension:

  • add Python extension configuration
  • expression formatter
  • native types
  • configuration for python tests

Add emitters for:

  • constants
  • enumerations
  • subtypes
  • structures
  • choices
  • unions

Add python tests

  • language
  • arguments
  • complex

Additional compatibility data

In the current plain implementation zserio streams are only backward compatible if new content is added to the end of the stream. It is not possible to change structures in the middle of the stream without adding additional data.
The additional data which we will call compatibility data for now, shall not be directly added to the stream but be an additional stream which older parsers may process. The original stream footprint shall not be touched since we want to still maintain a wire frame free format.

C++ generator fails on recursive definitions without parameters

Parser and JAVA generators are fine using the following schema, but C++ emitter throws errors.

Schema:

package tutorial;


struct Employee
{
  uint8           age : age <= 65; // max age is 65
  string          name;
  uint16          salary;
  optional uint16 bonus;
  Title           title;
 
  // if employee is a team lead, list the team members
  Employee      teamMember[] if title == Title.TEAM_LEAD;
};
 
enum uint8 Title
{
  DEVELOPER = 0,
  TEAM_LEAD = 1,
  CTO       = 2
};

Error generated by C++ generator:

Emitting C++ code
[ERROR] Internal error
java.lang.StackOverflowError
        at java.util.ArrayList$Itr.<init>(Unknown Source)
        at java.util.ArrayList.iterator(Unknown Source)
        at zserio.ast.CompoundType.needsChildrenInitialization(CompoundType.java:184)

Consider implementing Closeable interface in BitStreamReader and BitStreamWriter

Currently we use a custom BitStreamCloseable interface to prevent warning when users don't close reader / writer properly. It's because our readers and writers does nothing in the close method.
The only exception is FileBitStreamWriter which flushes the buffer to a file in the close method.

We can either extend Closeable interface in BitStreamReader and BitStreamWriter and thus force users to close the readers / writers properly, or we can just implement Closeable interface in FileBitStreamWriter. Implementing Closeable only in a single writer might bring inconsistency in our stream reader / writer classes however.

Introduce command line argument to enable "unused warnings"

The current zserio compiler throws warnings for unused structures. This is basically a good thing to keep an eye on unused parts of the schema.
But of course there will be always one warning we have to ignore, being the one for a root element.

I think the SQLite extension does work with it when using sql_database or sql_table. Those do not trigger warnings.

If we can prefix a structure with a keywork root or similar then the zserio compiler would not need to throw the warning.

Fields in choice or union expressions are wrongly resolved

Choice and union types do not have available all fields. Therefore the following should not be compileable:

package bad_constraint_error;

enum uint8 Selector
{
    BLACK,
    GREY,
    RED
};

choice EnumParamChoice(Selector selector) on selector
{
    case Selector.BLACK:
        int8 black;

    case GREY:
        int16 grey : black > 0 && grey > 0; // ERROR because 'black' is not available!

    default:
        int64 other;
};

Add new language element for variable integers

Currently zserio supports variable integer encoding with fixed sizes of the resulting bytes.
So a varuint64 in zserio cannot really hold the complete range of 64 bit but has a payload of 57 bit. The other bits are used for the variable encoding.
Other serialization languages do feature complete range with variable encoding. This can result in byte sizes greater than 8 bytes for varuint64 for example.

We should also add this capability to zserio so that we can retrofit it onto other serializations as well.

Proposal is to have:

type payload comment
varuint up to 64 bit
varint up to 64 bit
var(u)int64 up to 57 bit keep for backward compatibility
var(u)int32 up to 29 bit keep for backward compatibility
var(u)int16 up to 15 bit keep for backward compatibility

Keeping the current variable integer encodings may also be useful for people who rather want to stick with the sizes rather than total variable encoding.

Optimize AnyHolder inplace creation

static const bool fitsInPlace = sizeof(Holder<T>) <= sizeof(UntypedHolder::MaxInPlaceType);
if (fitsInPlace)
{
    holder = new (&m_untypedHolder.inPlace) Holder<T>();
    m_isInPlace = true;
}
else
{
    holder = new Holder<T>();
    m_untypedHolder.heap = holder;
}

The if in the code above could be replaced with a template.

Improve C++ native type mapping

Definition of native array types in C++ generator unnecessarily requires to specify theirs element types. The element type can be automatically deduced from Zserio types definition.

Clarify built-in operator numbits

Built-in operator numbits is not clearly defined in the documentation:

The numbits(value) operator is defined for unsigned integers as minimum number of bits required to encode value-1. The returned number is of type uint8. The numbits operator returns 1 if applied to value 0 or 1.

Such definition forces users to ask additional questions, like why do we have exception for value 0 or why we don't have operator which returns number of bits to encode value.

Possible solutions:

  1. To change numbits(value) operator to return number of bits required to encode value. Very similar to bit_length(value) operator defined in python as following:

    Return the number of bits necessary to represent an integer in binary, excluding the sign and leading zeros.

  2. To change numbits description in documentation to numbits(num) as minimum number of bits required to encode num different values. This change will include the removal of exception numbits(0) = 1, so numbits(0) will be 0.

  3. To do both 1. and 2. together. The new introduced operator can be named as bitlength.

Java generated code for uint64 offsets doesn't compile

struct Test
{
    uint64 offsets[];
offsets[@index]:
    uint32 data[];
};

Generates the following offsets setter (similar for offsets checker):

    private final class __OffsetSetter_data implements zserio.runtime.array.OffsetSetter
    {
        @Override
        public void setOffset(int __index, long __byteOffset)
        {
            final java.math.BigInteger __value = (java.math.BigInteger)__byteOffset;
            getOffsets().setElementAt(__value, __index);
        }
    }

Cast from long __byteOffset to BigInteger is not possible!

Make -withWriterCode the default in zserio.jar

Currently when running zserio.jar it only generates reading classes by default. We should change the default option to -withWriterCode so that people interested in read-only need to specify the command line switch not the other way around

initializeOffsets method does not resize offset arrays

initializeOffsets method supposes that the offset array has correct size set by application.

If application fails to resize offset arrays, out of bound exception is thrown. This can be improved and initializeOffsets can resize offset arrays automatically to make application life easier.

Choice / Union fires "unchecked" warning when contains an array

struct Data8
{
    int8 first;
    int8 second;
};

struct Data16
{
    int16 first;
    int16 second;
};

choice Test(int numBits
{
case 8:
    Data8 array8[];
case 16:
    Data16 array16[];
};

Getters cast Object to ObjectArray which fires an "unchecked cast" warning in Java.

optional support for initializer_list

It would be great to add constructors that allow usage of C++11 initializer_list, at list for simple structures without options and for arrays.

Example:

struct Wgs84
{
  float32 longitude;
  float32 latitude;
}

generates a simple C++ class, that needs to be filled in your code like this:

Wgs84 coordinate;
coordinate.setLongitude(11.);
coordinate.setLatitude(50.);

It would be great if one could use:

Wgs84 coordinate(11., 50.);

or

Wgs84 coordinate({11., 50.});

instead.
The same is valid for arrays that contain simple types, like an array of integers or floats.

The most easy way to allow std::initizliser_list for sequences is probably to group the members into a structure and add an additional constructor that accepts a const reference to this (internal) struct. The rest will be done by the compiler automatically. Here is how the the generated class could look like (shortened):

class Wgs84
{
public:
    struct members
    {
        float32 longitude;
        float32 latitude;
    }
    Wgs84(const members& m) : m_members(m) {}

    [... generated methods as before ...]

private:
    members m_members;
}

CMAKE_GENERATOR other than make doesn't work

If you set CMAKE_GENERATOR to something different than make, you would also need to overwrite MAKE (which is actually the binary that is called for compilation). If you don't do this, the build will fail.

I would expect a warning (at least!) in this case, better let cmake handle that for you. Calling cmake with the option --build <dir> will find out the binary and call it.

To reproduce this issue on linux install ninja and run:

CMAKE_GENERATOR=Ninja ./scripts/build.sh all-linux64

Invisible array constrains

It would be good to have constrains on invisible arrays. Simply as following:

uint8 value[1..10] the array must have minimal one and maximal ten entries
uint8 value[2..] at least one values must be in the array
uint8 value[..23] not more than 23 entries are possible would be also possible with uint8 value[0..23]
uint8 value[5] exactly 5 entries must be in the array
Also maybe possible:
uint8 value[2 .. 200 % 2] there must be at least 2, maximal 200 entries in the array and they have to come in pairs.

Example:

invisible zserio

struct Company
{
string employees[3..5];
};

classic zserio

struct Company
{
    varuint64 numEntries : numEntries >= 3 and numEntries <= 5;
    string    employees[numEntries];
};

An adoption of the numEntries type is IMHO not necessary. So no bit:3 numEntries in the case of the example.

Enums with larger than 32bit base type doesn't work in C++

enum bit:63 Enum63
{
    ENUM63_VALUE1 = 0
};

MSVC doesn't allow base type other than int for enums (prior to C++11). Gcc probably chooses the bigger base type based on the highest value in the enum. We should investigate what C99 standard says about enums.

Expression formatter doesn't solve casting at all

Neither C++ nor Java solves casting in expression formatter.

struct BitStructureParameter( bit:1 a, bit:15 b, bit:29 c )
{
    bit< a >    value1;
    bit< b >    value2;
    bit< c >    value3;
};

Writer and reader parts pass parameter values (e.g. getB()) without any casting to the bit stream reader write / read methods.

Generated code:

void BitStructureParameter::write(zserio::BitStreamWriter& _out, zserio::PreWriteAction _preWriteAction)
{
    if ((_preWriteAction & zserio::PRE_WRITE_CHECK_RANGES) != 0)
        checkRanges();

    _out.writeBits(m_value1, getA());
    _out.writeBits64(m_value2, getB()); // possible loss of data due to conversion
    _out.writeBits64(m_value3, getC());
}

Subtypes to parameterized types are not handled correctly in C++

  1. Subtype to a parameterized type doesn't fire an error when used as non-parameterized types!
subtype Data D;
struct Data(int32 param)
{
    int32 data : data < param;
};
struct Test
{
    D data; // doesn't fire an error!
};
  1. Subtype (typedef) is not used in the generated code
subtype Data D;
struct Data(int32 param)
{
    int32 data : data < param;
};
struct Test
{
    D(10) data; // doesn't fire an error!
};

D is not used in the Test class!

GRPC Support

  • #23 C++ Emitter
  • #25 Java Emitter
  • #31 Doc Emitter
  • Documentation, User Guide, etc.
  • #38 Streaming RPC methods

Support MSVC compiler

MSVC is not officially supported yet. We need MSVC support to test gRPC on Windows, because gRPC supports only MSVC (officially).
Also MSVC fires different warnings and some of them could be relevant.

Disable if clauses which use the same field

The following compiles:

struct SomethingIsWrong
{
    varuint64 value if value > 0;
    varuint64 after;
};

There is a correct check in if clauses that after field cannot be used here but there is no check that the same field is not available as well (unlike constraints).

Don't resolve element type for arrays in C++

Currently, we have arrays (except of object array) in C++ runtime library with fixed element type. For example, Float16Array array. This array is used even if there is a subtype to float16 in schema like in the following example:

subtype float16 ElementType;

struct Something
{
    ELementType array[];
};

So, generated C++ code uses resolved element type for all arrays. This is pity because C++ support typedefs.

As a consequence of this, ArrayType returns name of a resolved element type, for example float16[].

BoolArray doesn't work with bool elements and uses uint8_t instead

To prevent usage of std::vector<bool> our BoolArray is based on uint8_t. It might be good for performance, but user would expect to get a bool type when accessing an element -> e.g. boolArray.elementAt(0).
If we want to use another underlying type than bool, we should be able to provide bool on the container's interface.

Currently we disabled MSVC warning C4800: forcing value to bool 'true' or 'false' (performance warning), because the generated code fires the warning when a BoolArray is used as a parameter in a parameterized type.

Imports are resolved wrongly

The following two packages are compiled without any problem even if the constant ConstraintsConstant is not visible in the package constraint_table:

package constraint_table;

import constraint_constant.SomeStructure;

sql_table ConstraintsTable
{
    int32  withoutSql;
    uint16 sqlCheckConstant sql "CHECK(sqlCheckConstant < @ConstraintsConstant)";
};
package constraint_constant;

const uint16 ConstraintsConstant = 123;

struct SomeStructure
{
    uint32  someValue;
};

Add new language: Python

Currently zserio only supports generating code for JAVA and C++.
We should add support for Python for better coverage of used languages.

  • #40 Python runtime library
  • #52 Python generator
  • #88 Python SQLite Support
  • #93 Python GRPC Support

EndOfFile-token is called 'null' by parser

If a token is missing at the end of a parsed zserio-file, 'null' is reported to be found.
Could we rename this token of EndOfFile, that would make it easier to undersstand what is missing.

Parsing test.ds
[ERROR] test.ds:2:1: expecting SEMICOLON, found 'null'

Improve parsing error message if 'struct' is missing

If keyword 'struct' is missing the parser prints the following error:

expecting EOF, found 'Experience'

Consider to improve this error message, for example to print something like

expecting zserio keyword (struct, enum, etc...), found 'Experience'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.