Giter Club home page Giter Club logo

parsi's People

Contributors

cthulhu-irl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

parsi's Issues

make charset constexpr friendly

Description

What to do?

Add constexpr friendly equivalent of std::bitset for Charset class.

Why to do?

std::bitset is not constexpr and it keeps back so many combinations from producing a parser at compile-time.

making sure that a parser can be made at compile-time is crucial, as it would help with avoiding unnecessary redundant captures and also easily making custom parsers that use small core parsers inside.

Goals

  • keep the library fast and compile-time friendly.

Definition of Done

  • Charset constructor and methods are constexpr.
  • constexpr Charset instance can be created.

organize the units into different files and test cases

The project source code is currently just a prototype and it is unorganized, which was to be able to quickly iterate, try out different designs, and shape its core/basic API and expectations.

Now that its design has been somewhat established, and its core is growing fast, it's better to move towards establishing a codebase that can be collaborated on. it requires proper directory structure and code organization.

Goals:

  • Improve maintainability and readability.
  • Get out of prototype form and become open to contributions.

Definition of Done:

  • Each parser/combinator must be in its own header/source file.
  • Basic parsers/combinators tests should be grouped into their own separate test case.

add micro benchmark for core parsers

Description

For a parser library performance is highly valuable and essential. Currently there is a micro benchmark template sample, but no actual benchmark on core utilities to track performance improvements or decrements of vital parts.

There should be micro benchmarks to measure efficiency of core parsers, especially expect family.

Goals

  • automate development common processes for faster developments.
  • measures to guide development team.

Definition of Done

  • the benchmarks must involve all the core parsers under parsi::fn namespace.
  • the benchmarks must have data patterns for both best and worst scenarios.

add charset expect

Description

Currently expect core parser can check whether there is a given fixed string at the cursor on the stream being parsed. but it can't handle single character, nor a charset to expect either of the characters in a set.

This is required to make simple parsers that parse strings, variable names, numbers, etc.

Goals

  • providing an alternative to regex.
  • providing a minimal but not too lacking of a parser library for common use.

Definition of Done

  • there is an expect variant that can parse an expected given single character.
  • there is an expect variant that can parse an expected single character from a given charset.

add roadmap to readme

Description

There are no clear roadmap and future considered for this library.

Adding a comprehensive glimpse of what's planned ahead in README document would be nice.

add android ci

Description

What to do?

Add android CI.

Why to do?

To officially support android.

Goals

  • support common platforms.

Definition of Done

  • there should be a CI to build the library and tests for android targets by presets.
  • readme document should show a badge related to android support.

add documentation generator to build

Description

There aren't any documentation automation in the project, and it is not clear how the document should be generated.

There should be a documentation generator automation which can be used both by developers, and also later CI/CD to automatically update and upload the docs.

CMake must be used, and It is suggested to use a combination of doxygen, sphinx, and breathe similar to fmtlib.

Goals

  • establish base common project requirements for public usage and collaboration.
  • automate documentation generation for CI-based updates.

Definition of Done

  • cmake target docs must be available and able to generate api docs.

Comparing vs famous parsers

Description

need add new benchmark functions (same situation) in benchmark dir to comparing vs famous parser libraries

Goals

compare vs :

  • rapidjson
  • simdjson
  • nlohman
  • TODO: add more ...

Definition of Done

[ ] result of comparing with rapidjson
[ ] result of comparing with simdjson
[ ] result of comparing with nlohman
...

add anyof combinator

Is your feature request related to a problem? Please describe.

There is no way to fall through another path when a subparser fails. this makes it impossible to parse things like a json item where a value can be either null, number, string, array, or object, which all of them have different starting sequence and different ruleset.

Describe the solution you'd like

A combinator that can retract the parser cursor and fall back into the next possible match could be quite useful. for example a string literal could be parsed by anyof combinator:

auto expect_string() {
    const auto escaped_charset = parsi::Charset("\"'nra");
    return parsi::sequence(
        parsi::expect('"'),
        parsi::repeat(
            parsi::anyof(
                parsi::sequence(parsi::expect('\\'), parsi::expect(escaped_charset)),
                parsi::expect(normal_charset)
            )
        ),
        parsi::expect('"')
    );
}

or a json value as:

const auto item_parser = parsi::anyof(
  expect_null(visitor),
  expect_decimal(visitor),
  expect_integer(visitor),
  expect_string(visitor),
  [=](parsi::Stream stream) -> parsi::Result {  // recursive
    retrun expect_array(visitor)(stream);
  },
  [=](parsi::Stream stream) -> parsi::Result {  // indirect recursive
    retrun expect_object(visitor)(stream);
  }
);

Describe alternatives you've considered

There aren't. there must be a retracting combinator.

add combine method to charset

Currently combining two charsets requires too much of manual work, while it is a very useful and common operation that is needed. for example having numeric, lowercase, and uppercase alphabet charset, making alphanumeric charset should be simply combining the 3 charsets that we already have.

auto numeric = Charset("0123456789");
auto lowercase = Charset("abcdefghijklmnopqrstuvwxyz");
auto uppercase = Charset("ABCDEFGHIJKLMNOPQRSTUVWXYZ");

auto alphanum = numeric + lowercase + uppercase;  // or numeric.combined(lowercase).combined(uppercase)

assert(alphanum == Charset("0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"));

add match negation for charset expect

Is your feature request related to a problem? Please describe.
When trying to match anything other than a set of charset or a single character (matching against a single character), I have to write all the other characters which is too hard to do so. for this reason I could use an additional parameter to negate the match of given charset.

Describe the solution you'd like
To be able to pass an additional parameter to single char parsi::expect overload to indicate whether to negate the match or not.

for example when matching anything other than newline: parsi::expect('\n', parsi::expect_nagated_tag{true})

or to match anything other than digits: parsi::expect(parsi::Charset("0123456789"), parsi::expect_negated_tag{true})

Describe alternatives you've considered

The frustrating path of manually writing all characters into a parsi::Charset other than those I don't want to match, which is infeasible.

add anyof combinator

Is your feature request related to a problem? Please describe.

Currently the parsi library provides utilities for linear parsing along with an optional parser that backtracks.

But there are cases where that's not enough, and you might want to backtrack to several different option and keep going on with one that matches.

For example when parsing a json array item, it can be null, an integer number, a decimal number, a string, an object, or another array.

Describe the solution you'd like

Examining the json item example, epending on the successor it can choose which path to go:

  • if starts with n, then it's definitely null
  • if starts with +, -, or a digit then it's either an integer or decimal
  • if starts with " then it's for sure a string
  • if starts with [ then it's an array
  • if starts with { then it's an object
  • otherwise it must raise error

It seemingly can be a set of predecessor and successor pair, which works fine for the most part, but for the numbers it can't easily distinguish the integer and decimal from each other.

This becomes problematic when considering the extract utility which in here is better suited to operate on the whole number instead of two different portions of it.

There are two suggestions here:

  1. define an anyof combinator that works similar to optional but takes two or more parsers and upon parsing it would try these parsers one by one and at least one of them must succeed. it will return the result of the one that succeeds or the failed result of last parser.
  2. define a select combinator that takes two or more pairs of parsers (predecessor and successor) and upon parsing it will iterate on predecessors and if one a predecessor succeeds it runs the successor on the original stream and return the result.

for anyof it would look like this:

auto expect_string() {
    const auto escaped_charset = parsi::Charset("\"'nra");
    return parsi::sequence(
        parsi::expect('"'),
        parsi::repeat(
            parsi::anyof(
                parsi::sequence(parsi::expect('\\'), parsi::expect(escaped_charset)),
                parsi::expect(normal_charset)
            )
        ),
        parsi::expect('"')
    );
}

or about json example:

const auto item_parser = parsi::anyof(
  expect_null(visitor),
  expect_decimal(visitor),
  expect_integer(visitor),
  expect_string(visitor),
  [=](auto&&, parsi::Stream stream) -> parsi::Result {  // recursive
    retrun expect_array(visitor)(stream);
  },
  [=](auto&&, parsi::Stream stream) -> parsi::Result {  // indirect recursive
    retrun expect_object(visitor)(stream);
  }
);

for select it would look like:

...

Describe alternatives you've considered
I couldn't come up with any.

use remove_cvref on template types for members

when forwarding the deducted types to classes, the types also contain const/volatile and also references, thus the member variables would be defined with unintended type signature, becoming problematic for copy/move constructions/assignments.

use remove_cvref to define those member variables so they wouldn't contain const/volatile or references.

make a json parser example

Description

JSON data format is one of the most common data format used for inter-process communication, e.g. RESTful services.

Parsi as a library to ease making parsers should be able to make a simple and common data format like json parser, and it would be a great demonstration of the parsi library's usage and benchmark test.

Goals

  • demonstrate parsi's usage example.
  • having a real world data format parser for benchmark.

Definition of Done

  • the parser should be able to parse json data samples that cover most of standard json.

generalize stream structure

stream structure should support both string and binary data, and provide a unified public api instead of allowing direct member variable access.

add clang format config

Currently there are no globally defined formatting style for the project's source codes other than basic editorconfig.

To unify the C++ coding style across the project for every participant, a clang-format (de facto standard linter for C++) config must be added.

Goals:

  • Improve maintainability and readability.
  • Become open to contributions.

Definition of Done:

  • There must be a .clang-format config at the root of this project.
  • The codebase should be reformatted to conform to the suggested format config.

add msgpack library

Description

Parsi is a parser combinator library and it has core building blocks to make parsers, but it also would be much nicer to provide serializer/deserializer for common simple data formats.

MsgPack as binary data format alternative to json is simple, powerful, and also easy to parse compared to json. It is also a good candidate to benchmark.

Goals

  • bring parsi to real world use cases.
  • provide builtin parser for the very common yet simple data formats.

Definition of Done

  • there must be a mini library that can serialize and deserialize standard message pack format.
  • parsers must be visit-based and preferably allocation-free.
  • nested parsers must not rely on call stack and recursion.

add test coverage to build system

Description

To have a report on test coverage, add test coverage to build system.

Goals

  • easily produce coverage report with one command.
  • be able to coverage report to ci later.

Definition of Done

  • cmake target test-coverage produce html output in build/coverage directory.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.