cthulhu-irl / parsi Goto Github PK
View Code? Open in Web Editor NEWA declarative parser combinator library for C++20.
Home Page: https://cthulhu-irl.github.io/parsi/
License: MIT License
A declarative parser combinator library for C++20.
Home Page: https://cthulhu-irl.github.io/parsi/
License: MIT License
Add constexpr friendly equivalent of std::bitset
for Charset
class.
std::bitset
is not constexpr and it keeps back so many combinations from producing a parser at compile-time.
making sure that a parser can be made at compile-time is crucial, as it would help with avoiding unnecessary redundant captures and also easily making custom parsers that use small core parsers inside.
Charset
constructor and methods are constexpr.Charset
instance can be created.The project source code is currently just a prototype and it is unorganized, which was to be able to quickly iterate, try out different designs, and shape its core/basic API and expectations.
Now that its design has been somewhat established, and its core is growing fast, it's better to move towards establishing a codebase that can be collaborated on. it requires proper directory structure and code organization.
Goals:
Definition of Done:
For a parser library performance is highly valuable and essential. Currently there is a micro benchmark template sample, but no actual benchmark on core utilities to track performance improvements or decrements of vital parts.
There should be micro benchmarks to measure efficiency of core parsers, especially expect
family.
parsi::fn
namespace.Currently expect
core parser can check whether there is a given fixed string at the cursor on the stream being parsed. but it can't handle single character, nor a charset to expect either of the characters in a set.
This is required to make simple parsers that parse strings, variable names, numbers, etc.
expect
variant that can parse an expected given single character.expect
variant that can parse an expected single character from a given charset.There are no clear roadmap and future considered for this library.
Adding a comprehensive glimpse of what's planned ahead in README document would be nice.
Add android CI.
To officially support android.
There aren't any documentation automation in the project, and it is not clear how the document should be generated.
There should be a documentation generator automation which can be used both by developers, and also later CI/CD to automatically update and upload the docs.
CMake must be used, and It is suggested to use a combination of doxygen, sphinx, and breathe similar to fmtlib.
docs
must be available and able to generate api docs.need add new benchmark functions (same situation) in benchmark dir to comparing vs famous parser libraries
compare vs :
[ ] result of comparing with rapidjson
[ ] result of comparing with simdjson
[ ] result of comparing with nlohman
...
Is your feature request related to a problem? Please describe.
There is no way to fall through another path when a subparser fails. this makes it impossible to parse things like a json item where a value can be either null, number, string, array, or object, which all of them have different starting sequence and different ruleset.
Describe the solution you'd like
A combinator that can retract the parser cursor and fall back into the next possible match could be quite useful. for example a string literal could be parsed by anyof
combinator:
auto expect_string() {
const auto escaped_charset = parsi::Charset("\"'nra");
return parsi::sequence(
parsi::expect('"'),
parsi::repeat(
parsi::anyof(
parsi::sequence(parsi::expect('\\'), parsi::expect(escaped_charset)),
parsi::expect(normal_charset)
)
),
parsi::expect('"')
);
}
or a json value as:
const auto item_parser = parsi::anyof(
expect_null(visitor),
expect_decimal(visitor),
expect_integer(visitor),
expect_string(visitor),
[=](parsi::Stream stream) -> parsi::Result { // recursive
retrun expect_array(visitor)(stream);
},
[=](parsi::Stream stream) -> parsi::Result { // indirect recursive
retrun expect_object(visitor)(stream);
}
);
Describe alternatives you've considered
There aren't. there must be a retracting combinator.
Currently combining two charsets requires too much of manual work, while it is a very useful and common operation that is needed. for example having numeric, lowercase, and uppercase alphabet charset, making alphanumeric charset should be simply combining the 3 charsets that we already have.
auto numeric = Charset("0123456789");
auto lowercase = Charset("abcdefghijklmnopqrstuvwxyz");
auto uppercase = Charset("ABCDEFGHIJKLMNOPQRSTUVWXYZ");
auto alphanum = numeric + lowercase + uppercase; // or numeric.combined(lowercase).combined(uppercase)
assert(alphanum == Charset("0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"));
Is your feature request related to a problem? Please describe.
When trying to match anything other than a set of charset or a single character (matching against a single character), I have to write all the other characters which is too hard to do so. for this reason I could use an additional parameter to negate the match of given charset.
Describe the solution you'd like
To be able to pass an additional parameter to single char parsi::expect
overload to indicate whether to negate the match or not.
for example when matching anything other than newline: parsi::expect('\n', parsi::expect_nagated_tag{true})
or to match anything other than digits: parsi::expect(parsi::Charset("0123456789"), parsi::expect_negated_tag{true})
Describe alternatives you've considered
The frustrating path of manually writing all characters into a parsi::Charset
other than those I don't want to match, which is infeasible.
Is your feature request related to a problem? Please describe.
Currently the parsi library provides utilities for linear parsing along with an optional
parser that backtracks.
But there are cases where that's not enough, and you might want to backtrack to several different option and keep going on with one that matches.
For example when parsing a json array item, it can be null
, an integer number, a decimal number, a string, an object, or another array.
Describe the solution you'd like
Examining the json item example, epending on the successor it can choose which path to go:
n
, then it's definitely null
+
, -
, or a digit then it's either an integer or decimal"
then it's for sure a string[
then it's an array{
then it's an objectIt seemingly can be a set of predecessor and successor pair, which works fine for the most part, but for the numbers it can't easily distinguish the integer and decimal from each other.
This becomes problematic when considering the extract
utility which in here is better suited to operate on the whole number instead of two different portions of it.
There are two suggestions here:
anyof
combinator that works similar to optional
but takes two or more parsers and upon parsing it would try these parsers one by one and at least one of them must succeed. it will return the result of the one that succeeds or the failed result of last parser.select
combinator that takes two or more pairs of parsers (predecessor and successor) and upon parsing it will iterate on predecessors and if one a predecessor succeeds it runs the successor on the original stream and return the result.for anyof
it would look like this:
auto expect_string() {
const auto escaped_charset = parsi::Charset("\"'nra");
return parsi::sequence(
parsi::expect('"'),
parsi::repeat(
parsi::anyof(
parsi::sequence(parsi::expect('\\'), parsi::expect(escaped_charset)),
parsi::expect(normal_charset)
)
),
parsi::expect('"')
);
}
or about json example:
const auto item_parser = parsi::anyof(
expect_null(visitor),
expect_decimal(visitor),
expect_integer(visitor),
expect_string(visitor),
[=](auto&&, parsi::Stream stream) -> parsi::Result { // recursive
retrun expect_array(visitor)(stream);
},
[=](auto&&, parsi::Stream stream) -> parsi::Result { // indirect recursive
retrun expect_object(visitor)(stream);
}
);
for select
it would look like:
...
Describe alternatives you've considered
I couldn't come up with any.
when forwarding the deducted types to classes, the types also contain const/volatile and also references, thus the member variables would be defined with unintended type signature, becoming problematic for copy/move constructions/assignments.
use remove_cvref to define those member variables so they wouldn't contain const/volatile or references.
JSON data format is one of the most common data format used for inter-process communication, e.g. RESTful services.
Parsi as a library to ease making parsers should be able to make a simple and common data format like json parser, and it would be a great demonstration of the parsi library's usage and benchmark test.
stream structure should support both string and binary data, and provide a unified public api instead of allowing direct member variable access.
Currently there are no globally defined formatting style for the project's source codes other than basic editorconfig.
To unify the C++ coding style across the project for every participant, a clang-format (de facto standard linter for C++) config must be added.
Goals:
Definition of Done:
.clang-format
config at the root of this project.Parsi is a parser combinator library and it has core building blocks to make parsers, but it also would be much nicer to provide serializer/deserializer for common simple data formats.
MsgPack as binary data format alternative to json is simple, powerful, and also easy to parse compared to json. It is also a good candidate to benchmark.
To have a report on test coverage, add test coverage to build system.
test-coverage
produce html output in build/coverage directory.A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.