Giter Club home page Giter Club logo

parser-c's Introduction

parser-c

Gitter https://img.shields.io/crates/v/parser-c.svg

Rust module for parsing C code. Port of Haskell's language-c, semi-automatically translated using Corollary.

This port is a work in progress. A lot of work remains to parse anything but very simple C files; while most source code has been translated from Haskell, errors in translation prevent it from matching language-c's functionality yet. Here are the next steps for achieving parity, in order:

  1. Building up an equivalent test bed to language-c's, then automatically cross-check
  2. Fix errors in the ported code to support those test cases
  3. Converting portions of the code into Rust idioms without breaking tests
  4. Figure out a porting story for the alex/happy generated parser output

parser-c requires nightly (for now). See tests/ for some working examples, or try this example:

extern crate parser_c;

use parser_c::parse;

const INPUT: &'static str = r#"

int main() {
    printf("hello world!\n");
    return 0;
}

"#;

fn main() {
    match parse(INPUT, "simple.c") {
        Err(err) => {
            panic!("error: {}", err);
        }
        Ok(ast) => {
            println!("success: {:#?}", ast);
        }
    }
}

Result is:

success: Right(
    CTranslationUnit(
        [
            CFDefExt(
                CFunctionDef(
                    [
                        CTypeSpec(
                            CIntType(
                                ..
                            )
                        )
                    ],
                    CDeclarator(
                        Some(
                            Ident(
                                "main",
                                124382170,
                                ..
                            )
                        ),
                        ...

Development

Clone this crate:

git clone https://github.com/tcr/parser-c --init --recursive

Hacking on the lexer and parser requires to build and run the Haskell dependencies using:

./regen.sh

The test suite is being ported at the moment. It is in a separate crate, because it requires a build script with prerequisites, so to run it, use this script:

./test.sh

License

MIT

parser-c's People

Contributors

birkenfeld avatar tcr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

parser-c's Issues

src/analysis isn't actually imported

src/analysis has a trivial Haskell to Rust conversion but isn't actually included yet, because Corrode doesn't need it to work. There's some questions to figure out about monads (we can look here to parser_monad.rs for one solution).

Next steps: Include mod analysis into src/lib.rs, and start fixing up the code until it compiles. Then we'll need the proper test bench from language-c to test its functionality.

Find appropriate settings for recursion_limit, stack size

Stack size is important (see the parse function in root) to not cause stack overflows. I picked an arbitrary value here, but there might be a more scientific way of picking one or even figuring out why the code needs a stack size increase.

recursion_limit might not be needed to be set at all. Should be tested against the test bench.

Split core with lexer/parser into separate crate?

The generated parts are huge, and lead to compile times that are not too nice. I wonder if it's possible to split mostly those parts into a subcrate, so that while working on e.g. analysis the compile cycles should be much better. E.g. parser-c-core <- parser-c

Auto-generating lexer.rs and parser.rs

It's important that parser-c be able to port fixes from upstream (language-c) even though they weren't written in Rust. For the most part, source code changes can be ported over manually, assuming that most patches will are small.

But, there are exceptions. lexer.rs and parser.rs are converted from Lexer.y and Parser.x, which are inputs to a Haskell-specific lexer and parser generator (Happy and Alex, respectively). Because these generate a large amount of Haskell code, the corresponding changes to Rust code must also be massive, so porting it over manually is prohibitive.

Corollary is not capable of doing this complex a conversion any time soon. (The current lexer.rs and parser.rs were heavily edited by hand.) These are the solutions I came up with instead:

Short term solution: We use Haskell's Happy and Alex libraries to do the code generation, but modify their codemod files to output Rust instead: ProduceCode.lhs and Output.hs. Inline Haskell code in the source .y and .x files must also changed to be Rust.

The benefit of this setup is that it's simple, and we can keep using it indefinitely. Requiring Haskell for the build step will not mean Haskell is required for consumers of the library—they'll only get the generated Rust code.

Long term solution: After this, I see three obvious choices:

  1. Do nothing, just leave our Happy/Alex Haskell binaries with output==rust as part of the source tree. This introduces a Haskell dependency, but only when these files need to be regenerated.
  2. Pursue a wholly Rust solution. Convert Happy and Alex into Rust libraries, with a mixture of Corollary and manual editing. This is a lot of work to support just a single-use toolchain though.
  3. Convert the parser and lexer source files into an equivalent parser and lexer generator in the Rust ecosystem (think nom, LALRPOP, etc.) This is the dream solution as it allows us to leverage tools inside the Rust community, and allows the source files to be editable by anyone in that community. But on the other hand, we then cannot easily pull updates from upstream when they are made, creating a lot of additional work (and potential for more bugs!) for possibly little gain.

There may be better solutions than either of the above!

Clean up String handling

Since Haskell doesn't use string slices, there's a lot of places with unnecessary .to_string() use, and functions with String instead of &str in the signature.

Make reduce! a macro-by-example to work on stable

In the rush to get this compiling, there's some regex(!) hacks used in the parser-c-macro crate to get code compiling.

https://github.com/tcr/parser-c/blob/master/parser-c-macro/src/lib.rs#L18-L56

These regexes should be run against and directly committed into the code that uses the reduce!() macro. Then the reduce!() macro can focus exclusively on converting a pattern-matching function like fn test (Some(value): Option<i32>) { .. } into its deconstructed function fn test (_0: Option<i32>) { match _0 { Some(value) => { .. }, _ => panic!("Irrefutable pattern in Rust"); }.

This also would allow it to be rewritten as a procedural macro and compiled on normal rust (see #7).

rustfmt?

Corollary fails in formatting its output properly, but I didn't want to do any reformatting too early (in case I'd just need to regenerate it again). Since no files are being auto-generated anymore, it'd probably be appropriate to do one massive PR using rustfmt. Or would there be reasons this is a bad idea?

One more question, since parser-c requires nightly (for now), should we use rustfmt 0.9 or rustfmt-nightly? I'm not totally aware of how much they've diverged yet.

Either should be changed to Result

Either<A, B> is a class in support.rs that made the porting from Haskell easier. It in most cases is equivalent to Rust's Result, so it should be changed in all locations to Result and, if no cases remain, Either should be deleted from support.rs.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.