tcr / parser-c Goto Github PK

View Code? Open in Web Editor NEW

29.0 4.0 5.0 4.07 MB

Haskell's language-c ported to Rust.

Home Page: http://docs.rs/parser-c

Rust 49.47% Haskell 34.71% C 10.07% Makefile 0.24% Shell 0.37% Logos 1.09% Yacc 3.99% CSS 0.03% Ruby 0.02% C++ 0.01%

rust haskell-language parse c

parser-c's Introduction

parser-c

Rust module for parsing C code. Port of Haskell's language-c, semi-automatically translated using Corollary.

This port is a work in progress. A lot of work remains to parse anything but very simple C files; while most source code has been translated from Haskell, errors in translation prevent it from matching language-c's functionality yet. Here are the next steps for achieving parity, in order:

Building up an equivalent test bed to language-c's, then automatically cross-check
Fix errors in the ported code to support those test cases
Converting portions of the code into Rust idioms without breaking tests
Figure out a porting story for the alex/happy generated parser output

parser-c requires nightly (for now). See tests/ for some working examples, or try this example:

extern crate parser_c;

use parser_c::parse;

const INPUT: &'static str = r#"

int main() {
    printf("hello world!\n");
    return 0;
}

"#;

fn main() {
    match parse(INPUT, "simple.c") {
        Err(err) => {
            panic!("error: {}", err);
        }
        Ok(ast) => {
            println!("success: {:#?}", ast);
        }
    }
}

Result is:

success: Right(
    CTranslationUnit(
        [
            CFDefExt(
                CFunctionDef(
                    [
                        CTypeSpec(
                            CIntType(
                                ..
                            )
                        )
                    ],
                    CDeclarator(
                        Some(
                            Ident(
                                "main",
                                124382170,
                                ..
                            )
                        ),
                        ...

Development

Clone this crate:

git clone https://github.com/tcr/parser-c --init --recursive

Hacking on the lexer and parser requires to build and run the Haskell dependencies using:

./regen.sh

The test suite is being ported at the moment. It is in a separate crate, because it requires a build script with prerequisites, so to run it, use this script:

./test.sh

License

MIT

parser-c's People

Contributors

Stargazers

Watchers

Forkers

gitter-badger esovm coder0880 stevefan1999 stevefan1999-personal

parser-c's Issues

src/analysis isn't actually imported

src/analysis has a trivial Haskell to Rust conversion but isn't actually included yet, because Corrode doesn't need it to work. There's some questions to figure out about monads (we can look here to parser_monad.rs for one solution).

Next steps: Include mod analysis into src/lib.rs, and start fixing up the code until it compiles. Then we'll need the proper test bench from language-c to test its functionality.

How are comments and #directives handled by language-c?

EDIT: It actually calls out to gcc -E in preprocess.hs

Remove "Generated output with Corollary" from each file

Files are now modified by hand, so this is no longer true!

Find appropriate settings for recursion_limit, stack size

Stack size is important (see the parse function in root) to not cause stack overflows. I picked an arbitrary value here, but there might be a more scientific way of picking one or even figuring out why the code needs a stack size increase.

recursion_limit might not be needed to be set at all. Should be tested against the test bench.

Split core with lexer/parser into separate crate?

The generated parts are huge, and lead to compile times that are not too nice. I wonder if it's possible to split mostly those parts into a subcrate, so that while working on e.g. analysis the compile cycles should be much better. E.g. parser-c-core <- parser-c

How are #directives handled by language-c?

I actually don't know this, but tests can't for the moment include these.

Auto-generating lexer.rs and parser.rs

It's important that parser-c be able to port fixes from upstream (language-c) even though they weren't written in Rust. For the most part, source code changes can be ported over manually, assuming that most patches will are small.

But, there are exceptions. lexer.rs and parser.rs are converted from Lexer.y and Parser.x, which are inputs to a Haskell-specific lexer and parser generator (Happy and Alex, respectively). Because these generate a large amount of Haskell code, the corresponding changes to Rust code must also be massive, so porting it over manually is prohibitive.

Corollary is not capable of doing this complex a conversion any time soon. (The current lexer.rs and parser.rs were heavily edited by hand.) These are the solutions I came up with instead:

Short term solution: We use Haskell's Happy and Alex libraries to do the code generation, but modify their codemod files to output Rust instead: ProduceCode.lhs and Output.hs. Inline Haskell code in the source .y and .x files must also changed to be Rust.

The benefit of this setup is that it's simple, and we can keep using it indefinitely. Requiring Haskell for the build step will not mean Haskell is required for consumers of the library—they'll only get the generated Rust code.

Long term solution: After this, I see three obvious choices:

Do nothing, just leave our Happy/Alex Haskell binaries with output==rust as part of the source tree. This introduces a Haskell dependency, but only when these files need to be regenerated.
Pursue a wholly Rust solution. Convert Happy and Alex into Rust libraries, with a mixture of Corollary and manual editing. This is a lot of work to support just a single-use toolchain though.
Convert the parser and lexer source files into an equivalent parser and lexer generator in the Rust ecosystem (think nom, LALRPOP, etc.) This is the dream solution as it allows us to leverage tools inside the Rust community, and allows the source files to be editable by anyone in that community. But on the other hand, we then cannot easily pull updates from upstream when they are made, creating a lot of additional work (and potential for more bugs!) for possibly little gain.

There may be better solutions than either of the above!

Clean up String handling

Since Haskell doesn't use string slices, there's a lot of places with unnecessary .to_string() use, and functions with String instead of &str in the signature.

Get parser-c working on stable Rust

Some examples of why this is not yet the case are use of the box syntax and procedural macros (see #6).

Looking for help maintaining parser-c!

See the blog post on the Rust User's forum for more information.

Make reduce! a macro-by-example to work on stable

In the rush to get this compiling, there's some regex(!) hacks used in the parser-c-macro crate to get code compiling.

https://github.com/tcr/parser-c/blob/master/parser-c-macro/src/lib.rs#L18-L56

These regexes should be run against and directly committed into the code that uses the reduce!() macro. Then the reduce!() macro can focus exclusively on converting a pattern-matching function like fn test (Some(value): Option<i32>) { .. } into its deconstructed function fn test (_0: Option<i32>) { match _0 { Some(value) => { .. }, _ => panic!("Irrefutable pattern in Rust"); }.

This also would allow it to be rewritten as a procedural macro and compiled on normal rust (see #7).

rustfmt?

Corollary fails in formatting its output properly, but I didn't want to do any reformatting too early (in case I'd just need to regenerate it again). Since no files are being auto-generated anymore, it'd probably be appropriate to do one massive PR using rustfmt. Or would there be reasons this is a bad idea?

One more question, since parser-c requires nightly (for now), should we use rustfmt 0.9 or rustfmt-nightly? I'm not totally aware of how much they've diverged yet.

Either should be changed to Result

Either<A, B> is a class in support.rs that made the porting from Haskell easier. It in most cases is equivalent to Rust's Result, so it should be changed in all locations to Result and, if no cases remain, Either should be deleted from support.rs.