langston-barrett / treeedb Goto Github PK

View Code? Open in Web Editor NEW

51.0 3.0 7.0 134 KB

Generate Soufflé Datalog types, relations, and facts that represent ASTs from a variety of programming languages.

License: MIT License

Makefile 2.76% Nix 0.40% Rust 92.88% Java 0.37% C 0.24% Shell 2.71% JavaScript 0.10% C# 0.53%

datalog souffle static-analysis tree-sitter

treeedb's Issues

Java crates are not on crates.io

... because they depend on a git version of tree-sitter-java.

Publish v0.1.0 to crates.io

Only release candidates are available at the moment becase I kept hitting the rate limit, which appears to only allow 10 crates to be uploaded at once: rust-lang/crates.io#1596. I think I'll modify scripts/publish.sh to just wait for 30s-1min in between each upload. I may also email to ask for the ability to publish more crates at once.

Swift support

ci: Install and test with Souffle

gen/souffle: Accessors for more node fields

There's an inline node_text relation that retrieves a node's text based on its ID. There should be one such relation for each field in the node table.

ci: Release to crates.io

ci: Provide executables for other operating systems

The CI system builds static executables for Linux. We should do the same for other OSes.

Haskell support

gen/souffle: Configurable field suffix

To avoid name collisions, the Souffle generator suffixes field relations with _f. This can and should be configurable - and it should be left off in grammars that don't have such collisions.

Rust support

gen/souffle: Configurable case conventions

The Souffle generator uses CamelCase for types and snake_case for relations, but this could easily be configurable.

Trying to get all sub-nodes of a FunctionDeclarator

Hey Langston! 👋

I've been mucking around with the C lang, and I think I may be misunderstanding how tree-sitter works. I'm trying to get all the nodes that fall under a particular function declaration. This is my Datalog:

.decl parent_func(func: FunctionDeclarator, node: Node)
parent_func(as(f, FunctionDeclarator), n) :-
    function_definition_declarator_f(n, f).
parent_func(as(f, FunctionDeclarator), child) :-
    parent_func(f, parent),
    field(parent, _, child).

.decl show_parent_func(function: NodeText, node_type: symbol)
show_parent_func(function, ty) :-
    parent_func(f, n),
    node_text(f, function),
    node(n, ty, _, _, _, _, _, _, _, _, _, _, _).

.output show_parent_func(IO=stdout)

My test program:

int add(int a, int b) {
    return a + b;
}

The Souffle output:

---------------
show_parent_func
===============
add(int a, int b)       function_declarator
add(int a, int b)       function_definition
add(int a, int b)       compound_statement
add(int a, int b)       identifier
add(int a, int b)       parameter_list
add(int a, int b)       primitive_type
===============

And the tree-sitter AST:

(translation_unit [0, 0] - [3, 0]
  (function_definition [0, 0] - [2, 1]
    type: (primitive_type [0, 0] - [0, 3])
    declarator: (function_declarator [0, 4] - [0, 21]
      declarator: (identifier [0, 4] - [0, 7])
      parameters: (parameter_list [0, 7] - [0, 21]
        (parameter_declaration [0, 8] - [0, 13]
          type: (primitive_type [0, 8] - [0, 11])
          declarator: (identifier [0, 12] - [0, 13]))
        (parameter_declaration [0, 15] - [0, 20]
          type: (primitive_type [0, 15] - [0, 18])
          declarator: (identifier [0, 19] - [0, 20]))))
    body: (compound_statement [0, 22] - [2, 1]
      (return_statement [1, 4] - [1, 17]
        (binary_expression [1, 11] - [1, 16]
          left: (identifier [1, 11] - [1, 12])
          right: (identifier [1, 15] - [1, 16]))))))

As you can see, the nodes under the add function stop at the compound_statement; they don't seem to go any deeper. I'm guessing this is because the return_statement, etc. are all children of compound_statement, rather than fields (I see this when I look at node-types.json)? If this assumption is correct, what would be the correct way of writing such a query?

Thanks for your help and your efforts developing cool tools!

Generating "narrow" instead of "wide" tables, e.g., node_kind, node_start_byte, etc. instead of just node.
Splitting the tables based on the node kind - the generated Datalog code does this already, so it might be more efficient to do it up front.

langston-barrett / treeedb Goto Github PK

treeedb's Issues

Recommend Projects

Recommend Topics

Recommend Org