Giter Club home page Giter Club logo

cst's Introduction

Build Status

JavaScript CST implementation

CST

Check out code samples and rest of the wiki for more.

CST means Concrete Syntax Tree. Unlike an AST (Abstract Syntax Tree), a CST contains all the information from the JavaScript source file: whitespace, punctuators, comments. This information is extremely useful for code style checkers and other code linters. CST is also useful for cases when you need to apply modifications to existing JavaScript files while preserving the initial file formatting.

This CST implementation is designed to be 100% compatible with JS AST (https://github.com/estree/estree).

Main principles:

  • CST contains all the information from a parsed file (including whitespace and comments).
  • Compatible with AST (https://github.com/estree/estree).
  • Requires tokens to modify CST structure.
  • The tree is always valid (it protects itself against breaking changes).
  • CST can be rendered to valid JS at any time.

Let's see an example:

x = 0;
if (x) x++;

The CST for this example:

  • Blue text — CST Tokens.
  • White text in blue blocks — CST Nodes (their structure is equal to an AST).
  • Blue lines — CST Structure.
  • Red lined — AST Links.

Classes

Element

Element is the base class for Node and Token.

declare class Element {

  // traversal for children
  childElements: Array<Element>;
  firstChild: ?Element;
  lastChild: ?Element;

  // traversal for parent
  parentElement: ?Element;

  // traversing between siblings
  nextSibling: ?Element;
  previousSibling: ?Element;

  // traversing to first/last tokens (not only direct tokens)
  getFirstToken(): ?Token;
  getLastToken(): ?Token;

  // traversing to next/previous tokens (not only siblings)
  getNextToken(): ?Token;
  getPreviousToken(): ?Token;

  // Code properties
  type: string;
  isToken: boolean;
  isNode: boolean;
  isExpression: boolean;
  isStatement: boolean;
  isWhitespace: boolean;
  isFragment: boolean;
  isModuleDeclaration: boolean;
  isModuleSpecifier: boolean;

  // Code methods
  getSourceCode(): string;
  getSourceCodeLength(): number;

  // Mutation methods

  // appends child to the end of the `Element`
  appendChild(newElement: Element): void;
  // prepends child to the end of the `Element`
  prependChild(newElement: Element): void;
  // inserts child before `referenceChild`
  insertChildBefore(newElement: Element, referenceChild: Element): void;
  // replaces specified child interval (from `firstChildRef` to lastChildRef`) with specified child.
  replaceChildren(newElement: Element, firstRefChild: Element, lastRefChild: Element): void;

  // Location methods
  getRange(): Range;
  getLoc(): Location;
}

declare class Token extends Element {
  // token value
  value: string;
}

type Range = [
    start: number;
    end: number;
];

type Position = {
  line: number,
  column: number
};

type Location = {
  start: Position,
  end: Position
};

Node

Node extends Element. The Nodes are the "AST part of a CST". If you drop everything but Nodes from a CST, you will get a pure AST from the Node structure. So it is fair to say that Nodes provide the AST logic for a CST. Currently only Nodes can contain children.

The Node property isNode always returns true.

Token

Token extends Element. The purpose of a CST is to have tokens in the tree. By only manipulating tokens, we can change code formatting without any effect on the behaviour.

The Token property isToken always returns true.

cst's People

Contributors

forivall avatar graingert avatar hzoo avatar ikokostya avatar markelog avatar mdevils avatar paazmaya avatar playermet avatar yeti-or avatar yihongang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cst's Issues

ES6+ Support

Are we also going to implement es6+ stuff? Maybe stage > 2?

https://github.com/tc39/ecma262#current-proposals

Candidate

  • Stage 3: ExponentiationOperator #46
  • Stage 3: Async/AwaitExpression #49

Draft

  • Stage 2: SpreadProperty (and RestProperty) #51
  • Stage 2: Trailing commas in function parameter lists and calls #55

Proposal

  • Stage 1: Additional export-from Statements #70
  • Stage 1: Class and Property Decorators
  • Stage 1: Class Property Declarations

Strawman

  • Stage 0: BindExpression (function bind)
  • Stage 0: DoExpression

Other

- [ ] Stage 1: Call constructor

  • Stage 0: ComprehensionBlock/ComprehensionExpression/GeneratorExpression

Perf/Benchmarks

Maybe can run benchmarks between each release/PR to see if anything becomes too slow?

babel-node test/benchmarks/benchmarks.js on ee1ed9e with [email protected]

CST building costs
   - Parse (babel) x 10.42 ops/sec ±4.17% (30 runs sampled)
   - Parse, fix tokens x 8.18 ops/sec ±4.73% (25 runs sampled)
   - Parse, fix tokens, build CST x 3.08 ops/sec ±8.83% (12 runs sampled)
  Fastest is Parse (babel)

Building structure
   - NO INDEX, Array children x 637 ops/sec ±1.31% (88 runs sampled)
   - NO INDEX, List children x 570 ops/sec ±1.11% (89 runs sampled)
   - NO INDEX, List children, non recursive x 300 ops/sec ±1.56% (86 runs sampled)
   - NO INDEX, List children, closure recursive x 302 ops/sec ±0.82% (87 runs sampled)
   - INDEX, using sets x 286 ops/sec ±1.48% (87 runs sampled)
   - INDEX, using hashes x 127 ops/sec ±7.29% (60 runs sampled)
   - INDEX, using arrays x 609 ops/sec ±0.70% (93 runs sampled)
  Fastest is NO INDEX, Array children

Searching by type
   - NO INDEX, Array children x 507 ops/sec ±1.28% (90 runs sampled)
   - NO INDEX, List children x 572 ops/sec ±1.05% (90 runs sampled)
   - NO INDEX, List children, non recursive x 418 ops/sec ±0.92% (90 runs sampled)
   - NO INDEX, List children, closure recursive x 632 ops/sec ±1.24% (91 runs sampled)
   - INDEX, using sets x 4,750 ops/sec ±1.00% (91 runs sampled)
   - INDEX, using hashes x 1,618 ops/sec ±1.05% (92 runs sampled)   - INDEX, using arrays x 40,021 ops/sec ±0.76% (93 runs sampled)
  Fastest is INDEX, using arrays

Removing, then readding
   - NO INDEX, Array children x 263 ops/sec ±1.13% (84 runs sampled)
   - NO INDEX, List children x 507 ops/sec ±0.95% (89 runs sampled)
   - NO INDEX, List children, non recursive x 385 ops/sec ±0.90% (90 runs sampled)
   - NO INDEX, List children, closure recursive x 508 ops/sec ±1.07% (91 runs sampled)
   - INDEX, using sets x 118 ops/sec ±2.54% (76 runs sampled)
   - INDEX, using hashes x 53.21 ops/sec ±1.75% (60 runs sampled)
   - INDEX, using arrays x 120 ops/sec ±1.66% (77 runs sampled)
  Fastest is NO INDEX, List children, closure recursive,NO INDEX, List children

ES6: Template Strings

We need to implement nodes for:

    /*--*/ TaggedTemplateExpression: ['tag', 'quasi'],
    /*--*/ TemplateElement: [],
    /*--*/ TemplateLiteral: ['quasis', 'expressions'],

ES6: Import/Export/Modules

    /*--*/ ExportAllDeclaration: ['source'],
    /*--*/ ExportDefaultDeclaration: ['declaration'],
    /*--*/ ExportNamedDeclaration: ['declaration', 'specifiers', 'source'],
    /*--*/ ExportSpecifier: ['exported', 'local'],

    /*--*/ ImportDeclaration: ['specifiers', 'source'],
    /*--*/ ImportDefaultSpecifier: ['local'],
    /*--*/ ImportNamespaceSpecifier: ['local'],
    /*--*/ ImportSpecifier: ['imported', 'local'],

Also

ModuleDeclaration
ModuleSpecifier

removeChild vs removeToken

It seems we need a removeToken method, just like the one we have in JSCS.

Right now, probably as expected, this -

let program = parseAndGetProgram('var first = 1; var second = 2;');
program.removeChild(program.firstToken); //  Error: The element to be removed is not a child of this element.

Doesn't work.

@mdevils thoughts?

Error: Token expected but "JSXAttribute" found

Hello! I'm getting the error: Error: Token expected but "JSXAttribute" found when parsing the following JSX: <input type="text" id="input" />. When I remove id attribute, parsing is done without errors.

Is it a known bug or am I doing wrong something?

Here's a snippet of code I'm running.

var Parser = require('cst').Parser;
var parser = new Parser();

parser.parse('<input type="text" id="input" />');

Running tests

I think you can just do "mocha --compilers js:babel/register"

instead of mocha --require test/babelhook so you won't need the file anymore.

Question about customizing parsing

I was elated to discover this project. It does exactly what I need for a project I just started two days ago (https://github.com/mark-hahn/jsw). I have spent the last two days investigating how to do a round trip from JS code to AST and back while preserving the original text. This was needed to avoid unnecessary git diffs.

A question:

I need to modify the CST to output a slightly different syntax than JS and this project will make that trivial. However, I will also need to later parse my modified syntax back into a CST. It is not obvious how I should do this.

Can anyone offer any suggestions for customizing the parsing into a CST?

Create context select methods

i.e so methods like selectNodesByType would be exist on other types of entities.

Even though they can be slow, it could be terribly useful.

Pluggable parser support

Would it be possible to decouple the main logic from parsing itself and provide support for pluggable parsers similarly to ESLint? As far as I understand, they're not hardly linked so far, so should be possible to do as soon as parser emits same AST and tokens format.

Windows support

I wanted to look at #87 to see if updating some libraries could help.
But I found that Window is not supported:
flow-bin
No binary found matching your system. It's probably not supported.
It seems there is some work to support Windows platform on flow-bin side, so I leave this issue mostly as a tracking one.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.