Giter Club home page Giter Club logo

syntax-parser's Introduction

syntax-parser

syntax-parser is a parser using pure javascript, so it can both run in browser and nodejs.

CircleCI Status NPM Version Code Coverage

syntax-parser supports:

  • lexer.
  • parser.

Lexer

createLexer can help you create a lexer.

Example

import { createLexer } from 'syntax-parser';

const myLexer = createLexer([
  {
    type: 'whitespace',
    regexes: [/^(\s+)/],
    ignore: true
  },
  {
    type: 'word',
    regexes: [/^([a-zA-Z0-9]+)/]
  },
  {
    type: 'operator',
    regexes: [/^(\+)/]
  }
]);

myLexer('a + b');
// [
//   { "type": "word", "value": "a", "position": [0, 1] },
//   { "type": "operator", "value": "+", "position": [2, 3] },
//   { "type": "word", "value": "b", "position": [4, 5] }
// ]

type

Token type name, you can use any value here, and you will use it in the parser stage.

regexes

Regexes that use to be matched for each Token type.

ignore

The matching Token will not be added to the Token result queue.

In general, whitespace can be ignored in syntax parsing.

Parser

createParser can help you create a parser. Parser requires a lexer.

import { createParser, chain, matchTokenType, many } from 'syntax-parser';

const root = () => chain(addExpr)(ast => ast[0]);

const addExpr = () =>
  chain(matchTokenType('word'), many(addPlus))(ast => ({
    left: ast[0].value,
    operator: ast[1] && ast[1][0].operator,
    right: ast[1] && ast[1][0].term
  }));

const addPlus = () =>
  chain('+', root)(ast => ({
    operator: ast[0].value,
    term: ast[1]
  }));

const myParser = createParser(
  root, // Root grammar.
  myLexer // Created in lexer example.
);

myParser('a + b');
// ast:
// [{
//   "left": "a",
//   "operator": "+",
//   "right": {
//     "left": "b",
//     "operator": null,
//     "right": null
//   }
// }]

chain

Basic grammatical element, support four parameters:

string

String means match token:

chain('select', 'table'); // Match 'select table'

array

Array means 'or':

chain('select', ['table', 'chart']); // Match both 'select table' and 'select chart'

matchTokenType

matchTokenType allow you match Token type defined in lexer.

chain('select', matchTokenType('word')); // Match 'select [any word!]'

function

It's easy to call another chain function:

const a = () => chain('select', b);
const b = () => chain('table');

many/optional

Just as literal meaning:

const a = () => chain('select', optional('table')); // Match both 'select' and 'select table'
const b = () => chain('select', many(',', matchTokenType('word'))); // Match both 'select' and 'select a' and 'select a, b' .. and so on.

optional many can also use chain as parameter. many(chain(..))

The last callback allow partial redefin of local ast:

chain('select', 'table')(
  ast => ast[0] // return 'select'
);

Tests

npm test

Monaco Editor Sql Editor

If you want to see this demo, run this command:

npm run docs

Then select demo Monaco Editor.

syntax-parser's People

Contributors

ascoders avatar linningmii avatar lxylona avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

syntax-parser's Issues

疑问

我稍微修改了一下四则运算的例子,用递归声明文法, 发现死循环了。 是我的语法不对吗

const myLexer = createLexer([
    {
      type: 'whitespace',
      regexes: [/^(\s+)/],
      ignore: true
    },
    {
      type: 'word',
      regexes: [/^([a-zA-Z0-9]+)/]
    },
    {
      type: 'add',
      regexes: [/^(\+)/]
    },
    {
      type: 'minus',
      regexes: [/^(\-)/]
    },
    {
      type: 'times',
      regexes: [/^(\*)/]
    },
    {
      type: 'div',
      regexes: [/^(\/)/]
    },
    {
      type: 'lparen',
      regexes: [/^(\()/]
    },
    {
      type: 'lparen',
      regexes: [/^(\))/]
    }
  ]);
const addExpr= () => chain([
  chain(matchTokenType('word'), '+', matchTokenType('word'))(ast => ({
    left: ast[0],
    operator: '+',
    right: ast[2]
  })),
  chain(matchTokenType('word'), '-', matchTokenType('word')) (ast => ({
    left: ast[0],
    operator: '-',
    right: ast[2]
  })),
  chain(addExpr, '+', matchTokenType('word')) (ast => ({
    left: ast[0],
    operator: '+',
    right: ast[2]
  })),
  chain(addExpr, '-', matchTokenType('word')) (ast => ({
    left: ast[0],
    operator: '-',
    right: ast[2]
  }))
]) (ast => ast[0])

const root = () => chain(addExpr)(ast => ast[0]);

const myParser = createParser(
  root, // Root grammar.
  myLexer // Created in lexer example.
);

console.log(JSON.stringify(myParser('a + b - c').ast))

Error: Cannot find module 'syntax-parser'

node版本: v8.11.3

步骤:

  1. npm i syntax-parser --save

  2. 代码

const { createLexer } = require('syntax-parser');

const myLexer = createLexer([
    {
        type: 'whitespace',
        regexes: [/^(\s+)/],
        ignore: true
    },
    {
        type: 'word',
        regexes: [/^([a-zA-Z0-9]+)/]
    },
    {
        type: 'operator',
        regexes: [
            /^(\(|\))/, // '(' ')'.
            /^(\+|\-)/ // operators for + -.
        ]
    }
]);
console.log(myLexer('a + b'));
  1. 结果:
module.js:549
    throw err;
    ^

Error: Cannot find module 'syntax-parser'
    at Function.Module._resolveFilename (module.js:547:15)
    at Function.Module._load (module.js:474:25)
    at Module.require (module.js:596:17)
    at require (internal/module.js:11:18)
    at Object.<anonymous> (~/sqlweb/test.js:2:25)
    at Module._compile (module.js:652:30)
    at Object.Module._extensions..js (module.js:663:10)
    at Module.load (module.js:565:32)
    at tryModuleLoad (module.js:505:12)
    at Function.Module._load (module.js:497:3)

Add LICENSE

English:

Hello,

I'm developing an application based on your repository on GitHub. I respect and appreciate your work greatly. I've noticed that your repository does not include a LICENSE file, and I'm a bit unsure about how I can properly and legally use and modify your code. If it's possible, could you please add an open-source license to clearly outline the terms of use and distribution of your code? This will not only protect your rights but also provide clear guidance for other users. Thank you very much for your understanding and support.

Looking forward to your response.

Thank you

中文:

您好,

我正在基于您在GitHub上的仓库创建我的应用。我非常尊重并欣赏您的工作。我注意到,您的仓库没有包含LICENSE文件,因此对于我如何正确和合法地使用并修改您的代码,我感到有些困惑。如果可能的话,您是否可以添加一个开源许可证,以明确说明对您代码的使用和分发条件?这样不仅可以保护您的权益,同时也能为其他使用者提供明确的指引。非常感谢您的理解和支持。

期待您的回复。

谢谢

一个解析四则运算表达式的问题

我按照首页的解析四则运算表达时的例子运行一下发现打印的语法书不全。

import { createLexer } from './lexer'
import { chain, createParser, many, matchTokenType } from './parse'
const myLexer = createLexer([
    {
      type: 'whitespace',
      regexes: [/^(\s+)/],
      ignore: true
    },
    {
      type: 'word',
      regexes: [/^([a-zA-Z0-9]+)/]
    },
    {
      type: 'operator',
      regexes: [/^(\+)/]
    }
  ]);
  
  
  const root = () => chain(addExpr)(ast => ast[0]);

const addExpr = () =>
  chain(matchTokenType('word'), many(addPlus))(ast => ({
    left: ast[0].value,
    operator: ast[1] && ast[1][0].operator,
    right: ast[1] && ast[1][0].term
  }));

const addPlus = () =>
  chain('+', root)(ast => ({
    operator: ast[0].value,
    term: ast[1]
  }));

const myParser = createParser(
  root, // Root grammar.
  myLexer // Created in lexer example.
);

console.log(myParser('a + b + c + d + f + e + h').ast)


打印的结果是
{ left: 'a',
  operator: '+',
  right:
   { left: 'b',
     operator: '+',
     right: { left: 'c', operator: '+', right: [Object] } } }

【咨询】关于sql智能提示相关的实践

最近在研究web版sql编辑器相关的东西,看到你好像做过很多相关的东西。想问下

  1. 能否交流下相关实践,比如语法解析,语法、库表名提示,错误解析(可能和语法解析是提个东西~~)
  2. 有兴趣基于这个库写个sql智能提示的demo吗?只看文档上的demo还是不太明白怎么用你这个库怎么用
  3. 求加wx

SQL 语句前面存在空白行时,错误提示未能在正确的位置显示

当 SQL 语句前面存在空白行时,解析结果中的 error.suggestions.token 为null,导致无法正确提示发生错误的位置,现象如下:
image

解析结果如下:

{
  "success": false,
  "ast": null,
  "cursorKeyPath": [],
  "nextMatchings": [
    {
      "type": "string",
      "value": "UPDATE"
    },
    {
      "type": "string",
      "value": "set"
    },
    {
      "type": "string",
      "value": "insert"
    },
    {
      "type": "string",
      "value": "create"
    },
    {
      "type": "string",
      "value": "select"
    }
  ],
  "error": {
    "suggestions": [
      {
        "type": "string",
        "value": "select"
      },
      {
        "type": "string",
        "value": "create"
      },
      {
        "type": "string",
        "value": "insert"
      },
      {
        "type": "string",
        "value": "set"
      },
      {
        "type": "string",
        "value": "UPDATE"
      }
    ],
    "token": null,
    "reason": "incomplete"
  },
  "debugInfo": {
    "tokens": [
      {
        "type": "word",
        "value": "selec",
        "position": [
          2,
          6
        ]
      }
    ],
    "callVisiterCount": 38,
    "costs": {
      "lexer": 0,
      "parser": 0
    }
  }
}

Demo 中关于错误提示位置的处理:

const errorPosition = parseResult.error.token
? {
startLineNumber: model.getPositionAt(parseResult.error.token.position[0]).lineNumber,
startColumn: model.getPositionAt(parseResult.error.token.position[0]).column,
endLineNumber: model.getPositionAt(parseResult.error.token.position[1]).lineNumber,
endColumn: model.getPositionAt(parseResult.error.token.position[1]).column + 1,
}
: {
startLineNumber: 0,
startColumn: 0,
endLineNumber: 0,
endColumn: 0,
};

期望结果:
未完整的语句,error.suggestion.token 能给出错误位置的token,如{type: "string", value: "selec", position: [1, 6]}

关于insert语句的语法好像有点问题

在 src/demo/sql-parser/mysql/parser.ts 中,博主的的语法是这样的:
chain('insert', optional('ignore'), 'into', tableName, optional(selectFieldsInfo),[selectStatement])
这好像与mysql的语法:
INSERT INTO table_name (column1,column2,column3,...) VALUES (value1,value2,value3,...)
不太符合。
博主有时间麻烦修改一下,或者看看我发起的这个Pull request ##9 ,请指教。

【咨询】该项目实际应用

你好~有几个问题想请教下
1.想问下这个项目是否有实际的项目有在使用呢?
2.注意到阿里云的dataWorks编辑器也有语法解析的功能,那么和这个项目有联系吗?
:D

demos 中的 mysql 词法解析器无法匹配 \r\n 换行符

mysql 词法解析器中匹配注释时,换行符使用的是\n

{
type: 'comment',
regexes: [
/^((?:#|--).*?(?:\n|$))/, // # --
/^(\/\*[^]*?(?:\*\/|$))/, // /* */
],
ignore: true,
},

在 monaco-editor 编辑器默认的换行符是跟随内容(EndOfLinePreference),我这边默认是\r\n,会导致无法匹配到注释。

重现代码如下(在线示例):

const editor = monaco.editor.create(document.getElementById("container"), {
	value: "--\r\n",
	language: "javascript"
});

function test(input) {
    console.log(input.split(''), input.match(/^((?:#|--).*?(?:\n|$))/))
}

test(editor.getModel()?.getValue());
test(editor.getModel()?.getValue(monaco.editor.EndOfLinePreference.CRLF));
test(editor.getModel()?.getValue(monaco.editor.EndOfLinePreference.LF));

image

解决办法:

  • 方法1:monaco-editor 获取内容时,使用EndOfLinePreference.LF选项,这样返回的内容使用\n换行符。
  • 方法2:修改 mysql lexer 的正则配置为/^((?:#|--).*?(?:\r\n|\n|$))/,兼容\n\r\n换行符

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.