gcanti / parser-ts Goto Github PK

View Code? Open in Web Editor NEW

193.0 7.0 20.0 389 KB

String parser combinators for TypeScript

Home Page: https://gcanti.github.io/parser-ts/

License: MIT License

TypeScript 99.33% JavaScript 0.67%

typescript parser-combinators functional-programming fp-ts

parser-ts's Introduction

A porting of purescript-eulalie to TypeScript featuring fp-ts

Documentation

API Reference

parser-ts's People

Contributors

Stargazers

Watchers

parser-ts's Issues

feature request: `between` and `surroundedBy`

Would you think it's a good idea to add these 2 combinators into Parser?
It might be useful for parsing things like (for example) phone number in format (123)-456-789.

https://github.com/ekmett/parsers/blob/0bbae073a36c338f8205f529db3bbd94f297ba91/src/Text/Parser/Combinators.hs#L123-L135

In TS, it might look like

export function between<I, A>(begin: Parser<I, A>, end: Parser<I, A>, p: Parser<I, A>): Parser<I, A> {
  return Do(parser)
    .do(begin)
    .bind('content', p)
    .do(end)
    .return(({ content }) => content);
}

export function surroundedBy<I, A>(bound: Parser<I, A>, p: Parser<I, A>): Parser<I, A> {
  return between(bound, bound, p);
}

P.either should fail with fatal error, if second parser failed with fatal, or should it?

Take a look at the following test (which currently fails):

const fatalParser: P.Parser<Char, Char> = i => error(i, ['expected'], true)
const parser5 = P.either(C.char('a'), () => fatalParser)
assert.deepStrictEqual(S.run('c')(parser5), error(stream(['c']), ['"a"', 'expected'], true))

I believe that the semigroup of ParserResult should use getLastSemigroup on the fatal property, and not getFirstSemigroup. I guess it is kind of a philosophical question as one can argue that I'm mistaken....

@gcanti thoughts?

If you think this should indeed be fixed, I have a PR ready to go

How to handle left-recursive grammar with parser-ts

I would like to use parser-ts to parse PromQL of which formal definition may be found below,

You may notice that the BinaryExpr will recursively check Expr where BinaryExpr is also listed with a high priority.

But with the following code,

export const BinaryExprParser: P.Parser<C.Char, BinaryExpr> = pipe(
  ExprParser,
  P.bindTo('l'),
  P.apFirst(S.spaces),
  P.bind('op', () => binaryOpParser),
  P.apFirst(S.spaces),
  P.bind('r', () => ExprParser),
  P.map(a => ({ _tag: 'binary_expr', left: a.l, right: a.r, op: a.op })),
);

we will encounter "Maximum call stack size exceeded error".

As is suggested from https://stackoverflow.com/questions/29158248/parser-combinators-and-left-recursion, we may "memorize" the (position, parser) tuple to avoid this issue. So is that possible to do this with parser-ts?

[feature] repeat – many in terms of semigroup/monoid

for a parser, the parser should match at least once. If it does, it will return success with a combination of all immediate reoccurring matches.

I think this is parser.many1 but uses a Semigroup as the main thing.
I found it handy in what I'm doing, might be good for someone else.

/** Repeats a value until it's no longer matching. must match at least once or it fails */
export function repeat<A>(
  semigroup: SE.Semigroup<A>,
): Endomorphism<parser.Parser<string, A>> {
  return (fa) =>
    pipe(
      fa,
      parser.chain((a) =>
        parser.ChainRec.chainRec(a, (a) =>
          pipe(
            fa,
            parser.map((b) => semigroup.concat(a, b)),
            parser.map(E.left),
            parser.alt(() => parser.succeed(E.right(a))),
          ),
        ),
      ),
      parser.alt(() => parser.fail()),
    )
}

Add hexDigit combinator

🚀 Feature request

Current Behavior

Currently there is no hexadecimal digit combinator available in parser-ts/char.

Desired Behavior

The "core" library export a hexDigit combinator since it's quite common e.g. Haskell and Purescript

Who does this impact? Who is this for?

All users, regardless of skill would benefit from this. Not dependant on TS version.

Describe alternatives you've considered

It's easy enough to build this yourself and put into your own codebase. I've done so while attempting to port over my own uuid and mac address parsers written in Purescript.

Additional context

Your environment

Software	Version(s)
fp-ts	2.16.1
parser-ts	0.7.0
TypeScript	5.2.2

Working with Excessively Long Strings

Thanks for this neat library! For fun I'm attempting to port Pandoc's markdown parser to parser-ts, but I'm noticing that the library really struggles with longer strings. Each of the following examples results in an Uncaught RangeError: Maximum call stack size exceeded error. The limit is even less if I start combining parsers.

run(S.many(C.space),     " ".repeat(1000));
run(S.many(C.alphanum),  "x".repeat(1000));
run(S.many(C.digit),     "9".repeat(1000));
run(S.many(C.char("x")), "x".repeat(1000));

Do you have any tips for working with longer strings? Is this a fundamental limitation of this approach, or is there some hope of working around it by modifying the recursion primitives?

(related to #41)

$ npm list | grep parser-ts
1:markdown-parser-ts@ /home/benjamin/projects/markdown-parser
20:├── [email protected]

Thanks!

Can @babel/code-frame be avoided?

📖 Documentation

Hello.
I noticed that most of the examples uses the run function, that in turn requires babel code-frame. Can babel code frame be avoided? My parser is meant to run in the browser and this dependency is quite beefy (and I'm not even sure it can run properly)

The whitespace parser only accepts space

The comment claims that it parses any whitespace character, but the code only parses space:

parser-ts/src/char.ts

Lines 51 to 56 in 01e0b93

 function isSpace(c: Char): boolean { 

 return c === ' ' 

 } 

 /** Matches a single whitespace character. */ 

 export const space = p.expected(p.sat(isSpace), 'whitespace')

The comments in lib/string is also off in the same manner.

(Ab)use await syntax to get do-notation-style chain?

Is it possible to augment the API so that await can be used for the chain function argument?

Consider this as an example:

const space_padded = <A>(f: Parser<A>) =>
  pstr.spaces.chain(_ =>
  f.chain(a =>
  pstr.spaces.chain(_ => p.of(a))))

Then this could be written with async/await instead, making it look like Haskell do-notation:

const space_padded = async <A>(f: Parser<A>) => {
  const _ = await pstr.spaces.chain
  const a = await f.chain
  const _ = await pstr.spaces.chain
  return a
}

Perhaps Parser could extend Thenable or Promise to be able to skip chain altogether:

const space_padded = async <A>(f: Parser<A>) => {
  const _ = await pstr.spaces
  const a = await f
  const _ = await pstr.spaces
  return a
}

Would this be possible?

How to skip some parts

For example I have multi line string with 3 types of lines:

comments (started with //)
empty lines
some useful commands

How to skip empty lines and comment lines?

Design Question: Stream with Positioned Tokens

@gcanti - If you are willing, I would like to have a design discussion about how the problem described below can potentially be solved using parser-ts in the most efficient manner.

Question

How can I create a Stream in which I have access to the position of a given token (i.e. the line and column number) in the most efficient manner possible.

Description of the Issue

I would like to create a group of parsers that are aware of and can respond to the current indentation level, similar to how a Python parser needs to detect when the indentation level increases or decreases. Because the parser I am trying to create is context-sensitive, I am not sure the problem can be easily solved with parser-ts. However, I can think of two approaches that would at least allow me access to the position of tokens within a Stream.

Current State

The problem proposed above inherently lends itself to a Parser with a string source that parses a Stream of Chars. Therefore, when referring to parsers below I am actually referring to a parser of string characters:

import { Char } from 'parser-ts/Char'

type StringParser<A> = Parser<Char, A>

Within a Stream, we have access to the current cursor position which allows us to determine our current position within the buffer of unparsed character tokens. However, the cursor is unaware of the actual position of a character token within a given piece of source text.

Possible Solutions

Both solutions involve adding an additional piece of data to the Stream generated for a given Parser. In this case, the additional piece of data is the Pos interface described below.

/**
 * Represents the position of a token within the source text.
 */
export interface Pos {
  readonly line: number
  readonly col: number
}

Given a character and the previous position, the position of the character in question can be determined using the functions below:

/**
 * Constructs a new `Pos` given a line number and a column number.
 */
export const pos = (line: number, col: number): Pos => ({
  line,
  col
})

/**
 * The initial `Pos` within any source text.
 */
export const initialPos = pos(1, 1)

/**
 * Updates the `Pos` given a character and the previous `Pos` in the source text.
 */
export const updatePosChar: (c: Char, pos: Pos) => Pos = (char, { line, col }) => {
  switch (char) {
    case '\n':
      return pos(line + 1, 1)
    case '\r':
      return pos(line + 1, 1)
    case '\t':
      return pos(line, col + 8 - ((col - 1) % 8))
    default:
      return pos(line, col + 1)
  }
}

Solution 1

The first solution that I came up with involves modifying the type of data stored in the buffer of a Stream.

import { Char } from 'parser-ts/Char'
import { Pos } from 'parser-ts/Pos'
import { Stream } from 'parser-ts/Stream'

/**
 * Represents a character and its associated position with the source text.
 */
export interface PositionedChar {
  readonly char: Char
  readonly pos: Pos
}

/**
 * Represents a stream of positioned characters.
 */
export type CharStream = Stream<PositionedChar>

The buffer of positioned characters must be created ahead of time by first splitting the source text into an array of character tokens, and then mapping the resulting array into an array of PositionedChars. The inefficiency of the solution follows from the two separate passes through the stream of character tokens.

Solution 2

The second solution involves modification of the source code of parser-ts to include the Pos module as well as a new CharStream module given how common it is to use parser-ts on a stream of character tokens.

The proposed code for the CharStream module can be found below.

import { lookup } from 'fp-ts/lib/Array'
import { map, Option } from 'fp-ts/lib/Option'
import { pipe } from 'fp-ts/lib/function'

import { Char } from './char'
import { Pos, updatePosChar } from './Pos'

export interface CharStream {
  readonly buffer: Array<Char>
  readonly pos: Pos
  readonly cursor: number
}

export const charStream: (buffer: Array<Char>, pos: Pos, cursor?: number) => CharStream = (
  buffer,
  pos,
  cursor = 0
) => ({
  buffer,
  pos,
  cursor
})

export const get: (s: CharStream) => Option<Char> = s => lookup(s.cursor, s.buffer)

export const atEnd: (s: CharStream) => boolean = s => s.cursor >= s.buffer.length

export const getAndNext: (s: CharStream) => Option<{ readonly value: Char; readonly next: CharStream }> = s =>
  pipe(
    get(s),
    map(a => ({ value: a, next: charStream(s.buffer, updatePosChar(a, s.pos), s.cursor + 1) }))
  )

Please let me know if you think that this type of context-sensitive parsing is out of scope for parser-ts. I am very curious to hear your thoughts!

[Feature Request] Expose modules without lib/es6 prefix

🚀 Feature request

Current Behavior

As a parser-ts user, I will need to import modules like parser-ts/lib/Parser or parser-ts/es6/Parser.

Desired Behavior

Ideally, it would be great to just import the module directly without prefix like parser-ts/Parser that fp-ts already shipped in 2.8.0.

Who does this impact? Who is this for?

parser-ts users

Describe alternatives you've considered

Additional context

Your environment

Software	Version(s)
fp-ts	2.8.0
parser-ts	0.6.3
TypeScript	3.9.2

(I also filed another similar issue for fp-ts-contrib at gcanti/fp-ts-contrib#67)

New Feature: notString and notOneOf (string)

Apologies for opening yet another issue, but I continue to come across use cases uncovered by parser-ts that might be beneficial to include in the main library as I work on a parser for the GraphQL query language.

Example

From the GraphQL specification:

EnumValue:
Name but not true or false or null

where a valid Name (in theory) could be the identifiers true, false, and null.

Proposal

I propose that we add the two additional parsers defined below to the string module. For examples, see the tests in the feature branch of my fork.

Let me know what you think!

/**
 * Fails if the specified string is matched, otherwise succeeds with an empty result and
 * consumes no input.
 *
 * @example
 * import { run } from 'parser-ts/lib/code-frame'
 * import * as S from 'parser-ts/lib/string'
 *
 * const parser = S.notString('foo')
 *
 * run(parser, 'bar')
 * // { _tag: 'Right', right: '' }
 *
 * run(parser, 'foo')
 * // { _tag: 'Left', left: '> 1 | foo\n    | ^ Expected: not "foo"' }
 * @category constructors
 * @since 0.6.8
 */
export const notString: (s: string) => P.Parser<C.Char, string> = s => i => {
  const _string: (s2: string) => P.Parser<C.Char, string> = s2 =>
    pipe(
      charAt(0, s2),
      O.fold(
        () => P.succeed(''),
        c =>
          pipe(
            C.notChar(c),
            P.chain(() => _string(s2.slice(1))),
            P.chain(() => P.succeed(''))
          )
      )
    )
  return pipe(
    P.expected(_string(s), `not ${JSON.stringify(s)}`)(i),
    E.chain(next => success(next.value, i, i))
  )
}

/**
 * Fails if any of the specified strings are matched, otherwise succeeds with an empty result and
 * consumes no input.
 *
 * @category constructors
 * @since 0.6.8
 */
export function notOneOf<F extends URIS>(
  F: Functor1<F> & Foldable1<F>
): (ss: Kind<F, string>) => P.Parser<C.Char, string>
export function notOneOf<F>(F: Functor<F> & Foldable<F>): (ss: HKT<F, string>) => P.Parser<C.Char, string>
export function notOneOf<F>(F: Functor<F> & Foldable<F>): (ss: HKT<F, string>) => P.Parser<C.Char, string> {
  return ss => F.reduce(ss, P.succeed(''), (p, s) => fold([p, notString(s)]))
}

Import path problem in using ESM feature typescript@next

🐛 Bug report

Current Behavior

I am trying to build ECMAScript modules (ESM) that is supported by TypeScript nightly build.

https://devblogs.microsoft.com/typescript/announcing-typescript-4-5-beta/

tsc fails in *.mts (new source file extension for ESM) that imports parser-ts module.

Error log:

node_modules/parser-ts/lib/Parser.d.ts(7,27): error TS2307: Cannot find module 'fp-ts/ChainRec' or its corresponding type declarations.
node_modules/parser-ts/lib/Parser.d.ts(410,16): error TS2665: Invalid module name in augmentation. Module 'fp-ts/lib/HKT' resolves to an untyped module at 'node_modules/fp-ts/lib/HKT/index.js', which cannot be augmented.
node_modules/parser-ts/lib/Parser.d.ts(429,40): error TS2344: Type '"Parser"' does not satisfy the constraint 'keyof URItoKind2<any, any>'.
node_modules/parser-ts/lib/Parser.d.ts(434,48): error TS2344: Type '"Parser"' does not satisfy the constraint 'keyof URItoKind2<any, any>'.
node_modules/parser-ts/lib/Parser.d.ts(439,36): error TS2344: Type '"Parser"' does not satisfy the constraint 'keyof URItoKind2<any, any>'.
node_modules/parser-ts/lib/Parser.d.ts(449,32): error TS2344: Type '"Parser"' does not satisfy the constraint 'keyof URItoKind2<any, any>'.
node_modules/parser-ts/lib/Parser.d.ts(454,48): error TS2344: Type '"Parser"' does not satisfy the constraint 'keyof URItoKind2<any, any>'.
node_modules/parser-ts/lib/Parser.d.ts(459,37): error TS2344: Type '"Parser"' does not satisfy the constraint 'keyof URItoKind2<any, any>'.
node_modules/parser-ts/lib/Parser.d.ts(459,57): error TS2344: Type '"Parser"' does not satisfy the constraint 'keyof URItoKind2<any, any>'.
node_modules/parser-ts/lib/string.d.ts(6,33): error TS7016: Could not find a declaration file for module 'fp-ts/lib/HKT'. 'node_modules/fp-ts/lib/HKT/index.js' implicitly has an 'any' type.
  If the 'fp-ts' package actually exposes this module, try adding a new declaration (.d.ts) file containing `declare module 'fp-ts/lib/HKT';`

Expected behavior

tsc succeeds.

Reproducible example

main.mts

import { parser as P, string as S } from 'parser-ts';
console.log(P, S);

tsconfig.json

{
  "compilerOptions": {
    "strict": true,
    "module": "nodenext",
  }
}

package.json

{
  "name": "test_esm",
  "type": "module",
  "devDependencies": {
    "typescript": "^4.6.0-dev.20211113"
  },
  "dependencies": {
    "fp-ts": "^2.11.5",
    "parser-ts": "^0.6.16"
  },
  "scripts": {
    "build": "tsc"
  }
}

Additional context

Your environment

Which versions of parser-ts are affected by this issue? Did this work in previous versions of parser-ts?

Software	Version(s)
fp-ts	2.11.5
parser-ts	0.6.16
TypeScript	next

RangeError: Maximum call stack size exceeded

import { pipe } from "fp-ts/lib/function"
import { char, digit, oneOf } from "parser-ts/lib/char"
import { run } from "parser-ts/lib/code-frame"
import {
  between,
  bind,
  bindTo,
  many,
  optional,
  Parser,
} from "parser-ts/lib/Parser"

const opP = oneOf("+*")

const exprP = pipe(
  digit,
  bindTo("digitLeft"),
  bind("op", () => opP),
  bind("digitRight", () => digit)
)

const maybeWithParenths = <T extends unknown>(p: Parser<string, T>) =>
  optional(between(char("("), char(")"))(p))

const mainP = many(maybeWithParenths(exprP))

const result = run(mainP, "(2+2)(3*3)")

console.log(result)

Am I doing something wrong or is this a bug?

Combination of alt and many runs infinitely

const keyword = C.many1(C.notSpace);
const whitespace = S.spaces;

const lang = pipe(
  keyword,
  P.alt(() => whitespace)
);

const input = `hello world mickey mouse`;
const res = run(P.many(lang), input);

expectation: ['hello', ' ', 'word', ' ', 'mickey', ' ' ,'mouse']
actual: runs forever...

Feature Request: lookAhead and takeUntil

I have reached for the following utility parsers extensively in projects that use parser-ts. I was wondering if inclusion in the main library would be of interest. Any feedback on the implementation or function signatures is welcome!

Inspiration taken from

https://github.com/francisrstokes/arcsecond#lookAhead

import * as E from 'fp-ts/lib/Either'
import { Parser } from 'parser-ts/lib/Parser'
import { success } from 'parser-ts/lib/ParseResult'

/**
 * Takes a `Parser` and tries to match it without consuming any input.
 *
 * @example
 * import * as P from 'parser-ts/lib/Parser'
 * import * as S from 'parser-ts/lib/string'
 *
 * const parser = S.fold([
 *.   S.string('hello '), 
 *    P.lookAhead(S.string('world')), 
 *    S.string('wor')
 *. ])
 *
 * run(parser, 'hello world')
 * // { _tag: 'Right', right: 'hello worldwor' }
 */
export const lookAhead = <I, A>(p: Parser<I, A>): Parser<I, A> => (i) =>
  E.either.chain(p(i), (next) => success(next.value, i, i))

Inspiration taken from

https://github.com/francisrstokes/arcsecond#everythingUntil

import { not, Predicate } from 'fp-ts/lib/function'
import { many, sat, Parser } from 'parser-ts/lib/Parser'

/**
 * Takes a `Predicate` and continues parsing until the given `Predicate` is satisfied.
 *
 * @example
 * import * as P from 'parser-ts/lib/Parser'
 * import * as C from 'parser-ts/lib/char'
 *
 * const parser = P.takeUntil((c: C.Char) => c === 'w')
 *
 * run(parser, 'hello world')
 * // { _tag: 'Right', right: [ 'h', 'e', 'l', 'l', 'o', ' ' ] }
 */
export const takeUntil = <I>(predicate: Predicate<I>): Parser<I, I[]> =>
  many(sat(not(predicate)))

Fatal error from filter

I'd like to be able to raise a fatal error from a filter combinator. Perhaps we could add some optional parameters for filter which pass through options to error.

Happy to contribute code if you accept the feature.

Reci

export run from test/helpers.ts

The run function is handy for users writing their own tests, but it is not exposed to users consuming the module.

Would you accept a PR expose it?

`doubleQuotedString` hits recursion limit on longer strings

Running the string.doubleQuotedString parser on long-ish strings causes it to hit the recursion limit. I tried getting around this by writing a regex to match such a string, but since streams of strings are arrays of characters, there is no way to match that regex to a string longer than a single character.

Is there some official workaround for this? Thanks!

add eof parser

https://github.com/bodil/purescript-eulalie/blob/e046dfbc4b87f2c9c981151258201b7503ce7d95/src/Data/Eulalie/Parser.purs#L132

Feature request: n-ary alts combinator

Do you think this combinator would fit?

export function alts<A>(...ps: Parser<A>[]): Parser<A> {
  return ps.reduce((p1, p2) => p1.alt(p2), p.fail)
}

I find it improves readability over chaining .alts.

NB: Would it be more efficient to use reduceRight and swap the order of the parsers to alt?

Support for NodeJS Buffer

🚀 Feature request

Firstly, great work on this lib and fp-ts in general. Definitely has changed the way I think about writing software big time. Secondly, why I'm here, hoping you can help me use parser-ts more easily/efficiently for binary messages in a Node environment.

Any help or advice greatly appreciated!

Current Behavior

Currently it's not easy to use a Stream<A> with a buffer of the Node Buffer type. I'm able to make it work by casting to an Array<number> on construction, which works fine since the lib code is only doing lookups, length checks and slices I believe, all of which are supported by Buffer and/or its parent interface Uint8Array.

Desired Behavior

It'd be nice to support the Buffer type also. It'd also be nice to have getMany/ getManyAndNext functions for extracting a slice of a buffer. For example in the wire protocol I'm looking at (PostgreSQL) some messages have 32 bit big endian length prefixes which indicate the length of the buffer following that pertain to that message.

Who does this impact? Who is this for?

This is for being able to create a parser that deals easily and efficiently with binary content.

Software	Version(s)
fp-ts	2.11.9
parser-ts	0.6.16
TypeScript	4.5.2

[Example] Parsing a statement into a structured AST

I may be missing something fairly obvious, but I continue to struggle with using parser-ts to compose parsers to build up a structured abstract syntax tree.

My suggestion would be to collaborate here to provide a simple example that can then be displayed in an examples directory, similar to other fp-ts ecosystem libraries.

The example I have been working on involves parsing simple command line statements. It should be capable of handling a command, positional arguments, and flags/named arguments, in that order. An example statement might look like:

hello ./foo -b --bar=baz

I would like to parse this into the following structure:

{
  command: 'hello',
  args: { 
    named: {
      bar: "baz",
    }, 
    positional: {
      0: "./foo",
    } 
  },
  flags: {
    b: true
  }
}

Here is what I have so far:

import { pipe } from 'fp-ts/lib/function'
import * as C from 'parser-ts/lib/char'
import * as S from 'parser-ts/lib/string'
import * as P from 'parser-ts/lib/Parser'

// -------------------------------------------------------------------------------------
// models
// -------------------------------------------------------------------------------------

export type Statement = Command | Argument

export interface Command {
  _tag: 'Command'
  value: string
}

export type Argument = Flag | Named | Positional

export interface Flag {
  _tag: 'Flag'
  value: string
}

export interface Named {
  _tag: 'Named'
  name: string
  value: string
}

export interface Positional {
  _tag: 'Positional'
  value: string
}

// -------------------------------------------------------------------------------------
// constructors
// -------------------------------------------------------------------------------------

export const Command = (value: string): Command => ({ _tag: 'Command', value })

export const Flag = (value: string): Flag => ({ _tag: 'Flag', value })

export const Named = (name: string, value: string): Named => ({ _tag: 'Named', name, value })

export const Positional = (value: string): Positional => ({ _tag: 'Positional', value })

// -------------------------------------------------------------------------------------
// parsers
// -------------------------------------------------------------------------------------

const whitespaceSurrounded = P.surroundedBy(S.spaces)

const dash = C.char('-')

const doubleDash = S.string('--')

const equals = C.char('=')

const identifier = C.many1(C.alphanum)

const command = (cmd: string): P.Parser<string, Command> => pipe(S.string(cmd), P.map(Command))

const flag: P.Parser<string, Flag> = pipe(
  dash,
  P.chain(() => identifier),
  P.map(Flag)
)

const named: P.Parser<string, Named> = pipe(
  doubleDash,
  P.chain(() => P.sepBy1(equals, identifier)),
  P.map(([name, value]) => Named(name, value))
)

const positional: P.Parser<string, Positional> = pipe(C.many1(C.notSpace), P.map(Positional))

const argument = P.either<string, Argument>(flag, () =>
  P.either<string, Named | Positional>(named, () => positional)
)

Ideally, these parsers should be able to be composed to create the abstract syntax tree, but I am having trouble finding a clean solution.

One idea I had was to derive a Monoid instance for an array of Statement s and then further derive a Monoid instance for a Parser of arrays of Statements. Then the array of parsed statements can be used to build the AST.

import * as A from 'fp-ts/lib/Array'
import { fold } from 'fp-ts/lib/Monoid'
import { run } from 'parser-ts/lib/code-frame'

const monoidStatements = A.getMonoid<Statement>()

const monoidStatementsParser = P.getMonoid<string, Array<Statement>>(monoidStatements)

const statement = (cmd: string) =>
  fold(monoidStatementsParser)([
    pipe(command(cmd), P.map(A.of)),
    P.many(whitespaceSurrounded(argument))
  ])

run(statement('hello'), 'hello ./foo -b --bar=baz')
// {
//   _tag: 'Right',
//   right: [
//     { _tag: 'Command', value: 'hello' },
//     { _tag: 'Positional', value: './foo' },
//     { _tag: 'Flag', value: 'b' },
//     { _tag: 'Named', name: 'bar', value: 'baz' }
//   ]
// }

However, this solution seems somewhat awkward to me and I am not sure it is the best approach.

Any guidance would be much appreciated, and I think the library would benefit from having a simple yet robust example for people to check out!

Feature Request: manyTill and many1Till

Problem Statement

The manyTill and many1Till combinators from purescript-string-parsers allow for writing some interesting parsers.

-- | Parse values until a terminator.
manyTill :: forall a end. Parser a -> Parser end -> Parser (List a)

-- | Parse values until the terminator matches, requiring at least one match.
many1Till :: forall a end. Parser a -> Parser end -> Parser (NonEmptyList a)

For example, if we were writing an HTML parser and wanted to define a parser that could parse HTML comments, per the spec we would have to adhere to the following:

12.1.6 Comments
Comments must have the following format:

The string "", or "--!>", nor end with the string "<!-".

The string "-->".

Note: The text is allowed to end with the string "<!", as in "<!--My favorite operators are > and ".

My initial thought was to utilize the new filter combinator that we added in #30. This would work for rejecting invalid characters or strings encountered within the comment. However, because any other character is valid we would not have a reliable way of telling the parser to stop after the "-->" sequence.

Discussion

Under the hood, purescript-string-parsers uses the MonadRec type class to handle recursively iterating through the stream and accumulating values until the terminator parser is encountered.

I implemented a similar solution using the ChainRec type class from fp-ts.

import { tailRec } from 'fp-ts/ChainRec'
import * as E from 'fp-ts/Either'
import { error, success, ParseResult, ParseSuccess } from 'parser-ts/ParseResult'
import * as P from 'parser-ts/Parser'
import { Stream } from 'parser-ts/Stream'

interface Next<I, A> {
  readonly value: A
  readonly stream: Stream<I>
}

const chainRec = <I, A, B>(a: A, f: (a: A) => P.Parser<I, E.Either<A, B>>): P.Parser<I, B> => {
  const split = (result: ParseSuccess<I, E.Either<A, B>>): E.Either<Next<I, A>, ParseResult<I, B>> =>
    E.isLeft(result.value)
      ? E.left({ value: result.value.left, stream: result.next })
      : E.right(success(result.value.right, result.next, result.start))

  return (i) => tailRec({ value: a, stream: i }, (state) => {
    const result = f(state.value)(state.stream)
    if (E.isLeft(result)) {
      return E.right(error(state.stream))
    }
    return split(result.right)
  })
}

Solution

This allows us to define manyTill and many1Till in terms of chainRec.

import * as RA from 'fp-ts/ReadonlyArray'
import * as RNEA from 'fp-ts/ReadonlyNonEmptyArray'

export const manyTill = <I, A, B>(
  parser: P.Parser<I, A>,
  terminator: P.Parser<I, B>
): P.Parser<I, ReadonlyArray<A>> =>
  pipe(
    terminator,
    P.map<B, ReadonlyArray<A>>(() => RA.empty),
    P.alt<I, ReadonlyArray<A>>(() => many1Till(parser, terminator))
  )

export const many1Till = <I, A, B>(
  parser: P.Parser<I, A>,
  terminator: P.Parser<I, B>
): P.Parser<I, RNEA.ReadonlyNonEmptyArray<A>> => {
  return pipe(
    parser,
    P.chain((x) =>
      chainRec(RNEA.of(x), (acc) =>
        pipe(
          terminator,
          P.map(() => E.right(RNEA.reverse(acc))),
          P.alt(() =>
            pipe(
              parser,
              P.map((a) => E.left(RNEA.cons(a, acc)))
            )
          )
        )
      )
    )
  )
}

@gcanti - I am curious what you think of my approach above. I know that ChainRec is marked for removal in fp-ts@v3 but it worked nicely for this use case. What would your thoughts be on adding these to the main library? Do you think these can be improved/cleaned up further?

How to compose and handle Optional types in parsers?

Suppose we have get function of type:

const get: (s: string[]) => Option<string>

Then we compose a parser like:

pipe(
  P.chain(() => P.many1(C.alphanum)),
  P.map(get),
  // fail if we get None, else extract value from Option monad,
  // so the expected type of the pipe is Parser<string, string> and not Parser<string, Option<string>>
)

How am I am supposed to do things specified in the comment?

	function isSpace(c: Char): boolean {
	return c === ' '
	}

	/** Matches a single whitespace character. */
	export const space = p.expected(p.sat(isSpace), 'whitespace')

gcanti / parser-ts Goto Github PK

parser-ts's Introduction

Documentation

parser-ts's People

Contributors

Stargazers

Watchers

Forkers

parser-ts's Issues

🚀 Feature request

Current Behavior

Desired Behavior

Suggested Solution

Who does this impact? Who is this for?

Describe alternatives you've considered

Additional context

Your environment

📖 Documentation

Question

Description of the Issue

Current State

Possible Solutions

Solution 1

Solution 2

🚀 Feature request

Current Behavior

Desired Behavior

Suggested Solution

Who does this impact? Who is this for?

Describe alternatives you've considered

Additional context

Your environment

🐛 Bug report

Current Behavior

Expected behavior

Reproducible example

Suggested solution(s)

Additional context

Your environment

🚀 Feature request

Current Behavior

Desired Behavior

Suggested Solution

Who does this impact? Who is this for?

Problem Statement

Discussion

Solution

Recommend Projects

Recommend Topics

Recommend Org