Giter Club home page Giter Club logo

skulpt_parser's Introduction

Welcome to Skulpt

Join the chat at https://gitter.im/skulpt/skulpt

Skulpt is a Javascript implementation of Python 2.x. Python that runs in your browser! Python that runs on your iPad! Its being used several projects including, Interactive Python Textbooks -- You can see skulpt in action there. Try out some turtle graphics examples to see Skulpt in action.

Build Status

Origins

Skulpt is the brainchild of Scott Graham. See Skulpt.org for some early demos of skulpt in action.

Brad Miller has been project maintainer since sometime in 2010/2011 along with core contributors Albert-Jan Nijburg, Scott Rixner, Meredydd Luff and others.

Current Priorities

  • We have updated our development toolchain to include nodejs and webpack. If you have been a developer in the past make sure you check out the documentation for the current procedures on building skulpt.
  • Work on Python3 - With python 2 coming to the end of its life at the end of this year, being more and more Python3 compliant is a high priority. Of course how Python3 you need to be depends on the situation. For many uses, Skulpt is already there. But for more advanced work we are not. You can keep up with this work by configuring skulpt to run in Python3 mode.
Sk.configure({
    .... other settings
    __future__: Sk.python3
});

How can I help?

Welcome to the Skulpt developer community! We welcome new developers of all levels and abilities. Check out the ideas list below. And then some practical things for getting started after that.

Ideas List

  1. Python 3 -- see above.

  2. Expand the skulpt standard library to include more modules from the CPython standard library. So far we have math, random, turtle, time (partial) random (partial) urllib (partial) unittest, image, DOM (partial) and re (partial). Any of the partial modules could be completed, or many other CPython modules could be added. Potential new modules from the standard library include: functools, itertools, collections, datetime, operator, and string. Many of these would be relatively easy projects for a less experienced student to take on.

  3. Over time we have had numerous requests for more advanced Python modules to be included in Skulpt. These include, portions of matplotlib, tkinter, and numpy. These are much more challenging because they contain C code in their implementation, but if a reasonable subset could be implemented in Javascript this would make it much easier to directly add many more python modules that rely on these three. In addition, it would allow for skulpt to potentially be used in teaching an even broader set of topics.

  4. Expand and clean up the foreign function API. This API is critical for implementing parts of the standard library.

  5. Do a better job of supporting Python3 semantics, but make Python2/Python3 behavior configurable with a single flag. Sk.python3 is already there for this purpose. Another positive step in this direction would be to update our grammar to Python2.7. Updating the grammar would allow us to add set literals, dictionary comprehensions, and other features present in 2.7.x and Python 3.3.x. This would be an excellent project for a student interested in language design, parsing, and the use of abstract syntax trees.

  6. Make fully workable, and expand support for DOM access as part of the standard library.

  7. Expand and improve overall language coverage. Currently Skulpt does an excellent job of meeting the 80/20 rule. We cover the vast majority of the language features used by the 80% (maybe even 90%) of the code. But there are builtins that are not implemented at all, and there are builtins with only partial implementations.

  8. Implement the hooks for a debugger. This may be a half step towards 1 or may be in a completely different direction, but allowing students to debug line by line a program they have written would have some real benefit.

Building Skulpt

Building Skulpt is straightforward:

  1. Clone the repository from GitHub, ideally using your own fork if you're planning on making any contributions
  2. Install node.js
  3. Install the required dependencies using npm install
  4. Navigate to the repository and run npm run dist
  5. The tests should run and you will find skulpt.min.js and skulpt-stdlib.js in the dist folder

Contributing

There is plenty of work still to do in making improvements to Skulpt. If you would like to contribute

  1. Create a Github account if you don't already have one
  2. Create a Fork of the Skulpt repository -- This will make a clone of the repository in your account. DO NOT clone this one. Once you've made the fork you will clone the forked version in your account to your local machine for development.
  3. Read the HACKING.md file to get the "lay of the land". If you plan to work on creating a module then you may also find this blog post helpful.
  4. Check the issues list for something to do.
  5. Follow the instructions above to get skulpt building
  6. Fix or add your own features. Commit and push to your forked version of the repository. When everything is tested and ready to be incorporated into the master version...
  7. Make a Pull Request to get your feature(s) added to the main repository.

Community

Check out the mailing list: https://groups.google.com/forum/?fromgroups#!forum/skulpt

Acknowledgements

As time goes on its getting more dangerous to try to acknowledge everyone who has contributed to the project. And, after all, this is git, so their names are all in the historical record. But there are a few to call out.

  • First and foremost to Scott Graham for starting the original project.
  • Bob Lacatena for lots of work on Python longs
  • Charles Severence for bug fixes and the re module.
  • Leszek Swirski and Meredydd Luff for Suspensions
  • Albert-Jan Nijburg for countless bug fixes and process improvements
  • Ben Wheeler for the new and improved turtle module
  • Scott Rixner and students for many bug fixes and improvements
  • Of course, The complete list is here: https://github.com/skulpt/skulpt/graphs/contributors

skulpt_parser's People

Contributors

albertjan avatar pre-commit-ci[bot] avatar s-cork avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

skulpt_parser's Issues

licence

we should probably add this?

implement EXTRA

EXTRA keeps track of the lineno, col_offset, end_lineno, end_col_offset. At the moment this is just a temporary empty list. To stop the compiler breaking.

Searching for EXTRA in cpython's c_generator.py/parse.c might with implementing this and including it in the generated code.
The python_generator that skulpt_generator descends from doesn't keep track of these variables since (afaik) it didn't get that far.

Handling sequences with null

def f(*a, b):
    pass

This ends up with kw_defaults=[null]
but it should be kw_defaults=[None]

I think this is correct at the point of the ast
And it's just working out the best place to convert it to None

This particular path could be solved in

export class NameDefaultPair {
    arg: arg;
    value: expr | pyNoneType;
    constructor(arg: arg, value: expr | null) {
        this.arg = arg;
        this.value = value ?? pyNone;
    }
}

and then fixing all the typescript problems that result... but I don't know if that's a good idea
... might mean generating our own asdl patch

Clarify grammar workflow and move grammar to this repo

I vote 3...
so then the workflow for working with the grammar can be something like:

Generate patch

copy python.gram into skulpt_parser somewhere (maybe tools/grammar_patch)
git diff with skulpt.dev.gram and copy the output to the same folder
Then either as the apply_grammar_patch or just as step 3 in the above.
apply the patch to skulpt.gram somewhere (maybe src/grammar)

Then I think we can abandon #12 and it can just live in the tools/grammar_patch folder as a future helper/reference function.
(I'll adjust that pr after this pr)

optimized ast

Before running the symtable in compile.c there is an ast optimization phase that we should explore.

It seems to do some code folding of literal values.

see ast_opt.c

python errors

same issue we had with the constant types

Maybe we should move our error classes into a similar name space for now

incorrect return statement

return newline, indent;

when i turned off //@ts-nocheck there were 6 examples of this return statement that were complained about.

Seems strange.
It might be ok because from what I can tell a truthy ends up as a raised error.
But there's probably a better way to do the same thing without type script complaining.

here's the line in the python generator that we need to keep in mind

                action = node.action
                if not action:
                    if is_gather:
                        assert len(self.local_variable_names) == 2
                        action = (
                            f"[{self.local_variable_names[0]}] + {self.local_variable_names[1]}"
                        )
                    else:
                        action = f"[{', '.join(self.local_variable_names)}]"

verbose

Ultimately we should move the verbosity stuff out of the memoize functions because we probably don't need to ship a verbose parser outside of this repo.

Maybe we could instead generate a verbose parser like

vr gen_parser --verbosity=0 # no log statements
vr gen_parser --verbosity=1 # logs anytime we call a pegen function
vr gen_parser --verbosity=2 # logs anytime we try to find a cached token

Or something like that

Interactive mode

we may never get round to this (adding wontfix) but currently the tokenizer isn't set up for interactive mode.
So parsing in interactive mode will fail for anything other than a single line statement.

implement pegen actions

a list of functions we need to implement that are called from the generated_parser.ts

  • make_module
  • seq_append_to_end
  • singleton_seq
  • seq_flatten
  • interactive_exit
  • set_expr_context
  • NEW_TYPE_COMMENT
  • augoperator
  • map_names_to_ids
  • seq_count_dots
  • alias_for_star
  • join_names_with_dot
  • function_def_decorators
  • empty_arguments
  • make_arguments
  • slash_with_default
  • star_etc
  • add_type_comment_to_arg
  • name_default_pair
  • class_def_decorators
  • seq_insert_in_front
  • get_cmpops
  • get_exprs
  • cmpop_expr_pair
  • concatenate_strings
  • get_keys
  • get_values
  • key_value_pair
  • collect_call_seqs
  • dummy_name
  • seq_extract_starred_exprs
  • seq_delete_starred_exprs
  • join_sequences
  • keyword_or_starred
  • nonparen_genexp_in_call
  • arguments_parsing_error
  • get_expr_name

It looks a little less intimidating here then it does when looking at the C file!

cache

The cache seems to be quite a slow point in the code

the python parser uses a dictionary cache - which is fine for generating the grammar
#79 improves the performance
But there's probably a better way to do this.

Code to explore in pegen is the is_memo, update_memo, insert_memo

Accessing and inserting into the cache seems to be a fairly performance consuming operation

Screenshot 2021-07-01 at 12 55 25

enum Astnodes

typedef enum _expr_context { Load=1, Store=2, Del=3 } expr_context_ty;

typedef enum _boolop { And=1, Or=2 } boolop_ty;

typedef enum _operator { Add=1, Sub=2, Mult=3, MatMult=4, Div=5, Mod=6, Pow=7,
                         LShift=8, RShift=9, BitOr=10, BitXor=11, BitAnd=12,
                         FloorDiv=13 } operator_ty;

typedef enum _unaryop { Invert=1, Not=2, UAdd=3, USub=4 } unaryop_ty;

typedef enum _cmpop { Eq=1, NotEq=2, Lt=3, LtE=4, Gt=5, GtE=6, Is=7, IsNot=8,
                      In=9, NotIn=10 } cmpop_ty;

We generate constructors for each of these and call the constructor every time we need an instance. e.g.

export class expr_context extends AST {}
expr_context.prototype.tp$name = "expr_context";

export type expr_contextKind = typeof expr_context | typeof Load | typeof Store | typeof Del;

export class Load extends expr_context {}
Load.prototype.tp$name = "Load";
export class Store extends expr_context {}
Store.prototype.tp$name = "Store";
export class Del extends expr_context {}
Del.prototype.tp$name = "Del";

But we could just create const references to prevent having to instantiate these each time.

These are generated by TypeDefVisitor(f)


Todo

  • work out what the best abstraction is

keywords

we need to work out where keyword checks get implemented

search ->keywords in the pegen directory for cpython

connect to skulpt

postpone making python types till in compiler

intermediate types can toString to the exact cpython value, while maintaining metadata to construct skulpt types.

drawbacks:

  • our ast differs from cpythons because well use javascript types (with maybe annotation) we would need extra information to construct the right python type.

benefit:

  • not tied to skulpts implementation

deno ffi

deno recently added ffi support

this package: https://github.com/denosaurs/deno_python
works great

proof of concept takes the python->deno->python peg_parser tests from minutes to seconds

we should re-write the tests to take advantage of this!

parser_str

this is the reason we fail on all the string tests at the moment with the extra quotes in the ast output.

Before we concatenate we need to parse each one and determine the mode

Most/all of the skulpt code for parsing a string can be used here.

see parse_str.c for cpython's implementation

bigints

we need to not use bigints unless they're available. If they are available use them.
If not then use JSBI from the global scope which skulpt adds?

pseudo code

declare JSBI

class pyInt {
    constructor(v) {
        if ( v > Number.MAX_SAFE_INTEGER) {
            if (typeof BigInt !== "undefined") {
               // ...
            } else if (typeof JSBI !== "undefined") {
                // we're in skulpt world and in a browser that doesn't support BigInt so use JSBI

Py types

We'll need to work out how to handle these for the Constant node type.

It might be nice to use JavaScript objects so that we can completely isolate the parser from the Sk namespace and generate the real python objects at compile time.

Linking the comment from cpython that suggests this might be challenging

https://github.com/python/cpython/blob/v3.9.5/Include/asdl.h#L10-L15

Problem with token types

Issue arises in
"from .a import b"

the '.' gets tokenized as an OP but it's really a DOT
This is the correct output from tokenize.py too.

In the cpython version of the parser they don't pass the string to this.expect they pass the type

We need to work out how to fix OP's like this in our own parser.
I imagine a bit like how we changed the type of the Name when we found a keyword.

parsenumber

Not yet implemented
can probably be mostly ported over from skulpt

RAISE

implement raise behaviour

Tests - dream big

How cpython tested the new parser:

  • download the most popular pypi packages
  • compare the asts

once we get through the run-tests directory we can try to implement this πŸ˜‰

See: cpython/Tools/peg_generator/scripts/test_pypi_packages.py

Parse Tests

I think it's time we make a test that runs the rus test πŸ˜„

filename

This needs to be inserted somewhere for error reporting

astral code points and utf8 strings

"'δΈ­ζ–‡πŸ•'"

using tokenize.py
TokenInfo(type=3 (STRING), string="'δΈ­ζ–‡πŸ•'", start=(1, 0), end=(1, 5), line="'δΈ­ζ–‡πŸ•'")
we get
TokenInfo(type=3 (STRING), string="'δΈ­ζ–‡πŸ•'", start=(1, 0), end=(1, 6), line="'δΈ­ζ–‡πŸ•'")
so the pizza emoji means we're out in the coloffset

but strangely this should result in the following ast

      Module(
        body=[
          Expr(
            value=Constant(
              value='δΈ­ζ–‡πŸ•',
              lineno=1,
              col_offset=0,
              end_lineno=1,
-             end_col_offset=6), # we say 6
+             end_col_offset=12), # python says 12
            lineno=1,
            col_offset=0,
            end_lineno=1,
-           end_col_offset=6)],
+           end_col_offset=12)],
        type_ignores=[])

We say 6 but python says 12

No Astral codepoints
without the pizza emoji our tokenize matches python tokenize - (no astral codepoints) but our ast is still incorrect

"'δΈ­ζ–‡'"

using tokenize.py
TokenInfo(type=3 (STRING), string="'δΈ­ζ–‡'", start=(1, 0), end=(1, 4), line="'δΈ­ζ–‡'")
we get
TokenInfo(type=3 (STRING), string="'δΈ­ζ–‡'", start=(1, 0), end=(1, 4), line="'δΈ­ζ–‡'")

but

      Module(
        body=[
          Expr(
            value=Constant(
              value='δΈ­ζ–‡',
              lineno=1,
              col_offset=0,
              end_lineno=1,
-             end_col_offset=4), # we say 4 - this is correct no??
+             end_col_offset=8), # python says 8
            lineno=1,
            col_offset=0,
            end_lineno=1,
-           end_col_offset=4)],
+           end_col_offset=8)],
        type_ignores=[])

Is this a bug or a feature from cpython?!
see t542.py

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.