Giter Club home page Giter Club logo

minipy's Introduction

 __    __     __     __   __     __     ______   __  __    
/\ "-./  \   /\ \   /\ "-.\ \   /\ \   /\  == \ /\ \_\ \   
\ \ \-./\ \  \ \ \  \ \ \-.  \  \ \ \  \ \  _-/ \ \____ \  
 \ \_\ \ \_\  \ \_\  \ \_\\"\_\  \ \_\  \ \_\    \/\_____\ 
  \/_/  \/_/   \/_/   \/_/ \/_/   \/_/   \/_/     \/_____/ 
                                                           

minipy

Build Status

迷你Python解释器,Python实现的编译器+C语言实现的VM.

注意:本项目主要是用于学习编译器相关原理,如果需要用于生产环境,可以参考以下项目

如何开始

编译解释器

cd minipy
make && make test

# and enjoy yourself ^_^

打包成可执行文件

# 编辑 pack/main.py

# 打包
python3 build.py pack

# 执行打包后的文件
./pack_main

特性

编译器

位于 /src/python

  • mp_opcode.py 字节码定义
  • mp_tokenize.py 词法分析器,将代码转换成单词(tokens)
    • 运行 python mp_tokenize.py {script.py} 可以打印出单词
  • mp_parse.py 语法分析器,将单词(tokens)转换成语法树(Syntax Tree)
    • 运行 python mp_parse.py {script.py} 可以打印出语法树
  • mp_encode.py 代码生成器,将语法树(Syntax Tree)转换成字节码(opcodes)
    • 运行 python mp_encode.py {script.py} 可以打印出字节码(未处理过的)

特性

  • 基于栈的计算机模型,字节码定义在 src/python/mp_opcode.py
  • 支持异常处理,基于setjmp/longjmp实现
  • 支持Native方法扩展
  • 支持常用的Python类型
  • 支持函数定义、简单类定义
  • Mark-Sweep垃圾回收
  • 字符串常量池
  • 尾调用优化
  • [] DEBUG功能
  • [] 用户级线程
  • [] 类的继承

工具

  1. minipy -dis {test.py} 打印字节码(常量处理过)

代码结构

  1. main.c 程序入口
  2. vm.c 虚拟机入口
  3. execute.c 解释器
  4. builtins.c 一些常用的内置方法
  5. obj_ops.c 对象的操作符实现
  6. argument.c 函数调用参数API
  7. exception.c 异常处理
  8. gc.c 垃圾回收器
  9. string.c 字符串处理
  10. number.c 数字处理
  11. list.c 列表处理
  12. dict.c 字典处理
  13. function.c 函数/方法处理

类型系统

  1. string, 是不可变对象
  2. number, 全部使用double类型
  3. list, 列表(动态数组)
  4. dict, 哈希表
  5. function, 包括native的C函数和自定义的Python函数
  6. class, 自定义Python类型
  7. None, None类型
  8. data, 自定义的C语言类型

相关项目

其他Python的实现

  • CPython Python的官方实现版本
  • micropython 嵌入式版本
  • tinypy 64K的迷你版本,支持Python的部分子集
  • cython Python的超集,可以把Python转换成C语言编译以提升运行速度,同时也可以简化Python的C语言扩展的开发

其他嵌入式脚本语言实现

更多有意思的编译器项目

  • ShivyC 一个Python编写的C语言编译器
  • NASM 跨平台的x86汇编和反汇编器

协议

  • MIT

minipy's People

Contributors

xupingmao avatar yangluoshen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

minipy's Issues

执行 python setup.py gcc 失败

···
omu:/opt/swm/github/subpy/src # python setup.py gcc
In file included from vm.c:6:0,
from main.c:1:
builtins.c: In function 'bfInspectPtr':
builtins.c:545:17: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
char* ptr = (char*)(long long)_ptr;
^
/tmp/cczd1qDq.o: In function bf_pow': main.c:(.text+0x44cc): undefined reference topow'
collect2: error: ld returned 1 exit status
···

run: python setup.py gcc ERROR

执行python setup.py gcc报了错,错误信息如下

Traceback (most recent call last):
  File "setup.py", line 109, in <module>
    main()
  File "setup.py", line 100, in main
    buildOne(cc)
  File "setup.py", line 89, in buildOne
    build(ccompiler, libs)
  File "setup.py", line 29, in build
    dstMtime = mtime(dstPath)
  File "/home/yangluo/github/subpy/src/boot.py", line 98, in mtime
    return os.path.getmtime(fname)
  File "/usr/lib/python2.7/genericpath.py", line 54, in getmtime
    return os.stat(filename).st_mtime
OSError: [Errno 2] No such file or directory: 'bin.c'

定位了下 bin.c 的使用位置:

yangluo@yangluoPC:~/github/subpy/src$ grep -rwn 'bin.c' .
./setup.py:13:def build(cc="tcc", libs=None, dstPath = "bin.c"):
./setup.py:55:        #remove("../bin.c")
./tags:420:build    setup.py    /^def build(cc="tcc", libs=None, dstPath = "bin.c"):$/;“
./vm.c:14:#include "bin.c"

应该是默认参数吧,但是好像找不到这个文件(bin.c)

yangluo@yangluoPC:~/github/subpy/src$ sudo find / -name 'bin.c'
yangluo@yangluoPC:~/github/subpy/src$ 

是不是漏了?

tm2c的GC问题

  1. 在TmModule结构体中增加mark函数指针,GC的时候检查mark函数。
  2. 转换成C代码的过程中在*_py_main函数中增加tm_define_module(name, globals, _py_mark)调用

dict优化

考虑采用公共溢出区的hashmap实现
优点如下

  • hash搜索,速度可以接受
  • 内存紧凑
  • 实现简单

cachepy测试不通过

test-function异常

最后一条指令是OP_NEXT,
在test.py中import_func中出现了crash

直接运行test-function.py不会crash

使用valgrind分析

can not handle big list

when a list members is bigger than the cursive depth, compiler will fail to work.
the problem lies in encode.py, which handle list syntax tree cursively.

try-except嵌套方案

try:
    expr_list
except:
    handle_exc()

可以转为一种特殊的函数调用,在新的帧栈上面,但是操作数栈不变

if (try( expr_list )) else handle_exc() 

tm_eval每次都使用setjmp会不会比较重,或者tm_eval干脆每次都检查返回值,所有的C语言API都不抛异常,这样符合C语言的习惯,但是要修改的内容比较多,需要好好考虑。

整合文件到tm.c

把所有的头文件和C文件整合到一个文件tm.c,这样tm2c转换的时候只依赖这一个文件

目前的文件包括

tm.h, instructions.h, object.h
string.c, list.c, dict.c, number.c, function.c, ops.c, builtins.c, tmarg.c, util.c, vm.c, interp.c, gc.c, exception.c

Mac下自举报segment fault错误

$> source build.sh mp
...
use python interpreter: ./mp
++ ./mp encode.py init.py
Segmentation fault: 11
...

Token修改为类后导致了这个错误

# 原来的Token
def Token(type='symbol',val=None,pos=None):
    self = newobj()
    self.pos=pos
    self.type=type
    self.val=val
    return self

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.