Midterm project for Compiler course
This is a lexical analyzer for a subset of C language (VC) implemented using Python 3. The lexical analyzer is able to recognize tokens, comments and throw errors for invalid tokens. The language definition is defined in VC Language Definition.
- Python 3.9 (you can also try other versions).
- Hydra to read config files
- A
.dat
file containing information about the Deterministic finite automata (DFA) used in the lexical analyzer, the format of the file is defined in πΎ Data File. - A source code file written in VC language (
.vc
file).
- The data file is in json format but the extension is .dat, there's a sample data file in the root directory of this project. The data file contains the following fields:
keywords
: a list of keywords in target the language.special_literals
: a list of special literals used in target the language.separators
: a list of characters used to separate tokens in target the language.terminal_types
: a list of types of tokens in target the language.nodes
: a list of nodes in the DFA, this includes:- the key of each node is the name of the node.
children
is a list of children of the node, each child is a map from a list of characters to the name of the child node- if the node is the starting node, it will include a field
start
with value true. - if the node is terminal, it will include a field
terminal
with value true and a fieldterminal_type
with the type of the token fromterminal_types
, else it will have a fieldterminal
with value false.
The config files is in yaml
format. They're read by Hydra framwork. There are 3 variables:
file_name
(string): .vc file to scan through (default: data/example_fib.vc)data_file
(string): .dat file which stores DFA's states (defailt: dfa.dat)no_comments
(boolean): whether to output comments or not (default: True)
To run the lexical scanner, run the following command in the terminal:
python src/lexical.py
To override .vc file for scanning, do:
python src/lexical.py file_name=<source_code_file>
For example:
python src/lexical.py file_name=data/example_gcd.vc
If prefered, comments can be put to ouput:
python src/lexical.py no_comments=False
To see more information about the command, run the following command in the terminal:
python src/lexical.py -h