Giter Club home page Giter Club logo

minishell's Introduction

Welcome to minishell 🐚

This project is about creating a shell called Minishell based on bash with a reduced set of rules.

Minishell should be able to parse commands (simple commands, pipes, redirections), execute them properly, handle environment variables, keep an history of the commands and of course handle all kind of errors and interruptions.

⚠️ Before even starting to work on a shell which should reproduce the behavior of bash, I highly recommend reading the bash manual. ⚠️

Table of Contents

  1. Parser
  2. Interpreter

Parser

The parser is the part of the program responsible to read, understand and translate a command to a format the interpreter is able to work with. It is also responsible to check the validity of the input regarding a defined set of rules and to rise an error if the command is not valid.

Some interesting reads on the subject :

LR Parser principle

After some research, we decided to set up an LR parser (LR for Left-to-right, Rightmost derivation in reverse : reads input text from left to right without backing up, and produces a rightmost derivation in reverse).

The LR Parser is a bottom-up parser. That means that it will try to recognize the input's lowest-level small details first, before its mid-level structures, and leaving the highest-level overall structure of the command line to last.

Some interesting reads on the subject :

Below is a global view of the parser. We will detail these parts in the next steps.

LR Parser Schema


Lexer - Lexical analysis

LR Parser Schema

The work of the lexical analyser or lexer is to identify token in the input string according to certain lexical rules.

For bash, you can find some documentation about token recognition online : Shell token recognition (opengroup.org)

NOTE : Working with token instead of "words" will be much more comfortable when it will come to recognize grammar structure into the command line.


Parser - Syntax analysis

An important step for parsing is to define the set of grammar rules you will try to match your command line with.

As the project consists of creating a simple shell based on bash, we chose to simplify the shell grammar available here :

pipe_sequence	: simple_command 
				| pipe_sequence PIPE simple_command
				;
simple_command	: cmd_prefix cmd_word cmd_suffix
				| cmd_prefix cmd_word
				| cmd_prefix
				| cmd_name cmd_suffix
				| cmd_name
				;
cmd_name		: WORD
				;
cmd_word		: WORD
				;
cmd_prefix		: io_redirect
				| cmd_prefix io_redirect
				;
cmd_suffix		: io_redirect
				| cmd_suffix io_redirect
				| WORD
				| cmd_suffix WORD
				;
io_redirect		: io_file
				| io_here
				;
io_file			: RET_TO filename
				| RET_FROM filename
				| DGREAT filename
				;
filename		: WORD
				;
io_here			: DLESS here_end
				;
here_end		: WORD
				;

Parsing table

In order to create the parsing table, we used a specific program called Bison. Bison is the GNU version of yacc. It is a parser generator that can generate c code from a grammar.

As this code is not ok at all with the school norm and as it is not very right to get a generated code for a school project, we just used bison to create a textual automaton based on our grammar. From that automaton, we will deduce the parsing table.

Install bison :

$> apt-get install bison

Automaton creation :

$> bison -o gram.y

gram.y is the grammar in a specific format (the actual format from this readme with some additions).

From the above command, we get a textual automaton generated :

state 0

    0 $accept: . pipe_sequence $end

    WORD      shift, and go to state 1
    RET_TO    shift, and go to state 2
    RET_FROM  shift, and go to state 3
    DLESS     shift, and go to state 4
    DGREAT    shift, and go to state 5

    pipe_sequence   go to state 6
    simple_command  go to state 7
    cmd_name        go to state 8
    cmd_prefix      go to state 9
    io_redirect     go to state 10
    io_file         go to state 11
    io_here         go to state 12


state 1

    8 cmd_name: WORD .

    $default  reduce using rule 8 (cmd_name)


state 2

   18 io_file: RET_TO . filename

    WORD  shift, and go to state 13

    filename  go to state 14

...

Then, we just use this automaton to create a parsing table file (manually, yes ...) that can be read using get next line function and feeded to a LR parser algorithm :

State | token type | action | next state | number of reduced tokens

0	0	0	1	-1
0	1	0	2	-1
0	2	0	3	-1
0	3	0	4	-1
0	4	0	5	-1
0 	100	-1	6	-1
0	101	-1	7	-1
0	102	-1	8	-1
0	104	-1	9	-1
0	106	-1	10	-1
0	107	-1	11	-1
0	109	-1	12	-1
1	-1	1	102	1
2	0	0	13	-1
2	108	-1	14	-1
3	0	0	13	-1
3	108	-1	15	-1
4	0	0	16	-1
...

To make this file easier to compute, we just used a small range of int values that match enumerations in the header files :

typedef enum e_actions
{
	SHIFT,
	REDUCE,
	ACCEPT
}	t_actions;

typedef enum e_token_types
{
	T_END = -2,
	T_WORD = 0,
	T_RED_TO,
	T_RED_FROM,
	T_DLESS,
	T_DGREAT,
	T_PIPE	
}	t_token_types;

typedef enum e_rules
{
	R_PIPE_SEQUENCE = 100,
	R_SIMPLE_COMMAND,
	R_CMD_NAME,
	R_CMD_WORD,
	R_CMD_PREFIX,
	R_CMD_SUFFIX,
	R_IO_REDIRECT,
	R_IO_FILE,
	R_FILENAME,
	R_IO_HERE,
	R_HERE_END
}	t_rules;

For example, the line :

0	0	0	1	-1

can be read : When you are in the STATE 0, if the token you read from input is a WORD, you must SHIFT this token from the input into the stack, then change the STATE to 1. The last "-1" just means that the last parameter is not relevant for this line.


Interpreter

Interpreter principle

the principle of the interpretor is to browse the tree in POST ORDER, and to collect the right information in the right order to execute commands and pipes and redirection.


Tree visit

We chose to visit the tree in POST ORDER, that means that we will start at the root of the tree, the explore left branch to the bottom, then right, and then have an action on the current node.

The algorythm looks like this:

visit(node)
    if node == null
        return ;
    visit(node->left)
    visit(node->right)
    <do something>

POST ORDER Visit

For a command like

ls -l | grep .. | wc -l > file

the tree should be something like :

POST ORDER Visit real


Pipeline execution

Useful document:

This document explain the principle of creating a subshell to handle every command of the pipeline, then to pipe and fork before each command (except the last one which is executed in the subshell process and returns its exit status to minishell).

The principe with the tree is to browse the tree like in the schemas above and to stop on each PIPE node. When we get a PIPE node, we fork the subshell then we execute the tree from the pipe node (which is considered as the root for this command).

Pipeline exec

When we browse a part of the tree from the root or from a pipe, we consider that we stop browsing the left branch if we encounter another pipe, to avoid executing a command twice.

Before launching a command, we have of course to create a pipe between this command and the previous and/or next one. This is done by connecting the STDOUT of a command to the STDIN of the next command. We use pipe and dup2 for that.


Redirection

The redirection is done in this project with dup and dup2. Nothing too difficult but you need to think about saving STDIN and STDOUT if you execute the redirection within the main process (for builtins when you are not in a pipeline for example).


Installation

Clone the repository:

git clone [email protected]:twagger/minishell.git

Make to create the minishell program:

make

Launch minishell

./minishell

Authors

👨 Thomas WAGNER

👩 Wen Tzu Lo

minishell's People

Contributors

iamwen1023 avatar twagger avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

iamwen1023 bks-lo

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.