MIPS Compiler

In the spring of 2014, Robert Smayda and I wrote our own computer language. The language has an LL(1) grammar, which means input is parsed from left to right and constructs a leftmost syntax tree (derivation). An LL(1) language means that our grammar is context-free, which greatly decreased the difficulty in building an accompanying parser.

A parser is a tool that iterates through a given set of code and ensures it follows the required grammatical structure. For instance, the english language has a set of requirements, like a capital letter at the beginning of the sentence and a period, question mark, or exclamation point at the end. If either of those were missing, a parser would “throw an error” and identify the improper grammar.

In the same vein, our grammar has rules, like the requirement of the words “begin” and “end” at the start and finish of a program. The requirements for our language’s grammar can be found a few tabs over.

In addition to a parser, there is an additional tool used to verify a program structure, known as a lexer. A lexer, or lexical analyzer is used to validate the tokens in a program. Jumping back to our comparison with the English language we can see why this tool is needed. Imagine we had the sentence, “I like dogs”. The parser verifies there is a capital letter in the beginning, a period in the end, and a subject, verb and object. This is a valid sentence. Now imagine it was changed to “I like asdasd”. This sentence has a capital letter, a period, and a subject (I), verb (like), and object (asdasd). The parser would then validate this sentence as well. However, asdasd is meaningless, and as such this sentence shouldn’t be accepted. That’s where the lexer comes in. The lexer takes a sequence of characters (e.g. every letter) and converts them into a sequence of tokens (e.g. words). If a token does not match a set of rules, then the lexer will throw an error and invalidate the program. In the English language example, the simplest definition of requirements would be “words in the english language”.

When these two tools are used together, they can be used to validate a program. Once the grammar has a parser and lexer, we are able to generate machine operations to execute our commands.

We wrote our parser as a recurse descent parser, which interacts with our lexer and code generation modules. All code can be found on my github site. I’ve ported it to the web for you to demo below. Enjoy!

Our compiler compiles our invented language into the MIPS assembly language instruction set.

Name	Syntax	Name	Syntax
BEGIN	begin	END	end
READ	read	WRITE	write
INT	int	STRING	string
BOOL	bool	IF	if
ELSE	else	WHILE	while
TRUE	True	FALSE	False
AND	and	OR	or
NOT	not

Name	Syntax
ID	\w(\w\|\d)*
INTLITERAL	\d+
STRINGLITERAL	“.*?”

Name	Syntax	Name	Syntax
PLUS	+	MINUS	–
DIV	/	MOD	%
EQUALTO	==	LESS_EQUALS	<=
GREATER_EQUALS	>=	LESSTHAN	<
GREATERTHAN	>	AND	&&
OR	\|\|	NOT	!
LPAREN	(	RPAREN	)
SEMICOLON	;	COMMA	,
ASSIGNOP	:=	LCURLY	{
RCURLY	}

Term		Definition
<program>	>	#start begin <statement list> end #finish
<content>	>	{<statement list}
<statement list>		<statement> {statement list}
<statement>	>	<type> <ident> #declare ;
	>	<ident> := <expression> #assign ;
	>	read(<id list> #read_ids ) ;
	>	write(<expr list> #write_exprs) ;
	>	while (<expression>) <content>
		if (<expression>) <content> { else <content> }
<id list>		<ident> {, <ident>}
<expr list>	>	<expression> {, <expression> }
<expression>	>	<bool> { <bool_op> <bool> #bool_op }
<bool>	>	<clause> { <num_op> <clause> #num_op }
<clause>	>	<term> { <plus_op> <term> #add_op }
<term>	>	<factor> { <mult_op> <factor> #mult_op }
<factor>	>	<neg_op> <primary> #negate_term
	>	<primary>
<primary>	>	(<expression>)
	>	<ident>
	>	INTLITERAL #process_int_literal
	>	STRINGLITERAL #process_string_literal
	>	BOOLLITERAL # process_bool_literal
<ident>	>	ID # process_id
<type>	>	int
	>	string
	>	bool
<plus_op>	>	PLUS
	>	MINUS
<mult_op>	>	MULT
	>	DIV
	>	MOD
<neg_op>	>	MINUS
	>	NOT
<num_op>	>	EQUALTO
	>	LESSTHAN
	>	GREATERTHAN
	>	LESS_EQUALS
	>	GREATER_EQUALS
<bool_op>	>	AND
	>	OR

Enter program:

Compiled Output:

begin
int x;
x:=1;
write(x<20);
end