
MIPS Compiler
In the spring of 2014, Robert Smayda and I wrote our own computer language. The language has an LL(1) grammar, which means input is parsed from left to right and constructs a leftmost syntax tree (derivation). An LL(1) language means that our grammar is context-free, which greatly decreased the difficulty in building an accompanying parser.
A parser is a tool that iterates through a given set of code and ensures it follows the required grammatical structure. For instance, the english language has a set of requirements, like a capital letter at the beginning of the sentence and a period, question mark, or exclamation point at the end. If either of those were missing, a parser would āthrow an errorā and identify the improper grammar.
In the same vein, our grammar has rules, like the requirement of the words ābeginā and āendā at the start and finish of a program. The requirements for our languageās grammar can be found a few tabs over.
In addition to a parser, there is an additional tool used to verify a program structure, known as a lexer. A lexer, or lexical analyzer is used to validate the tokens in a program. Jumping back to our comparison with the English language we can see why this tool is needed. Imagine we had the sentence, āI like dogsā. The parser verifies there is a capital letter in the beginning, a period in the end, and a subject, verb and object. This is a valid sentence. Now imagine it was changed to āI like asdasdā. This sentence has a capital letter, a period, and a subject (I), verb (like), and object (asdasd). The parser would then validate this sentence as well. However, asdasd is meaningless, and as such this sentence shouldnāt be accepted. Thatās where the lexer comes in. The lexer takes a sequence of characters (e.g. every letter) and converts them into a sequence of tokens (e.g. words). If a token does not match a set of rules, then the lexer will throw an error and invalidate the program. In the English language example, the simplest definition of requirements would be āwords in the english languageā.
When these two tools are used together, they can be used to validate a program. Once the grammar has a parser and lexer, we are able to generate machine operations to execute our commands.
We wrote our parser as a recurse descent parser, which interacts with our lexer and code generation modules. All code can be found on my github site. Iāve ported it to the web for you to demo below. Enjoy!
Our compiler compiles our invented language into the MIPS assembly language instruction set.
| Name | Syntax | Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā | Name | Syntax |
| BEGIN | begin | END | end | |
| READ | read | WRITE | write | |
| INT | int | STRING | string | |
| BOOL | bool | IF | if | |
| ELSE | else | WHILE | while | |
| TRUE | True | FALSE | False | |
| AND | and | OR | or | |
| NOT | not |
| Name | Syntax |
| ID | \w(\w|\d)* |
| INTLITERAL | \d+ |
| STRINGLITERAL | ā.*?ā |
| Name | Syntax | Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā | Name | Syntax |
| PLUS | + | MINUS | ā | |
| DIV | / | MOD | % | |
| EQUALTO | == | LESS_EQUALS | <= | |
| GREATER_EQUALS | >= | LESSTHAN | < | |
| GREATERTHAN | > | AND | && | |
| OR | || | NOT | ! | |
| LPAREN | ( | RPAREN | ) | |
| SEMICOLON | ; | COMMA | , | |
| ASSIGNOP | := | LCURLY | { | |
| RCURLY | } |
| Term | Definition | |
|---|---|---|
| <program> | > | #start begin <statement list> end #finish |
| <content> | > | {<statement list} |
| <statement list> | <statement> {statement list} | |
| <statement> | > | <type> <ident> #declare ; |
| > | <ident> := <expression> #assign ; | |
| > | read(<id list> #read_ids ) ; | |
| > | write(<expr list> #write_exprs) ; | |
| > | while (<expression>) <content> | |
| if (<expression>) <content> { else <content> } | ||
| <id list> | <ident> {, <ident>} | |
| <expr list> | > | <expression> {, <expression> } |
| <expression> | > | <bool> { <bool_op> <bool> #bool_op } |
| <bool> | > | <clause> { <num_op> <clause> #num_op } |
| <clause> | > | <term> { <plus_op> <term> #add_op } |
| <term> | > | <factor> { <mult_op> <factor> #mult_op } |
| <factor> | > | <neg_op> <primary> #negate_term |
| > | <primary> | |
| <primary> | > | (<expression>) |
| > | <ident> | |
| > | INTLITERAL #process_int_literal | |
| > | STRINGLITERAL #process_string_literal | |
| > | BOOLLITERAL # process_bool_literal | |
| <ident> | > | ID # process_id |
| <type> | > | int |
| > | string | |
| > | bool | |
| <plus_op> | > | PLUS |
| > | MINUS | |
| <mult_op> | > | MULT |
| > | DIV | |
| > | MOD | |
| <neg_op> | > | MINUS |
| > | NOT | |
| <num_op> | > | EQUALTO |
| > | LESSTHAN | |
| > | GREATERTHAN | |
| > | LESS_EQUALS | |
| > | GREATER_EQUALS | |
| <bool_op> | > | AND |
| > | OR | |
int x;
x:=1;
write(x<20);
end