12011619 Liquan Wang
12111744 Wenhui Tao
12111611 Ruixiang Jiang
Score: 100/100
In this phase, we generate the syntax tree using the node
class during the bottom-up parsing process of Bison
class node { //syntax tree node
public:
string type; //eg: "ID","VarDec","COMMA"
vector<node*> children;
string value; //only for leaf node: INT, ID
};
Then we can generate IR by traversing the syntax tree and using the provided translation scheme
For each nonterminal X, we define the function translate_X()
and invoke the functions recursively, for example
string translate_Args(node* Args, vector<string> &arg_list){
vector<node*> nodes = Args->children;
string children = expression(Args);
if(children=="Exp"){
string tp = NEW_PLACE;
arg_list.push_back(tp);
return translate_Exp(nodes[0],tp);
}else if(children=="Exp COMMA Args"){
string tp = NEW_PLACE;
arg_list.push_back(tp);
return concat_ir(
translate_Exp(nodes[0],tp),
translate_Args(nodes[2],arg_list)
);
}else{return "";} //never
}
The two macros below are used to generate new variables and labels
#define NEW_PLACE "t"+to_string(place_count++)
#define NEW_LABEL "label"+to_string(label_count++)
int place_count = 1;
int label_count = 1;
We apply IR optimization to translate_Exp()
function, the optimization is enabled by default.
However, in some cases, optimization will cause error, so we should disable it when invoking the function
string translate_Exp(node* Exp, string* place, bool optimize=true);
The key idea is eliminating the IR only for production Exp: INT|ID|CHAR|FLOAT
So it is extremely easy to implement, the core code is just 5 lines as shown below
string translate_Exp(node* Exp, string* place, bool optimize){
if(nodes.size()==1){ //Exp:INT|ID|CHAR|FLOAT
if(optimize){
*place = nodes[0]->value; //value of INT|ID|CHAR|FLOAT
return ""; //No need to generate IR for constant
}else{return *place+" := "+nodes[0]->value;} //without optimization, need to generate IR for constant
}
...
}
Although it is easy, it can eliminate nearly half redundant IR instructions according to the test, so it is a very cost-effective method for IR optimization
To support the compound assignment operators: += -= *= /= ++ --
, we first added six match rules in lex.l
"+=" {yylval=new node("PLUS_ASSIGN"); return PLUS_ASSIGN;}
"-=" {yylval=new node("MINUS_ASSIGN"); return MINUS_ASSIGN;}
"*=" {yylval=new node("MUL_ASSIGN"); return MUL_ASSIGN;}
"/=" {yylval=new node("DIV_ASSIGN"); return DIV_ASSIGN;}
"++" {yylval=new node("SELF_PLUS"); return SELF_PLUS;}
"--" {yylval=new node("SELF_MINUS"); return SELF_MINUS;}
Then we just need to convert the compound expression to normal expression during the parsing process of Bison, by changing the syntax tree's structure:
-
convert
Exp1 [+-*/]= Exp2
toExp1 = Exp1 [+-*/] Exp2
node* convert_assign(node* Exp1, node* Exp2, string _operator){ node* op = new node(_operator); //`operator` is an internal keyword in C++ node* res = new node("Exp",vector{Exp1,op,Exp2}); node* assign = new node("ASSIGN"); return new node("Exp",vector{Exp1,assign,res}); } %% Exp: ... | Exp PLUS_ASSIGN Exp {$$=convert_assign($1,$3,"PLUS");} | Exp MINUS_ASSIGN Exp {$$=convert_assign($1,$3,"MINUS");} | Exp MUL_ASSIGN Exp {$$=convert_assign($1,$3,"MUL");} | Exp DIV_ASSIGN Exp {$$=convert_assign($1,$3,"DIV");} %%
-
convert
Exp[++|--]
toExp = Exp [+-] 1
node* convert_self_assign(node* Exp, string _operator){ //`operator` is an internal keyword in C++ node* one = new node("Exp", vector{new node("INT", "1")}); return convert_assign(Exp, one, _operator); } %% Exp: ... | Exp SELF_PLUS {$$=convert_self_assign($1,"PLUS");} | Exp SELF_MINUS {$$=convert_self_assign($1,"MINUS");} %%
After the preprocess of compound assignment operator, we can normally apply the translation scheme during the top-down traverse
To support for loop, we added four productions in syntax.y
Stmt:
...
//for(...;...;...){...}
| FOR LP Def Exp SEMI Exp RP Stmt {$$=new node("Stmt",vector{$1,$2,$3,$4,$5,$6,$7,$8});}
//for(...;...;){...}
| FOR LP Def Exp SEMI RP Stmt {$$=new node("Stmt",vector{$1,$2,$3,$4,$5,$6,$7});}
//for(;...;...){...}
| FOR LP SEMI Exp SEMI Exp RP Stmt {$$=new node("Stmt",vector{$1,$2,$3,$4,$5,$6,$7,$8});}
//for(;...;){...}
| FOR LP SEMI Exp SEMI RP Stmt {$$=new node("Stmt",vector{$1,$2,$3,$4,$5,$6,$7});}
While traversing the syntax tree, we just need to translate the for loop like while loop, below is the example translation scheme for for(...;...;...){...}
if(children=="FOR LP Def Exp SEMI Exp RP Stmt"){
string lb1 = NEW_LABEL;
string lb2 = NEW_LABEL;
string lb3 = NEW_LABEL;
string ir1 = translate_Def(nodes[2]);
string ir2 = "LABEL "+lb1+" :";
string ir3 = translate_cond_Exp(nodes[3],lb2,lb3);
string ir4 = "LABEL "+lb2+" :";
string ir5 = translate_Stmt(nodes[7]);
string ir6 = translate_Exp(nodes[5],nullptr); //no need to store the result
string ir7 = "GOTO "+lb1;
string ir8 = "LABEL "+lb3+" :";
return concat_ir(ir1,ir2,ir3,ir4,ir5,ir6,ir7,ir8);
}
The arrays' information are stored in a map(like symbol table), which stores the array size of each level
unordered_map<string,vector<int>> arrays = {}; //array_name -> {size1,size2,...}
For example:
- If
a[6]
is declared, then the entry isa->{1}
- If
b[6][8]
is declared, then the entry isb->{1,8}
- If
c[6][8][7]
is declared, then the entry isc->{1,7,7*8}
, i.e:c->{1,7,56}
- If
d[6][8][7][3]
is declared, then the entry isd->{1,3,3*7,3*7*8}
, i.e:d->{1,3,21,168}
By querying this map, we can compute the offset of each entry conveniently
The declaration of array appears in the production VarDec: VarDec LB INT RB
We can get the size of each dimension by iterating the syntax tree, then store the array information into the map, and generate the DEC
IR code: DEC {array_name} 4*{entry_count}
if(children=="VarDec LB INT RB"){
int count = 1;
vector<int> sizes= {};
do{//iterate multi-level array
sizes.push_back(count);
count *= stoi(nodes[2]->value.substr(1)); //delete the first '#' in INT
VarDec = VarDec->children[0];
nodes = VarDec->children;
}while(expression(VarDec)=="VarDec LB INT RB");
string array_name = nodes[0]->value; //here, nodes[0] must be `ID`
arrays.insert({array_name,sizes}); //store the array information into the map
return "DEC "+array_name+" "+to_string(count*4); //each int's size is 4 byte
}
The access of the array appears in the production Exp: Exp LB Exp RB
We can get the index of each dimension by iterating the syntax tree, then compute the offset using the following formula
Then we can get the array access IR below
address := &{array_name} + offset
place := *address