[Phase1] Add README.md

liqwang · Oct 29, 2023 · 0871bc3 · 0871bc3
1 parent 26c3fef
commit 0871bc3
Showing 1 changed file with 233 additions and 0 deletions.
diff --git a/phase1/README.md b/phase1/README.md
@@ -0,0 +1,233 @@
+# CS323 Project Phase1
+
+**12011619 Liquan Wang**
+
+**12111744 Wenhui Tao**
+
+**12111611 Ruixiang Jiang**
+
+
+
+## Run
+
+```shell
+> make splc #compile the splc compiler for SPL language
+> make test #compile the compiler and run test cases
+> make clean #clean the directory
+> make lexer #using lex.l to compile the lexer for debug 
+```
+
+
+
+## Flex
+
+In the `lex.l` file
+
+- `yylval` returns the `<name>\n` of the lexical unit, for example
+
+  ``` c
+  "{" {yylval=strdup("LC\n"); return LC;}
+
+- We detect tokens in the correct format and tokens in the wrong format.
+
+  - The correct token can help the **Bison** part to do syntax analysis
+
+  - The wrong token can also help the bison to do error recovery and analyze the entire program
+
+    ```c
+    /* For example: invalid decimal and hex INT: starting with 0 */
+    0[0-9]+|0[xX]0[0-9a-fA-F]+ {lexical_error(yytext,yylineno); return INT;}
+    ```
+
+
+
+
+## Bison
+
+In the `syntax.y` file
+
+- The various operators are defined from top to bottom in order of precedence from low to high.
+
+  The priority is specified by text `%prec` to eliminate some ambiguity.
+
+- Generate syntax tree:
+
+  Instead of defining  `struct Node`, we use the function `concat_shift()` to truncate the next level of string by text `\n` and then add two spaces
+  before each part
+
+  In each production, we just need to use the `concat_shift()` as below to merge the syntax information in `$n` to `$$`
+
+  This method's code implementation is simplier than the traditional `struct Node` method
+
+  ```c
+  FunDec:
+    ID LP VarList RP {asprintf(&$$,"FunDec (%d)\n%s\n", @$.first_line, concat_shift($1,$2,$3,$4));}
+  ```
+  
+  The function uses Variadic Template and Fold Expressions in C++17, so we should compile the `splc` use option `-std=c++17` in g++
+  
+  ```c++
+  //concat the strings and shift two spaces
+  template<typename... Args>
+  char* concat_shift(Args... strs) {
+      //1.concat the strings
+      char* concat = strdup("");
+      (asprintf(&concat,"%s%s",concat,strs), ...);
+  
+      //2.shift two spaces in each line
+      char* result = strdup("");
+      char* line=strtok(concat,"\n");
+      while(line!=NULL){
+          asprintf(&result,"%s  %s\n",result,line);
+          line=strtok(NULL,"\n");
+      }
+      return result;
+  }
+  ```
+
+  When the `splc` parse the program recursively, we can get the syntax tree as below
+
+  ```
+  Program (1)
+    ExtDefList (1)
+      ExtDef (1)
+        Specifier (1)
+          TYPE: int
+        FunDec (1)
+          ID: test_1_o02
+          LP
+          RP
+        CompSt (2)
+          LC
+          DefList (3)
+        ...
+  ```
+
+
+
+
+## Error detection
+
+- In `lex.l` we detect the **lexical error**, and pass the right token to the parser as if the lexeme is right, so the syntax analysis process can continue without error
+
+  For example, the code below recognizes the invalid decimal and hex `INT` token starting with 0
+
+  ```c
+  0[0-9]+|0[xX]0[0-9a-fA-F]+ {lexical_error(yytext,yylineno); return INT;}
+  ```
+
+  Moreover, we can also recognize the invalid `CHAR` token like below
+
+  ```c
+  '([^']|\\[xX](0|[1-9a-fA-F][0-9a-fA-F]?))' {asprintf(&yylval,"CHAR: %s\n",yytext); return CHAR;}
+  '[^']*' {lexical_error(yytext,yylineno); return CHAR;} /*Invalid CHAR: not matched by the above valid CHAR regex will fail to this regex*/
+  ```
+
+- In `syntax.y` we detect the **syntax error** by matching the wrong sequence of terminals and nonterminals as below
+
+  ```c
+  ExtDef:
+    Specifier ExtDecList SEMI {asprintf(&$$,"ExtDef (%d)\n%s\n", @$.first_line, concat_shift($1,$2,$3));}
+  | Specifier ExtDecList error {syntax_error("semicolon \';\'",@2.last_line);}
+
+  CompSt:
+  ...
+  /*The two error cases below indicate that declaration must before definition*/
+  | LC DefList StmtList DefList error {syntax_error("specifier",@3.first_line);}
+  | LC StmtList DefList error {syntax_error("specifier",@2.first_line);}
+  ```
+
+  
+
+## Bonus
+
+Beyond the basic requirement, we also implement some extra features
+
+Because of aesthetic issues, the code is partially commented, but the actual program is not commented
+
+To make the code more intuitive, we’ve attached test samples and results that you can skip if you don’t want to read them
+
+
+
+### Single- and multi-line comment
+
+```c
+/*in lex.l*/
+"//".*
+"/*"((("*"[^/])?)|[^*])*"*/"
+```
+
+
+
+### Macro preprocessor and file inclusion
+
+Enter the `<macro>` state after a special beginning is recognized to distinguish it from the exception recognition capture
+
+```c
+/*in lex.l*/
+"#include" {yylval=strdup("INCLUDE\n"); BEGIN( macro); return INCLUDE;}
+"#define" {yylval=strdup("DEFINE\n"); BEGIN(macro); return DEFINE;}
+"#ifdef" {yylval=strdup("IFDEF\n"); BEGIN(macro); return IFDEF;}
+"#else" {yylval=strdup("MACROELSE\n"); return MACROELSE;}
+"#endif" {yylval=strdup("ENDIF\n"); return ENDIF;}
+<macro>{
+	"," {yylval=strdup("COMMA\n"); return COMMA;}
+	"<" {return LT;}
+	">" {return GT;}
+	// \" {return DQUOT;}
+	\n {BEGIN(INITIAL);}
+    [ \t]+ /* ignore word splits */{}
+	// [^," "<>"\n]+ {asprintf (&yylval ,"%s\n",yytext); return MACRO;}
+}
+```
+
+Macro and file introduction automatically changes to the beginning of the file, and parallel output, to achieve `#define` `#ifdef` `#else` `#endif` `#include`
+
+In the SPL file, we can define multiple macros on a single line separated by commas
+
+```c
+/*in syntax.y*/
+RES:
+  Program {if(!has_error){printf("%s", $1);}}
+| MACROStmt Program {if(!has_error){printf("%s", $1);}}
+
+MACROStmt:
+  INCLUDEStmt {$$=$1;}
+| DEFINEStmt {$$=$1;}
+| DEFINEStmt MACROStmt {asprintf(&$$,"%s%s", $1,$2);}
+| INCLUDEStmt MACROStmt {asprintf(&$$,"%s%s", $1,$2);}
+
+INCLUDEStmt:
+  INCLUDE LT MACRO GT {asprintf(&$$,"INCLUDE (%d)\n%s", @1.first_line ,concat_shift($3));}
+| INCLUDE DQUOT MACRO DQUOT {asprintf(&$$,"INCLUDE (%d)\n%s", @1.first_line ,concat_shift($3));}
+
+DEFINEStmt:
+  DEFINE MACRO MACRO {asprintf(&$$,"DEFINE (%d)\n%s", @1.first_line ,concat_shift($2,$3));}
+| DEFINEStmt COMMA MACRO MACRO {asprintf(&$$,"%sDEFINE (%d)\n%s",$1, @1.first_line ,concat_shift($3,$4));}
+
+Stmt:
+...
+| IFDEF Stmt ENDIF {asprintf(&$$,"Stmt (%d)\n%s\n", @$.first_line, concat_shift($1,$2,$3));}
+| IFDEF MACRO Stmt MACROELSE Stmt ENDIF {asprintf(&$$,"Stmt (%d)\n%s\n", @$.first_line, concat_shift($1,$2,$3,$4,$5));}
+```
+
+
+
+### For loop statement
+
+```c
+/*in lex.l*/
+for {yylval=strdup("FOR\n"); return FOR;}
+```
+
+In the parser, the logic of `for` is very similar to `if` and `while`
+
+```c
+/*in syntax.y*/
+Stmt:
+...
+| FOR LP Exp SEMI Exp SEMI Exp RP Stmt {
+    asprintf(&$$,"Stmt (%d)\n%s\n", @$.first_line, concat_shift($1,$2,$3,$4,$5,$6,$7,$8,$9));
+}
+```
+