-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathreadme.txt
231 lines (189 loc) · 9.21 KB
/
readme.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
Joshua Baker jkb150030
Dr. Gupta
CS 4386.001
4/25/18
================================================================
INTRODUCTION
================================================================
This project is a compiler for the TL13 language according to
the specification given by Dr. Gupta. If the input code file is
valid, the compiler outputs C which can then be compiled to
machine code with a C compiler, such as gcc.
Below is information about how to compile this project and use
the resulting compiler. Then follows a detailed description of
each component of the compiler and how it was designed.
================================================================
USAGE
================================================================
DEPENDENCIES
make
Used to build each component of the project.
flex
Used to construct the lexical analyser.
bison
Used to construct the parser.
gcc
Used to compile both the compiler and the output C from
the compiler (from the TL13 source).
BUILDING THE COMPILER
To create the compiler executable, from the project root
folder execute 'make'. This will populate the intermediate
folder with all the object files and the outputs from flex
and bison, and the compiler executable will be placed in
the project root folder and called 'tl13compiler'.
To delete the contents of the intermediate folder and the
compiled folder as well as tl13compiler, from the project
root folder execute 'make clean'.
USING THE COMPILER
The compiler executable reads TL13 input code from the
standard input and prints the corresponding C code on the
standard output. If an error is encountered, that will be
printed to the standard error and the compiler will exit
with a non-zero status.
If compiling succeeds, the resulting C code can be
compiled to machine code with simply
'gcc <file containing output from tl13compiler>'.
To simplify using the compiler, I've created a helper script
called compileTL13CodeFile.sh. Note that this script relies
on bash as a dependency. Using compileTL13CodeFile.sh, you
can compile a TL13 codefile with the following command:
'./compileTL13CodeFile.sh <code file>'.
If an error occurs along the way, the helper script will
print an error message to the standard error stream.
Otherwise, if compilation succeeds, the output C code
is printed to the standard output along with a message
explaining where to find the compiled executable binary.
================================================================
PROJECT STRUCTURE
================================================================
DIRECTORIES
In an effort to stay organized, I created mutliple top-level
directories.
src
This folder contains all of the C code files as well as
the definitions for the lexical analyser and the parser.
headers
This folder contains all the header files except for the
one that is created by bison during build. Almost every
C code file in src has a corresponding header file
declaring its functions and structures.
intermediate
This folder exists to keep the temporary mid-build files
(object files from compiling C code files and C files
from bison and flex) away from everything else.
examples
This folder contains all the TL13 code files I wrote to
test the compiler. It contains 2 subdirectories: good
and bad.
good contains the TL13 code files that are valid and
ought to compile successfully.
bad contains code files that have been malformed in
specific ways (corresponding to their names) to test
that the compiler notices and reports errors properly.
compiled
This folder contains both the C output from tl13compiler
as well as the executable binary after compiling a TL13
code file using compileTL13CodeFile.sh.
LEXICAL ANALYSER
The lexical analyser is created by flex according to the
flex code file src/lexer.l. This file contains definitions
for each valid token, storing values where appropriate and
returning the corresponding enum value every time a token is
recognized.
A few of the tokens require storing additional data before
flex returns. The NUM token stores the integer value
corresponding to the recognized text; the BOOLLIT token
stores the boolean value; the IDENT token stores the string
value; and the OP tokens store the enum value of the
particular operator that was recognized.
If a character is fed to the standard input that is not in
TL13's alphabet, flex simply ignores that character.
The only error that can arise from this module is the int
out of bounds error, which is thrown if an integer literal
is greater than 2147483647.
PARSER
The parser defines the structure of the language's context-
free grammar and constructs the parse tree. It is defined
by src/parser.y
The grammar is defined in exact congruence with the language
specification provided by Dr. Gupta.
To improve readability, most of the code that constructs
each node in the tree was written in functions inside
src/makeStructs.c. The code for rules which do not involve
creating new node structures is written plainly in parser.y.
Once the whole parse tree has been constructed, the parser
initiates compilation.
Most of the errors that can be encountered during execution
are seen here (or in src/makeStructs.c). In the case of a
syntax error, the bison-generated parser will call yyerror
which outputs an error message and immediately stops the
program execution. All the other errors are discovered
while creating the parse tree, and yyerror is then called
manually.
SYMBOL TABLE
The symbol table keeps track of currently defined variables
in the TL13 code, storing each variable's name and type.
Its code consists of sdbm.c and symbolTable.c in src.
The symbol table is implemented with a hash table
(hash map) which makes use of the sdbm hash function. I
copied this hash function from
http://www.cse.yorku.ca/~oz/hash.html.
The symbol table is a simple array of pointers to symbol
structures. Its default length is 16, but it expands by
doubling in size every time it runs out of space to store
new variables. Each symbol structure contains the string
variable name and the type of that variable.
The hash table is an array of pointers to nodes, 31 long,
and uses linked lists to handle hash collisions. Each node
contains an integer which refers to the index of its
variable in the symbol table as well as a pointer to the
next node in the linked list. Each node is placed at the
end of the linked list corresponding to the index of the
hash table that matches the hash value of the variable
name.
The index of each variable in the symbol table is used as
a handle to refer to that variable throughout the parse
tree. Information about a variable can be quickly found
using its handle, and a variable's handle can be quickly
derived by taking the hash value of the variable's name
and searching the hash table.
COMPILATION
All the functionality that produces the output C code from
the parse tree is defined in src/compile.c. Each type of
node in the parse tree has its own compilation function,
which prints out just the C code for that type of node.
Each of these functions also frees the memory associated
with its node.
compile
This function is called by the parser to initiate
compilation. It calls printHeader, declareVars,
printStatements, and printFooter.
printHeader
This function prints include statements (to get
stdlib.h, which is needed for free(), size_t, NULL, and
atoi(); stdio.h, which provides getline() and printf();
and stdbool.h, which allows me to use the keywords
"bool", "true," and "false" in the C code (to improve
readability).
This function also defines int readInt(), which reads
an integer value from the stdin at runtime (no error
handling functionality is built in).
Finally, printHeader prints the beginning of the main
function so that the program can get started.
declareVars
This function has the simple task of defining each
variable that will be used in the program. The names
given in the TL13 code file are preserved, but the
prefix "tl13_" is added to each variable name to avoid
collisions with terms in the output C, such as "NULL".
printStatements
This function walks through the list of statements
and calls the proper function to print the C code
associated with each one.
printFooter
This function prints the end of the main method so the
output C compiles correctly.
printTabs
This function is declared in utility.h. It helps make
The output C more readable by keeping track of proper
indentation.