-
Notifications
You must be signed in to change notification settings - Fork 1
/
final-work.tex
201 lines (149 loc) · 11.5 KB
/
final-work.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
% --------------------------------------------------------------
% This is all preamble stuff that you don't have to worry about.
% Head down to where it says "Start here"
% --------------------------------------------------------------
\documentclass[12pt,spanish]{article}
\usepackage[margin=1in]{geometry}
\usepackage{amsmath,amsthm,amssymb}
\usepackage{mathtools}
\usepackage{enumitem}
% Para eliminar la numeración de secciones
\setcounter{secnumdepth}{0}
\frenchspacing
% Solución temporal al problema de los tildes, funciona bien sin instalar nada, pero habría que saber instalar polyglossia.
\usepackage[T1]{fontenc}
\usepackage{selinput}
\SelectInputMappings{%
aacute={á},
eacute={é},
iacute={í},
oacute={ó},
uacute={ú},
ntilde={ñ}
}
% Para simbolos de conjuntos usualmente usados.
\newcommand{\N}{\mathbb{N}}
\newcommand{\Z}{\mathbb{Z}}
\newcommand{\R}{\mathbb{R}}
\newcommand{\K}{\mathbb{K}}
% Para estructurar el documento en teoremas, lemas, ejercicios, reflexiones, proposiciones, corolarios, y demostraciones.
\newenvironment{teorema}[2][Teorema]{\begin{trivlist}
\item[\hskip \labelsep {\bfseries #1}\hskip \labelsep {\bfseries #2.}]}{\end{trivlist}}
\newenvironment{lema}[2][Lema]{\begin{trivlist}
\item[\hskip \labelsep {\bfseries #1}\hskip \labelsep {\bfseries #2.}]}{\end{trivlist}}
\newenvironment{ejercicio}[2][Ejercicio]{\begin{trivlist}
\item[\hskip \labelsep {\bfseries #1}\hskip \labelsep {\bfseries #2.}]}{\end{trivlist}}
\newenvironment{reflexion}[2][Reflexión]{\begin{trivlist}
\item[\hskip \labelsep {\bfseries #1}\hskip \labelsep {\bfseries #2.}]}{\end{trivlist}}
\newenvironment{proposicion}[2][Proposición]{\begin{trivlist}
\item[\hskip \labelsep {\bfseries #1}\hskip \labelsep {\bfseries #2.}]}{\end{trivlist}}
\newenvironment{corolario}[2][Corolario]{\begin{trivlist}
\item[\hskip \labelsep {\bfseries #1}\hskip \labelsep {\bfseries #2.}]}{\end{trivlist}}
\newenvironment{demostracion}{{\emph{Demostración.}}}{\hfill $\blacksquare$ \\}
% Para poder escribir \iff en math mode fácil.
\DeclareRobustCommand\iff{\;\Leftrightarrow\;}
\begin{document}
% --------------------------------------------------------------
% Start here
% --------------------------------------------------------------
\title{Final Work - Pilot \\
\large Computer Architecture
}
\date{January 8th, 2018}
\author{Gonzalo Turconi - T-2820/7 \\
Ignacio Cattoni - C-6415/7 \\
Maximiliano Redigonda - R-4079/7}
\maketitle
\section{Introduction}
The aim of this draft is to develop a compiler from the Pilot language to the ARM assembler language. \textbf{Pilot} is a simple language; here is provided an accurate description of this language, based on the given model.
Every sentence on Pilot is written on a single line. Each line begins with a letter or a symbol that designates the operation to be executed, or with the name of a variable to assign its value.
Variables are alphanumeric strings that start with a lower-case letter, and have to be followed by a label number\footnote{A label number is a non-negative integer number, represented by a signed 32-bits integer (i.e. any integer between $0$ and $2^{31}-1$, without leading zeros)}. Thus:
\begin{itemize}
\item{a0, z24, x231, p2147483647, are valid names, whereas:}
\item{a-1, b, romeosantos, 23z, w2147483648, r00, are not valid names.}
\end{itemize}
\noindent The supported operations are the following:
\begin{itemize}
\item{Assign: it begins with a variable's name and is followed by an expression. Its effect is to assign the expression's value to the variable.}
\item{Input: it starts with an R, followed by the name of a variable. Its effect is to read an integer from the standard input, and assign it to the variable.}
\item{Output: it begins with an O, followed by an expression. Its outcome is the expression's value printed on the screen.}
\item{End: it consists of an E, which means the end of the program.}
\item{Jump: it is first represented by a G, followed by a label number. This sentence modifies the control flow, and resumes the execution on the label given by the number.}
\item{Label: it consists of an L and a label number. This defines the label characterized by the specified integer.}
\item{Conditional jump: it is represented by a letter I, then it is followed by an expression, and ends with a label number. Its effect is to jump to the corresponding label number if the value of the expression is not zero.}
\item{Comment: it is represented by a symbol \#, and it has the effect to tell the compiler to ignore everything until the end of line.}
\item{Function definition: it is represented by a F and it is followed by the name of the function.}
\item{Call a function: it is represented by a C, and it is followed by the name of a function, it has the effect to execute the function.}
\item{Function return: it is represented by a word RET, followed by the name of the function from which to return.}
\end{itemize}
\noindent A component is a constant or a variable. An expression consists of a single component, or an operator followed by one or two components (depending on the operation).
\noindent Blank lines are not allowed: the compiler interprets them as the end of the program.
\section{Process}
To complete this project:
\begin{enumerate}
\item{A precise definition of the Pilot language was derived.}
\item{A research on how to structure and split the program in various files was carried out.}
\item{A research on tools and methods to document the files was carried out.}
\item{A reader and classifier of instructions was developed (\texttt{instruction-decoder.c}).}
\item{A data structure to map variable names into memory addresses was designed (\texttt{storage.c}).}
\item{A system to read and evaluate expressions was created (\texttt{expressions.c}).}
\item{A system to print useful data during the development process was created (\texttt{debug.c}).}
\item{A system in charge of writing the resulting ARM program was developed (\texttt{writer.c}).}
\item{More than 20 Pilot programs were created to test the compiler (\texttt{tests/*}).}
\end{enumerate}
\section{Note}
From the beginning, the objective of this project has been to develop a compiler that, given a valid Pilot code, it generates a valid ARM assembly code that represents it. Due to this fact, if the program receives an invalid Pilot code, the compiler's behavior is undefined.
\section{Problems}
During the development process several challenges arose. Some of them are showed next.
\subsubsection{How to store the variables' values?}
It was researched the possibility of storing the variables' values into registers. This would be efficient unless the number of variables was big enough to turn this operation tedious and chaotic. In this case, it would be needed some extra storage.
It was also considered the usage of the stack, which meant assigning an offset from the top of the stack to every variable. This option also has limitations when the number of variables is big. Besides, it is likely that the variables were not the only values stored in the stack.
Finally, an array with the minimum amount of memory for storing the value of every variable was determined as the best option. This method is easy and handy, a position in the array is assigned for each variable. Then, the array is only composed of variables, and the memory is used optimally.
\subsubsection{Once an expression is stored, how to obtain its value and how to store it?}
Since an expression has at most 2 components, it was developed an expression-evaluator that uses the registers \texttt{r2} and \texttt{r3} (this can be configured) to store auxiliary values, and saves the result in the register \texttt{r1}.
It was decided to store the result in the register \texttt{r1} since it is convenient for future calls to the \texttt{printf} function, avoiding an unnecessary \texttt{mov} instruction.
\subsubsection{¿How to support the integer division operator? (\texttt{/})}
One of the drawbacks arose when adding the integer division operator. The ARM set of instructions contains \texttt{sdiv} and \texttt{udiv}: signed and unsigned division, respectively.
When executing these instructions, the following error message was displayed: \texttt{«Error: selected processor does not support `sdiv r1,r2,r3' in ARM mode»}.
Said instructions were the only ones with which the compiler showed such error. After a careful research it was observed that integer division operations have been added since ARMv8, and that all emulators run ARMv6 by default.
This issue was resolved, then, by adding \texttt{-march=armv8-a} as an option in the compilation process, which tells the compiler to run ARMv8 making division operators available.
\section{Next Steps}
The compiler supports all additional features suggested. It also supports comments, and it has a documentation made with \texttt{Doxygen}.
Besides from these extensions, the project can continue by:
\begin{itemize}
\item{Adding more tests}
A compiler is a particularly difficult program to test. Adding more tests is one of the best options to discover flaws and make the compiler more robust. Future changes will then be less likely to produce problems that go unnoticed.
\item{Adding a \texttt{for} to Pilot}
The current code allows the integration of new instructions with relative ease. Because of this, implementing a \texttt{for} is a good idea to continue this project.
A possible syntax can be \texttt{FOR var from to}, where \texttt{var} is the iterator variable, and \texttt{from} and \texttt{to} are components. To end the for, \texttt{END FOR} could be used.
Given that they could be nested, and that functions are already implemented in Pilot, it might be necessary to change the way the compiler stores variables in the generated ARM program to allow this feature.
\item{Determine if the entered program is a valid Pilot program}
It was agreed that this feature is not critical, since it doesn't affect the correctness of the compiler. However, it would make developer's life easier.
To do this, it is require to at least:
\begin{enumerate}
\item{Save the labels and functions defined so far, to verify that there were no two definitions of the same label or function.}
\item{Verify the number of words of each instruction, depending on the type of instruction.}
\item{Avoid the declaration of a function inside another.}
\end{enumerate}
\end{itemize}
\begin{itemize}
\item{Optimize the translation}
When inspecting an \texttt{ARM} code generated by the compiler, it is clear that the efficiency overhead introduced by the compiler is high.
By maintaining the values stored in the registers, and using more of them, it is possible to prevent the compiler from using redundant operations, making the generated program more efficient.
However, conditional jumps, labels and functions can make the exact values of the registers difficult to track.
\end{itemize}
\section{Results}
More than 20 programs in Pilot have been created to test the correctness of the compiler.
\noindent
Among the most prominent Pilot programs created are:
\begin{enumerate}
\item{\texttt{test7.txt}: Calculates the n-th Fibonacci number.}
\item{\texttt{test8.txt}: Reverses the digits of a number.}
\item{\texttt{test16.txt}: Prints all prime numbers in a given range.}
\item{\texttt{test17.txt}: Multiplies two number using the russian peasant algorithm.}
\item{\texttt{test19.txt}: Implements a 'guess my number' game.}
\end{enumerate}
% --------------------------------------------------------------
% You don't have to mess with anything below this line.
% --------------------------------------------------------------
\end{document}