-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathJSVM.tex
181 lines (153 loc) · 7.16 KB
/
JSVM.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
\documentclass{article}
\usepackage{fullpage}
\author{Alexandre Passos and Vijayaraghavan Apparsundaram}
\title{JSVM: A implementing a JVM in javascript}
\begin{document}
\maketitle
This paper describes a prototype java virtual machine implemented in
javascript. We used the rhino implementation of javascript, which runs
on top of the JVM and gives access to all the java standard libraries
plus JVM debuggers and profilers. The implementation fits in a single
source file with around 1300 lines of code.
\section{Overall approach}
\label{sec:overall-approach}
The main goal guiding this implementation was to avoid doing as much
work as possible, at the penalty of performance and strict observation
of the specification. We reasoned that it was acceptable to sacrifice
performance because everything is already running under the many
overheads of javascript, and by making the core simple this
interpreter can be easily extended to correctly run any desired
program.
The interpreter is structured as follows. A big array of javascript
functions (\texttt{INST}) is maintained. This array is indexed by
bytes and in each byte there is a javascript function responsible for
executing that bytecode. This function has access to the local scope,
the current class, the code block, the program counter, and
the stack, and it is responsible to modify the stack and local
variables (and global state, such as static fields) as it sees fit,
returning in the end either another setting for the program counter or
one of a few special tokens. The code block is kept internally as-is
(essentially a list of bytes) and each instruction is responsible for
accessing any extra operands it needs from the code block. While this
decision enhanced practicality it makes it annoying to implement the
``wide'' bytecode, which is left as future work in our
implementation. These bytecode functions are allowed to call the
interpreter recursively.
The code is structured essentially like a C program, and objects are
only used to represent JVM classes and objects, rather than organizing
the interpreter code itself.
Exceptions are handled by the throwing bytecode returning a token
indicating that an exception has been thrown. Then, the interpreter
walks through the handlers, executing them as appropriate, and if no
handler is found it returns ``throw'' so interpreters higher up the
stack can handle the exception. try..finally blocks are handled by the
java compiler, and the virtual machine only has to implement the
``jsr'' and ``ret'' instructions.
In our implementation both classes and objects have vtables assigned
to them, and these vtables are responsible for virtual and interface
method resolution.
The java standard library was mostly left unimplemented, and we have
javascript code for class stubs for only four basic classes: Object,
Throwable, Serializable, and Exception. These classes are not fully
implemented. As we're running on top of the JVM, when possible we
redirect access to fields and methods in unknown classes to
appropriate fields and methods in the base JVM. This means programs
can use System libraries, but, for example, cannot place objects
inside collections or pass nonprimitive objects to standard library
methods. The indirection is handled by the javascript function eval
which is implemented using java reflection libraries.
Even though we were not concerned with performance ease of
implementation dictated that most bytecodes operate in constant
time. Some nonconstant operations are type casting (and, hence,
exception handling), switch statements (which are repeatedly
parsed every time they are executed), and multidimensional array
initialization.
\subsection{Individual contributions}
\label{sec:indiv-contr}
Alexandre wrote the classfile parser and made the initial
architectural decisions about where to base the implementation, how to
organize the interpreter, which data structures to use,
etc. Vijayaraghavan implemented many of the instructions, including
those responsible for type checking, arrays, and switch. He also did
most of the testing and support infrastructure.
\section{Key challenges}
\label{sec:key-challenges}
The hardest challenge in implementing a JVM in javascript is
performance, which we admitedly decided to ignore. This, however,
freed us to use higher-order functions to implement sets of highly
similar bytecodes only once. Not worrying about compactness of
in-memory datastructures also freed us to implement everything as a
bunch of hash tables.
Another challenge is that the specification presupposes that the
in-memory data structures used to store code and data are tightly
packed, and hence requires special code for the case where it
isn't. Instructions such as dup\_x2, for example, which have different
behavior for doubles versus floats, are implemented incorrectly. This
issue also complicated the classfile parsing, which on the other hand
was simpler than it would be otherwise due to our reliance on the java
standard libraries.
\section{Adaptations to javascript (and other purposeful incompatibilities)}
\label{sec:adapt-javascr-and}
There are many incompatibilities between the JVM spec and our
implementations. We stuck as much as possible to the javascript data
model, so internally our JVM does not do as much as it could to
differentiate between data types. We also did not implement classfile
validation and anything else which somehow verified whether the
compiler was performing its job correctly.
TODO: add more
\section{Example sections}
\label{sec:example-sections}
Our main interpreter loop is as follows:
\begin{verbatim}
function runMethod(method, cls, locals) {
var stack = []
if (method.jscode) {
// call a js version of the method, if exists
return method.jscode(method, cls, locals)
} else {
var code = method.code
var ip = 0
while (1) {
var res = INST[code[ip]](code, ip, stack, cls, locals)
if (res == "return") {
return "void"
} else if (res == "ireturn") {
return stack.pop()
} else if (res == "throw") {
var ex = stack.pop()
var handlers = method.handtable
var found = false
for (var i = 0; i < handlers.length; ++i) {
if (!((ip > handlers[i].start)&&(ip < handlers[i].end))) continue
if (! matchType(ex[1].cls, handlers[i].type)) continue
ip = handlers[i].handler
stack.push(ex)
print(" --- debug -- going to handler at instr "+ip)
found = true
break
}
if (found) continue
// all else failing we throw it one level up
locals[0] = ex
return "throw"
} else {
ip = res
}
}
}
}
\end{verbatim}
Using higher-order functions, we implemented bytecodes sometimes
generally, as follows':
\begin{verbatim}
function iconst(i) {
return function(c,p, s, cls, l) {
s.push(["int", i])
return p+1
}
}
\end{verbatim}
\section{Numbers and limitations}
\label{sec:numbers-limitations}
TODO: put some numbers here
\end{document}