EVM has Harvard architecture and consist of:
- 32 signed 64-bit registers (
r0..r31
), - linear memory addressed from
0x0
, - call stack (used by
call
andret
instructions).
Arithemtic operations use two's complement representation.
Instructions are placed in a consecutive block of memory, addressed from 0x0
.
Execution starts at instruction 0, after every instruction "instruction pointer" (ip
) is incremented. Jump instructions might affect ip
.
Every instruction is 3 bytes long, most instructions have format:
Op-code | Destination | Source |
---|---|---|
8 bits | 8 bits | 8 bits |
or, instruction taking argument imm16 have format
Op-code | Argument |
---|---|
8 bits | 16 bits |
Available instructions:
Instruction | Op-Code | Description | Pseudo-code |
---|---|---|---|
nop |
32 | no operation | |
in r0 |
40 | read hexadecimal value from standard input and store it in the registry r0 |
r0 <- stdin |
out r0 |
41 | write hexadecimal value in registry r0 to standard output |
stdout <- r0 |
store r0, r1 |
48 | store value of r1 in memory addressed by r0 |
[r0] = r1 |
load r0, r1 |
49 | load value from memory addressed by r1 into register r0 |
r0 = [r1] |
ldc r0, imm8 |
50 | load 8-bit immediate value to r0 |
r0 = imm8 |
mov r0, r1 |
64 | copy value from r1 to r0 |
r0 = r1 |
add r0, r1 |
65 | add value of r1 to r0 , saving the result in r0 |
r0 += r1 |
sub r0, r1 |
66 | subtract value of r1 from r0 , saving the result in r0 |
r0 -= r1 |
mul r0, r1 |
67 | multiply value of r1 by r0 , saving the result in r0 |
r0 *= r1 |
div r0, r1 |
68 | divide value of r0 by r1 , saving the result in r0 |
r0 /= r1 |
mod r0, r1 |
69 | calculate a reminder of a division of r0 by r1 , saving the result in r0 |
r0 %= r1 |
jz r0, imm8 |
97 | jump relatively by imm8 only if value of r0 is equal to zero |
if r0 == 0: ip += imm8 |
jl r0, imm8 |
98 | jump relatively by imm8 only if value of r0 is less then zero |
if r0 < 0: ip += imm8 |
jump imm16 |
99 | jump relatively by imm16 |
ip += imm16 |
call imm16 |
100 | store next instruction address on internal call stack and jump by imm16 |
push ip, jump imm16 |
ret |
101 | reads absolute ip from stack and jumps to it, returning to next instruction after corresponding call |
ip = pop ip |
hlt |
126 | terminate program execution |
Memory is linear, addressing starts from 0. Memory has byte-level addressing. Instructions accessing memory, always access 64-bit values. Data in stored and read from memory in little-endian format.
Let's assume we have following bytes in memory:
0xAA 0xBB 0xCC 0xDD 0x11 0x22 0x33 0x44 0x55 0x66 0x77 0x88 0x99 0x00 0xEE 0xFF
The following program
ldc r0, 1
load r1, r0
will load value of 0x5544332211DDCCBB
into register r1
File format consists of three segments: header, code section and initial data section values.
Header consist of 8 byte magic value "ESET-VM1" followed by 3 32-bit values: size of code (in instructions), size of whole data section (in bytes) and size of initialized data size (in bytes) and can be described using C structure like:
struct header
{
char magic[8];
uint32_t code_size;
uint32_t data_size;
uint32_t initial_data_size;
};
All values in header are stored in little endian.
Valid file format has:
data_size
>=initial_data_size
magic == "ESET-VM1"
code_size * 3 + initial_data_size + 20 == size of file
After header, instruction block follow. Exactly code_size
instructions are specified, giving this section 3 * code_size
bytes length.
Each instruction starts with opcode byte (see "Opcode" column in instruction table). Two bytes follow and their interpretation depends on the argument type:
- Register reference (
rN
in table) is stored as single byte where 0 marks first register and 31 last. Any value above 31 is invalid. - Immediate (
imm8
) is stored as single byte. - Long immediate (
imm16
) is stored as two consecutive bytes in little endian.
For example ldc r5, 33
will be encoded as 0x32 0x05 0x21
.
Some instructions might NOT use argument bytes (i.e out
), in such case excessive byte(s) are ignored.
Data section may be initialized with data loaded from file.
If initial_data_size > 0
, then initial_data_size
bytes are read from file and copied to the beginning of the memory.
Any non-initialized data (data_size - initial_data_size
bytes) is then initialized to zero.
Jump/call target offset is always decoded as relative to current instruction pointer. Immediate of jump instruction is interpreted as signed integer of proper size (signed char or signed short) and added to next instruction pointer to get target instruction pointer.
- Given absolute jump at 4th instruction, jumping to 5th instruction, it will be encoded as
0x63 0x00 0x00
. - Given absolute jump at 4th instruction, jumping to 6th instruction, it will be encoded as
0x63 0x01 0x00
. - Given absolute jump at 4th instruction, jumping to 4th instruction (looping in place), it will be encoded as
0x63 0xFF 0xFF
. - Given absolute jump at 4th instruction, jumping to 3th instruction, it will be encoded as
0x63 0xFE 0xFF