Add decoder (and disassembler) cache #7

sulir · 2012-04-19T15:25:39Z

The cache should work like this: Every time an instruction is decoded, it is placed into the cache of some capacity. Next time, if the supplied bytes are exactly the same, the decoded instruction is returned directly from the cache, without performing the decoding process.

There is one problem - the size of the instruction is unknown before decoding. So the cache should be a combination of a tree and hash-maps.

Something similar could be also done for the disassembler.

If the insertion and selection were fast enough, this would significantly improve the emulator speed.

vbmacher · 2021-06-15T05:57:32Z

This reminds me a prefix-tree (Trie) used in text searching. I imagine decoder should be working like this:

The trie node has the following:
- function explore(input)
- property annotation = decoded instruction
on an input, the decoding algorithm would match longest prefix in the trie. If it's matched and the node contains an annotation, we have the instruction. If it does not have any annotation, we're expecting more input (cannot decide).
if the input is longer than the longest prefix in the trie (we're at the end), we use function explore called with the unmatched input. This function will expand the trie even more from the node where we call it. If it's not possible, then we throw runtime exception (unknown instruction).

But I think we don't need to tie ourselfs with cache approach (I mean dynamically expand the tree). Instead we can build the whole tree in advance from the spec file and use it for searching during decoding. So decoding would mean just to search for the longest prefix of the input and emit instructions.

vbmacher · 2021-06-15T06:04:36Z

Actually, maybe even better is to use DFA (determinite finite automaton) used in parsing, or instead of generating Java code, generate parser grammar for some parser generator (JavaCC or ANTLR) which can do the job..

vbmacher · 2021-07-07T12:23:07Z

Hm, all the processes I described above seem too complicated - in implementation and complexity. I think the caching advantage should mean much faster decoding which means we need to know the input size in advance. I'm returning back to the idea originally suggested by @sulir .

Can we assume one instruction has size limit? I mean - can we always guarantee that the byte sequence of an instruction is final? Let's assume yes.

Then, we can have several caches based on expected instruction size: Map[Integer, Map[Array[Byte], DecodedInstruction]].
By the assumption above, we know all possible instruction sizes. Number of instruction sizes is also final (simply because the specification file doesn't allow infinite number of rules). With this assumptions, the cache can be prepared in advance. If we want to save memory, we can also limit the cache to cache only a certain instruction sizes.

The algorithm can work as follows:

Read input up to the longest instruction size and see if it's in cache
If not, cut the suffix until the next-longest instruction size and see if it's in cache
.. repeat step 2 until we find the instruction
If we didn't find the instruction, go to the decoding process.

When considering algorithmic complexity, the cache lookup is O(k) where k is the number of instruction sizes (when assuming that lookup of Array[Byte] in a hashmap is O(1)).

Decoding itself has complexity O(n * h * m) where n is number of rules, h is the number of masks in a rule and n is number of patterns in a mask.

So the final complexity in the worst-case using the cache solution can be up to O(k) + O(n * h * m).

vbmacher · 2021-07-07T12:37:10Z

I think we're getting into a problem which the JVM HotSpot needs to deal with. The JVM Hotspot is interpreting instructions until a hit rate of one instruction (or a block) hits a threshold. When this threshold is hit, the block is translated into native implementation and it is cached. From now on, every time this address is hit, the cached value is used.

If we get inspired from this knowledge, then we can improve our decoding algorithm based on the memory location of an instruction. And we don't need to know instruction sizes at all. The only thing we need to take care of is to monitor if any memory location changed which hits instructions in the cache.

The cache can be in form: Map[Integer, DecodedInstruction]. The algorithm can work as follows:

We know the memory location from which we want to read the input. So we check if the memory location is present in the cache.
If yes, return the instruction.
If not, decode the instruction and insert to the cache.

Requirement is that we need to implement a memory listener and invalidate cache locations which changed. In the implementation of the map we can use a LRU eviction algorithm (e.g. https://github.com/ben-manes/caffeine).

vbmacher · 2021-07-07T12:40:04Z

The algorithmic complexity of the above is not worse than the current decoding algorithm in the worst-case. However, in the best case it is O(1).

ghost assigned sulir Apr 19, 2012

sulir added low-priority and removed normal-priority labels Dec 5, 2016

vbmacher mentioned this issue Jun 15, 2021

Generate return-style code instead of default branches #10

Closed

vbmacher added optimization Optimization or refactoring and removed low-priority labels Aug 6, 2021

vbmacher added this to the 1.6 milestone Apr 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add decoder (and disassembler) cache #7

Add decoder (and disassembler) cache #7

sulir commented Apr 19, 2012

vbmacher commented Jun 15, 2021

vbmacher commented Jun 15, 2021

vbmacher commented Jul 7, 2021

vbmacher commented Jul 7, 2021 •

edited

Loading

vbmacher commented Jul 7, 2021

Add decoder (and disassembler) cache #7

Add decoder (and disassembler) cache #7

Comments

sulir commented Apr 19, 2012

vbmacher commented Jun 15, 2021

vbmacher commented Jun 15, 2021

vbmacher commented Jul 7, 2021

vbmacher commented Jul 7, 2021 • edited Loading

vbmacher commented Jul 7, 2021

vbmacher commented Jul 7, 2021 •

edited

Loading