In this version we fixed a bug (commit) and provide instructions to re-produce the experiment results in our USENIX paper.
Note that 64-bit program binaries require the data sections to be 16B aligned and SSE instructions will do a align checking on this. If the memory access through SSE instructions are not 16B aligned, the memory access through SSE instructions will crash immediately.
We used to have some routines to add the ".align 16" macro to force this 16B alignment. However, they are errorly excluded during our internal code merge of 32-bit and 64-bit codebases.
Please find the instructions here.