Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve generated assembly for pflang filters #27

Open
wingo opened this issue Aug 8, 2014 · 2 comments
Open

Improve generated assembly for pflang filters #27

wingo opened this issue Aug 8, 2014 · 2 comments

Comments

@wingo
Copy link
Contributor

wingo commented Aug 8, 2014

The generated assembly is not ideal. For example, for a loop reading packets from a pcap-format savefile and matching them against "tcp port 5555", we have the function:

return function(P,length)
   if not (24 <= length) then return false end
   local v1 = ffi.cast("uint16_t*", P+12)[0]
   if not (v1 == 8) then goto L2 end
   do
      local v2 = P[23]
      if not (v2 == 6) then return false end
      do
         local v3 = ffi.cast("uint16_t*", P+20)[0]
         local v4 = bit.band(v3,65311)
         if not (v4 == 0) then return false end
         local v5 = P[14]
         local v6 = bit.band(v5,15)
         local v7 = bit.lshift(v6,2)
         local v8 = v7+16
         if not (v8 <= length) then return false end
         local v9 = v7+14
         local v10 = ffi.cast("uint16_t*", P+v9)[0]
         if v10 == 45845 then return true end
         do
            local v11 = v7+18
            if not (v11 <= length) then return false end
            local v12 = ffi.cast("uint16_t*", P+v8)[0]
            do return v12 == 45845 end
         end
      end
   end
::L2::
   do
      if not (56 <= length) then return false end
      if not (v1 == 56710) then return false end
      do
         local v13 = P[20]
         if v13 == 6 then goto L5 end
         do
            if not (v13 == 44) then return false end
            do
               local v14 = P[54]
               if not (v14 == 6) then return false end
            end
         end
      end
::L5::
      do
         if not (v1 == 56710) then return false end
         local v15 = P[20]
         if v15 == 6 then goto L9 end
         do
            if not (v15 == 44) then return false end
            do
               local v16 = P[54]
               if v16 == 6 then goto L9 end
               do
                  if not (v16 == 6) then return false end
               end
            end
         end
::L9::
         do
            local v17 = ffi.cast("uint16_t*", P+54)[0]
            if v17 == 45845 then return true end
            do
               if not (58 <= length) then return false end
               local v18 = ffi.cast("uint16_t*", P+56)[0]
               do return v18 == 45845 end
            end
         end
      end
   end
end

Here is the IR trace of the loop:

0099 ------ LOOP ------------
0100 >  p32 UREFO  bench.lua:110  #3  
0101 >  tab ULOAD  0100
0102    int FLOAD  0101  tab.hmask
0103 >  int EQ     0102  +31 
0104    p32 FLOAD  0101  tab.node
0105 >  p32 HREFK  0104  "cast" @6
0106 >  fun HLOAD  0105
0107 >  fun EQ     0106  ffi.cast
0108 }  cdt CNEWI  +186  0095
0109    p64 ADD    0095  +16 
0110 }  cdt CNEWI  +184  0109
0111    p64 ADD    0095  +8  
0112    u32 XLOAD  0111  
0113    num CONV   0112  num.u32
0114 >  num GE     0113  +24 
0115    p64 ADD    0095  +28 
0116    u16 XLOAD  0115  
0117 >  int EQ     0116  +8  
0118    p64 ADD    0095  +39 
0119    u8  XLOAD  0118  
0120 >  int EQ     0119  +6  
0121    p64 ADD    0095  +36 
0122    u16 XLOAD  0121  
0123    int BAND   0122  +65311
0124 >  int EQ     0123  +0  
0125    p64 ADD    0095  +30 
0126    u8  XLOAD  0125  
0127    int BAND   0126  +15 
0128    int BSHL   0126  +2  
0129    int BAND   0128  +60 
0130 >  int ADDOV  0129  +16 
0131    num CONV   0130  num.int
0132 >  num LE     0131  0113
0133 >  int ADDOV  0129  +14 
0134    i64 CONV   0133  i64.int sext
0135    p64 ADD    0134  0109
0136    u16 XLOAD  0135  
0137 >  int NE     0136  +45845
0138 >  int ADDOV  0129  +18 
0139    num CONV   0138  num.int
0140 >  num LE     0139  0113
0141    i64 CONV   0130  i64.int sext
0142    p64 ADD    0141  0109
0143    u16 XLOAD  0142  
0144 >  int EQ     0143  +45845
0145  + num ADD    0092  +1  
0146  + num ADD    0094  +1  
0147  + p64 ADD    0112  0109
0148 }+ cdt CNEWI  +184  0147
0149 >  p64 ULT    0147  +140160267072216
0150 }  cdt PHI    0096  0148
0151    p64 PHI    0095  0147
0152    num PHI    0092  0145
0153    num PHI    0094  0146

Anything having type "num" in the loop is suboptimal -- all of the types can be proven to be integers. All the mucking about with tables and ffis and checks and such are also unnecessary. As such we see that the body of the loop has lots of memory accesses and floating point comparisons:

->LOOP:
0bca231f  mov ebx, [0x41c452e0]
0bca2326  cmp dword [rbx+0x4], -0x0c
0bca232a  jnz 0x0bca0054        ->17
0bca2330  mov ebx, [rbx]
0bca2332  cmp dword [rbx+0x1c], +0x1f
0bca2336  jnz 0x0bca0054        ->17
0bca233c  mov ebx, [rbx+0x14]
0bca233f  mov rdi, 0xfffffffb41c74b80
0bca2349  cmp rdi, [rbx+0x98]
0bca2350  jnz 0x0bca0054        ->17
0bca2356  cmp dword [rbx+0x94], -0x09
0bca235d  jnz 0x0bca0054        ->17
0bca2363  cmp dword [rbx+0x90], 0x41c84058
0bca236d  jnz 0x0bca0054        ->17
0bca2373  mov r15, rbp
0bca2376  add rbp, +0x10
0bca237a  mov ebx, [r15+0x8]
0bca237e  xorps xmm5, xmm5
0bca2381  cvtsi2sd xmm5, rbx
0bca2386  ucomisd xmm5, xmm1
0bca238a  jb 0x0bca0058 ->18
0bca2390  movzx r14d, word [r15+0x1c]
0bca2395  cmp r14d, +0x08
0bca2399  jnz 0x0bca005c        ->19
0bca239f  movzx r13d, byte [r15+0x27]
0bca23a4  cmp r13d, +0x06
0bca23a8  jnz 0x0bca0060        ->20
0bca23ae  movzx r12d, word [r15+0x24]
0bca23b3  mov edi, r12d
0bca23b6  and edi, 0xff1f
0bca23bc  jnz 0x0bca0064        ->21
0bca23c2  movzx esi, byte [r15+0x1e]
0bca23c7  mov edx, esi
0bca23c9  and edx, +0x0f
0bca23cc  mov ecx, esi
0bca23ce  shl ecx, 0x02
0bca23d1  and ecx, +0x3c
0bca23d4  mov r11d, ecx
0bca23d7  add r11d, +0x10
0bca23db  jo 0x0bca0068 ->22
0bca23e1  xorps xmm4, xmm4
0bca23e4  cvtsi2sd xmm4, r11d
0bca23e9  ucomisd xmm5, xmm4
0bca23ed  jb 0x0bca006c ->23
0bca23f3  mov r10d, ecx
0bca23f6  add r10d, +0x0e
0bca23fa  jo 0x0bca0070 ->24
0bca2400  movsxd rax, r10d
0bca2403  movzx r9d, word [rax+rbp]
0bca2408  cmp r9d, 0xb315
0bca240f  jz 0x0bca0074 ->25
0bca2415  mov r8d, ecx
0bca2418  add r8d, +0x12
0bca241c  jo 0x0bca0078 ->26
0bca2422  xorps xmm4, xmm4
0bca2425  cvtsi2sd xmm4, r8d
0bca242a  ucomisd xmm5, xmm4
0bca242e  jb 0x0bca007c ->27
0bca2434  movsxd rax, r11d
0bca2437  movzx eax, word [rax+rbp]
0bca243b  mov [rsp+0x8], eax
0bca243f  mov rax, 0x00007f799aee32d8
0bca2449  cmp dword [rsp+0x8], 0xb315
0bca2451  jnz 0x0bca0080        ->28
0bca2457  addsd xmm6, xmm0
0bca245b  addsd xmm7, xmm0
0bca245f  add rbp, rbx
0bca2462  cmp rbp, rax
0bca2465  jb 0x0bca231f ->LOOP
0bca246b  jmp 0x0bca0088        ->30
---- TRACE 108 stop -> loop

Sub-optimal. To fix this, we can hack on LuaJIT, or look to emit assembly ourselves. I would try the former before the latter, as it's not far off from what needs to happen.

@wingo
Copy link
Contributor Author

wingo commented Dec 17, 2014

With the new backend from #95, the code looks like:

return function(P,length)
   if length < 34 then return false end
   local var1 = cast("uint16_t*", P+12)[0]
   if var1 == 8 then
      if P[23] ~= 6 then return false end
      if band(cast("uint16_t*", P+20)[0],65311) ~= 0 then return false end
      local var7 = lshift(band(P[14],15),2)
      local var8 = (var7 + 16)
      if var8 > length then return false end
      if cast("uint16_t*", P+(var7 + 14))[0] == 45845 then return true end
      if (var7 + 18) > length then return false end
      return cast("uint16_t*", P+var8)[0] == 45845
   else
      if length < 56 then return false end
      if var1 ~= 56710 then return false end
      local var24 = P[20]
      if var24 == 6 then goto L22 end
      do
         if var24 ~= 44 then return false end
         if P[54] == 6 then goto L22 end
         return false
      end
::L22::
      if cast("uint16_t*", P+54)[0] == 45845 then return true end
      if length < 58 then return false end
      return cast("uint16_t*", P+56)[0] == 45845
   end
end

and there are three traces:

---- TRACE 49 start pflua-match:23
0007  TGETV    8   0   7
0008  GSET     8   0      ; "packet"
0009  ADDVN    2   2   0  ; 1
0010  MOV      8   1
0011  GGET     9   0      ; "packet"
0012  TGETS    9   9   0  ; "packet"
0013  GGET    10   0      ; "packet"
0014  TGETS   10  10   1  ; "len"
0015  CALL     8   2   3
0000  . FUNCF    8          ; "tcp port 5555":1
0001  . KSHORT   2  34
0002  . ISGE     1   2
0003  . JMP      2 => 0006
0006  . GGET     2   0      ; "cast"
0007  . KSTR     3   1      ; "uint16_t*"
0008  . ADDVN    4   0   0  ; 12
0000  . . . FUNCC               ; ffi.meta.__add
0009  . CALL     2   2   3
0000  . . FUNCC               ; ffi.cast
0010  . TGETB    2   2   0
0000  . . . FUNCC               ; ffi.meta.__index
0011  . ISNEN    2   1      ; 8
0012  . JMP      3 => 0069
0013  . TGETB    3   0  23
0000  . . . FUNCC               ; ffi.meta.__index
0014  . ISEQN    3   2      ; 6
0015  . JMP      3 => 0018
0018  . GGET     3   2      ; "band"
0019  . GGET     4   0      ; "cast"
0020  . KSTR     5   1      ; "uint16_t*"
0021  . ADDVN    6   0   3  ; 20
0000  . . . FUNCC               ; ffi.meta.__add
0022  . CALL     4   2   3
0000  . . FUNCC               ; ffi.cast
0023  . TGETB    4   4   0
0000  . . . FUNCC               ; ffi.meta.__index
0024  . KNUM     5   4      ; 65311
0025  . CALL     3   2   3
0000  . . FUNCC               ; bit.band
0026  . ISEQN    3   5      ; 0
0027  . JMP      3 => 0030
0030  . GGET     3   3      ; "lshift"
0031  . GGET     4   2      ; "band"
0032  . TGETB    5   0  14
0000  . . . FUNCC               ; ffi.meta.__index
0033  . KSHORT   6  15
0034  . CALL     4   2   3
0000  . . FUNCC               ; bit.band
0035  . KSHORT   5   2
0036  . CALL     3   2   3
0000  . . FUNCC               ; bit.lshift
0037  . ADDVN    4   3   6  ; 16
0038  . ISGE     1   4
0039  . JMP      5 => 0042
0042  . GGET     5   0      ; "cast"
0043  . KSTR     6   1      ; "uint16_t*"
0044  . ADDVN    7   3   7  ; 14
0045  . ADDVV    7   0   7
0000  . . . FUNCC               ; ffi.meta.__add
0046  . CALL     5   2   3
0000  . . FUNCC               ; ffi.cast
0047  . TGETB    5   5   0
0000  . . . FUNCC               ; ffi.meta.__index
0048  . ISNEN    5   8      ; 45845
0049  . JMP      5 => 0052
0050  . KPRI     5   2
0051  . RET1     5   2
0016  ISF          8
0017  JMP      9 => 0019
0018  ADDVN    3   3   0  ; 1
0019  FORL     4 => 0007
---- TRACE 49 IR
....              SNAP   #0   [ ---- ]
0001 rax   >  int SLOAD  #6    CRI
0002       >  int LE     0001  +2147483646
0003 rbp      int SLOAD  #5    CI
0004 rcx   >  tab SLOAD  #1    T
0005          int FLOAD  0004  tab.asize
0006       >  p32 ABC    0005  0001
0007 rdx      p32 FLOAD  0004  tab.array
0008          p32 AREF   0007  0003
0009 rbx   >  tab ALOAD  0008
0010 rcx      fun SLOAD  #0    R
0011 rdi      tab FLOAD  0010  func.env
0012          int FLOAD  0011  tab.hmask
0013       >  int EQ     0012  +63 
0014 rcx      p32 FLOAD  0011  tab.node
0015 rcx   >  p32 HREFK  0014  "packet" @32
0016          tab FLOAD  0011  tab.meta
0017       >  tab EQ     0016  [NULL]
0018          tab HSTORE 0015  0009
0019          nil TBAR   0011
....              SNAP   #1   [ ---- ---- ---- ---- ---- 0003 0001 ---- 0003 0009 ]
0020 xmm6  >  num SLOAD  #3    T
0021 xmm6   + num ADD    0020  +1  
0022       >  fun SLOAD  #2    T
0023          int FLOAD  0009  tab.hmask
0024       >  int EQ     0023  +1  
0025 rdi      p32 FLOAD  0009  tab.node
0026       >  p32 HREFK  0025  "packet" @1
0027 r8    >  cdt HLOAD  0026
0028       >  p32 HREFK  0025  "len" @0
0029 xmm2  >  num HLOAD  0028
0030       >  fun EQ     0022  "tcp port 5555":1
....              SNAP   #2   [ ---- ---- ---- 0021 ---- 0003 0001 ---- 0003 "tcp port 5555":1|---- ---- ]
0031       >  num UGE    0029  +34 
....              SNAP   #3   [ ---- ---- ---- 0021 ---- 0003 0001 ---- 0003 "tcp port 5555":1|0027 0029 ]
0032 rbx      tab FLOAD  "tcp port 5555":1  func.env
0033          int FLOAD  0032  tab.hmask
0034       >  int EQ     0033  +15 
0035 rdi      p32 FLOAD  0032  tab.node
0036       >  p32 HREFK  0035  "cast" @6
0037       >  fun HLOAD  0036
0038 rbx      u16 FLOAD  0027  cdata.ctypeid
0039       >  int EQ     0038  +181
0040 rbx      p64 FLOAD  0027  cdata.ptr
0041          p64 ADD    0040  +12 
0043       >  fun EQ     0037  ffi.cast
0045 r9       u16 XLOAD  0041  
....              SNAP   #4   [ ---- ---- ---- 0021 ---- 0003 0001 ---- 0003 "tcp port 5555":1|0027 0029 0045 ]
0046       >  int EQ     0045  +8  
0047          p64 ADD    0040  +23 
0048          u8  XLOAD  0047  
....              SNAP   #5   [ ---- ---- ---- 0021 ---- 0003 0001 ---- 0003 "tcp port 5555":1|---- ---- ---- ]
0049       >  int EQ     0048  +6  
....              SNAP   #6   [ ---- ---- ---- 0021 ---- 0003 0001 ---- 0003 "tcp port 5555":1|0027 0029 0045 ]
0050       >  p32 HREFK  0035  "band" @15
0051       >  fun HLOAD  0050
0052          p64 ADD    0040  +20 
0055 r10      u16 XLOAD  0052  
0056       >  fun EQ     0051  bit.band
0057          int BAND   0055  +65311
....              SNAP   #7   [ ---- ---- ---- 0021 ---- 0003 0001 ---- 0003 "tcp port 5555":1|---- ---- ---- ]
0058       >  int EQ     0057  +0  
....              SNAP   #8   [ ---- ---- ---- 0021 ---- 0003 0001 ---- 0003 "tcp port 5555":1|0027 0029 0045 ]
0059       >  p32 HREFK  0035  "lshift" @13
0060       >  fun HLOAD  0059
0061          p64 ADD    0040  +14 
0062 r10      u8  XLOAD  0061  
0064       >  fun EQ     0060  bit.lshift
0065 r10      int BSHL   0062  +2  
0066 r10      int BAND   0065  +60 
0067 r11   >  int ADDOV  0066  +16 
0068 xmm3     num CONV   0067  num.int
....              SNAP   #9   [ ---- ---- ---- 0021 ---- 0003 0001 ---- 0003 "tcp port 5555":1|---- ---- ---- ---- ---- ]
0069       >  num ULE    0068  0029
....              SNAP   #10  [ ---- ---- ---- 0021 ---- 0003 0001 ---- 0003 "tcp port 5555":1|0027 0029 0045 0066 0067 ]
0070 rdi   >  int ADDOV  0066  +14 
0071 rdi      i64 CONV   0070  i64.int sext
0072          p64 ADD    0071  0040
0075 rbx      u16 XLOAD  0072  
....              SNAP   #11  [ ---- ---- ---- 0021 ---- 0003 0001 ---- 0003 "tcp port 5555":1|0027 0029 0045 0066 0067 ]
0076       >  int EQ     0075  +45845
....              SNAP   #12  [ ---- ---- ---- 0021 ---- 0003 0001 ---- 0003 "tcp port 5555":1|---- ---- ---- ---- ---- ]
0077 xmm7  >  num SLOAD  #4    T
0078 xmm7   + num ADD    0077  +1  
0079 rbp    + int ADD    0003  +1  
....              SNAP   #13  [ ---- ---- ---- 0021 0078 ]
0080       >  int LE     0079  0001
....              SNAP   #14  [ ---- ---- ---- 0021 0078 0079 0001 ---- 0079 ]
0081 ------------ LOOP ------------
0082          p32 AREF   0007  0079
0083 r15   >  tab ALOAD  0082
0084          tab HSTORE 0015  0083
....              SNAP   #15  [ ---- ---- ---- 0021 0078 0079 0001 ---- 0079 0083 ]
0085 xmm6   + num ADD    0021  +1  
0086          int FLOAD  0083  tab.hmask
0087       >  int EQ     0086  +1  
0088 r14      p32 FLOAD  0083  tab.node
0089       >  p32 HREFK  0088  "packet" @1
0090 rbx   >  cdt HLOAD  0089
0091       >  p32 HREFK  0088  "len" @0
0092 xmm5  >  num HLOAD  0091
....              SNAP   #16  [ ---- ---- ---- 0085 0078 0079 0001 ---- 0079 "tcp port 5555":1|---- ---- ]
0093       >  num UGE    0092  +34 
....              SNAP   #17  [ ---- ---- ---- 0085 0078 0079 0001 ---- 0079 "tcp port 5555":1|0090 0092 ]
0094 r15      u16 FLOAD  0090  cdata.ctypeid
0095       >  int EQ     0094  +181
0096 r12      p64 FLOAD  0090  cdata.ptr
0097          p64 ADD    0096  +12 
0098 r15      u16 XLOAD  0097  
....              SNAP   #18  [ ---- ---- ---- 0085 0078 0079 0001 ---- 0079 "tcp port 5555":1|0090 0092 0098 ]
0099       >  int EQ     0098  +8  
0100          p64 ADD    0096  +23 
0101          u8  XLOAD  0100  
....              SNAP   #19  [ ---- ---- ---- 0085 0078 0079 0001 ---- 0079 "tcp port 5555":1|---- ---- ---- ]
0102       >  int EQ     0101  +6  
0103          p64 ADD    0096  +20 
0104 r14      u16 XLOAD  0103  
0105          int BAND   0104  +65311
....              SNAP   #20  [ ---- ---- ---- 0085 0078 0079 0001 ---- 0079 "tcp port 5555":1|---- ---- ---- ]
0106       >  int EQ     0105  +0  
....              SNAP   #21  [ ---- ---- ---- 0085 0078 0079 0001 ---- 0079 "tcp port 5555":1|0090 0092 0098 ]
0107          p64 ADD    0096  +14 
0108 r14      u8  XLOAD  0107  
0109 r14      int BSHL   0108  +2  
0110 r14      int BAND   0109  +60 
0111 r13   >  int ADDOV  0110  +16 
0112 xmm4     num CONV   0111  num.int
....              SNAP   #22  [ ---- ---- ---- 0085 0078 0079 0001 ---- 0079 "tcp port 5555":1|---- ---- ---- ---- ---- ]
0113       >  num ULE    0112  0092
....              SNAP   #23  [ ---- ---- ---- 0085 0078 0079 0001 ---- 0079 "tcp port 5555":1|0090 0092 0098 0110 0111 ]
0114 rdi   >  int ADDOV  0110  +14 
0115 rdi      i64 CONV   0114  i64.int sext
0116          p64 ADD    0115  0096
0117 r12      u16 XLOAD  0116  
....              SNAP   #24  [ ---- ---- ---- 0085 0078 0079 0001 ---- 0079 "tcp port 5555":1|0090 0092 0098 0110 0111 ]
0118       >  int EQ     0117  +45845
0119 xmm7   + num ADD    0078  +1  
0120 rbp    + int ADD    0079  +1  
....              SNAP   #25  [ ---- ---- ---- 0085 0119 ]
0121       >  int LE     0120  0001
0122 rbp      int PHI    0079  0120
0123 xmm6     num PHI    0021  0085
0124 xmm7     num PHI    0078  0119
0125 xmm4     nil RENAME 0021  #15 
---- TRACE 49 mcode 999
0bcaa8fd  mov dword [0x416854a0], 0x31
0bcaa908  mov esi, edx
0bcaa90a  movsd xmm1, [0x4159a2c0]
0bcaa913  movsd xmm0, [0x4159a288]
0bcaa91c  movsd xmm7, [rsi+0x28]
0bcaa921  cvttsd2si eax, xmm7
0bcaa925  xorps xmm6, xmm6
0bcaa928  cvtsi2sd xmm6, eax
0bcaa92c  ucomisd xmm7, xmm6
0bcaa930  jnz 0x0bca0010    ->0
0bcaa936  jpe 0x0bca0010    ->0
0bcaa93c  cmp eax, 0x7ffffffe
0bcaa942  jg 0x0bca0010 ->0
0bcaa948  cvtsd2si ebp, [rsi+0x20]
0bcaa94d  cmp dword [rsi+0x4], -0x0c
0bcaa951  jnz 0x0bca0010    ->0
0bcaa957  mov ecx, [rsi]
0bcaa959  cmp eax, [rcx+0x18]
0bcaa95c  jnb 0x0bca0010    ->0
0bcaa962  mov edx, [rcx+0x8]
0bcaa965  cmp dword [rdx+rbp*8+0x4], -0x0c
0bcaa96a  jnz 0x0bca0010    ->0
0bcaa970  mov ebx, [rdx+rbp*8]
0bcaa973  mov ecx, [rsi-0x8]
0bcaa976  mov edi, [rcx+0x8]
0bcaa979  cmp dword [rdi+0x1c], +0x3f
0bcaa97d  jnz 0x0bca0010    ->0
0bcaa983  mov ecx, [rdi+0x14]
0bcaa986  mov r15, 0xfffffffb41691100
0bcaa990  cmp r15, [rcx+0x308]
0bcaa997  jnz 0x0bca0010    ->0
0bcaa99d  add ecx, 0x300
0bcaa9a3  cmp dword [rdi+0x10], +0x00
0bcaa9a7  jnz 0x0bca0010    ->0
0bcaa9ad  mov dword [rcx+0x4], 0xfffffff4
0bcaa9b4  mov [rcx], ebx
0bcaa9b6  test byte [rdi+0x4], 0x4
0bcaa9ba  jz 0x0bcaa9d3
0bcaa9bc  and byte [rdi+0x4], 0xfb
0bcaa9c0  mov r15d, [0x416853f4]
0bcaa9c8  mov [0x416853f4], edi
0bcaa9cf  mov [rdi+0xc], r15d
0bcaa9d3  cmp dword [rsi+0x14], 0xfffeffff
0bcaa9da  jnb 0x0bca0014    ->1
0bcaa9e0  movsd xmm6, [rsi+0x10]
0bcaa9e5  addsd xmm6, xmm0
0bcaa9e9  cmp dword [rsi+0xc], -0x09
0bcaa9ed  jnz 0x0bca0014    ->1
0bcaa9f3  cmp dword [rbx+0x1c], +0x01
0bcaa9f7  jnz 0x0bca0014    ->1
0bcaa9fd  mov edi, [rbx+0x14]
0bcaaa00  mov r15, 0xfffffffb41691100
0bcaaa0a  cmp r15, [rdi+0x20]
0bcaaa0e  jnz 0x0bca0014    ->1
0bcaaa14  cmp dword [rdi+0x1c], -0x0b
0bcaaa18  jnz 0x0bca0014    ->1
0bcaaa1e  mov r8d, [rdi+0x18]
0bcaaa22  mov r15, 0xfffffffb4168a640
0bcaaa2c  cmp r15, [rdi+0x8]
0bcaaa30  jnz 0x0bca0014    ->1
0bcaaa36  cmp dword [rdi+0x4], 0xfffeffff
0bcaaa3d  jnb 0x0bca0014    ->1
0bcaaa43  movsd xmm2, [rdi]
0bcaaa47  cmp dword [rsi+0x8], 0x4a6f6a60
0bcaaa4e  jnz 0x0bca0014    ->1
0bcaaa54  ucomisd xmm1, xmm2
0bcaaa58  ja 0x0bca0018 ->2
0bcaaa5e  mov ebx, [0x4a6f6a68]
0bcaaa65  cmp dword [rbx+0x1c], +0x0f
0bcaaa69  jnz 0x0bca001c    ->3
0bcaaa6f  mov edi, [rbx+0x14]
0bcaaa72  mov rbx, 0xfffffffb41598a80
0bcaaa7c  cmp rbx, [rdi+0x98]
0bcaaa83  jnz 0x0bca001c    ->3
0bcaaa89  cmp dword [rdi+0x94], -0x09
0bcaaa90  jnz 0x0bca001c    ->3
0bcaaa96  movzx ebx, word [r8+0x6]
0bcaaa9b  cmp ebx, 0xb5
0bcaaaa1  jnz 0x0bca001c    ->3
0bcaaaa7  mov rbx, [r8+0x8]
0bcaaaab  cmp dword [rdi+0x90], 0x41598a58
0bcaaab5  jnz 0x0bca001c    ->3
0bcaaabb  movzx r9d, word [rbx+0xc]
0bcaaac0  cmp r9d, +0x08
0bcaaac4  jnz 0x0bca0020    ->4
0bcaaaca  cmp byte [rbx+0x17], 0x6
0bcaaace  jnz 0x0bca0024    ->5
0bcaaad4  mov r15, 0xfffffffb4168c128
0bcaaade  cmp r15, [rdi+0x170]
0bcaaae5  jnz 0x0bca0028    ->6
0bcaaaeb  cmp dword [rdi+0x16c], -0x09
0bcaaaf2  jnz 0x0bca0028    ->6
0bcaaaf8  movzx r10d, word [rbx+0x14]
0bcaaafd  cmp dword [rdi+0x168], 0x4168c100
0bcaab07  jnz 0x0bca0028    ->6
0bcaab0d  test r10d, 0xff1f
0bcaab14  jnz 0x0bca002c    ->7
0bcaab1a  mov r15, 0xfffffffb4168bfc0
0bcaab24  cmp r15, [rdi+0x140]
0bcaab2b  jnz 0x0bca0030    ->8
0bcaab31  cmp dword [rdi+0x13c], -0x09
0bcaab38  jnz 0x0bca0030    ->8
0bcaab3e  movzx r10d, byte [rbx+0xe]
0bcaab43  cmp dword [rdi+0x138], 0x4168bf98
0bcaab4d  jnz 0x0bca0030    ->8
0bcaab53  shl r10d, 0x02
0bcaab57  and r10d, +0x3c
0bcaab5b  mov r11d, r10d
0bcaab5e  add r11d, +0x10
0bcaab62  jo 0x0bca0030 ->8
0bcaab68  xorps xmm3, xmm3
0bcaab6b  cvtsi2sd xmm3, r11d
0bcaab70  ucomisd xmm3, xmm2
0bcaab74  ja 0x0bca0034 ->9
0bcaab7a  mov edi, r10d
0bcaab7d  add edi, +0x0e
0bcaab80  jo 0x0bca0038 ->10
0bcaab86  movsxd rdi, edi
0bcaab89  movzx ebx, word [rdi+rbx]
0bcaab8d  cmp ebx, 0xb315
0bcaab93  jnz 0x0bca003c    ->11
0bcaab99  cmp dword [rsi+0x1c], 0xfffeffff
0bcaaba0  jnb 0x0bca0040    ->12
0bcaaba6  movsd xmm7, [rsi+0x18]
0bcaabab  addsd xmm7, xmm0
0bcaabaf  add ebp, +0x01
0bcaabb2  cmp ebp, eax
0bcaabb4  jg 0x0bca0044 ->13
->LOOP:
0bcaabba  cmp dword [rdx+rbp*8+0x4], -0x0c
0bcaabbf  jnz 0x0bca0048    ->14
0bcaabc5  mov r15d, [rdx+rbp*8]
0bcaabc9  mov dword [rcx+0x4], 0xfffffff4
0bcaabd0  mov [rcx], r15d
0bcaabd3  movaps xmm4, xmm6
0bcaabd6  addsd xmm6, xmm0
0bcaabda  cmp dword [r15+0x1c], +0x01
0bcaabdf  jnz 0x0bca004c    ->15
0bcaabe5  mov r14d, [r15+0x14]
0bcaabe9  mov rdi, 0xfffffffb41691100
0bcaabf3  cmp rdi, [r14+0x20]
0bcaabf7  jnz 0x0bca004c    ->15
0bcaabfd  cmp dword [r14+0x1c], -0x0b
0bcaac02  jnz 0x0bca004c    ->15
0bcaac08  mov ebx, [r14+0x18]
0bcaac0c  mov rdi, 0xfffffffb4168a640
0bcaac16  cmp rdi, [r14+0x8]
0bcaac1a  jnz 0x0bca004c    ->15
0bcaac20  cmp dword [r14+0x4], 0xfffeffff
0bcaac28  jnb 0x0bca004c    ->15
0bcaac2e  movsd xmm5, [r14]
0bcaac33  ucomisd xmm1, xmm5
0bcaac37  ja 0x0bca0050 ->16
0bcaac3d  movzx r15d, word [rbx+0x6]
0bcaac42  cmp r15d, 0xb5
0bcaac49  jnz 0x0bca0054    ->17
0bcaac4f  mov r12, [rbx+0x8]
0bcaac53  movzx r15d, word [r12+0xc]
0bcaac59  cmp r15d, +0x08
0bcaac5d  jnz 0x0bca0058    ->18
0bcaac63  cmp byte [r12+0x17], 0x6
0bcaac69  jnz 0x0bca005c    ->19
0bcaac6f  movzx r14d, word [r12+0x14]
0bcaac75  test r14d, 0xff1f
0bcaac7c  jnz 0x0bca0060    ->20
0bcaac82  movzx r14d, byte [r12+0xe]
0bcaac88  shl r14d, 0x02
0bcaac8c  and r14d, +0x3c
0bcaac90  mov r13d, r14d
0bcaac93  add r13d, +0x10
0bcaac97  jo 0x0bca0064 ->21
0bcaac9d  xorps xmm4, xmm4
0bcaaca0  cvtsi2sd xmm4, r13d
0bcaaca5  ucomisd xmm4, xmm5
0bcaaca9  ja 0x0bca0068 ->22
0bcaacaf  mov edi, r14d
0bcaacb2  add edi, +0x0e
0bcaacb5  jo 0x0bca006c ->23
0bcaacbb  movsxd rdi, edi
0bcaacbe  movzx r12d, word [rdi+r12]
0bcaacc3  cmp r12d, 0xb315
0bcaacca  jnz 0x0bca0070    ->24
0bcaacd0  addsd xmm7, xmm0
0bcaacd4  add ebp, +0x01
0bcaacd7  cmp ebp, eax
0bcaacd9  jle 0x0bcaabba    ->LOOP
0bcaacdf  jmp 0x0bca0074    ->25
---- TRACE 49 stop -> loop

---- TRACE 50 start 49/24 "tcp port 5555":11
0052  . ADDVN    5   3   9  ; 18
0053  . ISGE     1   5
0054  . JMP      5 => 0057
0057  . GGET     5   0      ; "cast"
0058  . KSTR     6   1      ; "uint16_t*"
0059  . ADDVV    7   0   4
0000  . . . FUNCC               ; ffi.meta.__add
0060  . CALL     5   2   3
0000  . . FUNCC               ; ffi.cast
0061  . TGETB    5   5   0
0000  . . . FUNCC               ; ffi.meta.__index
0062  . ISEQN    5   8      ; 45845
0063  . JMP      5 => 0066
0066  . KPRI     5   2
0067  . RET1     5   2
0016  ISF          8
0017  JMP      9 => 0019
0018  ADDVN    3   3   0  ; 1
0019  JFORL    4  49
---- TRACE 50 IR
0001 xmm6     num SLOAD  #3    PI
0002 xmm7     num SLOAD  #4    PI
0003 rbp      int SLOAD  #5    PI
0004 rax      int SLOAD  #6    PRI
0005 rbx      cdt SLOAD  #10   PI
0006 xmm5     num SLOAD  #11   PI
0007 r15      u16 SLOAD  #12   PI
0008 r14      int SLOAD  #13   PI
0009 r13      int SLOAD  #14   PI
....              SNAP   #0   [ ---- ---- ---- 0001 0002 0003 0004 ---- 0003 "tcp port 5555":1|0005 0006 0007 0008 0009 ]
0010 r12   >  int ADDOV  0008  +18 
0011 xmm4     num CONV   0010  num.int
....              SNAP   #1   [ ---- ---- ---- 0001 0002 0003 0004 ---- 0003 "tcp port 5555":1|---- ---- ---- ---- ---- ]
0012       >  num ULE    0011  0006
....              SNAP   #2   [ ---- ---- ---- 0001 0002 0003 0004 ---- 0003 "tcp port 5555":1|0005 0006 0007 0008 0009 ]
0013 r12      tab FLOAD  "tcp port 5555":1  func.env
0014          int FLOAD  0013  tab.hmask
0015       >  int EQ     0014  +15 
0016 rsi      p32 FLOAD  0013  tab.node
0017       >  p32 HREFK  0016  "cast" @6
0018       >  fun HLOAD  0017
0019 r12      u16 FLOAD  0005  cdata.ctypeid
0020       >  int EQ     0019  +181
0021 r12      p64 FLOAD  0005  cdata.ptr
0022 rdi      i64 CONV   0009  i64.int sext
0023          p64 ADD    0022  0021
0024  {sink}  cdt CNEWI  +181  0023
0025       >  fun EQ     0018  ffi.cast
0026  {sink}  cdt CNEWI  +184  0023
0027 r12      u16 XLOAD  0023  
....              SNAP   #3   [ ---- ---- ---- 0001 0002 0003 0004 ---- 0003 "tcp port 5555":1|0005 0006 0007 0008 0009 ]
0028       >  int EQ     0027  +45845
0029 xmm7     num ADD    0002  +1  
0030 rbp      int ADD    0003  +1  
....              SNAP   #4   [ ---- ---- ---- 0001 0029 ]
0031       >  int LE     0030  0004
0032 xmm5     num CONV   0030  num.int
....              SNAP   #5   [ ---- ---- ---- 0001 0029 0032 0004 ---- 0032 ]
---- TRACE 50 mcode 220
0bcaa81a  mov dword [0x416854a0], 0x32
0bcaa825  mov edx, esi
0bcaa827  mov r12d, r14d
0bcaa82a  add r12d, +0x12
0bcaa82e  jo 0x0bca0010 ->0
0bcaa834  xorps xmm4, xmm4
0bcaa837  cvtsi2sd xmm4, r12d
0bcaa83c  ucomisd xmm4, xmm5
0bcaa840  ja 0x0bca0014 ->1
0bcaa846  mov r12d, [0x4a6f6a68]
0bcaa84e  cmp dword [r12+0x1c], +0x0f
0bcaa854  jnz 0x0bca0018    ->2
0bcaa85a  mov esi, [r12+0x14]
0bcaa85f  mov rdi, 0xfffffffb41598a80
0bcaa869  cmp rdi, [rsi+0x98]
0bcaa870  jnz 0x0bca0018    ->2
0bcaa876  cmp dword [rsi+0x94], -0x09
0bcaa87d  jnz 0x0bca0018    ->2
0bcaa883  movzx r12d, word [rbx+0x6]
0bcaa888  cmp r12d, 0xb5
0bcaa88f  jnz 0x0bca0018    ->2
0bcaa895  mov r12, [rbx+0x8]
0bcaa899  movsxd rdi, r13d
0bcaa89c  cmp dword [rsi+0x90], 0x41598a58
0bcaa8a6  jnz 0x0bca0018    ->2
0bcaa8ac  movzx r12d, word [rdi+r12]
0bcaa8b1  cmp r12d, 0xb315
0bcaa8b8  jnz 0x0bca001c    ->3
0bcaa8be  movsd xmm5, [0x4159a288]
0bcaa8c7  addsd xmm7, xmm5
0bcaa8cb  add ebp, +0x01
0bcaa8ce  cmp ebp, eax
0bcaa8d0  jg 0x0bca0020 ->4
0bcaa8d6  xorps xmm5, xmm5
0bcaa8d9  cvtsi2sd xmm5, ebp
0bcaa8dd  movsd [rdx+0x38], xmm5
0bcaa8e2  movsd [rdx+0x20], xmm5
0bcaa8e7  movsd [rdx+0x18], xmm7
0bcaa8ec  movsd [rdx+0x10], xmm6
0bcaa8f1  jmp 0x0bcaa8fd
---- TRACE 50 stop -> 49

---- TRACE 51 start 49/11 "tcp port 5555":11
0052  . ADDVN    5   3   9  ; 18
0053  . ISGE     1   5
0054  . JMP      5 => 0057
0057  . GGET     5   0      ; "cast"
0058  . KSTR     6   1      ; "uint16_t*"
0059  . ADDVV    7   0   4
0000  . . . FUNCC               ; ffi.meta.__add
0060  . CALL     5   2   3
0000  . . FUNCC               ; ffi.cast
0061  . TGETB    5   5   0
0000  . . . FUNCC               ; ffi.meta.__index
0062  . ISEQN    5   8      ; 45845
0063  . JMP      5 => 0066
0066  . KPRI     5   2
0067  . RET1     5   2
0016  ISF          8
0017  JMP      9 => 0019
0018  ADDVN    3   3   0  ; 1
0019  JFORL    4  49
---- TRACE 51 IR
0001 xmm6     num SLOAD  #3    PI
0002 rbp      int SLOAD  #5    PI
0003 rax      int SLOAD  #6    PRI
0004 r8       cdt SLOAD  #10   PI
0005 xmm2     num SLOAD  #11   PI
0006 r9       u16 SLOAD  #12   PI
0007 r10      int SLOAD  #13   PI
0008 r11      int SLOAD  #14   PI
....              SNAP   #0   [ ---- ---- ---- 0001 ---- 0002 0003 ---- 0002 "tcp port 5555":1|0004 0005 0006 0007 0008 ]
0009 rbx   >  int ADDOV  0007  +18 
0010 xmm7     num CONV   0009  num.int
....              SNAP   #1   [ ---- ---- ---- 0001 ---- 0002 0003 ---- 0002 "tcp port 5555":1|---- ---- ---- ---- ---- ]
0011       >  num ULE    0010  0005
....              SNAP   #2   [ ---- ---- ---- 0001 ---- 0002 0003 ---- 0002 "tcp port 5555":1|0004 0005 0006 0007 0008 ]
0012 rbx      tab FLOAD  "tcp port 5555":1  func.env
0013          int FLOAD  0012  tab.hmask
0014       >  int EQ     0013  +15 
0015 r14      p32 FLOAD  0012  tab.node
0016       >  p32 HREFK  0015  "cast" @6
0017       >  fun HLOAD  0016
0018 rbx      u16 FLOAD  0004  cdata.ctypeid
0019       >  int EQ     0018  +181
0020 rbx      p64 FLOAD  0004  cdata.ptr
0021 r15      i64 CONV   0008  i64.int sext
0022          p64 ADD    0021  0020
0023  {sink}  cdt CNEWI  +181  0022
0024       >  fun EQ     0017  ffi.cast
0025  {sink}  cdt CNEWI  +184  0022
0026 rbx      u16 XLOAD  0022  
....              SNAP   #3   [ ---- ---- ---- 0001 ---- 0002 0003 ---- 0002 "tcp port 5555":1|0004 0005 0006 0007 0008 ]
0027       >  int EQ     0026  +45845
....              SNAP   #4   [ ---- ---- ---- 0001 ---- 0002 0003 ---- 0002 "tcp port 5555":1|---- ---- ---- ---- ---- ]
0028 xmm7  >  num SLOAD  #4    T
0029 xmm7     num ADD    0028  +1  
0030 rbp      int ADD    0002  +1  
....              SNAP   #5   [ ---- ---- ---- 0001 0029 ]
0031       >  int LE     0030  0003
0032 xmm5     num CONV   0030  num.int
....              SNAP   #6   [ ---- ---- ---- 0001 0029 0032 0003 ---- 0032 ]
---- TRACE 51 mcode 232
0bcaa72b  mov dword [0x416854a0], 0x33
0bcaa736  mov edx, esi
0bcaa738  movsd xmm5, [0x4159a288]
0bcaa741  mov ebx, r10d
0bcaa744  add ebx, +0x12
0bcaa747  jo 0x0bca0010 ->0
0bcaa74d  xorps xmm7, xmm7
0bcaa750  cvtsi2sd xmm7, ebx
0bcaa754  ucomisd xmm7, xmm2
0bcaa758  ja 0x0bca0014 ->1
0bcaa75e  mov ebx, [0x4a6f6a68]
0bcaa765  cmp dword [rbx+0x1c], +0x0f
0bcaa769  jnz 0x0bca0018    ->2
0bcaa76f  mov r14d, [rbx+0x14]
0bcaa773  mov rdi, 0xfffffffb41598a80
0bcaa77d  cmp rdi, [r14+0x98]
0bcaa784  jnz 0x0bca0018    ->2
0bcaa78a  cmp dword [r14+0x94], -0x09
0bcaa792  jnz 0x0bca0018    ->2
0bcaa798  movzx ebx, word [r8+0x6]
0bcaa79d  cmp ebx, 0xb5
0bcaa7a3  jnz 0x0bca0018    ->2
0bcaa7a9  mov rbx, [r8+0x8]
0bcaa7ad  movsxd r15, r11d
0bcaa7b0  cmp dword [r14+0x90], 0x41598a58
0bcaa7bb  jnz 0x0bca0018    ->2
0bcaa7c1  movzx ebx, word [r15+rbx]
0bcaa7c6  cmp ebx, 0xb315
0bcaa7cc  jnz 0x0bca001c    ->3
0bcaa7d2  cmp dword [rdx+0x1c], 0xfffeffff
0bcaa7d9  jnb 0x0bca0020    ->4
0bcaa7df  movsd xmm7, [rdx+0x18]
0bcaa7e4  addsd xmm7, xmm5
0bcaa7e8  add ebp, +0x01
0bcaa7eb  cmp ebp, eax
0bcaa7ed  jg 0x0bca0024 ->5
0bcaa7f3  xorps xmm5, xmm5
0bcaa7f6  cvtsi2sd xmm5, ebp
0bcaa7fa  movsd [rdx+0x38], xmm5
0bcaa7ff  movsd [rdx+0x20], xmm5
0bcaa804  movsd [rdx+0x18], xmm7
0bcaa809  movsd [rdx+0x10], xmm6
0bcaa80e  jmp 0x0bcaa8fd
---- TRACE 51 stop -> 49

@wingo
Copy link
Contributor Author

wingo commented Dec 17, 2014

The assembly's still kinda trash to be honest. Oh well though, the performance is certainly fine though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants