performance tests #2

HBehrens · 2013-02-09T21:23:48Z

Similar to the benchmark by trigger.io it is interesting to compare

larger/smaller payload
blocked and async calls (how to do async?)
PhoneGap, trigger.io

Here's the code of the benchmark mentioned above triggr.io, phonegap, iOS

HBehrens · 2013-02-10T13:12:23Z

Here are the result of my iPhone 5 on iOS 6.0.1 (there's room for improvement)

1000 sequential requests with 10 bytes of data: 4.141 seconds
1000 sequential requests with 100 bytes of data: 4.141 seconds
1000 sequential requests with 1000 bytes of data: 4.34 seconds
1000 real sequential requests with 10 bytes of data: 2.952 seconds
1000 real sequential requests with 100 bytes of data: 2.924 seconds
1000 real sequential requests with 1000 bytes of data: 3.117 seconds
1000 concurrent requests with 10 bytes of data: 4.191 seconds
1000 concurrent requests with 100 bytes of data: 3.974 seconds
1000 concurrent requests with 1000 bytes of data: 4.151 seconds

HBehrens · 2013-02-10T14:04:07Z

Results from trigger.io 1.4 on same device

1000 sequential requests with 10 bytes of data: 8.962 seconds
1000 sequential requests with 100 bytes of data: 8.152 seconds
1000 sequential requests with 1000 bytes of data: 9.108 seconds
1000 concurrent requests with 10 bytes of data: 2.096 seconds
1000 concurrent requests with 100 bytes of data: 2.239 seconds
1000 concurrent requests with 1000 bytes of data: 3.949 seconds

HBehrens · 2013-02-10T14:24:07Z

And the same tests with Cordova 2.4.0 (great performance!)

1000 sequential requests with 10 bytes of data: 1.832 seconds
1000 sequential requests with 100 bytes of data: 1.783 seconds
1000 sequential requests with 1000 bytes of data: 2.044 seconds
1000 concurrent requests with 10 bytes of data: 0.292 seconds
1000 concurrent requests with 100 bytes of data: 0.29 seconds
1000 concurrent requests with 1000 bytes of data: 0.394 seconds

HBehrens · 2013-02-10T14:24:19Z

These reference projects should be part of the repo

HBehrens · 2013-02-10T15:44:58Z

Here's the result in numbers

To update this diagram with changed numbers, edit the values for the last query parameter chd where you can find 18 values separated by | for each group of 6 values.

HBehrens · 2013-02-11T11:26:05Z

Here are the results after improvements a353502 and 872b622

1000 real sequential requests with 10 bytes of data: 1.981 seconds
1000 real sequential requests with 100 bytes of data: 1.971 seconds
1000 real sequential requests with 1000 bytes of data: 2.193 seconds
1000 concurrent requests with 10 bytes of data: 3.121 seconds
1000 concurrent requests with 100 bytes of data: 3.2 seconds
1000 concurrent requests with 1000 bytes of data: 3.395 seconds

That's on a par with Cordova :)

…rformance increase (refs #2)

HBehrens · 2013-02-11T13:27:11Z

and the figures from f24b2bb

1000 real sequential requests with 10 bytes of data: 1.974 seconds
1000 real sequential requests with 100 bytes of data: 2.112 seconds
1000 real sequential requests with 1000 bytes of data: 2.214 seconds
1000 concurrent requests with 10 bytes of data: 2.644 seconds
1000 concurrent requests with 100 bytes of data: 2.771 seconds
1000 concurrent requests with 1000 bytes of data: 2.967 seconds

…is needed -> performance now better than ever :) (refs #2)

HBehrens · 2013-02-13T01:22:51Z

and the figures from cbecfe5

1000 real sequential requests with 10 bytes of data: 1.845 seconds
1000 real sequential requests with 100 bytes of data: 2.08 seconds
1000 real sequential requests with 1000 bytes of data: 2.181 seconds
1000 concurrent requests with 10 bytes of data: 2.513 seconds
1000 concurrent requests with 100 bytes of data: 2.649 seconds
1000 concurrent requests with 1000 bytes of data: 2.785 seconds

…ctions (closes #17) -> dramatic performance increase on concurrent calls (refs #2)

HBehrens · 2013-02-13T19:29:06Z

And here are the latests figures after suporting async calls in both directions as described in #12 and #11 + #17:

1000 real sequential requests with 10 bytes of data: 1.904 seconds
1000 real sequential requests with 100 bytes of data: 1.999 seconds
1000 real sequential requests with 1000 bytes of data: 2.209 seconds
1000 concurrent requests with 10 bytes of data: 0.403 seconds
1000 concurrent requests with 100 bytes of data: 0.42 seconds
1000 concurrent requests with 1000 bytes of data: 0.565 seconds

mrcljx · 2013-02-18T15:16:30Z

The figures for b72cb39 (no reference benchmarks)

1000 real sequential requests with 10 bytes of data: 3.858 seconds
1000 real sequential requests with 100 bytes of data: 3.901 seconds
1000 real sequential requests with 1000 bytes of data: 4.504 seconds
1000 concurrent requests with 10 bytes of data: 7.543 seconds
1000 concurrent requests with 100 bytes of data: 7.276 seconds
1000 concurrent requests with 1000 bytes of data: 6.411 seconds

Bad (but expected) performance of the concurrent calls since their complete-callbacks must be dispatched on a different thread (the UI thread) to call webView.loadUrl(...) which sequential calls don't have to.

HBehrens · 2013-02-18T15:20:34Z

Great to see some numbers :) The values for sequential calls are not too bad, actually. Would be interesting to compare those with Cordova and trigger.io. Let's do that on Thursday. For the concurrent test: You don't have implemented transit.doHandleInvocationQueue yet, do you?

mrcljx · 2013-02-18T15:29:28Z

In fact I implemented it. See

Transit/source/java-android/src/com/getbeamapp/transit/prompt/TransitPromptAdapter.java

Lines 146 to 165 in 8c93e73

    
           case BATCH_INVOKE: 
        
               assert !isActive(); 
        
               Object invocationsObject = unmarshal(payload); 
        
               final List<?> invocations = (List<?>)invocationsObject; 
        
               result.resolve(); 
        
               runOnNonUiThread(new Runnable() { 
        
                   @Override 
        
                   public void run() { 
        
                       for(Object invocation : invocations) { 
        
                           try { 
        
                               doInvokeNativeAsync((TransitJSObject) context.proxify(invocation)); 
        
                           } catch (Exception e) { 
        
                               Log.e(TAG, String.format("[%s] Invocation of `%s` failed", request, invocation)); 
        
                           } 
        
                       } 
        
                   } 
        
               }); 
        
               return true;

and

Transit/source/java-android/res/raw/runtime.js

Lines 69 to 77 in 8c93e73

    
           transit.doHandleInvocationQueue = function(invocationDescriptions) { 
        
             batch = true; 
        
             try { 
        
               return post("__TRANSIT_MAGIC_BATCH_INVOKE", invocationDescriptions); 
        
             } finally { 
        
               batch = false; 
        
             } 
        
           };

.

However I could use a threadpool to increase performance. Trying that right now.

HBehrens · 2013-02-18T15:40:45Z

But this line should only be called once for 1000 concurrent calls, right? What about the other the other direction (bulk calls from native to JS)? Could you safe roundtrips over there? On ObjC there's a callAync on JSFunction that puts itself in a queue.

mrcljx · 2013-02-18T15:59:05Z

The line is only called once, indeed. There also exists a .callAsync function on JSFunction. The invocation is then added to the global UI-thread dispatch queue. I could further group these into a single loadUrl but for now I try using a threadpool. Results incoming...

mrcljx · 2013-02-18T16:03:06Z

The figures for fa8affc with a ThreadPool (still no reference benchmarks)

1000 real sequential requests with 10 bytes of data: 4.149 seconds
1000 real sequential requests with 100 bytes of data: 3.831 seconds
1000 real sequential requests with 1000 bytes of data: 4.344 seconds
1000 concurrent requests with 10 bytes of data: 2.092 seconds
1000 concurrent requests with 100 bytes of data: 2.098 seconds
1000 concurrent requests with 1000 bytes of data: 2.166 seconds

Much better!

HBehrens · 2013-02-18T16:11:24Z

Indeed! So 2,17s (=4.34-2.17) / 1,73s (=3.83-2.1) is the time needed to to 1.000 calls from JS to native?

mrcljx · 2013-02-18T16:54:52Z

I doubt that the diff of the measurements can be used as a measurement itself. Sequential calls and concurrent calls have quite different codepaths within the native code because of the limitations the Android SDK put upon us. Just for example: The first call from native to JS is more expensive than a reentrant call.

HBehrens · 2013-02-18T17:05:01Z

...which would balancing out in this particular example. Anyway, I am still curous why "sequential" is that much slower then "concurrent". Is it really just the calls from JS to native?

mrcljx · 2013-02-18T17:21:00Z

I'd rather assume that this is 2-core-multithread vs single-thread (my ThreadPool size defaults to numOfProcessors).

HBehrens · 2013-02-18T17:25:41Z

Can you verify this be decreasing the number to 1?

mrcljx · 2013-02-18T17:37:34Z

I was wrong indeed. PoolSize doesn't have an impact on concurrent benchmarks.

mrcljx · 2013-02-19T10:48:15Z

I did some tweaks and recompiled the app with Android's ProGuard (to get rid of Log.d and Log.v messages and perform general optimization).

The figures for fff739d built with ant release install show equal numbers for concurrent and sequential calls.

1000 real sequential requests with 10 bytes of data: 1.703 seconds
1000 real sequential requests with 100 bytes of data: 1.62 seconds
1000 real sequential requests with 1000 bytes of data: 1.568 seconds
1000 concurrent requests with 10 bytes of data: 1.608 seconds
1000 concurrent requests with 100 bytes of data: 1.73 seconds
1000 concurrent requests with 1000 bytes of data: 1.8 seconds

Much better!

HBehrens · 2013-02-19T10:53:51Z

Impressive! Any noteworthy insights beyond "general optimization"?

Those numbers appear to be similar to this optimization above where concurrent calls are basically as fast as sequential calls. For Objective-C "real" async calls with only 1 roundtrip for 1000 calls from JS and back resulted in a performance increase by factor 5.

HBehrens · 2013-02-19T10:59:42Z

Another note on ProGuard: I do not fully understand the implications of the ProGuard config you used. Do you think scarifying the readability of call stacks is worth the performance gain? What happens if you disable the aforementioned log statements only without any obfuscation?

mrcljx · 2013-02-19T11:22:33Z

I disabled obfuscation in 96cc021 and the results remain the same. Using ProGuard in this configuration should be similar to -O3 in C-world with macros to disable log-statements.

HBehrens · 2013-02-19T11:31:26Z

Perfect :) Any thoughts on the remarks on aync calls?

mrcljx · 2013-02-19T13:11:33Z

The real async calls (1 roundtrip for 1000 calls) were introduced in 8c93e73ad3ffb4ea436988182267560a726dee91 (before my first benchmark). Overall optimization was essentially pre-checking for "__T" prefix.

For more insight I did some profiling of 100 concurrent requests (we can go into this on Thursday). Sadly I can't run this on my devices since free memory runs out after 5 seconds (but 100 requests take 16s with profiling enabled). Might retry this with fewer requests.

Row 1 = UI thread
Row 2 = WebView thread
Row 4 = AsyncInvokerPoolThread (only one, since running in Emulator = 1 core)
Row 5 = FinalizerDaemon (kicking in due to GC, starting releaseProxy)

2400ms-6800ms = JSON parsing
6800ms-7800ms = JSON postprocessing (convert Android's JSONObjects to Maps and JSONArrays to Lists)
rest: 1000 times native invoke, followed by loadUrl on UI Thread and some handling inside WebView

Detail view of nativeInvoke

Unmarked area = proxify(jsonData) including String -> Func etc.
Marked area = doNativeInvokeAsync

So we are CPU bound for now.

HBehrens · 2013-02-19T13:30:25Z

Not completely sure about the second diagram but as far as I understand, the 4th row of the first diagrams is the actual native implementation that does not do much more than passing back the argument it receives, right? Why does this take so long? Also, row 1 and 2 appear to do some work during this period while before PP on row 2 there's no activity at all. Shouldn't those threads idle while thread 4 is preparing the bulk result?

If calling loadUrl and prompt is no bottleneck the whole idea of bulk responses are useless, anyway. I'd be happy to look into these resuls on Thursday. Also, we should try to get the Cordova version up and running to verify our results or learn from them.

In any case: Having more than 500 round trips per second is great!

mrcljx · 2013-02-19T14:18:54Z

No, pool-1-thread-1 also has to proxify the input arguments (resolve _TRANSIT...), call jsExpressionFromCode (since it must call the callback-JSFunction of the benchmark) and loadUrl(...) the expression. This is the code that is run on the thread:

// 2nd diagram, 13,260ms - 13,290ms
TransitJSObject invocation = (TransitJSObject) context.proxify(invocation);

// 2nd diagram, 13,290ms - 13,335ms (interrupted by UI thread, woken by `handler.post`)
doInvokeNativeAsync(invocation);

But from the second diagram we get the cost of loadUrl at 13,322ms - 13,335ms (~10% of time) which is cheap compared to jsonParsing+postProcessing+proxifying (~60% of time) and jsExpressionFromCode (~30% of time).

mrcljx · 2013-02-19T14:22:37Z

With fef4b98 Transit.java now does batch calls against loadUrl. Results in 20% better performance for concurrent requests. Stopping here until we get Cordova as a reference benchmark on Thursday. :)

1000 real sequential requests with 10 bytes of data: 1.645 seconds
1000 real sequential requests with 100 bytes of data: 1.698 seconds
1000 real sequential requests with 1000 bytes of data: 1.778 seconds
1000 concurrent requests with 10 bytes of data: 1.331 seconds
1000 concurrent requests with 100 bytes of data: 1.332 seconds
1000 concurrent requests with 1000 bytes of data: 1.501 seconds

mrcljx · 2013-02-20T11:37:00Z

A few optimization ideas from a good nights sleep can be found in 67264e6 (including faster Jackson JSON parser, iterative/yielding parsing for Batch-Invokes, optimizations for trivial situations like @() and smaller stuff)... improving overall performance and concurrent performance even more (up to twice as fast).

1000 real sequential requests with 10 bytes of data: 1.39 seconds
1000 real sequential requests with 100 bytes of data: 1.322 seconds
1000 real sequential requests with 1000 bytes of data: 1.577 seconds
1000 concurrent requests with 10 bytes of data: 0.605 seconds
1000 concurrent requests with 100 bytes of data: 0.698 seconds
1000 concurrent requests with 1000 bytes of data: 0.767 seconds

HBehrens · 2013-02-20T11:47:14Z

Yiiiieeha!

mrcljx · 2013-02-20T13:54:11Z

Also looks nice when Profiling the current version 😄

and compared to the old one

Less waiting, more working.

HBehrens · 2013-02-21T13:44:59Z

Here are the results from trigger.io running on the same android device

1000 sequential requests with 10 bytes of data: 1.864 seconds
1000 sequential requests with 100 bytes of data: 1.683 seconds
1000 sequential requests with 1000 bytes of data: 3.08 seconds
1000 concurrent requests with 10 bytes of data: 1.591 seconds
1000 concurrent requests with 100 bytes of data: 1.655 seconds
1000 concurrent requests with 1000 bytes of data: 2.782 seconds

I am completing the diagram

HBehrens · 2013-02-21T15:33:16Z

and the results from a3b410b

1000 sequential requests with 10 bytes of data: 0.881 seconds
1000 sequential requests with 100 bytes of data: 0.731 seconds
1000 sequential requests with 1000 bytes of data: 1.643 seconds
1000 concurrent requests with 10 bytes of data: 0.501 seconds
1000 concurrent requests with 100 bytes of data: 0.64 seconds
1000 concurrent requests with 1000 bytes of data: 1.381 seconds

HBehrens added a commit that referenced this issue Feb 10, 2013

tiny benchmark refs #2

ddb2834

HBehrens closed this as completed in 8e4877b Feb 10, 2013

HBehrens reopened this Feb 10, 2013

HBehrens added a commit that referenced this issue Feb 10, 2013

source of trigger.io benchmark (refs #2)

727d4ce

HBehrens added a commit that referenced this issue Feb 10, 2013

source of cordova iOS benchmark (refs #2)

d9e0411

HBehrens closed this as completed Feb 10, 2013

HBehrens mentioned this issue Feb 10, 2013

automate benchmarks #3

Closed

HBehrens added a commit that referenced this issue Feb 11, 2013

performance improvements for calling JSFunctions, in benchmark 18% pe…

f24b2bb

…rformance increase (refs #2)

HBehrens added a commit that referenced this issue Feb 13, 2013

flattened calls to JSFunction and saving JSON.stringify if no result …

cbecfe5

…is needed -> performance now better than ever :) (refs #2)

HBehrens added a commit that referenced this issue Feb 13, 2013

support for TransitNativeFunc.async and native support for queued fun…

d76fe22

…ctions (closes #17) -> dramatic performance increase on concurrent calls (refs #2)

mrcljx added a commit that referenced this issue Feb 18, 2013

Async call from Native->JS (refs #2)

b72cb39

HBehrens reopened this Feb 18, 2013

mrcljx added a commit that referenced this issue Feb 18, 2013

performance improvements (refs #2)

fa8affc

HBehrens mentioned this issue Feb 20, 2013

Reconsider polling approach for performance reasons #23

Open

HBehrens added a commit that referenced this issue Feb 21, 2013

cordova benchmark for android refs #2

a3b410b

performance tests #2

performance tests #2

Comments

HBehrens commented Feb 9, 2013

HBehrens commented Feb 10, 2013

HBehrens commented Feb 10, 2013

HBehrens commented Feb 10, 2013

HBehrens commented Feb 10, 2013

HBehrens commented Feb 10, 2013

HBehrens commented Feb 11, 2013

HBehrens commented Feb 11, 2013

HBehrens commented Feb 13, 2013

HBehrens commented Feb 13, 2013

mrcljx commented Feb 18, 2013

HBehrens commented Feb 18, 2013

mrcljx commented Feb 18, 2013

HBehrens commented Feb 18, 2013

mrcljx commented Feb 18, 2013

mrcljx commented Feb 18, 2013

HBehrens commented Feb 18, 2013

mrcljx commented Feb 18, 2013

HBehrens commented Feb 18, 2013

mrcljx commented Feb 18, 2013

HBehrens commented Feb 18, 2013

mrcljx commented Feb 18, 2013

mrcljx commented Feb 19, 2013

HBehrens commented Feb 19, 2013

HBehrens commented Feb 19, 2013

mrcljx commented Feb 19, 2013

HBehrens commented Feb 19, 2013

mrcljx commented Feb 19, 2013

HBehrens commented Feb 19, 2013

mrcljx commented Feb 19, 2013

mrcljx commented Feb 19, 2013

mrcljx commented Feb 20, 2013

HBehrens commented Feb 20, 2013

mrcljx commented Feb 20, 2013

HBehrens commented Feb 21, 2013

HBehrens commented Feb 21, 2013