Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance tests #2

Open
HBehrens opened this issue Feb 9, 2013 · 35 comments
Open

performance tests #2

HBehrens opened this issue Feb 9, 2013 · 35 comments

Comments

@HBehrens
Copy link
Member

HBehrens commented Feb 9, 2013

Similar to the benchmark by trigger.io it is interesting to compare

  • larger/smaller payload
  • blocked and async calls (how to do async?)
  • PhoneGap, trigger.io

Here's the code of the benchmark mentioned above triggr.io, phonegap, iOS

HBehrens added a commit that referenced this issue Feb 10, 2013
@HBehrens
Copy link
Member Author

Here are the result of my iPhone 5 on iOS 6.0.1 (there's room for improvement)

1000 sequential requests with 10 bytes of data: 4.141 seconds
1000 sequential requests with 100 bytes of data: 4.141 seconds
1000 sequential requests with 1000 bytes of data: 4.34 seconds
1000 real sequential requests with 10 bytes of data: 2.952 seconds
1000 real sequential requests with 100 bytes of data: 2.924 seconds
1000 real sequential requests with 1000 bytes of data: 3.117 seconds
1000 concurrent requests with 10 bytes of data: 4.191 seconds
1000 concurrent requests with 100 bytes of data: 3.974 seconds
1000 concurrent requests with 1000 bytes of data: 4.151 seconds

@HBehrens
Copy link
Member Author

Results from trigger.io 1.4 on same device

1000 sequential requests with 10 bytes of data: 8.962 seconds
1000 sequential requests with 100 bytes of data: 8.152 seconds
1000 sequential requests with 1000 bytes of data: 9.108 seconds
1000 concurrent requests with 10 bytes of data: 2.096 seconds
1000 concurrent requests with 100 bytes of data: 2.239 seconds
1000 concurrent requests with 1000 bytes of data: 3.949 seconds

@HBehrens
Copy link
Member Author

And the same tests with Cordova 2.4.0 (great performance!)

1000 sequential requests with 10 bytes of data: 1.832 seconds
1000 sequential requests with 100 bytes of data: 1.783 seconds
1000 sequential requests with 1000 bytes of data: 2.044 seconds
1000 concurrent requests with 10 bytes of data: 0.292 seconds
1000 concurrent requests with 100 bytes of data: 0.29 seconds
1000 concurrent requests with 1000 bytes of data: 0.394 seconds

@HBehrens
Copy link
Member Author

These reference projects should be part of the repo

@HBehrens
Copy link
Member Author

Here's the result in numbers

Speed Comparison

To update this diagram with changed numbers, edit the values for the last query parameter chd where you can find 18 values separated by | for each group of 6 values.

@HBehrens
Copy link
Member Author

Here are the results after improvements a353502 and 872b622

1000 real sequential requests with 10 bytes of data: 1.981 seconds
1000 real sequential requests with 100 bytes of data: 1.971 seconds
1000 real sequential requests with 1000 bytes of data: 2.193 seconds
1000 concurrent requests with 10 bytes of data: 3.121 seconds
1000 concurrent requests with 100 bytes of data: 3.2 seconds
1000 concurrent requests with 1000 bytes of data: 3.395 seconds

Speed Comparison

That's on a par with Cordova :)

HBehrens added a commit that referenced this issue Feb 11, 2013
@HBehrens
Copy link
Member Author

and the figures from f24b2bb

1000 real sequential requests with 10 bytes of data: 1.974 seconds
1000 real sequential requests with 100 bytes of data: 2.112 seconds
1000 real sequential requests with 1000 bytes of data: 2.214 seconds
1000 concurrent requests with 10 bytes of data: 2.644 seconds
1000 concurrent requests with 100 bytes of data: 2.771 seconds
1000 concurrent requests with 1000 bytes of data: 2.967 seconds

Speed Comparison

HBehrens added a commit that referenced this issue Feb 13, 2013
…is needed -> performance now better than ever :) (refs #2)
@HBehrens
Copy link
Member Author

and the figures from cbecfe5

1000 real sequential requests with 10 bytes of data: 1.845 seconds
1000 real sequential requests with 100 bytes of data: 2.08 seconds
1000 real sequential requests with 1000 bytes of data: 2.181 seconds
1000 concurrent requests with 10 bytes of data: 2.513 seconds
1000 concurrent requests with 100 bytes of data: 2.649 seconds
1000 concurrent requests with 1000 bytes of data: 2.785 seconds

Speed Comparison

HBehrens added a commit that referenced this issue Feb 13, 2013
…ctions (closes #17) -> dramatic performance increase on concurrent calls (refs #2)
@HBehrens
Copy link
Member Author

And here are the latests figures after suporting async calls in both directions as described in #12 and #11 + #17:

1000 real sequential requests with 10 bytes of data: 1.904 seconds
1000 real sequential requests with 100 bytes of data: 1.999 seconds
1000 real sequential requests with 1000 bytes of data: 2.209 seconds
1000 concurrent requests with 10 bytes of data: 0.403 seconds
1000 concurrent requests with 100 bytes of data: 0.42 seconds
1000 concurrent requests with 1000 bytes of data: 0.565 seconds

Speed Comparison

mrcljx added a commit that referenced this issue Feb 18, 2013
@mrcljx
Copy link
Member

mrcljx commented Feb 18, 2013

The figures for b72cb39 (no reference benchmarks)

1000 real sequential requests with 10 bytes of data: 3.858 seconds
1000 real sequential requests with 100 bytes of data: 3.901 seconds
1000 real sequential requests with 1000 bytes of data: 4.504 seconds
1000 concurrent requests with 10 bytes of data: 7.543 seconds
1000 concurrent requests with 100 bytes of data: 7.276 seconds
1000 concurrent requests with 1000 bytes of data: 6.411 seconds

Speed Comparison

Bad (but expected) performance of the concurrent calls since their complete-callbacks must be dispatched on a different thread (the UI thread) to call webView.loadUrl(...) which sequential calls don't have to.

@HBehrens
Copy link
Member Author

Great to see some numbers :) The values for sequential calls are not too bad, actually. Would be interesting to compare those with Cordova and trigger.io. Let's do that on Thursday. For the concurrent test: You don't have implemented transit.doHandleInvocationQueue yet, do you?

@HBehrens HBehrens reopened this Feb 18, 2013
@mrcljx
Copy link
Member

mrcljx commented Feb 18, 2013

In fact I implemented it. See

case BATCH_INVOKE:
assert !isActive();
Object invocationsObject = unmarshal(payload);
final List<?> invocations = (List<?>)invocationsObject;
result.resolve();
runOnNonUiThread(new Runnable() {
@Override
public void run() {
for(Object invocation : invocations) {
try {
doInvokeNativeAsync((TransitJSObject) context.proxify(invocation));
} catch (Exception e) {
Log.e(TAG, String.format("[%s] Invocation of `%s` failed", request, invocation));
}
}
}
});
return true;
and
transit.doHandleInvocationQueue = function(invocationDescriptions) {
batch = true;
try {
return post("__TRANSIT_MAGIC_BATCH_INVOKE", invocationDescriptions);
} finally {
batch = false;
}
};
.

However I could use a threadpool to increase performance. Trying that right now.

@HBehrens
Copy link
Member Author

But this line should only be called once for 1000 concurrent calls, right? What about the other the other direction (bulk calls from native to JS)? Could you safe roundtrips over there? On ObjC there's a callAync on JSFunction that puts itself in a queue.

mrcljx added a commit that referenced this issue Feb 18, 2013
@mrcljx
Copy link
Member

mrcljx commented Feb 18, 2013

The line is only called once, indeed. There also exists a .callAsync function on JSFunction. The invocation is then added to the global UI-thread dispatch queue. I could further group these into a single loadUrl but for now I try using a threadpool. Results incoming...

@mrcljx
Copy link
Member

mrcljx commented Feb 18, 2013

The figures for fa8affc with a ThreadPool (still no reference benchmarks)

1000 real sequential requests with 10 bytes of data: 4.149 seconds
1000 real sequential requests with 100 bytes of data: 3.831 seconds
1000 real sequential requests with 1000 bytes of data: 4.344 seconds
1000 concurrent requests with 10 bytes of data: 2.092 seconds
1000 concurrent requests with 100 bytes of data: 2.098 seconds
1000 concurrent requests with 1000 bytes of data: 2.166 seconds

Speed Comparison

Much better!

@HBehrens
Copy link
Member Author

Indeed! So 2,17s (=4.34-2.17) / 1,73s (=3.83-2.1) is the time needed to to 1.000 calls from JS to native?

@mrcljx
Copy link
Member

mrcljx commented Feb 18, 2013

I doubt that the diff of the measurements can be used as a measurement itself. Sequential calls and concurrent calls have quite different codepaths within the native code because of the limitations the Android SDK put upon us. Just for example: The first call from native to JS is more expensive than a reentrant call.

@HBehrens
Copy link
Member Author

...which would balancing out in this particular example. Anyway, I am still curous why "sequential" is that much slower then "concurrent". Is it really just the calls from JS to native?

@mrcljx
Copy link
Member

mrcljx commented Feb 18, 2013

I'd rather assume that this is 2-core-multithread vs single-thread (my ThreadPool size defaults to numOfProcessors).

@HBehrens
Copy link
Member Author

Can you verify this be decreasing the number to 1?

@mrcljx
Copy link
Member

mrcljx commented Feb 18, 2013

I was wrong indeed. PoolSize doesn't have an impact on concurrent benchmarks.

@mrcljx
Copy link
Member

mrcljx commented Feb 19, 2013

I did some tweaks and recompiled the app with Android's ProGuard (to get rid of Log.d and Log.v messages and perform general optimization).

The figures for fff739d built with ant release install show equal numbers for concurrent and sequential calls.

1000 real sequential requests with 10 bytes of data: 1.703 seconds
1000 real sequential requests with 100 bytes of data: 1.62 seconds
1000 real sequential requests with 1000 bytes of data: 1.568 seconds
1000 concurrent requests with 10 bytes of data: 1.608 seconds
1000 concurrent requests with 100 bytes of data: 1.73 seconds
1000 concurrent requests with 1000 bytes of data: 1.8 seconds

Speed Comparison

Much better!

@HBehrens
Copy link
Member Author

Impressive! Any noteworthy insights beyond "general optimization"?

Those numbers appear to be similar to this optimization above where concurrent calls are basically as fast as sequential calls. For Objective-C "real" async calls with only 1 roundtrip for 1000 calls from JS and back resulted in a performance increase by factor 5.

@HBehrens
Copy link
Member Author

Another note on ProGuard: I do not fully understand the implications of the ProGuard config you used. Do you think scarifying the readability of call stacks is worth the performance gain? What happens if you disable the aforementioned log statements only without any obfuscation?

@mrcljx
Copy link
Member

mrcljx commented Feb 19, 2013

I disabled obfuscation in 96cc021 and the results remain the same. Using ProGuard in this configuration should be similar to -O3 in C-world with macros to disable log-statements.

@HBehrens
Copy link
Member Author

Perfect :) Any thoughts on the remarks on aync calls?

@mrcljx
Copy link
Member

mrcljx commented Feb 19, 2013

The real async calls (1 roundtrip for 1000 calls) were introduced in 8c93e73ad3ffb4ea436988182267560a726dee91 (before my first benchmark). Overall optimization was essentially pre-checking for "__T" prefix.

For more insight I did some profiling of 100 concurrent requests (we can go into this on Thursday). Sadly I can't run this on my devices since free memory runs out after 5 seconds (but 100 requests take 16s with profiling enabled). Might retry this with fewer requests.

Trace of 100 concurrent requests with 10 bytes each

Row 1 = UI thread
Row 2 = WebView thread
Row 4 = AsyncInvokerPoolThread (only one, since running in Emulator = 1 core)
Row 5 = FinalizerDaemon (kicking in due to GC, starting releaseProxy)

2400ms-6800ms = JSON parsing
6800ms-7800ms = JSON postprocessing (convert Android's JSONObjects to Maps and JSONArrays to Lists)
rest: 1000 times native invoke, followed by loadUrl on UI Thread and some handling inside WebView

Detail view of nativeInvoke

Native invoke

Unmarked area = proxify(jsonData) including String -> Func etc.
Marked area = doNativeInvokeAsync

So we are CPU bound for now.

@HBehrens
Copy link
Member Author

Not completely sure about the second diagram but as far as I understand, the 4th row of the first diagrams is the actual native implementation that does not do much more than passing back the argument it receives, right? Why does this take so long? Also, row 1 and 2 appear to do some work during this period while before PP on row 2 there's no activity at all. Shouldn't those threads idle while thread 4 is preparing the bulk result?

If calling loadUrl and prompt is no bottleneck the whole idea of bulk responses are useless, anyway. I'd be happy to look into these resuls on Thursday. Also, we should try to get the Cordova version up and running to verify our results or learn from them.

In any case: Having more than 500 round trips per second is great!

@mrcljx
Copy link
Member

mrcljx commented Feb 19, 2013

No, pool-1-thread-1 also has to proxify the input arguments (resolve _TRANSIT...), call jsExpressionFromCode (since it must call the callback-JSFunction of the benchmark) and loadUrl(...) the expression. This is the code that is run on the thread:

// 2nd diagram, 13,260ms - 13,290ms
TransitJSObject invocation = (TransitJSObject) context.proxify(invocation);

// 2nd diagram, 13,290ms - 13,335ms (interrupted by UI thread, woken by `handler.post`)
doInvokeNativeAsync(invocation);

But from the second diagram we get the cost of loadUrl at 13,322ms - 13,335ms (~10% of time) which is cheap compared to jsonParsing+postProcessing+proxifying (~60% of time) and jsExpressionFromCode (~30% of time).

@mrcljx
Copy link
Member

mrcljx commented Feb 19, 2013

With fef4b98 Transit.java now does batch calls against loadUrl. Results in 20% better performance for concurrent requests. Stopping here until we get Cordova as a reference benchmark on Thursday. :)

1000 real sequential requests with 10 bytes of data: 1.645 seconds
1000 real sequential requests with 100 bytes of data: 1.698 seconds
1000 real sequential requests with 1000 bytes of data: 1.778 seconds
1000 concurrent requests with 10 bytes of data: 1.331 seconds
1000 concurrent requests with 100 bytes of data: 1.332 seconds
1000 concurrent requests with 1000 bytes of data: 1.501 seconds

Speed Comparison

@mrcljx
Copy link
Member

mrcljx commented Feb 20, 2013

A few optimization ideas from a good nights sleep can be found in 67264e6 (including faster Jackson JSON parser, iterative/yielding parsing for Batch-Invokes, optimizations for trivial situations like @() and smaller stuff)... improving overall performance and concurrent performance even more (up to twice as fast).

1000 real sequential requests with 10 bytes of data: 1.39 seconds
1000 real sequential requests with 100 bytes of data: 1.322 seconds
1000 real sequential requests with 1000 bytes of data: 1.577 seconds
1000 concurrent requests with 10 bytes of data: 0.605 seconds
1000 concurrent requests with 100 bytes of data: 0.698 seconds
1000 concurrent requests with 1000 bytes of data: 0.767 seconds

Speed Comparison

@HBehrens
Copy link
Member Author

Yiiiieeha!

@mrcljx
Copy link
Member

mrcljx commented Feb 20, 2013

Also looks nice when Profiling the current version 😄

New Trace

and compared to the old one

Old Trace

Less waiting, more working.

@HBehrens
Copy link
Member Author

Here are the results from trigger.io running on the same android device

1000 sequential requests with 10 bytes of data: 1.864 seconds
1000 sequential requests with 100 bytes of data: 1.683 seconds
1000 sequential requests with 1000 bytes of data: 3.08 seconds
1000 concurrent requests with 10 bytes of data: 1.591 seconds
1000 concurrent requests with 100 bytes of data: 1.655 seconds
1000 concurrent requests with 1000 bytes of data: 2.782 seconds

I am completing the diagram
Speed Comparison

HBehrens added a commit that referenced this issue Feb 21, 2013
@HBehrens
Copy link
Member Author

and the results from a3b410b

1000 sequential requests with 10 bytes of data: 0.881 seconds
1000 sequential requests with 100 bytes of data: 0.731 seconds
1000 sequential requests with 1000 bytes of data: 1.643 seconds
1000 concurrent requests with 10 bytes of data: 0.501 seconds
1000 concurrent requests with 100 bytes of data: 0.64 seconds
1000 concurrent requests with 1000 bytes of data: 1.381 seconds

Speed Comparison

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants