Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark example #2

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Benchmark example #2

wants to merge 1 commit into from

Conversation

puuu
Copy link
Contributor

@puuu puuu commented Dec 18, 2017

Hi, I did some benchmarking of this lib. Writing single bytes or PROGMEM is rather slow compared to String. In addition, as a poc, I tried to implement a prebuffer to avoid iteration of all handler in ArduinoSimpleLogging::log(). You can find it in my prebuffer branch (puuu/ArduinoSimpleLogging@dd8cef7). The benchmark results (see below), show no big improvement (may be for a high number of handler, but this is unlikely a use case). Thus, the iteration is not the problem. The overhead likely comes from the dynamic binding within the Print class.

Anyhow, the benchmark example can be helpful.

Original:

------------------------
No Handler ...
char string
duration 4311 us, 0 calls
single chars
duration 20594 us, 0 calls
PROGMEM string
duration 25170 us, 0 calls
String object
duration 2971 us, 0 calls
------------------------

------------------------
Handler for error, logging to debug ...
char string
duration 4526 us, 0 calls
single chars
duration 24075 us, 0 calls
PROGMEM string
duration 28651 us, 0 calls
String object
duration 3214 us, 0 calls
------------------------

------------------------
Handler for error, logging to error ...
char string
duration 4905 us, 2 calls
single chars
duration 29326 us, 28 calls
PROGMEM string
duration 33901 us, 28 calls
String object
duration 3588 us, 2 calls
------------------------

Prebuffer

------------------------
No Handler ...
char string
duration 4436 us, 0 calls
single chars
duration 25311 us, 0 calls
PROGMEM string
duration 29820 us, 0 calls
String object
duration 3096 us, 0 calls
------------------------

------------------------
Handler for error, logging to debug ...
char string
duration 4651 us, 0 calls
single chars
duration 25539 us, 0 calls
PROGMEM string
duration 30176 us, 0 calls
String object
duration 3338 us, 0 calls
------------------------

------------------------
Handler for error, logging to error ...
char string
duration 5030 us, 2 calls
single chars
duration 25914 us, 2 calls
PROGMEM string
duration 30739 us, 3 calls
String object
duration 3714 us, 2 calls
------------------------

@janLo
Copy link
Owner

janLo commented Dec 18, 2017

Thank you.

I'm generally not a fan of buffering, especially if there is no clear flush-support in the interface you adapt. It always ends u implementing some magic (level-change plus newline in you testcase) that give hard-to-debug issues in some corner-cases.

Your performance test should be normalized I think. Writing a string from flash vs. writing a string bytewise from any other location in ram does not say anything about the performance of the lib, as long as you don't substract the overhead of the flash-read which you have anyway with data comming from flash, regardless what you use to pump it to your log target. If you do so, the numbers should be similar to the byte-wise-test except for the overhead of the extra function calls to size_t Print::println(const __FlashStringHelper *ifsh), size_t Print::print(const __FlashStringHelper *ifsh) and size_t Print::println(void).

The difference between String and const char* might result from different placement in memory or more likely from the extra strlen() in print. That's also something we can't really do anything about.

The general (and huge) difference between single-byte-writes and bulk-writes might come indeed from the (virtual-)function-call-overhead. That's just as it is and we can't do much about it. The only thing is, to tell people, that bulk-writes should be preferred and to make clear, that it is a library for logging messages from a program to various sources that are configurable at runtime, not for high throughput data processing.

I also think, that when you log to a network target or a serial connection, this overhead is not really a big deal.

@puuu
Copy link
Contributor Author

puuu commented Dec 20, 2017

The idea of this example is to see what impact any code change has to this library.
With this I realised that buffering will not increase the performance. I showed the output of the prebuffer brnach only to share this experience.

Now I added an output of the time used by the PrintMock:

------------------------
Only the the handler ...
char string
duration 3295 us, 2 calls
single chars
duration 9034 us, 28 calls
PROGMEM string
duration 13561 us, 28 calls
String object
duration 1996 us, 2 calls
------------------------

------------------------
No Handler ...
char string
duration 4263 us, 0 calls
single chars
duration 20588 us, 0 calls
PROGMEM string
duration 25150 us, 0 calls
String object
duration 2963 us, 0 calls
------------------------

------------------------
Handler for error, logging to debug ...
char string
duration 4501 us, 0 calls
single chars
duration 24076 us, 0 calls
PROGMEM string
duration 28651 us, 0 calls
String object
duration 3214 us, 0 calls
------------------------

------------------------
Handler for error, logging to error ...
char string
duration 4876 us, 2 calls
single chars
duration 29326 us, 28 calls
PROGMEM string
duration 33901 us, 28 calls
String object
duration 3588 us, 2 calls
------------------------

You can now subtract them from the time of your Logger:

No Handler Not matching Handler Matching Handler
char 968 1206 1581
single chars 11554 15042 20292
PROGMEM 11589 15090 20340
String 967 1218 1592

From this you see that the char and String have almost the same overhead as well as single chars and PROGMEM. But the total times are still important, since they tell how the lib performance compared to a preprocessed controlled logger.

Anyhow, at the moment I do not see a way to optimize the speed of this lib further more. And if you log to Serial then, the overhead of this lib is negligible.

By the way, all test were performed with a WeMos D1 mini at 80 MHz.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants