Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance enhancements for streaming #150

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

mroth
Copy link

@mroth mroth commented Dec 15, 2014

I made a few changes to the way streaming parsing is done to improve performance and throughput.

Here are some ad-hoc benchmarks from my workstation, consuming the streaming API with a query that generates ~1000 tweets per second for 35 seconds.

Before:

RESULTS  10.18s user 0.83s system 31% cpu 35.008 total
-> user:                      10.18s
-> system:                    0.83s
-> cpu:                       31%
-> max memory:                68064
-> page faults from disk:     0
-> other page faults:         18426
-> socket msgs received:      82913
-> socket msgs sent:          10

After:

RESULTS  6.70s user 1.06s system 22% cpu 35.007 total
-> user:                      6.70s
-> system:                    1.06s
-> cpu:                       22%
-> max memory:                69004
-> page faults from disk:     0
-> other page faults:         19171
-> socket msgs received:      104496
-> socket msgs sent:          10

In addition, the new streaming parser adds support for sending the delimited: length parameter to the Twitter API streaming endpoint, which causes it to prefix each tweet object with its length in bytes. This enables more efficient parsing since you don't have to scan ahead for an EOL.

Using {delimited: length}:

RESULTS  5.75s user 0.94s system 19% cpu 35.006 total
-> user:                      5.75s
-> system:                    0.94s
-> cpu:                       19%
-> max memory:                69336
-> page faults from disk:     0
-> other page faults:         18433
-> socket msgs received:      103859
-> socket msgs sent:          10

I rarely use Javascript or NodeJS so I would strongly suggest a code review on this before merging.

the most common case when parsing elements from the streaming API will
be tweets, thus, it should come first in the if/else branch rather than
last so that in more scenarios less checks will be performed.

this is a small difference but actually is highly visible when
streaming at high throughput.
add support for parsing `delimited: length` from twitter, and perform
less redundant comparisons in all cases.
@mroth
Copy link
Author

mroth commented Dec 15, 2014

By the way, I suspect there is still a large degree of room to further improve streaming performance. For example, using an actual buffer parser such as node-strtok could make a massive difference over my hacky method of using a local string variable which has to be constantly reassigned.

In addition, some profiling and looking for unnecessary operations in hot code paths could go a long way.

I'd love to encourage others who know Javascript better than me to look into this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant