Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvements #65

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

ercpe
Copy link
Contributor

@ercpe ercpe commented May 27, 2021

In my hunt for a bottleneck in one of our applications, I've found another few hotspots. Not that it was slow before, but when you push a few thousand messages per minute, some things just pop up.

This PR contains two small but measurable improvements. My approach was a simple timeit script, run standalone and with a profiler:

    result = timeit.timeit(stmt="""formatter.format(lr)""",
                           setup="""
from logstash_async.formatter import LogstashFormatter
from logging import LogRecord, Logger
import datetime
import logging

now = datetime.datetime.now()
formatter = LogstashFormatter()
lr = Logger('dummy-logger').makeRecord(
    'Some name', logging.ERROR, 'blubb.py', 123, 'Some message',
    args=(), exc_info=None,
    extra={'foo': 'bar', 'baz': 123, 'somedatetime': now})
""",
                           number=100000)
    print(result)

With the current 2.3.0, this script completed (on my machine) in around 2.6 seconds.

The first change was to change FORMATTER_RECORD_FIELD_SKIP_LIST to a set: this list is used in the fashion of x in y which is way faster for sets than list. This brought the execution time down to 2.2 seconds.

The other change was to exclude fields from FORMATTER_RECORD_FIELD_SKIP_LIST in _get_record_fields: The formatter pulled all fields from the message and the extra into the dict and removed the fields afterwards in _remove_excluded_fields. Since we're iterating over __dict__.items() in _get_record_fields anyway, we can skip the fields right there and save us the del later on.
With both changes, I'm down to 1.7 seconds.

I'm working on another improvement involving the datetime handling, but that will come in another PR as it involves a little bit more code.

Johann Schmitz added 2 commits May 25, 2021 06:28
The `FORMATTER_RECORD_FIELD_SKIP_LIST` is used heavily to remove fields from the resulting document in the fashion `x in FORMATTER_RECORD_FIELD_SKIP_LIST`. This operation is `O(n)` for lists, but `O(1)` to `O(n)` for sets.
The `_get_record_fields` method extracts and formats the items of the `LogRecord` object into the message dict. After collecting the static extra fields and adding the per-record extra fields, `_remove_excluded_fields` is called to remove any keys we don't want.
`_remove_excluded_fields` is called for the two `dict`s `message` and `extra_fields` and removes items if the key is present in `FORMATTER_RECORD_FIELD_SKIP_LIST`. We can improve this by not adding those fields in the first place since we're iterating over `record.__dict__` anyways.
@@ -32,11 +32,11 @@ class Constants:
# Usually this list does not need to be modified. Add/Remove elements to
# exclude/include them in the Logstash event, for the full list see:
# http://docs.python.org/library/logging.html#logrecord-attributes
FORMATTER_RECORD_FIELD_SKIP_LIST = [
FORMATTER_RECORD_FIELD_SKIP_LIST = {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, I wasn't aware there is such a difference on lookup performance between lists and sets.

It would be even better if we keep the constant a list here and convert it to a set later in the Formatter class, probably best in its __init__(). Then people can still append/remove items as they wish. Even if named constants, they are meant to be customised if necessary.
A set is also mutable but with a different API and so users had to adjust their code.

I think converting the list to a set later when creating a Formatter instance, should be ok as this happens only once (or only a few times if the formatter is configured multiple times or on forking or so but still not as often as log records are to be formatted).

@@ -133,7 +133,9 @@ def _format_timestamp(self, time_):

# ----------------------------------------------------------------------
def _get_record_fields(self, record):
return {k: self._value_repr(v) for k, v in record.__dict__.items()}
return {k: self._value_repr(v)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!
Since we removed now all unwanted fields from the record, we could skip trying to remove them again in _remove_excluded_fields by only passing extra_fields to the method.
Or maybe we incoporate the field removing code into _get_extra_fields as it remains the only place where the filtering is necessary yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I thought of that. However, subclasses may depend on _remove_excluded_fields to be called.

Another optimization would be to fold the _get_record_fields, _get_extra_fields and _remove_excluded_fields into one method: It iterates over record.__dict__ and builds a dict that can be message.updated in the format method. This would eliminate some iterations, checks and eventually del calls as well.
However, this would be a larger break in API and behaviour.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point about API breakage.

Maybe it's worth for these changes to bump to 3.0 and announce the API changes, so people will notice.

oschulzSam added a commit to SamhammerAG/python-logstash-async that referenced this pull request Jul 15, 2021
gkreuzer pushed a commit to SamhammerAG/python-logstash-async that referenced this pull request Jul 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants