Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

teuthology/scrape.py: Remove empty string in _get_service_types #1889

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion teuthology/scrape.py
Original file line number Diff line number Diff line change
Expand Up @@ -410,7 +410,10 @@ def _get_service_types(self, job):
result = defaultdict(list)
# Lines like:
# 2014-08-22T20:07:18.668 ERROR:tasks.ceph:saw valgrind issue <kind>Leak_DefinitelyLost</kind> in /var/log/ceph/valgrind/osd.3.log.gz
for line in grep(os.path.join(job.path, "teuthology.log"), "</kind> in "):
valgrind_err_line = grep(os.path.join(job.path, "teuthology.log"), "</kind> in ")
# removes blank space, empty string and None
valgrind_err_line = [line for line in valgrind_err_line if line and line.strip()]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

teuthology logs are usually huge, and grep function is already returns list of lines, using of list comprehension is quite expensive here, maybe it is better to use a generator expression?
but I've got more general question, is there any sense to call grep in Call out to native grep rather than feeding massive log files through python line by line, are python regex are that slow?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my benchmarks, using a list comprehension is ~2x as fast as grep for a statement like this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My points were:

  1. use generator expression instead of list comprehension here.
  2. remove scrape.grep methods and replace with pure python equivalent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, that's more clear. I don't disagree, but I think replacing grep() can be considered a separate issue from this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, that's more clear. I don't disagree, but I think replacing grep() can be considered a separate issue from this PR.

There are just two places where grep is used, this place and another one with similar logic. At least with this PR in this place just drop grep usage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or, do you want to merge this PR as is?

for line in valgrind_err_line:
match = re.search("<kind>(.+)</kind> in .+/(.+)", line)
if not match:
log.warning("Misunderstood line: {0}".format(line))
Expand Down