-
Notifications
You must be signed in to change notification settings - Fork 354
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
leader election framework for performance metric collection #589
leader election framework for performance metric collection #589
Conversation
b6df07d
to
f90dc54
Compare
Codecov Report
@@ Coverage Diff @@
## master #589 +/- ##
==========================================
- Coverage 72.10% 71.18% -0.92%
==========================================
Files 141 144 +3
Lines 12814 12794 -20
Branches 1535 1528 -7
==========================================
- Hits 9240 9108 -132
- Misses 3050 3167 +117
+ Partials 524 519 -5
|
678091c
to
cf34f14
Compare
|
||
scheduler_mgr = SchedulerManager() | ||
|
||
if plugin == "tooz": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
plugin name can be a configurable name, so that it is easy to switch from one plugin type to other in future based on configuration..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its already a configurable parameter with name leader_election_plugin.
|
||
while not self._stop.is_set(): | ||
with timeutils.StopWatch() as w: | ||
LOG.info("sending heartbeats for leader election") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This log can be DEBUG leval
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logged as debug as suggested
# adjust time for leader | ||
has_to_sleep_for = has_to_sleep_for / 2 | ||
|
||
LOG.info('resting after leader watch as leader=%(leader)s ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Log.debug ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logged as debug as suggested
self.group = None | ||
|
||
def start(self): | ||
"""Connect to coordination back end.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make it switchable coordination back end based on config, current implimentaion can be only for redis with toox
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
created an issue to address the issue as future improvement #610
class LeaderElectionFactory: | ||
|
||
@staticmethod | ||
def construct_elector(plugin, leader_key=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we give a docstring here to describe what the function does?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added docstring. Thanks
def set_leader_callback(self, *args, **kwargs): | ||
self.leader = True | ||
|
||
def cleanup(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we give a docstring here to describe what the function does?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cleanup is abstract method of LeaderElector
interface. The docstring is already specified there.
7994321
to
c012b7b
Compare
c012b7b
to
a1ea770
Compare
@@ -80,7 +82,7 @@ def wait_random(low, high): | |||
def _wait(f, *a, **k): | |||
rd = random.randint(0, 100) | |||
secs = low + (high - low) * rd / 100 | |||
time.sleep(secs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the difference between time.sleep
and greenthread.sleep
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
greenthread.sleep
Yield control to another eligible coroutine until at least seconds have elapsed but with time.sleep
call do not do same. It eventually keep multiple coroutine busy when we test performance metric collection with load test (register more than 100 storage).
delfin/coordination.py
Outdated
# check if group exist | ||
groups.index(group) | ||
except Exception: | ||
# create a group if not exist |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we put a debug message here? or for that sake there should be a debug message in any exception block, IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added debug message. Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment starts with uppercase letter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated comment as per suggestion
Changes looks good |
key = leader_election_key.encode('ascii') | ||
super(Elector, self).__init__(callbacks, key) | ||
|
||
self._coordinator = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason, why are we using _ before some variables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Python takes "_"ed variable as private variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the intention. But with single underscore(_), privatization can not be supported in python. If really you want, then either __(double undrscore) or getter(), setter() way required.
Anyway, otherthings looks good to me. Thanks
45d0afc
to
d5bf9ad
Compare
d5bf9ad
to
e9372ab
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
842b499
38f5aa5
to
afc6212
Compare
@@ -521,7 +523,7 @@ def get_capabilities(context): | |||
"description": "Average time taken for an IO " | |||
"operation in ms" | |||
}, | |||
"requests": { | |||
"iops": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This changes are merged using #609 you may rebase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rebased. Thanks
afc6212
to
90edc9a
Compare
@ThisIsClark Please take a look. |
|
||
self.group = group | ||
|
||
def join_group(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw there is create group and join group, do we need remove a node from a group when a node is down
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually we need not remove node for leader election once node is down. But for cleanup reasons we need to remove the node from the group.
Its done in code
delfin/leader_election/interface.py
Outdated
# Copyright 2010 United States Government as represented by the | ||
# Administrator of the National Aeronautics and Space Administration. | ||
# Copyright 2011 Justin Santa Barbara |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to add copyright of these
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The copyright string is copied from task cmd
I hope i got your question right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got that, but task cmd has some other reason to have that copyright, so for these files in your PR, you could refer to ssl_utils.py
to set the copyright
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated copyright string as per suggestion. Thanks
@ThisIsClark Addressed your comments. PTAL. |
90edc9a
to
e2bfef3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please follow the python comment style
delfin/coordination.py
Outdated
self.started = True | ||
|
||
def ensure_group(self, group): | ||
# request groups |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove comments here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated as per comment
groups = req.get() | ||
try: | ||
# check if group exist | ||
groups.index(group) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment starts with uppercase letter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated as per comment
delfin/coordination.py
Outdated
# check if group exist | ||
groups.index(group) | ||
except Exception: | ||
# create a group if not exist |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated the comment
delfin/coordination.py
Outdated
# check if group exist | ||
groups.index(group) | ||
except Exception: | ||
# create a group if not exist |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment starts with uppercase letter
delfin/coordination.py
Outdated
self.group = group | ||
|
||
def join_group(self): | ||
# Join a group |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remvoe comments here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment removed. Thanks
self._stop.clear() | ||
|
||
self._coordinator = LeaderElectionCoordinator() | ||
# start leader coordinator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove comment here
self._coordinator.start() | ||
|
||
self._coordinator.ensure_group(self.election_key) | ||
# join coordinator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove comments here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment removed
# join coordinator | ||
self._coordinator.join_group() | ||
|
||
# register callback for elected leader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove comment here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment removed
callbacks.on_started_leading) | ||
|
||
# register internal callback to notify being a leader | ||
self._coordinator. \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove comments here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is special case for internal callback to notify current coordinator is selected as leader.
IMO the comment is required for maintainability.
" happen (which will not end well).", ran_for, | ||
ran_for - wait_until_next_beat) | ||
|
||
# check if coordinator is still a leader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment starts with uppercase letter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the comment
9804a91
3155865
to
9804a91
Compare
9804a91
to
e0ec7d0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What this PR does / why we need it:
The PR introduce leased based leader election for metric job scheduling in distributed performance metric collection framework
Which issue this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close that issue when PR gets merged): fixes #fixes #560
Special notes for your reviewer:
Release note: