Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metrics module to DeepVariant Runner #19

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ RUN curl -L -o /usr/bin/kubectl https://storage.googleapis.com/kubernetes-releas
ADD LICENSE /
ADD gcp_deepvariant_runner.py /opt/deepvariant_runner/src/
ADD gke_cluster.py /opt/deepvariant_runner/src/
ADD metrics.py /opt/deepvariant_runner/src/
ADD process_util.py /opt/deepvariant_runner/src/
ADD run_and_verify.sh /opt/deepvariant_runner/bin/
ADD cancel /opt/deepvariant_runner/bin/
Expand Down
185 changes: 185 additions & 0 deletions metrics.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,185 @@
# Copyright 2019 Google LLC.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
#
# 1. Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
#
# 2. Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
#
# 3. Neither the name of the copyright holder nor the names of its
# contributors may be used to endorse or promote products derived from this
# software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
# POSSIBILITY OF SUCH DAMAGE.
"""Used to collect anonymous DeepVariant metrics."""

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import atexit
import functools
import json
import logging
import time
import uuid

import requests

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please keep it sorted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving requests before time causes a lint error. I think because requests is considered a 3rd party module.
I added a new line before requests to make it clear it's in a new 'list'.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see! Yes, for 3rd party module we add them in a separate group. And a blank line between each group is the right way to go. Can you move from typing import Dict, Optional to the first group then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving that line to the first group caused a lint error, why do you think typing is not third party module?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's interesting. We don't have typing in the setup.py file (in Variant Transforms), so it should be python build in. I don't know whether it is different for Python 3. But if the lint complains, let's follow that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Allie for your comments and attention to details.

from typing import Dict, Optional, Text

_CLEARCUT_ENDPOINT = 'https://play.googleapis.com/log'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optional, but, in general, constants that are (1) only used once and (2) not expected to be tweaked could just be inlined.

So, for example, we are not likely to change anything in this list except maybe the request timeout

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess another benefit of having all these constant values here is the ease of finding them instead of scattered at different parts of code.
So if you don't mind I am going to keep them as is.

_CLOUD_HCLS_OSS = 'CLOUD_HCLS_OSS'
_CONCORD = 'CONCORD'
_DEEP_VARIANT_RUN = 'DeepVariantRun'
_HTTP_REQUEST_TIMEOUT_SEC = 10
_PYTHON = 'PYTHON'
_VIRTUAL_HCLS_DEEPVARIANT = 'virtual.hcls.deepvariant'


def capture_exceptions(func):
"""Function decorator to capture and log any exceptions."""

@functools.wraps(func)
def wrapper(*args, **kwds):
try:
return func(*args, **kwds)
# pylint:disable=broad-except
except Exception as e:
logging.error('Exception captured in %s : %s', func.__name__, e)

return wrapper


class _ConcordEvent(object):
"""Encapsulates information representing a Concord event."""

def __init__(self,
event_name: Text,
event_type: Text,
project_number: int,
console_type: Text,
page_hostname: Text,
event_metadata: Optional[Dict[Text, Text]] = None) -> None:
self._event_name = event_name
self._event_type = event_type
self._project_number = project_number
self._console_type = console_type
self._page_hostname = page_hostname
self._event_metadata = event_metadata or {}

def to_json(self, **kwargs):
"""Encodes data in json."""
event_dict = {
'project_number': str(self._project_number),
'event_name': self._event_name,
'event_type': self._event_type,
'console_type': self._console_type,
'page_hostname': self._page_hostname,
'event_metadata': self._event_metadata_as_kv(),
}
return json.dumps(event_dict, **kwargs)

def _event_metadata_as_kv(self):
kv_list = []
for k, v in sorted(self._event_metadata.items()):
kv_list.append({'key': k, 'value': str(v)})

return kv_list


class _MetricsCollector(object):
"""A class that collects and submits metrics.

Instances of this class share the same internal state, and thus behave the
same all the time.
"""
_events = []
_session_identifier = uuid.uuid4().hex

def add_metrics(self, project_number: int,
metrics_name: Text, **metrics_kw: Text) -> None:
concord_event = _ConcordEvent(
event_name=metrics_name,
event_type=_DEEP_VARIANT_RUN,
project_number=project_number,
console_type=_CLOUD_HCLS_OSS,
page_hostname=_VIRTUAL_HCLS_DEEPVARIANT,
event_metadata={k: v for k, v in metrics_kw.items()})
self._events.append(concord_event)

def submit_metrics(self):
"""Submits all the collected metrics to Concord endpoint.

Raises:
HTTPError if http request doesn't succeed (status code != 200).
"""
request_data = json.dumps(self._clearcut_request(), sort_keys=True)
requests.post(
url=_CLEARCUT_ENDPOINT,
data=request_data,
headers=None,
timeout=_HTTP_REQUEST_TIMEOUT_SEC).raise_for_status()

def _clearcut_request(self):
# We dont have (or want to have) any cookies. So, using a random ID for
# zwieback_cookie is ok for tracking purposes.
return {
'client_info': {
'client_type': _PYTHON,
},
'log_source_name':
_CONCORD,
'zwieback_cookie':
self._session_identifier,
'request_time_ms':
_now_ms(),
'log_event': [{
'source_extension_json': e.to_json(sort_keys=True)
} for e in self._events]
}


def _now_ms():
"""Returns current time in milliseconds."""
return int(round(time.time() * 1000))


def add(project_number: int, metrics_name: Text, **metrics_kw: Text) -> None:
"""Adds the given metric to the metrics to be submitted to Concord.

Note: All metrics are submitted at exit.
Note: Do not rely on thread safety of this method.

Args:
project_number(int): GCP project number.
allieychen marked this conversation as resolved.
Show resolved Hide resolved
metrics_name(str): metrics name.
**metrics_kw: key-values of metrics. For example, for
metrics_name="MakeExamplesSuccess", metrics_kw can be
duration_seconds=1000, wait_duration_seconds=100.
"""
metrics_collector = _MetricsCollector()
metrics_collector.add_metrics(project_number, metrics_name, **metrics_kw)


# Exceptions are captured and logged to avoid crashing callers.
@capture_exceptions
@atexit.register
def shutdown():
"""Reports all metrics that were collected."""
metrics_collector = _MetricsCollector()
metrics_collector.submit_metrics()
Loading