Skip to content

Commit

Permalink
Merge remote-tracking branch 'refs/remotes/origin/hide-members-record…
Browse files Browse the repository at this point in the history
…s' into hide-members-records
  • Loading branch information
panh99 committed Jun 22, 2024
2 parents d24e4e0 + fb513ff commit d61da15
Show file tree
Hide file tree
Showing 22 changed files with 1,164 additions and 50 deletions.
3 changes: 3 additions & 0 deletions .github/workflows/datasets.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,9 @@ concurrency:
group: ${{ github.workflow }}-${{ github.ref == 'refs/heads/main' && github.run_id || github.event.pull_request.number || github.ref }}
cancel-in-progress: true

env:
FLWR_TELEMETRY_ENABLED: 0

defaults:
run:
working-directory: datasets
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -153,7 +153,7 @@ Other [examples](https://github.com/adap/flower/tree/main/examples):
- [Flower with KaplanMeierFitter from the lifelines library](https://github.com/adap/flower/tree/main/examples/federated-kaplan-meier-fitter)
- [Sample Level Privacy with Opacus](https://github.com/adap/flower/tree/main/examples/opacus)
- [Sample Level Privacy with TensorFlow-Privacy](https://github.com/adap/flower/tree/main/examples/tensorflow-privacy)
- [Flower with a Tabular Dataset] (https://github.com/adap/flower/tree/main/examples/fl-tabular)
- [Flower with a Tabular Dataset](https://github.com/adap/flower/tree/main/examples/fl-tabular)

## Community

Expand Down
4 changes: 4 additions & 0 deletions datasets/doc/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,11 @@ Information-oriented API reference and other reference material.

flwr_datasets

.. toctree::
:maxdepth: 1
:caption: Reference docs

ref-telemetry

Main features
-------------
Expand Down
66 changes: 66 additions & 0 deletions datasets/doc/source/ref-telemetry.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# Telemetry

The Flower Datasets open-source project collects **anonymous** usage metrics to make well-informed decisions to improve Flower Datasets. Doing this enables the Flower team to understand how Flower Datasets is used and what challenges users might face.

**Flower is a friendly framework for collaborative AI and data science.** Staying true to this statement, Flower makes it easy to disable telemetry for users that do not want to share anonymous usage metrics.

## Principles

We follow strong principles guarding anonymous usage metrics collection:

- **Optional:** You will always be able to disable telemetry; read on to learn “[How to opt-out](#how-to-opt-out)”.
- **Anonymous:** The reported usage metrics are anonymous and do not contain any personally identifiable information (PII). See “[Collected metrics](#collected-metrics)” to understand what metrics are being reported.
- **Transparent:** You can easily inspect what anonymous metrics are being reported; see the section “[How to inspect what is being reported](#how-to-inspect-what-is-being-reported)
- **Open for feedback:** You can always reach out to us if you have feedback; see the section “[How to contact us](#how-to-contact-us)” for details.

## How to opt-out

When Flower Datasets starts, it will check for an environment variable called `FLWR_TELEMETRY_ENABLED`. Telemetry can easily be disabled by setting `FLWR_TELEMETRY_ENABLED=0`. Assuming you are using Flower Datasets in a Flower server or client, simply do so by prepending your command as in:

```bash
FLWR_TELEMETRY_ENABLED=0 python server.py # or client.py
```

Alternatively, you can export `FLWR_TELEMETRY_ENABLED=0` in, for example, `.bashrc` (or whatever configuration file applies to your environment) to disable Flower Datasets telemetry permanently.

## Collected metrics

Flower telemetry collects the following metrics:

**Flower version.** Understand which versions of Flower Datasets are currently being used. This helps us to decide whether we should invest effort into releasing a patch version for an older version of Flower Datasets or instead use the bandwidth to build new features.

**Operating system.** Enables us to answer questions such as: *Should we create more guides for Linux, macOS, or Windows?*

**Python version.** Knowing the Python version helps us, for example, to decide whether we should invest effort into supporting old versions of Python or stop supporting them and start taking advantage of new Python features.

**Hardware properties.** Understanding the hardware environment that Flower Datasets is being used in helps to decide whether we should, for example, put more effort into supporting low-resource environments.

**Dataset and Partitioners names.** Knowing what datasets and Partitioners are used enables us to provide more detailed code examples and tutorials and better prioritize work on development and support for them.

**Cluster.** Flower telemetry assigns a random in-memory cluster ID each time a Flower workload starts. This allows us to understand which device types not only start Flower workloads but also successfully complete them.

**Source.** Flower telemetry tries to store a random source ID in `~/.flwr/source` the first time a telemetry event is generated. The source ID is important to identify whether an issue is recurring or whether an issue is triggered by multiple clusters running concurrently (which often happens in simulation). For example, if a device runs multiple workloads at the same time, and this results in an issue, then, in order to reproduce the issue, multiple workloads must be started at the same time.

You may delete the source ID at any time. If you wish for all events logged under a specific source ID to be deleted, you can send a deletion request mentioning the source ID to `[email protected]`. All events related to that source ID will then be permanently deleted.

We will not collect any personally identifiable information. If you think any of the metrics collected could be misused in any way, please [get in touch with us](#how-to-contact-us). We will update this page to reflect any changes to the metrics collected and publish changes in the changelog.

If you think other metrics would be helpful for us to better guide our decisions, please let us know! We will carefully review them; if we are confident that they do not compromise user privacy, we may add them.

## How to inspect what is being reported

We wanted to make it very easy for you to inspect what anonymous usage metrics are reported. You can view all the reported telemetry information by setting the environment variable `FLWR_TELEMETRY_LOGGING=1`. Logging is disabled by default. You may use logging independently from `FLWR_TELEMETRY_ENABLED` so that you can inspect the telemetry feature without sending any metrics.

```bash
FLWR_TELEMETRY_LOGGING=1 python server.py # or client.py
```

The inspect Flower telemetry without sending any anonymous usage metrics, use both environment variables:

```bash
FLWR_TELEMETRY_ENABLED=0 FLWR_TELEMETRY_LOGGING=1 python server.py # or client.py
```

## How to contact us

We want to hear from you. If you have any feedback or ideas on how to improve the way we handle anonymous usage metrics, reach out to us via [Slack](https://flower.ai/join-slack/) (channel `#telemetry`) or email (`[email protected]`).
9 changes: 9 additions & 0 deletions datasets/flwr_datasets/common/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,12 @@
# limitations under the License.
# ==============================================================================
"""Common components in Flower Datasets."""


from .telemetry import EventType as EventType
from .telemetry import event as event

__all__ = [
"EventType",
"event",
]
224 changes: 224 additions & 0 deletions datasets/flwr_datasets/common/telemetry.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,224 @@
# Copyright 2023 Flower Labs GmbH. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Flower telemetry."""


import datetime
import json
import logging
import os
import platform
import urllib.request
import uuid
from concurrent.futures import Future, ThreadPoolExecutor
from enum import Enum, auto
from pathlib import Path
from typing import Any, Dict, List, Optional, Union, cast

from flwr_datasets.common.version import package_name, package_version

FLWR_TELEMETRY_ENABLED = os.getenv("FLWR_TELEMETRY_ENABLED", "1")
FLWR_TELEMETRY_LOGGING = os.getenv("FLWR_TELEMETRY_LOGGING", "0")

TELEMETRY_EVENTS_URL = "https://telemetry.flower.ai/api/v1/event"

LOGGER_NAME = "flwr-datasets-telemetry"
LOGGER_LEVEL = logging.DEBUG


def _configure_logger(log_level: int) -> None:
console_handler = logging.StreamHandler()
console_handler.setLevel(log_level)
console_handler.setFormatter(
logging.Formatter(
"%(levelname)s %(name)s %(asctime)s | %(filename)s:%(lineno)d | %(message)s"
)
)

logger = logging.getLogger(LOGGER_NAME)
logger.setLevel(log_level)
logger.addHandler(console_handler)


_configure_logger(LOGGER_LEVEL)


def log(msg: Union[str, Exception]) -> None:
"""Log message using logger at DEBUG level."""
logging.getLogger(LOGGER_NAME).log(LOGGER_LEVEL, msg)


def _get_home() -> Path:
return Path().home()


def _get_source_id() -> str:
"""Get existing or new source ID."""
source_id = "unavailable"
# Check if .flwr in home exists
try:
home = _get_home()
except RuntimeError:
# If the home directory can’t be resolved, RuntimeError is raised.
return source_id

flwr_dir = home.joinpath(".flwr")
# Create .flwr directory if it does not exist yet.
try:
flwr_dir.mkdir(parents=True, exist_ok=True)
except PermissionError:
return source_id

source_file = flwr_dir.joinpath("source")

# If no source_file exists create one and write it
if not source_file.exists():
try:
source_file.touch(exist_ok=True)
source_file.write_text(str(uuid.uuid4()), encoding="utf-8")
except PermissionError:
return source_id

source_id = source_file.read_text(encoding="utf-8").strip()

try:
uuid.UUID(source_id)
except ValueError:
source_id = "invalid"

return source_id


# Using str as first base type to make it JSON serializable as
# otherwise the following exception will be thrown when serializing
# the event dict:
# TypeError: Object of type EventType is not JSON serializable
class EventType(str, Enum):
"""Types of telemetry events."""

# This method combined with auto() will set the property value to
# the property name e.g.
# `START_CLIENT = auto()` becomes `START_CLIENT = "START_CLIENT"`
# The type signature is not compatible with mypy, pylint and flake8
# so each of those needs to be disabled for this line.
# pylint: disable-next=no-self-argument,arguments-differ,line-too-long
def _generate_next_value_(name: str, start: int, count: int, last_values: List[Any]) -> Any: # type: ignore # noqa: E501
return name

PING = auto()

LOAD_PARTITION_CALLED = auto()
LOAD_SPLIT_CALLED = auto()
PLOT_LABEL_DISTRIBUTION_CALLED = auto()
PLOT_COMPARISON_LABEL_DISTRIBUTION_CALLED = auto()


# Use the ThreadPoolExecutor with max_workers=1 to have a queue
# and also ensure that telemetry calls are not blocking.
state: Dict[str, Union[Optional[str], Optional[ThreadPoolExecutor]]] = {
# Will be assigned ThreadPoolExecutor(max_workers=1)
# in event() the first time it's required
"executor": None,
"source": None,
"cluster": None,
}


# In Python 3.7 pylint will throw an error stating that
# "Value 'Future' is unsubscriptable".
# This pylint disable line can be remove when dropping support
# for Python 3.7
# pylint: disable-next=unsubscriptable-object
def event(
event_type: EventType,
event_details: Optional[Dict[str, Any]] = None,
) -> Future: # type: ignore
"""Submit create_event to ThreadPoolExecutor to avoid blocking."""
if state["executor"] is None:
state["executor"] = ThreadPoolExecutor(max_workers=1)

executor: ThreadPoolExecutor = cast(ThreadPoolExecutor, state["executor"])

result = executor.submit(create_event, event_type, event_details)
return result


def create_event(event_type: EventType, event_details: Optional[Dict[str, Any]]) -> str:
"""Create telemetry event."""
if state["source"] is None:
state["source"] = _get_source_id()

if state["cluster"] is None:
state["cluster"] = str(uuid.uuid4())

if event_details is None:
event_details = {}

date = datetime.datetime.now(tz=datetime.timezone.utc).isoformat()
context = {
"source": state["source"],
"cluster": state["cluster"],
"date": date,
"package": {
"package_name": package_name,
"package_version": package_version,
},
"hw": {
"cpu_count": os.cpu_count(),
},
"platform": {
"system": platform.system(),
"release": platform.release(),
"platform": platform.platform(),
"python_implementation": platform.python_implementation(),
"python_version": platform.python_version(),
"machine": platform.machine(),
"architecture": platform.architecture(),
"version": platform.uname().version,
},
}
payload = {
"event_type": event_type,
"event_details": event_details,
"context": context,
}
payload_json = json.dumps(payload)
if FLWR_TELEMETRY_LOGGING == "1":
log(" - ".join([date, "POST", payload_json]))

# If telemetry is not disabled with setting FLWR_TELEMETRY_ENABLED=0
# create a request and send it to the telemetry backend
if FLWR_TELEMETRY_ENABLED == "1":
request = urllib.request.Request(
url=TELEMETRY_EVENTS_URL,
data=payload_json.encode("utf-8"),
headers={
"User-Agent": f"{package_name}/{package_version}",
"Content-Type": "application/json",
},
method="POST",
)
try:
with urllib.request.urlopen(request, timeout=60) as response:
result = response.read()

response_json: str = result.decode("utf-8")

return response_json
except urllib.error.URLError as ex:
if FLWR_TELEMETRY_LOGGING == "1":
log(ex)

return "disabled"
Loading

0 comments on commit d61da15

Please sign in to comment.