Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should the system leave out the great_expectations anonymous_usage_statistics identifiers? #34

Open
MattTriano opened this issue Jan 2, 2023 · 1 comment

Comments

@MattTriano
Copy link
Owner

Per this GE docs page, the great_expectations team added a bit of code to enable them to track usage of their code, which can be disabled in the great_expectations.yml file. That page advises there's more information on a blog post from 2020, but the given link is dead. Still, per the wayback machine that post, the GE team states

"We do not track credentials, validation results, or arguments passed to Expectations. We consider these private, and frankly none of our business. User-created names are always hashed, to create a longitudinal record without leaking any private information. We track types of Expectations, to understand which are most useful to the community."

This is very reasonable and I'm keen to provide the GE team with information that helps them figure out what features are worth working on. However, as my project is intended to be both a specific project but also a platform that other people can fork and make their own pipelines for (but from the traffic page, I see people are mainly cloning the repo without forking), I don't know if I should strip out the UUID as it would produce a polluted longitudinal record.

So I should experiment with stripping out this UUID (both in /great_expectations/expectations/.ge_store_backend_id and .../great_expectations.yml files; per grep, all other appearances of the UUID are in the /.uncommitted/ dir) and see if anything complains when I run checkpoints.

@MattTriano
Copy link
Owner Author

After watching this video from one of the leading Rust-lang evangelists about a Go-lang improvement plan to use this kind of anonymous telemetry, I think it's unambiguously good to provide this kind of telemetry info back to the great_expectations maintainers, but I also don't want to send them polluted signals (by having potentially many different systems sending back telemetry info with the same not-really-UUID (IIRC, Universally Unique IDentifier)).

As those goals are in conflict, I'll have to weigh what I think is better (sending possibly diluted/polluted feedback or sending no feedback).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant