-
Notifications
You must be signed in to change notification settings - Fork 88
Querying ClickHouse database for fun and profit
ClickHouse is an open-source column-oriented relational database that PyTorch Dev Infra team is using to store all open source data from PyTorch-org including GitHub events, test stats, benchmark results, and many more things. The database is hosted on ClickHouse Cloud and that's also where you can login and start querying the data.
Skip this part if you already have access to PyTorch Dev Infra ClickHouse cluster on https://console.clickhouse.cloud/
For metamates, goto https://console.clickhouse.cloud/ and login with your Meta email. The portal uses SSO, so you just need to follow the step on your browser to request access. We grant read-only access by default.
Note that propagating the permission takes sometime from half an hour to an hour. So, you can go grab a coffee if you like.
The list of all databases and tables on CH is at https://github.com/pytorch/test-infra/wiki/Available-databases-on-ClickHouse. If you are looking for more, please take a look at https://github.com/pytorch/test-infra/wiki/How-to-add-a-new-custom-table-on-ClickHouse and reach out to us (poc @clee2000 @huydhn) to chat about your new use cases.
After logging in, you should see our Dev Infra CH service running there, let's go there.
The SQL console and the two databases default
and benchmark
are probably where you first want to go to. By selecting the database, you can start writing SQL queries there.
With read-only permission, you will only be able to write SELECT
queries there. However, if you want to experiment with write queries such as CREATE TABLE, INSERT INTO, you can switch to the fortesting
database which grants write permission to everyone.
Here is a fun example query to see how long you wait for signals on your PR: