Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

blink_features.usage with ranks and partitioned #39

Merged
merged 10 commits into from
Feb 25, 2025
Merged

Conversation

max-ostapenko
Copy link
Contributor

@max-ostapenko max-ostapenko commented Dec 21, 2024

  1. Added rank column.
    Resolves blink_features.usage has null rank column #25

  2. Removed the blink_features.features table, as it's a duplicate of a pages.features column.
    And now loading usage data directly to blink_features.usage partitioned table.

Table migration script:

CREATE TABLE `blink_features.usage_partitioned` (
  date DATE,
  client STRING,
  rank INT64,
  id STRING,
  feature STRING,
  type STRING,
  num_urls INT64,
  total_urls INT64,
  pct_urls FLOAT64,
  sample_urls ARRAY<STRING>
)
PARTITION BY date
CLUSTER BY client, rank, feature;

INSERT INTO `blink_features.usage_partitioned`
SELECT
  PARSE_DATE('%Y%m%d', yyyymmdd) AS date,
  client,
  NULL AS rank,
  id,
  feature,
  type,
  num_urls,
  total_urls,
  pct_urls,
  sample_urls
FROM `blink_features.usage`;

ALTER TABLE `blink_features.usage` RENAME TO `blink_features.usage_backup`;

CREATE TABLE `blink_features.usage`
COPY `blink_features.usage_partitioned`;

@max-ostapenko
Copy link
Contributor Author

@tunetheweb do you have edit access to the Looker report using this data?

@max-ostapenko max-ostapenko marked this pull request as draft December 21, 2024 16:42
@max-ostapenko max-ostapenko marked this pull request as ready for review December 21, 2024 17:05
@max-ostapenko max-ostapenko changed the title blink_features.usage partitioned blink_features.usage with ranks and partitioned Dec 21, 2024
@tunetheweb
Copy link
Member

@tunetheweb do you have edit access to the Looker report using this data?

Yes I do. You can see it here if you wanna clone it to try it against any changed data source.

@max-ostapenko
Copy link
Contributor Author

@tunetheweb do you have edit access to the Looker report using this data?
Yes I do. You can see it here if you wanna clone it to try it against any changed data source.

Data sources are not accessible - can't clone.

Do you want to migrate the table and adjust the report? Or I can do it if I have edit permission for the data sources and the dashboard.

@max-ostapenko
Copy link
Contributor Author

@tunetheweb, need your assistance to continue.
How do we update the dashboard?

@tunetheweb
Copy link
Member

Yeah on my todo list. Trying to get the reports finished first and then this is next thing.

@tunetheweb
Copy link
Member

This works but I wasted a lot of time figuring out why it wouldn't import into Looker Studio and it was due to the Partition filter: Required setting. You can't set this on a Looker Studio connection so I had to change it to a Custom SQL rather than a direct table.

I think the Partition filter: Required setting makes sense for the larger tables, where we want to avoid users running up large bills, but I don't think it's necessary here.

Can we recreate the table without that setting?

@max-ostapenko
Copy link
Contributor Author

max-ostapenko commented Feb 14, 2025

Updated the code.
But to clarify, I thought we'll continue using httparchive.blink_features.usage just with updated schema.

Let me know when I can replace it, or run a replacement script yourself:

ALTER TABLE `blink_features.usage` RENAME TO `blink_features.usage_backup`;

CREATE TABLE `blink_features.usage`
COPY `blink_features.usage_partitioned`;

@tunetheweb
Copy link
Member

But to clarify, I thought we'll continue using httparchive.blink_features.usage just with updated schema.

Agreed.

Let me know when I can replace it, or run a replacement script yourself:

Will run this when I'm back rather than change something and leave for holidays! Let's leave this PR open as a reminder and merge after crawl is finished and I'm back.

@tunetheweb
Copy link
Member

OK the chromestatus charts have been migrated over. Guess we can merge this then?

@max-ostapenko
Copy link
Contributor Author

@tunetheweb Thanks for recreating a table.
BTW I noticed it has 7k rows less, than a backup - expected?

@max-ostapenko max-ostapenko merged commit 42208c4 into main Feb 25, 2025
19 checks passed
@max-ostapenko max-ostapenko deleted the partition_usage branch February 25, 2025 20:56
@tunetheweb
Copy link
Member

BTW I noticed it has 7k rows less, than a backup - expected?

Oh good spot! That was Feb 2025 data. copied across now and row counts now match.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

blink_features.usage has null rank column
2 participants