Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add feature to filter by date in users analysis table #1010

Merged
merged 14 commits into from
Mar 21, 2024

Conversation

benzuzu
Copy link
Member

@benzuzu benzuzu commented Aug 2, 2023

We can now select time range in the users tab of the usage analysis page. Most importantly, this allows us to see recent usage, rather than just cumulative.

This was done by fetching dayUsage and monthUsage from the backend. It does take a bit longer, but the difference is not actually too significant. The longest part is actually fetching the files themselves from S3, not parsing through. Some JHipster analysis may be good.

this fixes https://github.com/oncokb/oncokb-pipeline/issues/169

Some other small changes:

  • "Top Users" and "All Users" filter is removed. This doesn't change render/fetch time, so there is really not much purpose.
  • Most used endpoint also includes proportion of all usage.
  • Default sort is reset when returning to a specific time range

Some ideas for future:

  • The UsageText component we've made has the limits hardcoded. I think it could be good to accept a string for type of usage (daily, monthly, all-time), and adjust the color limits accordingly (1 mil in 1 day definitely more concerning than 1 mil all time).
  • Month usage can be slightly unclear, especially at the beginning of months. The usage is very small as it is only from a couple of days, but we don't display this context anywhere (e.g., all months just have format "2023-08"). Also, the s3 summary files are at least 1-2 days behind based on our cronjob (potentially more if cronjob fails), so using the current date is not always accurate. It may be nice to add an (x days) text next to months that are not fully analyzed, or maybe even all months.
  • JHipster runtime analysis to potentially speed up the fetching/rendering.

@benzuzu
Copy link
Member Author

benzuzu commented Aug 2, 2023

Regarding clearing up the confusion on S3 summary files and our backend methods:

S3 currently stores 4 different types of files:

Monthly Usage
Yearly Usage Monthly Resource Yearly Resource
Fmt <user>:
  month:
   <resource>: <amt>
   <resource>: <amt>
  day:
   <day>:
    <resource>: <amt>
<user>: ...
<user>:
  year:
   <resource>: <amt>
   <resource>: <amt>
  month:
   <month>:
    <resource>: <amt>
<user>: ...
month:
  <resource>: <amt>
  <resource>: <amt>
day:
  <day>:
   <resource>: <amt>
  <day>:...
year:
  <resource>: <amt>
  <resource>: <amt>
month:
  <month>:
   <resource>: <amt>
  <month>:...
Info This pretty much contains everything (specific resource usage, grouped by day). Can manipulate to create all other files. We now use this for all user-based tables. More general version of Monthly Usage. Used for summarizing cumulative usage, as the year data is very convenient for that, specifically in the all time usage tables. Contains usage amounts for each resource (grouped by month and day), but no user data. Currently not used at all. Can consider removing this. Contains usage amounts for each resource (grouped by year and month), but no user data. Currently used for resource pages.

So, I think the somewhat significant downfall is that our only useful day-grouped data is in the Monthly Usage file. Consequently, regardless of our use case, if we want to get user-specific, day-grouped data we must parse through the massive monthly usage files (12 if we want past year). However, it seems we haven't reached too big of a user base, so the fetching of the file from S3 still contributes more to the processing time than the backend parsing (but not by much, both are nontrivial).

Definitely something to look into regardless, maybe different formatting of our summary files to have more versatility and efficiency. Specifically, maybe we can look at a <user>: <amt> format instead of a <resource>: <amt> format for the monthly resource. This has much more utility as we can group resource usage by user and date.

@benzuzu benzuzu requested a review from zhx828 August 2, 2023 15:04
@benzuzu benzuzu force-pushed the usage-analysis-group-by-date branch 2 times, most recently from 6271a94 to 0e2f67e Compare August 4, 2023 20:08
@zhx828
Copy link
Member

zhx828 commented Aug 10, 2023

@benzuzu thanks for the detailed comments. I agree doing <user>: <amt> format instead of a <resource>: <amt> seems more reasonable. But we might add annotate/bySample, annotate/clinicalTrials in the future which makes latter more attractive structure. Let's revisit in a year when we have more endpoints available.

this.filterToggled = value;
}

@computed get calculateData(): UserOverviewUsage[] | UsageRecord[] {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather for you to calculate the data separately and render two tables so the code looks more organized. Attache a key to differentiate. Any performance concern here?

if (this.timeTypeToggleValue === ToggleValue.RESULTS_BY_MONTH) {
this.props.data.forEach((userOverviewUsage: UserOverviewUsage) => {
for (const curMonth in userOverviewUsage.monthUsage) {
if (Object.prototype.hasOwnProperty.call(userOverviewUsage.monthUsage, curMonth)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by the usage of the condition here. Could you elaborate?

Header: usageHeader,
accessor: 'totalUsage',
Cell(props: { original: UserOverviewUsage }) {
return <UsageText usage={props.original.totalUsage ? props.original.totalUsage : 0} />;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think as a primitive data type, they all have default values. Why we are checking here?

@benzuzu benzuzu requested a review from zhx828 August 15, 2023 14:59
Copy link
Member

@zhx828 zhx828 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • The icon under Details is not clickable except for user public_website@localhost. But it should be enabled for all.
  • Can calendar tooltip be closed if use clicks anywhere on the page?
  • Could you add a link on the email to user page? It should apply to both Users tab and resource page Example : oncokb.org/admin/usage-analysis/resources/!api!v1!annotate!mutations!byProteinChange

@benzuzu
Copy link
Member Author

benzuzu commented Aug 21, 2023

  • The icon under Details is not clickable except for user public_website@localhost. But it should be enabled for all.
  • Can calendar tooltip be closed if use clicks anywhere on the page?
  • Could you add a link on the email to user page? It should apply to both Users tab and resource page Example : oncokb.org/admin/usage-analysis/resources/!api!v1!annotate!mutations!byProteinChange

Thanks for the review @zhx828. Regarding the first item, I'm pretty sure this is just because we are using the fake data to run locally. The backend searches the database for userid since we do not store in s3. Should be all good in production.

I'll work on the two frontend components you had comments on now.

@benzuzu benzuzu force-pushed the usage-analysis-group-by-date branch from d5478e5 to 87f25c7 Compare March 18, 2024 09:04
benzuzu added 5 commits March 18, 2024 02:05
Some logic was wrong in the default sorting rules. Also, the day and month usage hashmap does not have userId stored, so this fix adds an optional property that can be used for the info icon link.
Render two table instead of one based on data type (date grouping)
@benzuzu benzuzu force-pushed the usage-analysis-group-by-date branch from 87f25c7 to dec2980 Compare March 18, 2024 09:06
@benzuzu benzuzu requested a review from zhx828 March 20, 2024 18:37
@benzuzu benzuzu added enhancement Enhancement tag for release Ready to merge labels Mar 20, 2024
@zhx828
Copy link
Member

zhx828 commented Mar 20, 2024

@benzuzu for the same user on the same day, why there are two entries?
Screenshot 2024-03-20 at 3 49 29 PM

@benzuzu
Copy link
Member Author

benzuzu commented Mar 20, 2024

@benzuzu for the same user on the same day, why there are two entries? Screenshot 2024-03-20 at 3 49 29 PM

@zhx828 thanks for catching. all the users in fake data had the same email. just fixed

@@ -104,7 +104,7 @@ export const SHORTEN_TEXT_FROM_LIST_THRESHOLD = 5;
export const TOKEN_ABOUT_2_EXPIRE_NOTICE_IN_DAYS = 14;
export const USAGE_TOP_USERS_LIMIT = 10000;
export const USAGE_ALL_TIME_KEY = 'All';
export const USAGE_ALL_TIME_VALUE = 'This year';
export const USAGE_ALL_TIME_VALUE = 'Past Year';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I see Past Year, I think about 2023. But the stats is for 2024. Or am I using Past Year wrong?

Copy link
Member Author

@benzuzu benzuzu Mar 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screenshot 2024-03-20 at 4 35 41 PM

It does the past year, i.e. one year back from today. Is this confusing/not what we wanted?

maybe "last 12 months" is more clear - thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benzuzu oh I see. Last 12 Months is definitely better.

Copy link
Member Author

@benzuzu benzuzu Mar 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhx828 Should we also change By Year tab on the top to

  • All Time (Last 12 Months)
  • Last 12 Months
  • All (Last 12 Months)
  • All Time

or does By Year make sense? I think one of the first 3 makes more sense

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benzuzu I'm a bit confused. Looking at the stats for the user, it's past 12 month. But for resource, it's from the starting of the year to current?

Copy link
Member Author

@benzuzu benzuzu Mar 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhx828 ah, sorry. I was wrong - it's been a bit since I wrote the logic. Both the user and resource stats are beginning of year to current (not last 12 months). This is because we store the analysis in yearly files, not constantly updating to last 12 months. If I remember correctly, we decided to change backed logic in userOverviewUsageGet specifically so that we could view specific user usage for the past year (not just year to date). I've changed the tab now below:

Screenshot 2024-03-20 at 5 52 15 PM Screenshot 2024-03-20 at 5 52 22 PM

I can change the backend logic for resourceUsageGet as well to add all 12 months, but I'm not sure if it is worth the trouble.

@zhx828
Copy link
Member

zhx828 commented Mar 21, 2024

@benzuzu looks good. Let me know once this is ready to be merged.

@benzuzu
Copy link
Member Author

benzuzu commented Mar 21, 2024

@benzuzu looks good. Let me know once this is ready to be merged.

@zhx828 good to go now. Thanks

@zhx828 zhx828 merged commit 4ea7581 into oncokb:master Mar 21, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement tag for release Ready to merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants