Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bundle Analysis: associate past assets to current parsed bundle #231

Merged
merged 7 commits into from
Jun 3, 2024

Conversation

JerrySentry
Copy link
Contributor

@JerrySentry JerrySentry commented May 29, 2024

Asset name changes from one bundler build to another across commits, therefore we don't know if the current asset is a new one or a continuation of the previous one. This PR adds a function to offer a heuristic to associate the current asset to an asset from the previous commit.

The rules are that if the hashed asset name exists in the previous bundle report then this is considered the same asset. Similarly if all the modules of the asset are the same as any in the previous bundle report it is also considered the same asset. We track assets through a generated UUID, when 2 assets are considered associated then they will have the same UUID.

This mechanism is important for providing analytics asset size trends throughout the course of its existence. This component will be implemented in the coming iterations.

Legal Boilerplate

Look, I get it. The entity doing business as "Sentry" was incorporated in the State of Delaware in 2015 as Functional Software, Inc. In 2022 this entity acquired Codecov and as result Sentry is going to need some rights from me in order to utilize my contributions in this PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Sentry can use, modify, copy, and redistribute my contributions, under Sentry's choice of terms.

@JerrySentry JerrySentry marked this pull request as ready for review May 29, 2024 19:13
Copy link

codecov bot commented May 29, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.48%. Comparing base (bef1d46) to head (458a941).
Report is 3 commits behind head on main.

Current head 458a941 differs from pull request most recent head 31ab2ca

Please upload reports for the commit 31ab2ca to get more accurate results.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #231      +/-   ##
==========================================
- Coverage   89.52%   89.48%   -0.04%     
==========================================
  Files         328      324       -4     
  Lines       10480    10375     -105     
  Branches     1915     1904      -11     
==========================================
- Hits         9382     9284      -98     
+ Misses       1025     1020       -5     
+ Partials       73       71       -2     
Flag Coverage Δ
shared-docker-uploader ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@giovanni-guidini giovanni-guidini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally think that associate_previous_assets is quite complex as it is. I'd encourage you to refactor it by breaking up parts of the big loop into helper functions.
Easier to read and to test. But the comments and docstrings are quite helpful, thanks for that.

Otherwise LGTM... left some other comments.

def associate_previous_assets(self, prev_bundle_analysis_report: Any) -> None:
"""
Note: prev_bundle_analysis_report is of type BundleAnalysisReport,
typing.Self is not available in 3.10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can write "BundleAnalysisReport" and still have it identified (with the doublequotes) (I've seen that somewhere in the code).
In any case shared is at least 3.11, so maybe it's available there? (see

python_requires=">=3.11",
)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh interesting I didn't know about the double quote trick!

shared/bundle_analysis/report.py Show resolved Hide resolved


def test_asset_association():
try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have a try: without an except?

"to have the finally; block"
but what's the difference of having it at the end of the try block?

Copy link
Contributor Author

@JerrySentry JerrySentry Jun 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part basically simulates what the worker does when it processes a bundle file, except on work the except part actually handles errors like doing retrying before it enters finally. With the tests there's really nothing to do when an exception occurs, so just skip that and do finally. Otherwise the code will be

try:
  # do things
  cleanup()
except:
  cleanup()

@JerrySentry
Copy link
Contributor Author

I personally think that associate_previous_assets is quite complex as it is.

Separated out the 2 rules as 2 separate helper functions

@@ -170,6 +171,104 @@ def ingest(self, path: str) -> int:
self.db_session.commit()
return session_id

def _associate_bundle_report_assets_by_name(
self, curr_bundle_report, prev_bundle_report
) -> Set:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[very nit] You can be more explicit by saying Set[Tuple] or even more Set[Tuple[str, str]]. Starts to get a bit wild fast but possible. I personally think Set[Tuple] is an improvement over Set.

@@ -170,6 +171,104 @@ def ingest(self, path: str) -> int:
self.db_session.commit()
return session_id

def _associate_bundle_report_assets_by_name(
self, curr_bundle_report, prev_bundle_report
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The args don't have a known type?

associated_assets_found = set()

# Rule 1 check
associated_assets_found |= (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even the code is going "|=" (i.e. 😐)
(Jokes aside I do think this is an improvement, thanks)

@JerrySentry JerrySentry added this pull request to the merge queue Jun 3, 2024
Merged via the queue into main with commit 57e53e8 Jun 3, 2024
6 checks passed
@JerrySentry JerrySentry deleted the may_29_ba_asset_assoc branch June 3, 2024 16:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants