Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize space usage of ExplorationReport before saving #279

Merged
merged 5 commits into from
Jan 8, 2025

Conversation

zjgemi
Copy link
Collaborator

@zjgemi zjgemi commented Jan 7, 2025

Summary by CodeRabbit

  • New Features

    • Added no_candidate() method across multiple exploration report classes to check candidate availability.
    • Enhanced get_candidate_ids() method with optional clear parameter for memory management.
  • Improvements

    • Optimized ratio calculations in exploration report classes.
    • Introduced more efficient state tracking for candidate configurations.
  • Technical Updates

    • Updated method signatures in exploration report classes to include new parameters.
    • Refined candidate selection and reporting mechanisms.

Copy link

coderabbitai bot commented Jan 7, 2025

📝 Walkthrough

Walkthrough

The pull request introduces modifications to several exploration report classes across the dpgen2/exploration/report/ directory. Key changes include the addition of a new abstract method no_candidate() to the ExplorationReport class, the introduction of new attributes for tracking candidate states in the ExplorationReportAdaptiveLower and ExplorationReportTrustLevels classes, and updates to the get_candidate_ids() method signatures to include an optional clear parameter for memory management. These changes enhance the flexibility and efficiency of exploration reporting mechanisms in the codebase.

Changes

File Change Summary
dpgen2/exploration/report/report.py Added abstract method no_candidate() to ExplorationReport class
dpgen2/exploration/report/report_adaptive_lower.py - Added new attributes: _no_candidate, _failed_ratio, _accurate_ratio, _candidate_ratio
- Updated record() method to calculate new attributes
- Added no_candidate() method
- Modified get_candidate_ids() to include clear parameter
dpgen2/exploration/report/report_trust_levels_base.py - Added new private attributes: _no_candidate, _failed_ratio, _accurate_ratio, _candidate_ratio
- Updated record() method to compute new attributes
- Added no_candidate() method
dpgen2/exploration/report/report_trust_levels_max.py - Updated get_candidate_ids() method to include clear parameter
- Added type assertion in converged() method
dpgen2/exploration/report/report_trust_levels_random.py - Updated get_candidate_ids() method to include clear parameter
- Added type assertion in converged() method
tests/exploration/test_report_adaptive_lower.py Updated test cases to accommodate new get_candidate_ids() method signature

Sequence Diagram

sequenceDiagram
    participant Report as ExplorationReport
    participant SubReport as SubclassReport
    
    Report->>Report: Define abstract no_candidate()
    SubReport->>SubReport: Implement no_candidate()
    SubReport->>SubReport: Calculate candidate state
    SubReport-->>Report: Return boolean indicating candidates
Loading

This sequence diagram illustrates the new no_candidate() method's role in the exploration report hierarchy, showing how subclasses implement the method to determine their candidate state.


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between af8ee2f and 48a5c47.

📒 Files selected for processing (1)
  • dpgen2/exploration/report/report_trust_levels_max.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • dpgen2/exploration/report/report_trust_levels_max.py

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
dpgen2/exploration/report/report_trust_levels_random.py (1)

49-49: Good memory optimization with clear parameter!

The addition of the clear parameter allows for better memory management. However, the comment about single usage should be moved to the method's docstring for better visibility.

Consider moving the comment to the method's docstring:

    def get_candidate_ids(
        self,
        max_nframes: Optional[int] = None,
        clear: bool = True,
    ) -> List[List[int]]:
+        """Get indexes of candidate configurations
+
+        Note: This method should only be called once as it may clear memory when clear=True.
+
+        Parameters
+        ----------
+        max_nframes : Optional[int], optional
+            The maximal number of frames of candidates.
+        clear : bool, optional
+            Whether to free memory after getting candidates, by default True
+
+        Returns
+        -------
+        List[List[int]]
+            The frame indices of candidate configurations.
+        """
        ntraj = len(self.traj_nframes)
        id_cand = self._get_candidates(max_nframes)
        id_cand_list = [[] for ii in range(ntraj)]
        for ii in id_cand:
            id_cand_list[ii[0]].append(ii[1])
-        # free the memory, this method should only be called once
        if clear:
            self.clear()
        return id_cand_list

Also applies to: 56-58

dpgen2/exploration/report/report_trust_levels_max.py (1)

49-49: Good memory optimization with clear parameter!

The addition of the clear parameter allows for better memory management. However, the comment about single usage should be moved to the method's docstring for better visibility.

Consider moving the comment to the method's docstring, similar to the suggestion for report_trust_levels_random.py.

Also applies to: 56-58

dpgen2/exploration/report/report_adaptive_lower.py (1)

377-386: Add parameter documentation for better clarity.

The clear parameter's purpose and behavior should be documented in the method's docstring.

     def get_candidate_ids(
         self,
         max_nframes: Optional[int] = None,
         clear: bool = True,
     ) -> List[List[int]]:
+        """
+        Parameters
+        ----------
+        max_nframes : Optional[int]
+            The maximal number of frames to return
+        clear : bool
+            If True, frees memory by clearing internal state after retrieving candidates
+        """
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fc72c85 and 2bcb683.

📒 Files selected for processing (6)
  • dpgen2/exploration/report/report.py (1 hunks)
  • dpgen2/exploration/report/report_adaptive_lower.py (3 hunks)
  • dpgen2/exploration/report/report_trust_levels_base.py (3 hunks)
  • dpgen2/exploration/report/report_trust_levels_max.py (1 hunks)
  • dpgen2/exploration/report/report_trust_levels_random.py (1 hunks)
  • tests/exploration/test_report_adaptive_lower.py (2 hunks)
🔇 Additional comments (9)
dpgen2/exploration/report/report.py (1)

63-66: Well-structured abstract method addition!

The new no_candidate() method is properly defined with correct type annotations and documentation.

dpgen2/exploration/report/report_trust_levels_base.py (3)

67-70: Good optimization: Caching computed values!

Adding cached attributes for frequently accessed values is a good optimization that prevents redundant calculations.


140-149: Well-structured ratio calculations!

The ratio calculations are properly moved to the record() method where the data is first processed. This is more efficient than calculating on-demand.


222-234: Efficient use of cached values!

The ratio methods now return cached values instead of recalculating, which is more efficient.

dpgen2/exploration/report/report_adaptive_lower.py (3)

130-133: LGTM! Good optimization for caching computed values.

The addition of these instance variables helps avoid redundant calculations by caching the computed values.


281-284: LGTM! Efficient caching of computed values.

The calculations are performed at the right time, after all processing is complete, and the results are properly cached.


357-372: LGTM! Efficient use of cached values.

The ratio methods now return pre-calculated values instead of computing them on each call, and the new no_candidate method provides a clean interface for checking candidate availability.

tests/exploration/test_report_adaptive_lower.py (2)

91-91: LGTM! Test properly verifies the new clear parameter.

The test correctly sets clear=False to ensure state is preserved for subsequent assertions.


224-225: LGTM! Improved test reliability.

Moving the mock.patch to encompass the get_candidate_ids call ensures reliable mocking behavior.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
dpgen2/exploration/report/report_trust_levels_random.py (1)

44-46: LGTM! Consider adding a more descriptive assertion message.

The type assertion is a good addition for runtime type safety.

Consider making the assertion message more descriptive:

-        assert isinstance(accurate_ratio, float)
+        assert isinstance(accurate_ratio, float), f"accurate_ratio must be float, got {type(accurate_ratio)}"
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2bcb683 and af8ee2f.

📒 Files selected for processing (1)
  • dpgen2/exploration/report/report_trust_levels_random.py (1 hunks)
🔇 Additional comments (1)
dpgen2/exploration/report/report_trust_levels_random.py (1)

51-51: 🛠️ Refactor suggestion

Consider improving memory management design and documentation.

While the memory optimization is valuable, there are several concerns to address:

  1. The method is documented as "should only be called once" but there's no mechanism to enforce this.
  2. The memory clearing operation is tightly coupled with candidate retrieval.
  3. The new clear parameter lacks documentation.

Consider these improvements:

     def get_candidate_ids(
         self,
         max_nframes: Optional[int] = None,
-        clear: bool = True,
+        clear: bool = True,
     ) -> List[List[int]]:
+        """Get candidate IDs for each trajectory.
+
+        Parameters
+        ----------
+        max_nframes : Optional[int]
+            Maximum number of frames to return
+        clear : bool, default=True
+            If True, clears internal state after retrieval.
+            Note: This method is designed to be called only once.
+
+        Returns
+        -------
+        List[List[int]]
+            List of candidate IDs for each trajectory
+        """
+        if hasattr(self, '_candidates_retrieved'):
+            raise RuntimeError("get_candidate_ids should only be called once")
         ntraj = len(self.traj_nframes)
         id_cand = self._get_candidates(max_nframes)
         id_cand_list = [[] for ii in range(ntraj)]
         for ii in id_cand:
             id_cand_list[ii[0]].append(ii[1])
-        # free the memory, this method should only be called once
         if clear:
             self.clear()
+        setattr(self, '_candidates_retrieved', True)
         return id_cand_list

Let's verify if this method is called multiple times in the codebase:

Also applies to: 58-60

Signed-off-by: zjgemi <[email protected]>
Copy link

codecov bot commented Jan 7, 2025

Codecov Report

Attention: Patch coverage is 87.50000% with 5 lines in your changes missing coverage. Please review.

Project coverage is 84.33%. Comparing base (fc72c85) to head (48a5c47).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
...gen2/exploration/report/report_trust_levels_max.py 40.00% 3 Missing ⚠️
dpgen2/exploration/report/report.py 50.00% 1 Missing ⚠️
dpgen2/exploration/report/report_adaptive_lower.py 93.33% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master     #279   +/-   ##
=======================================
  Coverage   84.32%   84.33%           
=======================================
  Files         104      104           
  Lines        6003     6031   +28     
=======================================
+ Hits         5062     5086   +24     
- Misses        941      945    +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@wanghan-iapcm wanghan-iapcm merged commit d60b4e9 into deepmodeling:master Jan 8, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants