Add HDF5 support for trajs and model_devis #259

zjgemi · 2024-09-03T08:54:43Z

Summary by CodeRabbit

New Features
- Introduced new optional arguments for improved data handling and multitasking capabilities.
- Added support for HDF5 formatted data in various modules.
- Enhanced flexibility in input handling for multiple data formats.
Bug Fixes
- Improved robustness in handling validation data structures.
Documentation
- Updated documentation to clarify new parameters and their intended use.

Signed-off-by: zjgemi <[email protected]>

for more information, see https://pre-commit.ci

…frozen_head in run_lmp and run_relax Signed-off-by: zjgemi <[email protected]>

for more information, see https://pre-commit.ci

Signed-off-by: zjgemi <[email protected]>

coderabbitai · 2024-09-03T08:54:58Z

Walkthrough

The changes enhance argument handling, data processing capabilities, and flexibility across various modules of the dpgen2 package. New optional parameters are introduced to functions, enabling better configuration and support for HDF5 datasets. The logic for handling valid data and model freezing is refined, and new methods are implemented to improve data writing processes.

Changes

Files	Change Summary
`dpgen2/entrypoint/args.py`	Added `use_hdf5` argument to `run_diffcsp_args`.
`dpgen2/entrypoint/submit.py`	Introduced `RunRelaxHDF5`; updated `make_concurrent_learning_op` to include `explore_config`; restructured `workflow_concurrent_learning` for multitasking data handling.
`dpgen2/exploration/render/traj_render.py`, `dpgen2/exploration/render/traj_render_lammps.py`	Updated `get_model_devi` and `get_confs` methods to accept `Union[List[Path], List[HDF5Dataset]]` as parameters.
`dpgen2/exploration/selector/conf_selector.py`, `dpgen2/exploration/selector/conf_selector_frame.py`	Modified `select` method to accept `Union[List[Path], List[HDF5Dataset]]` for `trajs` and `model_devis`.
`dpgen2/op/select_confs.py`	Updated `get_input_sign` method to accept `Artifact(Union[List[Path], HDF5Datasets])` for `trajs` and `model_devis`.
`dpgen2/exploration/scheduler/convergence_check_stage_scheduler.py`, `dpgen2/exploration/scheduler/scheduler.py`, `dpgen2/exploration/scheduler/stage_scheduler.py`	Updated `plan_next_iteration` method to accept `Union[List[Path], List[HDF5Dataset]]` for `trajs`.
`pyproject.toml`	Updated `pydflow` version from `>=1.6.57` to `>=1.8.88`.
`tests/op/test_run_relax.py`	Added empty dictionary under `"expl_config"` in the `OPIO` constructor within `testRunRelax`.

Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 687b9c5 and 0499be9.

Files selected for processing (4)

dpgen2/exploration/scheduler/convergence_check_stage_scheduler.py (2 hunks)
dpgen2/exploration/scheduler/scheduler.py (3 hunks)
dpgen2/exploration/scheduler/stage_scheduler.py (3 hunks)
dpgen2/flow/dpgen_loop.py (3 hunks)

Additional comments not posted (7)

dpgen2/exploration/scheduler/stage_scheduler.py (2)

11-12: Approved import changes.

The addition of Union and HDF5Dataset is necessary for the new functionality to handle both paths and HDF5 datasets in the trajs parameter.

Also applies to: 14-15

Line range hint 95-106: Approved method changes with a suggestion to verify integration.

The update to the trajs parameter type in plan_next_iteration enhances the method's flexibility to handle different data sources. The documentation is updated accordingly, which is good for clarity.

Please ensure that the integration of HDF5Dataset is tested thoroughly to confirm that the system handles these datasets correctly across different scenarios.

dpgen2/exploration/scheduler/convergence_check_stage_scheduler.py (2)

8-8: Approved import changes.

The addition of Union and HDF5Dataset is necessary for the new functionality to handle both paths and HDF5 datasets in the trajs parameter.

Also applies to: 14-16

74-74: Approved method changes with a suggestion to verify integration.

The update to the trajs parameter type in plan_next_iteration enhances the method's flexibility to handle different data sources. The documentation is updated accordingly, which is good for clarity.

Please ensure that the integration of HDF5Dataset is tested thoroughly to confirm that the system handles these datasets correctly across different scenarios.

dpgen2/exploration/scheduler/scheduler.py (2)

8-8: Approved import changes.

The addition of Union and HDF5Dataset is necessary for the new functionality to handle both paths and HDF5 datasets in the trajs parameter.

Also applies to: 15-17

117-117: Approved method changes with a suggestion to verify integration.

The update to the trajs parameter type in plan_next_iteration enhances the method's flexibility to handle different data sources. The documentation is updated accordingly, which is good for clarity.

Please ensure that the integration of HDF5Dataset is tested thoroughly to confirm that the system handles these datasets correctly across different scenarios.

Also applies to: 126-126
dpgen2/flow/dpgen_loop.py (1)
96-96: Type annotation change approved, verify handling of new type.

The change in type annotation for the trajs key in the get_input_sign method is approved as it enhances flexibility. However, it's crucial to verify that the rest of the codebase correctly handles the new HDF5Datasets type to ensure compatibility.

Run the following script to verify the handling of the new type:

Verification successful

Handling of HDF5Datasets type is consistent across the codebase.

The codebase shows consistent handling of the trajs variable with the new type annotation Union[List[Path], HDF5Datasets]. The changes are integrated correctly, and the code is prepared to handle both types effectively.

Instances in dpgen2/exploration/selector/conf_selector_frame.py and other files confirm the correct handling of the new type.
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify handling of the new HDF5Datasets type in the codebase.

# Test: Search for usage of `trajs` and check if it handles both types correctly. Expect: Proper handling of both List[Path] and HDF5Datasets.
rg --type python -A 5 $'trajs'
Length of output: 60

Script:
#!/bin/bash
# Description: Verify handling of the new HDF5Datasets type in the codebase.

# Test: Search for usage of `trajs` and check if it handles both types correctly. Expect: Proper handling of both List[Path] and HDF5Datasets.
rg --type py -A 5 $'trajs'
Length of output: 47826

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai generate interesting stats about this repository and render them as a table.
- @coderabbitai show all the console.log statements in this repository.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

for more information, see https://pre-commit.ci

coderabbitai

Actionable comments posted: 4

Outside diff range, codebase verification and nitpick comments (3)

dpgen2/op/__init__.py (1)
42-42: LGTM!

The new import statement for RunRelaxHDF5 is consistent with the existing import style in the file.

Regarding the unused import warning from Ruff, it's likely a false positive in this case. Importing an entity in __init__.py allows it to be accessed directly from the package level, even if it's not used within the __init__.py file itself.

If desired, you can resolve the warning by adding RunRelaxHDF5 to the __all__ list to explicitly mark it as part of the public interface:
__all__ = [
    ..., 
    "RunRelaxHDF5",
]
However, this is not strictly necessary if the project doesn't define __all__ for other entities.

Tools

Ruff

42-42: .run_relax.RunRelaxHDF5 imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)
dpgen2/exploration/render/traj_render_lammps.py (1)
55-58: Consider simplifying the if-else block using a ternary operator.

The static analysis tool suggests using a ternary operator instead of the if-else block. This can simplify the code without changing its behavior.

Apply this diff to simplify the code:
-if isinstance(fname, HDF5Dataset):
-    dd = fname.get_data()
-else:
-    dd = np.loadtxt(fname)
+dd = fname.get_data() if isinstance(fname, HDF5Dataset) else np.loadtxt(fname)
Tools

Ruff

55-58: Use ternary operator dd = fname.get_data() if isinstance(fname, HDF5Dataset) else np.loadtxt(fname) instead of if-else-block

Replace if-else-block with dd = fname.get_data() if isinstance(fname, HDF5Dataset) else np.loadtxt(fname)

(SIM108)
dpgen2/op/run_lmp.py (1)
296-318: LGTM with a nitpick!

The freeze_model function implementation looks good.

Improve the error message.

Consider providing more context in the error message to help with debugging.

Apply this diff to improve the error message:
 def freeze_model(input_model, frozen_model, head=None):
     freeze_args = "-o %s" % frozen_model
     if head is not None:
         freeze_args += " --head %s" % head
     freeze_cmd = "dp --pt freeze -c %s %s" % (input_model, freeze_args)
     ret, out, err = run_command(freeze_cmd, shell=True)
     if ret != 0:
         logging.error(
             "".join(
                 (
                     "freeze failed\n",
-                    "command was",
+                    "command was: ",
                     freeze_cmd,
-                    "out msg",
+                    "\nout msg: ",
                     out,
                     "\n",
-                    "err msg",
+                    "err msg: ",
                     err,
                     "\n",
                 )
             )
         )
-        raise TransientError("freeze failed")
+        raise TransientError(f"Failed to freeze model {input_model} with command: {freeze_cmd}")

dpgen2/exploration/selector/conf_selector.py

coderabbitai · 2024-09-03T09:01:18Z

dpgen2/op/run_relax.py

@@ -1,3 +1,4 @@
+import logging


Remove unused import.

The logging module is imported but not used in the code. Please remove it.

-import logging

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

import logging

Tools

Ruff

1-1: logging imported but unused

Remove unused import: logging

(F401)

coderabbitai · 2024-09-03T09:01:19Z

dpgen2/op/run_relax.py

+    @staticmethod
+    def normalize_config(data={}):
+        ta = RunRelax.relax_args()
+        base = Argument("base", dict, ta)
+        data = base.normalize_value(data, trim_pattern="_*")
+        base.check_value(data, strict=False)
+        return data


Replace the mutable default argument with None.

Using a mutable default argument can lead to unexpected behavior. Please replace it with None and initialize it within the function.

-def normalize_config(data={}): +def normalize_config(data=None): + if data is None: + data = {}

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

@staticmethod

def normalize_config(data={}):

ta = RunRelax.relax_args()

base = Argument("base", dict, ta)

data = base.normalize_value(data, trim_pattern="_*")

base.check_value(data, strict=False)

return data

@staticmethod

def normalize_config(data=None):

if data is None:

data = {}

ta = RunRelax.relax_args()

base = Argument("base", dict, ta)

data = base.normalize_value(data, trim_pattern="_*")

base.check_value(data, strict=False)

return data

Tools

Ruff

223-223: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

coderabbitai · 2024-09-03T09:01:19Z

dpgen2/op/run_lmp.py

+            Argument(
+                "model_frozen_head", str, optional=True, default=None, doc=doc_head
+            ),


Update the argument documentation.

The argument has been renamed from head to model_frozen_head, but the documentation still refers to the old name.

Apply this diff to update the documentation:

-doc_head = "Select a head from multitask" +doc_model_frozen_head = "Select a head from the multitask model to freeze" return [ Argument("command", str, optional=True, default="lmp", doc=doc_lmp_cmd), Argument( "teacher_model_path", [BinaryFileInput, str], optional=True, default=None, doc=doc_teacher_model, ), Argument( "shuffle_models", bool, optional=True, default=False, doc=doc_shuffle_models, ), Argument( - "model_frozen_head", str, optional=True, default=None, doc=doc_head + "model_frozen_head", str, optional=True, default=None, doc=doc_model_frozen_head ), ]

Committable suggestion was skipped due to low confidence.

codecov · 2024-09-03T09:20:58Z

Codecov Report

Attention: Patch coverage is 78.37838% with 8 lines in your changes missing coverage. Please review.

Project coverage is 83.65%. Comparing base (ce4ab3e) to head (0499be9).
Report is 13 commits behind head on master.

Files with missing lines	Patch %	Lines
dpgen2/op/run_relax.py	71.42%	6 Missing ⚠️
dpgen2/exploration/render/traj_render_lammps.py	77.77%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #259      +/-   ##
==========================================
- Coverage   83.70%   83.65%   -0.05%     
==========================================
  Files         104      104              
  Lines        5958     5990      +32     
==========================================
+ Hits         4987     5011      +24     
- Misses        971      979       +8

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: zjgemi <[email protected]>

coderabbitai

Actionable comments posted: 0

Outside diff range, codebase verification and nitpick comments (1)

dpgen2/exploration/render/traj_render_lammps.py (1)
55-58: Consider using a ternary operator for conciseness.

The if-else block can be replaced with a ternary operator to make the code more concise without changing the functionality.

Apply this diff to refactor the code:
-if isinstance(fname, HDF5Dataset):
-    dd = fname.get_data()
-else:
-    dd = np.loadtxt(fname)
+dd = fname.get_data() if isinstance(fname, HDF5Dataset) else np.loadtxt(fname)
Tools

Ruff

55-58: Use ternary operator dd = fname.get_data() if isinstance(fname, HDF5Dataset) else np.loadtxt(fname) instead of if-else-block

Replace if-else-block with dd = fname.get_data() if isinstance(fname, HDF5Dataset) else np.loadtxt(fname)

(SIM108)

coderabbitai

Actionable comments posted: 2

Outside diff range, codebase verification and nitpick comments (1)

dpgen2/exploration/render/traj_render_lammps.py (1)
62-65: LGTM! Consider using a ternary operator for conciseness.

The change to the logic for loading data allows the _load_one_model_devi method to handle HDF5 datasets directly, improving its capability to work with different data formats.

As suggested by Ruff, consider replacing the if-else block with a ternary operator for conciseness:
-if isinstance(fname, HDF5Dataset):
-    dd = fname.get_data()
-else:
-    dd = np.loadtxt(fname)
+dd = fname.get_data() if isinstance(fname, HDF5Dataset) else np.loadtxt(fname)
Tools

Ruff

62-65: Use ternary operator dd = fname.get_data() if isinstance(fname, HDF5Dataset) else np.loadtxt(fname) instead of if-else-block

Replace if-else-block with dd = fname.get_data() if isinstance(fname, HDF5Dataset) else np.loadtxt(fname)

(SIM108)

coderabbitai · 2024-09-04T08:56:28Z

dpgen2/exploration/selector/conf_selector.py

@@ -10,9 +10,13 @@
    Optional,
    Set,
    Tuple,
+    Union,
 )

 import dpdata


Remove unused import.

The dpdata module is imported but unused.

Apply this diff to remove the unused import:

-import dpdata

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

import dpdata

Tools

Ruff

16-16: dpdata imported but unused

Remove unused import: dpdata

(F401)

coderabbitai · 2024-09-04T08:56:29Z

dpgen2/exploration/selector/conf_selector_frame.py

 import dpdata
 import numpy as np


Remove unused imports.

The static analysis tool suggests that the following imports are unused:

dpdata (line 15)

numpy (line 16)

Please remove these unused imports to keep the code clean.

Tools

Ruff

15-15: dpdata imported but unused

Remove unused import: dpdata

(F401)

16-16: numpy imported but unused

Remove unused import: numpy

(F401)

It seems ruff hasn't been enabled in the dpgen2

@njzjz, you're correct that Ruff hasn't been enabled in the dpgen2 repository. If you want to enforce linting rules like removing unused imports, you might consider enabling Ruff in the future. For now, you can disregard the suggestion to remove the unused imports.

wanghan-iapcm

Could you please also support run_lmp, which seems to be straightforward.

zjgemi · 2024-09-05T01:30:17Z

Could you please also support run_lmp, which seems to be straightforward.

Sure.

zjgemi · 2024-09-05T01:55:06Z

Could you please also support run_lmp, which seems to be straightforward.

I realize that for run_lmp, a task only outputs a single trajectory and a single model_devi file. As outputs of each task must be stored in a seperated file. Merging outputs of each task into a HDF5 file will bring little benefit.

On the other hand, in the HDF5 mode, users cannot conveniently preview file content in UI. That's why HDF5 mode is not employed by default unless performance bottleneck is met.

Signed-off-by: zjgemi <[email protected]>

for more information, see https://pre-commit.ci

zjgemi and others added 10 commits August 27, 2024 12:13

Support valid data for multitask training

f539428

Signed-off-by: zjgemi <[email protected]>

Merge branch 'master' into multitask-valid

864dcf9

add command to RunDPTrain

93e508c

Signed-off-by: zjgemi <[email protected]>

fix list + str

18d2b70

Signed-off-by: zjgemi <[email protected]>

Support multitask for diffcsp engine

c90e5dd

Signed-off-by: zjgemi <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

1666e91

for more information, see https://pre-commit.ci

share freeze_model among run_lmp and run_relax; rename head -> model_…

6acdbef

…frozen_head in run_lmp and run_relax Signed-off-by: zjgemi <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

ea87d06

for more information, see https://pre-commit.ci

add args to RunRelax

16c30d6

Signed-off-by: zjgemi <[email protected]>

Add HDF5 support for trajs and model_devis

0e36fb5

Signed-off-by: zjgemi <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

6e7e795

for more information, see https://pre-commit.ci

coderabbitai bot reviewed Sep 3, 2024

View reviewed changes

zjgemi added 3 commits September 4, 2024 09:28

Merge branch 'master' into hdf5-trajs

967ade6

Merge branch 'hdf5-trajs' of github.com:zjgemi/dpgen2 into hdf5-trajs

a8e82ab

fix type check

5e2c2eb

Signed-off-by: zjgemi <[email protected]>

coderabbitai bot reviewed Sep 4, 2024

View reviewed changes

Merge branch 'master' into hdf5-trajs

687b9c5

coderabbitai bot reviewed Sep 4, 2024

View reviewed changes

wanghan-iapcm reviewed Sep 4, 2024

View reviewed changes

zjgemi and others added 2 commits September 6, 2024 15:05

add HDF5Datasets format of trajs to exploration scheduler

ee022f7

Signed-off-by: zjgemi <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

0499be9

for more information, see https://pre-commit.ci

wanghan-iapcm approved these changes Sep 9, 2024

View reviewed changes

wanghan-iapcm merged commit 3501db4 into deepmodeling:master Sep 10, 2024
9 checks passed

Add HDF5 support for trajs and model_devis #259

Add HDF5 support for trajs and model_devis #259

Uh oh!

Conversation

zjgemi commented Sep 3, 2024 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Sep 3, 2024

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 3, 2024

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 3, 2024

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Sep 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 4, 2024

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

njzjz Sep 4, 2024

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 4, 2024

Choose a reason for hiding this comment

Uh oh!

wanghan-iapcm left a comment

Choose a reason for hiding this comment

Uh oh!

zjgemi commented Sep 5, 2024

Uh oh!

zjgemi commented Sep 5, 2024

Uh oh!

Uh oh!

Uh oh!

zjgemi commented Sep 3, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 3, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

codecov bot commented Sep 3, 2024 •

edited

Loading

coderabbitai bot Sep 4, 2024 •

edited

Loading