-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Develop a unified class for water structure and dynamics #9
base: devel
Are you sure you want to change the base?
Conversation
* analyze time-averaged water structure, water dipole autocorrelation, and survival probability within the same class * class collects data, then user can choose which analyses to perform by running methods * methods interface to general functions in modules (waterstructure and waterdynamics) * can be extended to include z-MSD * utils now includes a function to automatically identify surface indices
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThe pull request introduces a new class Changes
Sequence Diagram(s)sequenceDiagram
participant U as User
participant WA as WaterAnalysis
participant T as Trajectory Data
U->>WA: Initialize analysis with simulation parameters
WA->>WA: _prepare() - Set up analysis environment
loop Process each frame
WA->>T: Retrieve frame data
WA->>WA: _single_frame(frame) - Process current frame
end
WA->>WA: _conclude() - Finalize calculations and average results
WA->>U: Return computed analysis results
Possibly related PRs
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Nitpick comments (3)
WatAnalysis/utils.py (2)
166-199
: Clarify argument shape constraints in docstring.While this function effectively creates boolean masks based on given intervals, it would be helpful to explicitly state in the docstring or via type hints that:
z_coords
must have shape(n_frames, n_atoms)
or(n_atoms,)
if operating on a single frame.z1
andz2
must each be one-dimensional arrays with the same length asz_coords
’s first dimension (if multi-frame).This helps future maintainers avoid shape mismatch mistakes.
201-251
: Add safeguards to prevent infinite loops.The
while _crosses_z_boundary(atoms[atoms.symbols == element]):
loop translates the slab by +1 Å each iteration until it no longer crosses the boundary. Although this approach works under typical slab configurations, a pathological case could cause the loop never to exit (e.g., extremely large or malformed cell dimensions). Consider:
- Adding a maximum iteration count as a fallback.
- Logging or printing the number of iterations to give users feedback if the loop runs too often.
This will make the function more robust.
WatAnalysis/waterstructure.py (1)
2-5
: Expand the docstring to reflect new functionalities
These lines reference "time-averaged water structure properties" but do not clarify the additional angle/dipole autocorrelation features now offered by the module. Consider updating the docstring to briefly mention the new orientation and angular distribution analyses introduced below.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
WatAnalysis/analysis.py
(1 hunks)WatAnalysis/utils.py
(2 hunks)WatAnalysis/waterdynamics.py
(1 hunks)WatAnalysis/waterstructure.py
(2 hunks)
🔇 Additional comments (6)
WatAnalysis/utils.py (1)
2-8
: Use consistent import style and minimize redundancy where possible.All these imports look correct and are commonly used in the subsequent code. If you plan to expand this module further, consider grouping imports from the same library together. For now, this is fine.
WatAnalysis/analysis.py (2)
74-102
: Confirm thatsurf_ids
is not None and is valid.If
surf_ids
is missing or incorrectly shaped, downstream logic (e.g., accessingsurf_ids[0]
in_single_frame
) may fail. Consider raising an exception or providing a fallback to computesurf_ids
automatically, such as viautils.guess_surface_indices
, to improve robustness.
195-196
: Validate assumption about “valid dipoles.”When
only_valid_dipoles=True
, you filter out NaN values incos_theta
to identify valid water molecules. Ensure there are no corner cases wherecos_theta
might be NaN for other reasons (e.g., a transient geometry issue) that do not necessarily mean “invalid water.” You may want to confirm that each molecule truly has exactly 2 hydrogens.WatAnalysis/waterdynamics.py (1)
75-129
: Validate indexing logic in the loop.The nested loop:
for k in range(tau): intersection *= mask[k : -tau + k : step] intersection *= mask[tau::step]could risk out-of-bounds errors if
tau
approaches the total number of frames, or it might unintentionally double-count the final slice. Ensure that the step-based slicing is correct for your intended definition of “continuous presence” in the region.WatAnalysis/waterstructure.py (2)
64-114
: Check bin calculation to avoid potential zero or negative bins
When(z_surf[1] - z_surf[0]) < dz
, the computed bin count might be zero or negative, causing a runtime warning or error. It would be prudent to validate these parameters or handle the scenario more gracefully.You could, for instance, add a small check:
if (z_surf[1] - z_surf[0]) <= dz: + raise ValueError("Bin width dz is too large or surfaces are too close.") counts, bin_edges = np.histogram(...)
226-263
: Logic looks good for counting water molecules
This function correctly combines masks for the lower and upper surfaces. The code is structured and readable, with meaningful variable names.
def calc_vector_autocorrelation( | ||
max_tau: int, | ||
delta_tau: int, | ||
step: int, | ||
vectors: np.ndarray, | ||
mask: np.ndarray, | ||
): | ||
""" | ||
Calculate the autocorrelation function for a vector quantity over time. | ||
|
||
Parameters | ||
---------- | ||
max_tau : int | ||
Maximum lag time to calculate ACF for | ||
delta_tau : int | ||
Time interval between lag times (points on the C(tau) vs. tau curve) | ||
step : int | ||
Step size for time origins. If equal to max_tau, there is no overlap between | ||
time windows considered in the calculation (so more uncorrelated). | ||
vectors : numpy.ndarray | ||
Array of vectors with shape (num_timesteps, num_particles, 3) | ||
mask : numpy.ndarray | ||
Boolean mask array indicating which particles to include, shape | ||
(num_timesteps, num_particles) | ||
|
||
Returns | ||
------- | ||
tau : numpy.ndarray | ||
Array of lag times | ||
acf : numpy.ndarray | ||
Normalized autocorrelation function values for each lag time | ||
""" | ||
tau = np.arange(start=0, stop=max_tau, step=delta_tau) | ||
acf = np.zeros(tau.shape) | ||
mask = np.expand_dims(mask, axis=2) | ||
|
||
# Calculate ACF for each lag time | ||
for i, t in enumerate(tau): | ||
n_selected_vectors = None | ||
if t == 0: | ||
# For t=0, just calculate the dot product with itself | ||
dot_products = np.sum( | ||
vectors * vectors * mask, axis=2 | ||
) # Shape: (num_timesteps, num_molecules) | ||
n_selected_vectors = np.count_nonzero(mask) | ||
else: | ||
# For t > 0, calculate the dot products between shifted arrays | ||
_vectors_0 = vectors[:-t:step] * mask[:-t:step] # dipole(t=0) | ||
_vectors_t = vectors[t::step] * mask[t::step] # dipole(t=tau) | ||
dot_products = np.sum( | ||
_vectors_0 * _vectors_t, axis=2 | ||
) # Shape: ((num_timesteps - t)//step, num_molecules) | ||
n_selected_vectors = np.count_nonzero(mask[:-t:step] * mask[t::step]) | ||
|
||
# Average over molecules and time origins | ||
acf[i] = np.sum(dot_products) / n_selected_vectors | ||
|
||
# Normalize the ACF | ||
acf /= acf[0] # Normalize by the zero-lag value | ||
return tau, acf | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Handle potential division by zero for ACF normalization.
When normalizing by acf[0]
, if no vectors are selected or if the dot product is zero at t=0, acf[0]
will be 0, leading to NaN. Consider adding a check before normalization:
if acf[0] == 0:
# Handle edge case, e.g., set acf[...] = np.nan or skip normalization
acf /= acf[0]
Committable suggestion skipped: line range outside the PR's diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (2)
WatAnalysis/waterstructure.py (2)
64-71
: Possible integer rounding for bin count.When computing
bins=int((z_surf[1] - z_surf[0]) / dz)
, floating-point inaccuracies may cause off-by-one issues. Consider rounding withmath.floor
ormax(1, int(...))
to avoid an empty bin scenario if(z_surf[1] - z_surf[0]) < dz
.
200-224
: Validate zero-vector cases incalc_costheta_profile
.Like the dipole normalization concern elsewhere, NaN values can arise if norms are zero. Although you filter out
np.isnan
, consider systematically handling zero-length vectors or logging warnings for molecules missing hydrogens.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
WatAnalysis/analysis.py
(1 hunks)WatAnalysis/waterstructure.py
(2 hunks)
🧰 Additional context used
🧠 Learnings (1)
WatAnalysis/waterstructure.py (1)
Learnt from: lucasdekam
PR: ChiahsinChu/WatAnalysis#9
File: WatAnalysis/waterstructure.py:170-224
Timestamp: 2025-02-25T16:12:43.768Z
Learning: The cosine theta profile in water analysis is an odd function across surfaces, meaning the proper symmetrization formula is (f(x) - f(-x))/2 which is implemented as `avg_cos_theta = (avg_cos_theta - avg_cos_theta[::-1]) / 2` in the code. This differs from even functions like density profiles which use addition for symmetrization.
🔇 Additional comments (4)
WatAnalysis/analysis.py (3)
74-92
: Consider validatingsurf_ids
and related atom indices.If
surf_ids
isNone
or empty, or if the selected surfaces contain few atoms, the references in_single_frame
may trigger unexpected behavior or indexing errors. You might add a check that raises a helpful exception if these indices are invalid.
95-101
: Handle potential empty or invalid output fromidentify_water_molecules
.If the system lacks proper water molecules or if detection fails,
self.water_dict
could be empty. You may want to guard against downstream usage of an empty dictionary, e.g., by logging a warning or raising an exception.
183-210
: Check consistency ofonly_valid_dipoles
usage.
only_valid_dipoles
filters out NaN entries incos_theta
, which is derived directly from normalized dipoles. Ensure it aligns with your analysis goals when dealing with incomplete or broken water geometries (sometimes H is missing). If partial water molecules can appear in your system, consider alternative filtering or logging.WatAnalysis/waterstructure.py (1)
165-167
: Confirm that subtractive symmetrization is intentional for an odd function.The use of
(rho_cos_theta - rho_cos_theta[::-1]) / 2
is valid if you are explicitly modeling an odd function across the surface centerline. If instead you intend to average symmetrical data (an even function), use addition instead. Verify the conceptual basis here to ensure correctness.
# Calculate dipoles and project on self.axis | ||
dipole = waterstructure.calc_water_dipoles( | ||
coords_hydrogen, | ||
coords_oxygen, | ||
self.water_dict, | ||
box=ts_box, | ||
mic=self.min_vector, | ||
) | ||
dipole /= np.linalg.norm(dipole, axis=-1, keepdims=True) | ||
np.copyto(self.results.dipoles[self._frame_index], dipole) | ||
cos_theta = dipole[:, self.axis] | ||
np.copyto(self.results.cos_theta[self._frame_index], cos_theta) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prevent division by zero in normalized dipole calculation.
When calling dipole /= np.linalg.norm(dipole, axis=-1, keepdims=True)
, water molecules missing atoms (or incorrectly detected) might yield zero vectors. This produces NaNs or raises warnings. Consider adding a small epsilon check or filtering out zero-length vectors to avoid potential runtime issues.
dipole_lengths = np.linalg.norm(dipole, axis=-1, keepdims=True)
+nonzero_mask = dipole_lengths[:,0] > 1e-8
+dipole[nonzero_mask] /= dipole_lengths[nonzero_mask]
Committable suggestion skipped: line range outside the PR's diff.
also remove spurious attribute self.save_velocities
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (2)
WatAnalysis/waterdynamics.py (2)
70-71
: 🛠️ Refactor suggestionAddress division by zero for ACF normalization.
Directly normalizing by
acf[0]
can cause division by zero ifacf[0]
equals zero. This was flagged in a past review as well.-acf /= acf[0] # Normalize by the zero-lag value +if acf[0] != 0: + acf /= acf[0] +else: + # Handle edge case: set acf[...] = np.nan or skip normalization
132-132
: 🛠️ Refactor suggestionAvoid dividing by zero for survival probability normalization.
As with the autocorrelation, normalizing with:
acf /= acf[0]risks division by zero if
acf[0]
is zero. Consider the same fix as above to handle this edge case.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
WatAnalysis/analysis.py
(1 hunks)WatAnalysis/waterdynamics.py
(1 hunks)WatAnalysis/waterstructure.py
(2 hunks)
🧰 Additional context used
🧠 Learnings (1)
WatAnalysis/waterstructure.py (1)
Learnt from: lucasdekam
PR: ChiahsinChu/WatAnalysis#9
File: WatAnalysis/waterstructure.py:170-224
Timestamp: 2025-02-25T16:12:43.768Z
Learning: The cosine theta profile in water analysis is an odd function across surfaces, meaning the proper symmetrization formula is (f(x) - f(-x))/2 which is implemented as `avg_cos_theta = (avg_cos_theta - avg_cos_theta[::-1]) / 2` in the code. This differs from even functions like density profiles which use addition for symmetrization.
🔇 Additional comments (4)
WatAnalysis/analysis.py (2)
149-150
: Guard against zero dipole norm.If a water molecule’s two O–H vectors collapse to a zero vector (e.g. incomplete or incorrectly assigned water), normalizing the dipole at lines 149–150:
dipole /= np.linalg.norm(dipole, axis=-1, keepdims=True)can produce NaNs or inf. Consider adding an epsilon or filtering zero-length vectors before normalization.
19-456
: Overall implementation looks solid.The
WaterAnalysis
class is cohesive and well-documented, with clear methods for density, orientation, dipole autocorrelation, and survival probability. No major concerns beyond the zero-vector edge case. Good job!WatAnalysis/waterstructure.py (2)
292-293
: Verify angle inputs forarccos
.If floating-point rounding pushes
cos_theta[mask_lo]
or-cos_theta[mask_hi]
slightly outside [-1, 1],np.arccos
can produce NaNs or raise warnings. Consider clamping the values to the valid range:val = np.clip(val, -1.0, 1.0)before calling
arccos
.
64-304
: All new water structure methods appear correctly integrated.These five functions (
calc_density_profile
,calc_orientation_profile
,calc_costheta_profile
,count_water_in_region
, andcalc_angular_distribution
) align well with the code inanalysis.py
. Symmetrization for odd/even functions is clearly documented, and overall logic is consistent with the user’s design. LGTM!
# Average over molecules and time origins | ||
acf[i] = np.sum(dot_products) / n_selected_vectors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle potential zero division in n_selected_vectors
.
When t == 0
or if the boolean mask is empty for some reason, n_selected_vectors
may end up being zero. This will raise an error or produce NaNs at line 68:
acf[i] = np.sum(dot_products) / n_selected_vectors
Consider adding a safety check or skipping the iteration if n_selected_vectors == 0
.
# N(t,tau), shape: (num_timesteps - tau, ) | ||
n_t_tau = np.sum(intersection, axis=1) | ||
|
||
acf[i] = np.mean(n_t_tau / n_t) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prevent potential division by zero when computing survival probability.
In line 127:
acf[i] = np.mean(n_t_tau / n_t)
If n_t
is zero, a division by zero occurs or NaNs are produced. Add a guard check for empty or zero denominators.
Added a notebook by accident
This PR contains a new module
analysis.py
with a classWaterAnalysis
that replaces the oldWaterStructure
classes. It analyzes the water structure, but also collects data to compute the dipole autocorrelation and water survival probability. The water survival probability calculation takes only seconds, whereas thewaterdynamics
-basedSP
class takes minutes.I also added a function for automatically identifying surface indices,
utils.guess_surface_indices
.Comparison to previous code
Survival probability

Angle distribution

count_in_region
: also verified withcalc_sel-water
Water structure profiles

New features
Dipole autocorrelation, quite confident of correctness because the vector autocorrelation was also tested with the VACF code
Future tasks
waterstructure.py
andwaterdynamics.py
modules, I left the old code there for comparison. The files contain a bit too much code now. Maybe also move hbonds to a separate fileLet me know if you want/need the comparison notebook, I wasn't sure whether to place it in the tests folder.
Summary by CodeRabbit
Summary by CodeRabbit