Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use reduction function in pair_coalescence_stat #2975

Merged
merged 1 commit into from
Aug 7, 2024

Conversation

nspope
Copy link
Contributor

@nspope nspope commented Aug 6, 2024

Followup to #2932

  • Use a reduction function in pair_coalescence_stat, and change some naming conventions to be less confusing
  • The reduction function is applied when each window is flushed, and is of the form f(weights, values, params)
  • Because node weights may be aggregated into bins prior to reduction, values (which are node ages) also need to be aggregated. This is done by taking the weighted average of ages in each bin, which reduces to node ages if each node is in its own bin.
  • There's some matrix transposition done to ensure that the inputs to the reduction function are contiguous in memory, which makes for nicer semantics

@nspope
Copy link
Contributor Author

nspope commented Aug 6, 2024

Because of the transposition, the output dimensions are now (genomic windows, sample set indexes, time windows), which seems like a nicer layout to me because it makes indexing a given population pair trajectory "nice", e.g. output[window][pair] gives a trajectory through time.

But, this doesn't agree with the time windowed stats draft #2948, so would be good to sort out now (@petrelharp). Maybe the pair coalescence stuff doesn't need to have the same output layout.

Copy link

codecov bot commented Aug 6, 2024

Codecov Report

Attention: Patch coverage is 98.34711% with 2 lines in your changes missing coverage. Please review.

Project coverage is 89.66%. Comparing base (1f28c11) to head (22a8f66).

Files Patch % Lines
c/tskit/trees.c 97.70% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2975      +/-   ##
==========================================
- Coverage   89.90%   89.66%   -0.24%     
==========================================
  Files          28       29       +1     
  Lines       23148    30435    +7287     
  Branches     4693     5905    +1212     
==========================================
+ Hits        20811    27290    +6479     
- Misses       1302     1799     +497     
- Partials     1035     1346     +311     
Flag Coverage Δ
c-tests 86.36% <97.80%> (+0.05%) ⬆️
lwt-tests 80.78% <ø> (ø)
python-c-tests 88.81% <100.00%> (?)
python-tests 99.00% <100.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
c/tskit/core.c 95.63% <100.00%> (ø)
c/tskit/core.h 100.00% <ø> (ø)
python/_tskitmodule.c 88.81% <100.00%> (ø)
python/tskit/trees.py 98.75% <100.00%> (ø)
c/tskit/trees.c 90.68% <97.70%> (+0.12%) ⬆️

@nspope
Copy link
Contributor Author

nspope commented Aug 6, 2024

More numpy 2 issues in the CI:

tests/test_file_format.py:34: in <module>
    import h5py
../../.local/lib/python3.10/site-packages/h5py/__init__.py:25: in <module>
    from . import _errors
h5py/_errors.pyx:1: in init h5py._errors
    ???
ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

I'm guessing this is because the circleci workflow is using h5py 3.9.0, when we need 3.11.0 for numpy 2 compatibility. Why this didn't get hit yesterday on #2932 is mysterious ...

@nspope nspope marked this pull request as ready for review August 6, 2024 22:42
@nspope
Copy link
Contributor Author

nspope commented Aug 6, 2024

Ready for a look whenever @jeromekelleher . Not sure where I'm going wrong with "codecov/project"?

Copy link
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very pretty, great stuff! I just have one comment about division by zero.

Regarding the order of dimensions, I don't really have an opinion here and happy to go with what you and @petrelharp think. I'm happy to merge this and keep plugging forward while we figure it out.

c/tskit/trees.c Outdated Show resolved Hide resolved
@nspope nspope force-pushed the pair-coalescence-use-reduction branch from 4135408 to 22a8f66 Compare August 7, 2024 16:52
@nspope
Copy link
Contributor Author

nspope commented Aug 7, 2024

Let's get this in then-- thanks!

@jeromekelleher jeromekelleher added the AUTOMERGE-REQUESTED Ask Mergify to merge this PR label Aug 7, 2024
@mergify mergify bot merged commit 0403de0 into tskit-dev:main Aug 7, 2024
20 of 21 checks passed
@mergify mergify bot removed the AUTOMERGE-REQUESTED Ask Mergify to merge this PR label Aug 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants