Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Support distributed read for other acquisition files #30

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

tristpinsm
Copy link
Contributor

@tristpinsm tristpinsm commented Jun 21, 2022

I made these changes in order to be able to read CalibrationGainData files into distributed containers.

@tristpinsm
Copy link
Contributor Author

the other part of the unicode index map fix is in radiocosmology/caput#201

@tristpinsm tristpinsm requested a review from jrs65 June 21, 2022 00:14
@jrs65
Copy link
Contributor

jrs65 commented Jun 21, 2022

Ah interesting. I was facing similar issues with the HFB code. In the end I opted to go a slightly different route and write a draco style container that could read the archived HFB code (chime-experiment/ch_pipeline#99). I figured I could probably do the same for many of the other formats to (e.g. CorrData).

I'll have a look at this later. It is a massive pain to need to be maintaining multiple implementations for distributed IO, so we should see if there are reasons to keep it separate, and if we can combine any of it together.

@tristpinsm
Copy link
Contributor Author

Ah interesting. I was facing similar issues with the HFB code. In the end I opted to go a slightly different route and write a draco style container that could read the archived HFB code (chime-experiment/ch_pipeline#99). I figured I could probably do the same for many of the other formats to (e.g. CorrData).

Yeah I thought about taking this approach, but decided I didn't want to tackle changing the whole framework, especially because the CorrData code is so complicated. In the end doing it this way took more effort than I anticipated anyways...

I'll have a look at this later. It is a massive pain to need to be maintaining multiple implementations for distributed IO, so we should see if there are reasons to keep it separate, and if we can combine any of it together.

I agree that it makes no sense the way the code is organised at the moment, and it would be much better if everything used a common approach. It might be painful to try and move things over seamlessly and support all the old formats, but just making containers for the current acquisition formats would be an improvement.

**kwargs,
)

# Datasets that we should convert into distribute ones
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this routine is moving to BaseData maybe this list should be a class attribute so it can easily be modified by derived classes.


Examples
--------
Examples are analogous to those of :meth:`CorrData.from_acq_h5`.

"""

if distributed:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of think classes should need to opt in to distributed support. See comments below for why,

datasets=None,
out_group=None,
**kwargs,
):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for the sake of people maintaining the module there should be a bit more clarity about what these various base methods are for. One thing that would help would be calling this method something else, e.g. _from_acq_h5_single, or somesuch that makes it clearer how this differs. It might also be good to put some extensive comments explaining the relationship between everything.

comm = MPI.COMM_WORLD

# Determine the total number of frequencies
nfreq = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems problematic as not all datasets have a frequency axis, e.g. doing WeatherData.from_acq_h5(..., distributed=True) and I think this would crash on you complaining about the absence of a frequency axis.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One option would be to allow subclasses to name an axis to distributed the read over. If it's present this routine uses it, if it's not present then you can't do a distributed read and it exits elegantly.

@tristpinsm tristpinsm force-pushed the tpm/dist_read branch 2 times, most recently from f0b982c to 4f94fec Compare August 31, 2022 00:07
@tristpinsm
Copy link
Contributor Author

ok, I think I've addressed your comments @jrs65 (thanks!). It feels like making this work is a hack, so I'm not sure what the right path forward is. I guess getting all of the CorrData code integrated with draco will not be trivial, but probably moving the other containers in here to that model would be straightforward. And this PR only addresses the latter...

@tristpinsm tristpinsm changed the title Support distributed read for other acquisition files [WIP] Support distributed read for other acquisition files May 4, 2023
@tristpinsm
Copy link
Contributor Author

It's not clear this is the way forward to enable distributed reading of other acquisition files. There is a similar problem with the HFB files, so we can shelf this for now and see how the HFB approach pans out and consider generalising it to all non-visibility acquisition files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants