Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get code up to date #37

Open
wants to merge 17 commits into
base: main
Choose a base branch
from
Open

Get code up to date #37

wants to merge 17 commits into from

Conversation

adampbeardsley
Copy link

@adampbeardsley adampbeardsley commented Jun 28, 2022

This is an all-in-one PR to grab changes made over the last couple years. Some things included here:

  • py3 support
  • equinox fix
  • fits writer
  • Hari's new modules

This supersedes #20 and #36.

Copy link
Author

@adampbeardsley adampbeardsley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this PR looks pretty straightforward. Just a few cleanup tasks, and a couple quick discussions to have.

Comment on lines +1050 to +1055
#try:
# bf_romein.execute(udata, gdata)
#except NameError:
# bf_romein = Romein()
# bf_romein.init(self.locs, gphases, self.grid_size, polmajor=False)
# bf_romein.execute(udata, gdata)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove blocks of commented code.

Suggested change
#try:
# bf_romein.execute(udata, gdata)
#except NameError:
# bf_romein = Romein()
# bf_romein.init(self.locs, gphases, self.grid_size, polmajor=False)
# bf_romein.execute(udata, gdata)

Comment on lines +1149 to +1154
#bifrost.map(
# "a(i,j,k,l) += (b(i,j,k,l/2) * b(i,j,k,l%2).conj())",
# {"a": autocorrs, "b": udata, "t": self.ntime_gulp},
# axis_names=("i", "j", "k", "l"),
# shape=(self.ntime_gulp, nchan, nstand, npol ** 2),
#)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove blocks of commented code.

Suggested change
#bifrost.map(
# "a(i,j,k,l) += (b(i,j,k,l/2) * b(i,j,k,l%2).conj())",
# {"a": autocorrs, "b": udata, "t": self.ntime_gulp},
# axis_names=("i", "j", "k", "l"),
# shape=(self.ntime_gulp, nchan, nstand, npol ** 2),
#)

Comment on lines +1166 to +1171
#bifrost.map(
# "a(i,j,p,k,l) += b(0,i,j,p/2,k,l)*b(0,i,j,p%2,k,l).conj()",
# {"a": crosspol, "b": gdata},
# axis_names=("i", "j", "p", "k", "l"),
# shape=(self.ntime_gulp, nchan, npol ** 2, self.grid_size, self.grid_size),
#)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove blocks of commented code

Suggested change
#bifrost.map(
# "a(i,j,p,k,l) += b(0,i,j,p/2,k,l)*b(0,i,j,p%2,k,l).conj()",
# {"a": crosspol, "b": gdata},
# axis_names=("i", "j", "p", "k", "l"),
# shape=(self.ntime_gulp, nchan, npol ** 2, self.grid_size, self.grid_size),
#)

Comment on lines +1198 to +1205
#try:
# bf_romein_autocorr.execute(autocorrs_av, autocorr_g)
#except NameError:
# bf_romein_autocorr = Romein()
# bf_romein_autocorr.init(
# autocorr_lo, autocorr_il, self.grid_size, polmajor=False
# )
# bf_romein_autocorr.execute(autocorrs_av, autocorr_g)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove blocks of commented code

Suggested change
#try:
# bf_romein_autocorr.execute(autocorrs_av, autocorr_g)
#except NameError:
# bf_romein_autocorr = Romein()
# bf_romein_autocorr.init(
# autocorr_lo, autocorr_il, self.grid_size, polmajor=False
# )
# bf_romein_autocorr.execute(autocorrs_av, autocorr_g)

"--duration",
type=int,
default=3600,
help="Duration of EPIC (seconds)",
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
help="Duration of EPIC (seconds)",
help="Total duration of EPIC observation (seconds)",

Comment on lines -2168 to +2213
if args.removeautocorrs:
raise NotImplementedError(
"Removing autocorrelations is not yet properly implemented."
)
#if args.removeautocorrs:
# raise NotImplementedError(
# "Removing autocorrelations is not yet properly implemented."
# )
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There have been some changes to the autocorr code, but has it been tested? As far as I know we don't run with this option, so I'm not sure this NotImplementedError should be removed yet.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not even sure if the auto-correlation removal ever worked.

Comment on lines +750 to +751
self.iring.resize(igulp_size, buffer_factor= 8)
self.oring.resize(ogulp_size, buffer_factor= 128) # , obuf_size)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @jaycedowell had some concerns about this. IIRC, this looks like we're making large buffers to account for a slow startup, when really we should be aiming to fix the startup issue instead. I don't have a solution, just flagging this for discussion.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that buffer_factor of 128 is really large. I had done some simple benchmarking and it looked like the problem was with the GenerateLocations call in MOFFCorrelatorOp.main being slow. I don't know if that was all of it but it is a factor.

For solutions we could think about something like compute-and-cache. The first time a particular frequency range is encountered we take the hit on computing and cache the results to disk. On subsequent encounters with that frequency setup we use what is on disk.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes a lot of sense. Perhaps we could also reorder so that even if we need to generate the locations it's done before the buffer is allocated/used, so we don't need to be juggling data while we're still setting up.

self.iring.resize(igulp_size)
self.oring.resize(ogulp_size, buffer_factor=5)
self.iring.resize(igulp_size, buffer_factor=128)
self.oring.resize(ogulp_size, buffer_factor=256)
Copy link

@jaycedowell jaycedowell Jun 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is another large buffer in MOFFCorrelatorOp. The buffer_factor of 256 on the output ring could be one of the big things driving the GPU memory utilization.

cores = [0, 2, 3, 4, 5, 6, 7]
gpus = [0, 0, 0, 0, 0, 0, 0]
cores = [3, 4, 5, 6, 7]
gpus = [0, 0, 0, 0, 0]
Copy link

@jaycedowell jaycedowell Jun 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Control of cores and gpus should probably be set by a command line flag. gpus could probably be reduced to just a single gpu argument that is used for everything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants