Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-42870: Add ModelRebuilder #9

Merged
merged 1 commit into from
Apr 12, 2024
Merged

DM-42870: Add ModelRebuilder #9

merged 1 commit into from
Apr 12, 2024

Conversation

taranu
Copy link
Collaborator

@taranu taranu commented Mar 28, 2024

No description provided.

Copy link

@fred3m fred3m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen a lot of different solutions for organizing catalog data for plotting but I really like your solution using dataclasses and how clean it makes your notebook.

Comment on lines +145 to +178
def get_is_extended(self) -> np.ndarray:
return self.table["refExtendedness"] >= 0.5
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised that you're still using the current extendedness model, or is this a different extendedness column that MPF creates?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's propagated from the objectTable_tract. If you mean why use that column instead of refSizeExtendedness, I just haven't updated the downstream tasks to do that yet.

Comment on lines 191 to 193
# This is a bit of an ugly hack, refactor this method?
kwargs_table = {name: getattr(table, name) for name in table.__pydantic_fields__ if name != "table"}
table_within = type(table)(table=table.table[within], **kwargs_table)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is a nice hack but I wonder if just creating an updatedTable method that returns a new instance of the ObjectTable with a new table would be cleaner.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I will make this a method on the base class with these same lines as a default implementation and let subclasses override if the kwargs_table hack doesn't work.

Also it turns out slicing an astropy table makes a copy... always? I can't see a way to get a view. Oh well, it's not important for this ticket.

fit_results: astropy.table.Table = pydantic.Field(doc="Multiprofit model fit results")
task_fit: MultiProFitSourceTask = pydantic.Field(doc="The task")

@cached_property
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for introducing me to the @cached_property decorator!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's great! It improves readability and performance and makes practically immutable properties. I don't know if I'd be bold enough to use it on non-frozen dataclasses or more easily mutable classes, though.

parameter values.
"""

catexps: list[CatalogExposurePsfs] = pydantic.Field(doc="List of catalog-exposure-psf objects")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a little more info about what a catalog-exposure-psf object is, perhaps in the docstring?

Comment on lines +104 to +108
# I have no idea what to put for initInputs.
# quantum.initInputs looks wrong - the values can be lists
# quantumgraph.initInputRefs(taskDef) returns a list of DatasetRefs...
# ... but I'm not sure how to map that to connection names?
task: CoaddMultibandFitTask = taskDef.taskClass(config=config, initInputs={})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I haven't dived deep enough into the middle ware to know what this does, but I guess you don't need it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works as is, but when I originally looked in to it, I couldn't work out under which circumstances I might add a connection to the task and break this line. I guess we'll find out and I'll leave this comment here to jog my memory if that happens.

catexps=catexps, task_fit=task_fit, catalog_multi=inputs["cat_ref"], fit_results=cat_output,
)

def make_config_data(self):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add docstrings for methods in this class.

Comment on lines +152 to +188
offsets[g2f.CentroidXParameterD] = -offset_cen
offsets[g2f.CentroidYParameterD] = -offset_cen
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the models are always square? Is there any chance that will change in the future?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, no, these aren't bounding box offset but coordinate system offsets. MultiProFit defines the bottom-left corner of the bottom-left pixel as (0,0)* whereas it's (-0.5, -0.5) in the stack. I left that parameter general but I found it even less likely that someone would need to make it asymmetric than to give a value other than 0 or 0.5.

*I might regret this choice in the future but it's a bit late to change now.

FigureAxes = tuple[Figure, Axes]


@dataclass(frozen=True, config=pydantic.ConfigDict(arbitrary_types_allowed=True))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naive question, but what's the gain in using a pydantic dataclass with lots of explicit pydantic code over using pydantic.BaseModel directly? The @dataclass decorator is confusing to me at first glance when this is clearly a data model.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's really just force of habit as I started using pydantic dataclasses to replace dataclasses.dataclass, which is what they were made for. I don't think I've ever actually extended BaseModel.

Looking into this there is some discussion of the small differences in behaviour between dataclass and BaseModel here: pydantic/pydantic#710. I don't think any of those issues have impacted me yet, so I'm uncertain as to whether it's best to use one or the other. In the same way that using dataclass over BaseModel looks strange to someone familiar with pydantic, using dataclass should look familiar to anyone who only uses standard library dataclasses (though I suppose by that argument I should use from pydantic import Field rather than pydantic.Field).

Copy link
Member

@timj timj Apr 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main comment would be that if you are wanting to switch to pydantic for new code it would be better to use BaseModel -- the ConfigDict you are using is also non-standard for dataclasses. The pydantic dataclasses support is to ease migration of existing code and not something that we should be doing for new code unless we are worried that we might drop pydantic in the future. cc/ @TallJimbo

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, I changed to BaseModel and it was painless.

@taranu taranu merged commit 92eda9c into main Apr 12, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants