-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Baseline use cases #36
Comments
I am agree that more simple is the output more it will be adopted by clients. why Mango ? With Mango, you can annotate with generic measure or you can use a DataModel (in my examples , I use the both: for position and photometry I use DM, and for other quantities GenericMeasure). Note that I am not a fan of “everything in DM” because my feeling is that they are more complex to manage/parse – but a minimum why not ? (may be needed for photometry and always with the condition that nothing is required) About complex structure (like timeseries): for me it is a bonus – possible with Mango! You have an astropy version to parse vodmlinstance ? That is a good new - It could be use to demonstrate the benefit of the current work – could we imagine a mango extension possible :-) (limited in a first step to the Mango core) ? |
On Fri, Apr 16, 2021 at 09:25:12AM -0700, gilleslandais wrote:
I am agree that more simple is the output more it will be adopted
by clients. However, I don’t think that just gather a quantity
with its error is enough – in the example I sent previously
(https://github.com/ivoa/dm-usecases/wiki/mango) , I listed some
examples of tables having columns with associated error, flags,
ref..
Yeah, but as I wrote there: we need concrete *use* cases to figure
out if and how to annotate such columns. Annotation is there for
clients, and if they can't do anything with the annotation, it's
worse than useless (because it may lead to confusion when, at a later
time, an actual use case is identified; cf. caproles).
Let me try for these flag-like columns:
(1) Error-like flags (e.g., "uncertain"). These should probably be
dealt with in Meas; doing anything machine-readably with them might,
however, be so hard that it may not be worth it.
(2) For anything else, the only real use case I can imagine would be
"When showing a subset of columns, show and hide this set of columns
as one". This sounds like "presentation-oriented" annotation, for
which we so far have to DM. Perhaps there should be one?
What other cases are there for these groups? What purpose would the
machine-readable annotation have for them?
Oh, and also note that for VizieR-internal use (where interoperable
clients can just look the other way) you can always add private
annotation like, perhaps:
```
<INSTANCE dmrole="viz-pres:ColGroup">
<ATTRIBUTE dmrole="header" value="All about V mag"/>
<ATTRIBUTE dmrole="group_leader" ref="vmag"/>
<ATTRIBUTE dmrole="group_members">
<COLLECTION>
<ITEM ref="vmag_quality"/>
<ITEM ref="vmag_source"/>
<ITEM ref="vmag_err"/>
<ITEM ref="vmag_flag"/>
</COLLECTION>
</ATTRIBUTE>
</INSTANCE>
```
why Mango ?
For me, the most important is that Mango is capable to expose
measures – measure which can be composed by a column and associated
columns like error, flags, etc. (and each measure even if it is a
collection of columns is defined with a couple UCD+semantic –
easily understandable by clients). It is well adapted for VizieR
tables – enough generic also. Is it also adapted for other
data-centers ?
Well, if, as you say, machines should be aware of these things when
processing a measurement, then I argue whatever it is should go into
Meas -- if it's needed interoperably at all.
With Mango, you can annotate with generic measure or you can use a
DataModel (in my examples , I use the both: for position and
photometry I use DM, and for other quantities GenericMeasure).
Yes -- except I still don't understand what you win by saying "this
is a special measurement that can have photometry metadata" over just
attaching the PhotDM metadata when it's there to a normal
Measurement. My usual question: what can a machine *do* with this
extra annotation that it cannot do with just the PhotDM annotation?
For me the question is more: what are the (trusted) DM expected in Mango ?
You mean: what DMs can a client rely on? I am totally sure that no
client can ever rely on any particular annotation in everything it
sees.
As I point out in https://github.com/msdemlei/astropy (look at the
README), it's a lot better if clients look for annotation they need
when they need it (and fail in a defined way when it's not there).
And of course you can convey the notion of, say, time series in
that scheme rather plainly (look at the time-series, md-proposal).
About complex structure (like timeseries): for me it is a bonus –
possible with Mango!
...as with anything else currently discussed.
You have an astropy version to parse vodmlinstance ? That is a good
Well, not the gruesome 25-element proposal, but something with six
new elements (MODEL and children should go, too: they're just
INSTANCE-s, too). And yes, that's what
https://github.com/msdemlei/astropy is. Again, see its README for
how this intended to be used.
new - It could be use to demonstrate the benefit of the current
work – could we imagine a mango extension possible :-) (limited in
a first step to the Mango core) ?
Since I have doubts there is such a benefit, I'd love to look into
this. What kind of functionality would you be interested in?
|
I believe that the use-case agenda hasn't changed:
Honestly, IMO putting forward a library prototype does not help that much for the discussion. I won't rehash my argument in this post but I'll ask the true question which relates to the exact scope of the Markus's proposal. The annotation block starts with a list of the models on which data are mapped.
This clearly shows that technically both approaches can live together.
|
On Mon, Apr 19, 2021 at 05:57:02AM -0700, Laurent MICHEL wrote:
I believe that the use-case agenda hasn't changed:
1. Showing whether MCT/PhotDM cover our need
2. Adopting a mapping strategy
3. Adopting a mapping syntax
Honestly, IMO putting forward a library prototype does not help
that much for the discussion.
Well, it would for me; as I said, I cannot see at this point how
ordinary clients will consume these complex data models *in the VO*
(which is a slow and dirty place of long-lived services and clients)
to satisfy these simple use cases.
But if I get shown plausible ways how they can do that with a
reasonable effort, I'll stop whining immediately.
By the way, writing such a thing is effort well spent: After all,
we're doing all this for the consumers, and seeing how they'll do
the consuming in common cases is very informative all around (let me
also mention that getting an impression of the ease of implementation
is a good thing, too).
The annotation block starts with a list of the models on which data are mapped.
- If that list contains `Meas` and `Coord`, clients will understand they can find both `Meas` and `Coord` instances in the mapping block.
- If that list contains `Meas`, `Coord` and `Cube`, clients will understand they can find `Meas` , `Coord` and `Cube` instances in the mapping.
Oh... you mean, clients should go for the MODEL declarations?
Frankly, I doubt that's a good idea, because a single model may
contain all kinds of classes, and a model declaration as such doesn't
tell you which of them might be used (since I've just played with it
today, see ProvDM: what would a client learn from knowing there's
ProvDM annotation in a VOTable?).
Conversely, I'd be interested in your opinions about the API I'm
proposing over at https://github.com/msdemlei/astropy
(get_annotations(dmtype), iter_annotations()). Are you saying
something like that won't work for MANGO? Or just isn't suitable?
- They are is free to ignore classes of one of these models (likely `Cube` for the Markus's client)
Well, I'd put it the other way round: They'll look at the annotation
they need for a given task; and of course "my" client would look at
cube annotation, too, when figuring out what to plot against what (say).
This clearly shows that technically both approaches can live together.
I come to the conclusion that the real question asked by @msdemlei is to say whether the IVOA have to get ride of integrated models (CUBE, MANGO,..) or not.
This is *related* to the question I'm asking, yes. But I'd not worry
about God models (note: Cube doesn't need to be one; if it just points
to the dependent and independent axes and leaves the rest to other
DMs, it'll be a very compact and tidy DM) if there is a clear way in
which to satisify these basic use cases in spite of them, and with a
clear vision of what happens if we do have a major version change in
one of our fundamental DMs.
- If YES: this must be set as a headline of the workshop.
- If NO: let's consider that both are valid and let's refine the syntax to facilitate the job for all.
This question is not only a matter of annotation
I also suspect that entangled DMs cannot be fixed in the annotation,
which is where the thread last October came from.
But perhaps that is just me -- which is why some sample
implementation would help me a lot. If entangled models can be done
in a way that don't impact client operations for the use cases
presented, there's not much I'm worried about (but I'm still rather
convinced programmers like Lego bricks better than pre-assembled
models).
So, yes, I *would* propose "Bricks or Death Stars" as the workshop's
headline.
|
Markus,
Have you looked at the implementations done so far? I feel that a lot of
these questions are addressed there.
Again, and that is important: Clients must not need to know a full data
model (e.g., "timeseries") to do that. The rationale behind that is that
client writers will curse us if they will have to re-implement "find an
error for a value" again and again just because there's a new product type.
And the data providers will curse us if they cannot annotate a value/error
pair just because it's, say, a redshift and there's no "data model" for
that.
There is no 're-implement "find an error for a value" again and again'.
With the model re-use, if you understand meas:Measure, you know where the
errors are in any model that uses it.
These examples become rather too trivial to be interesting...
Examples (all assuming "if present in the table", of course):
If I see an ra, figure out the corresponding dec
these are part of the same instance, so if you've found the instance
containing RA, you already have the DEC.
pos = doc.find_instances(LonLatPoint)[0]
ra = pos.longitude
dec = pos.latitude
If I have an ra/dec, figure out the reference system, the reference
position, the epoch
frame = pos.coord_sys.frame.space_ref_frame
epoch = pos.coord_sys.frame.equinox
If I have a position, figure out an associated proper motion
I know Astropy combines the position and proper motion elements together in
the same object..
So far, the models define no specific association between them. With
'Mango', you could have a Source with Position and ProperMotion as
associated properties.
In the rama package, it converts meas:Position to astropy SkyCoord whenever
possible, so if there is an associated proper motion element as well, it
could be incorporated into that instance.
If I have any sort of quantity, figure out the/a error (and, eventually
its sort, perhaps one day distribution paraemters, etc)
I'm assuming you don't mean the ivoa:Quantity.. which is not something that
would be grabbed individually
prop = doc.find_instance(<any Measure type>)
err_type = type( prop.error.stat_error )
If I have a flux or magnitude, figure out the band it's in, perhaps a
zeropoint, etc.
This is a bit complicated by the fact that PhotDM is not tied into the
other models, so there is no coordinate system which includes photometry
metadata. That was a target of the SpectralDM work. Mango has a stub in
place to work with..
prop = doc.find_instance(Photometry)
band = prop.coord.coord_sys.frame.name
zeropoint = prop.coord.coord_sys.frame.zeroPointFlux
Re: Prototype library..
* Omar's rama package is a prototype library, available on GIT, uses
standard VOTable parser to interpret the VOTable, and custom parser for the
Annotation..
* it auto-converts Position and Time instances to astropy SkyCoord and
Time objects when possible.
* presents an API to the user (see above commands; and any of the
example implementation pages)
* as I mentioned in your other thread, has a similar front-end to
yours.
* often following the model class structure
* but can be extended with helpers for convenience.
* eg implemented in rama: conversion to astropy, plotters,
shortcuts from cube axes -> data
* I've always expected it to become publicly available once the
model/annotation questions are resolved.. the way things are going,
probably as a PyVO package.
One of the questions I've had with the 'just search for what you are
interested in' approach is illustrated in the standard_properties case:
* Table with Master Source properties
* Table with Detections (also Source instances)
If I open that file and do
sources = doc.find_instances(Source)
I get ALL the Source instances.. master and detections.. there is very
little that I can reasonably do with that.
Same for many of the base elements
pos = doc.find_instances(Position)
returns all the Source positions, Detection positions, Target Position,
possibly the Reference Position of the satellite, etc.. any Position in the
file.
The approach seems too loose for anything but the simplest cases.
|
On Tue, Apr 20, 2021 at 03:22:57PM -0700, Mark Cresitello-Dittmar wrote:
Have you looked at the implementations done so far? I feel that a lot of
these questions are addressed there.
Presumably not nearly the latest ones, more like Mango of a year ago
and Omar's code as of Shanghai or so, and I have to admit I have lost
track of what goes on where.
Perhaps you could point us to the current version(s) of the packages
you have mentioned? Having a concise overview over where to look (in
which branches where applicable) is a good thing with a view to the
workshop, too. Of course, I'd be particularly grateful if brief docs
on solving the mentioned use cases (which clearly are not trivial, as
evinced by the fact that we still can't do them 20 years in the VO)
could easily be located.
|
@msdemlei wrote
Cube is much more than that, as far as I remember. Dependent and independent axes fit well with TimeSeries and Spectrum dataproduct types, but in ND Cubes all but one axes are independant and in event list everything is independant.
Of course if you change one of the models the annotation will change and clients will have to adapt. But this is true wether the models are entangled or not. Or there is something I do not understand |
On Wed, Apr 21, 2021 at 3:57 AM msdemlei ***@***.***> wrote:
On Tue, Apr 20, 2021 at 03:22:57PM -0700, Mark Cresitello-Dittmar wrote:
> Have you looked at the implementations done so far? I feel that a lot of
> these questions are addressed there.
Presumably not nearly the latest ones, more like Mango of a year ago
and Omar's code as of Shanghai or so, and I have to admit I have lost
track of what goes on where.
I was referring to these workshop cases...
Perhaps you could point us to the current version(s) of the packages
you have mentioned? Having a concise overview over where to look (in
which branches where applicable) is a good thing with a view to the
workshop, too.
Each of my implementation pages have a summary of the packages used, with
pointers.
e.g.,
https://github.com/ivoa/dm-usecases/tree/main/usecases/time-series/mcd-implementation
Of course, I'd be particularly grateful if brief docs
on solving the mentioned use cases (which clearly are not trivial, as
evinced by the fact that we still can't do them 20 years in the VO)
could easily be located.
Perhaps my point is that with the data properly annotated to the current
models, these operations become trivial.
With the 'Death Star' approach, there is no 'figuring out' where something
is, you simply follow the path (modulo API helpers).
With the 'Bricks' approach, you have Coordinate A using FIELD x.. you have
to find Measure B also using FIELD x.
I'd be happy to put a page together that goes through that set of actions
specifically, but I'm sure each is covered within the various cases (module
position-propermotion maybe).
|
On Wed, Apr 21, 2021 at 06:37:48AM -0700, Mark Cresitello-Dittmar wrote:
> Of course, I'd be particularly grateful if brief docs
> on solving the mentioned use cases (which clearly are not trivial, as
> evinced by the fact that we still can't do them 20 years in the VO)
> could easily be located.
>
Perhaps my point is that with the data properly annotated to the current
models, these operations become trivial.
With the 'Death Star' approach, there is no 'figuring out' where something
is, you simply follow the path (modulo API helpers).
But that's different each time depending on whether you have a time
series, a catalogue, or yet something else, right? So, a client that
doesn't know SparseCube but just PhotDM would be out of luck when
trying to work out the, say, band of the magnitudes in
https://github.com/ivoa/dm-usecases/blob/main/usecases/time-series/mcd-implementation/gavo_summary.md
right?
It is this kind of "new code per high-level DM" thing that I'd really
love to avoid, because it will be hard sell on our clients (I'll
mention again I'm only here because we failed to sell STC1). And
it'll be an even tougher sell once we change the first basic DM and
all these high-level DMs start to break.
I'd be happy to put a page together that goes through that set of actions
specifically, but I'm sure each is covered within the various cases (module
position-propermotion maybe).
Yeah, that would help me a lot. Just putting up the various steps
I'll have to do until the code on the gavo_summary.md actually runs
would give me a much better idea of what all this entails.
But, frankly, what I'm really after is code that satisfies the
precise use cases in the original issue: "use annotation X" if it's
there, where X is Meas, Coord, or Phot -- and do that regardless of
what "sort" of data one is dealing with.
|
On Wed, Apr 21, 2021 at 06:13:10AM -0700, Bonnarel wrote:
@msdemlei wrote
Of course if you change one of the models the annotation will
change and clients will have to adapt. But this is true wether the
models are entangled or not. Or there is something I do not
understand
The question, again, is: When you change, say, Meas to Meas2, will
this only break uses of Meas, or will it take Coord, Dataset, Cube,
Spectrum and everything else with it?
With entangled models, Meas2 will immediately mean Coord2, Dataset2,
Cube2, Spectrum2, and so on, so a legacy client will no longer
understand *any* annotation at all. It won't, for instance, find the
reference system for a position, although if it found it and boldly
interpreted it it would actually work. Client developers will not be
amused.
With isolated models, a client still can figure out that ra/dec is in
ICRS even after a full Meas2 migration. It will still work fine for
anything that doesn't require figuring out the errors. Overall:
*much* better use experience in many common cases (and graceful
failure in the others).
Oh, and of course it's simple to provide Meas and Meas2 annotation at
the same time, and it's straightforward to write clients to "use the
best one"; see
https://github.com/msdemlei/astropy#choosing-the-most-expressive-annotation
for how clients would do that.
Sure, you could have multiple annotations with entangled models, too
-- but then you'll have a tag soup explosion, because you will have to
repeat *all* annotations in *all* the God blocks, most of them
identical except for the DM names. Annotators will now be amused.
|
On Thu, Apr 22, 2021 at 3:56 AM msdemlei ***@***.***> wrote:
On Wed, Apr 21, 2021 at 06:37:48AM -0700, Mark Cresitello-Dittmar wrote:
> > Of course, I'd be particularly grateful if brief docs
> > on solving the mentioned use cases (which clearly are not trivial, as
> > evinced by the fact that we still can't do them 20 years in the VO)
> > could easily be located.
> >
>
> Perhaps my point is that with the data properly annotated to the current
> models, these operations become trivial.
> With the 'Death Star' approach, there is no 'figuring out' where
something
> is, you simply follow the path (modulo API helpers).
But that's different each time depending on whether you have a time
series, a catalogue, or yet something else, right? So, a client that
doesn't know SparseCube but just PhotDM would be out of luck when
trying to work out the, say, band of the magnitudes in
https://github.com/ivoa/dm-usecases/blob/main/usecases/time-series/mcd-implementation/gavo_summary.md
right?
No..
If you are starting at the top (SparseCube vs Mango), the path to the Meas
would, of course be different.
If you only understand Meas and PhotDM, you can ( if using the rama package
)
mags = doc.find_instances( Photometry ) << find the photometry Measure
band = mags[0].coord.coord_sys.frame.filter.name << band name of the
PhotometryFilter
* assuming photDM:PhotometryFilter if off the CoordFrame.
This is always true, regardless of the data product in which the Photometry
resides.
You can do this because you know it is a Photometry type Measure, and
therefore, the PhotometryFilter resides in the associated CoordFrame.
If you only understand PhotDM (and not Meas), not sure how you found
'mag', but..
you can still find all PhotometryFilters
filters = doc.find_instances( PhotometryFilter )
band = filters.name
If there is only 1 filter, that must apply to your mag, otherwise you have
some troubles.
|
On Thu, Apr 22, 2021 at 07:21:19AM -0700, Mark Cresitello-Dittmar wrote:
On Thu, Apr 22, 2021 at 3:56 AM msdemlei ***@***.***> wrote:
> On Wed, Apr 21, 2021 at 06:37:48AM -0700, Mark Cresitello-Dittmar wrote:
If you are starting at the top (SparseCube vs Mango), the path to the Meas
would, of course be different.
If you only understand Meas and PhotDM, you can ( if using the rama package
)
mags = doc.find_instances( Photometry ) << find the photometry Measure
band = mags[0].coord.coord_sys.frame.filter.name << band name of the
PhotometryFilter
* assuming photDM:PhotometryFilter if off the CoordFrame.
Can I see this in a complete script? I'm wondering at the moment
where the Photometry symbol comes from.
If you only understand PhotDM (and not Meas), not sure how you found
'mag', but..
you can still find all PhotometryFilters
filters = doc.find_instances( PhotometryFilter )
band = filters.name
If there is only 1 filter, that must apply to your mag, otherwise you have
some troubles.
I hope you'll understand that this statement unnerves me a bit given
we're talking about the solution to our 20-years-old coordinates
troubles here...
So, why not attach the PhotometryFilter directly to the magnitude?
It's clearly more robust and works just as well, no?
…-- Markus
|
I spent last 2 days on an implementation aiming at satisfying your request . This implementation, based on Mango, can be used at different model friendships levels. |
My opinion is that we roughly agree on HOW to consume annotated data
but we definitely disagree on WHAT to consume.
|
The Standard Properties case: Annotated CSC file
Python Script:
Produces: Summary:
What is unnerving? The model associates the 'col15' flux Measure with the proper PhotometryFilter. With that relation defined, finding the band becomes trivial, without it there is. no way to know which of the 4 bands is relevant.. you really can't do it.
That could be done, but the models try organize the concepts. The PhotometryFilter is not a direct property of the Photometry Measure, but rather associated metadata on the environment which produced the value. This sort of information is relegated to the associated coordinate Frame in all Measures. |
On Fri, Apr 23, 2021 at 10:18:23AM -0700, Mark Cresitello-Dittmar wrote:
Python Script:
```
#!/usr/bin/env python
from rama.reader import Reader
from rama.reader.votable import Votable
from rama.models.mango import Photometry
```
Ah, that's where the Photometry symbol comes from. So... that's
generated code? I'll frankly say that anything requiring code
generation gets large demerits in my book, and I think in many other
peoples' books.
> So, why not attach the PhotometryFilter directly to the
> magnitude? It's clearly more robust and works just as well, no?
That could be done, but the models try organize the concepts. The
PhotometryFilter is not a direct property of the Photometry
Measure, but rather associated metadata on the environment which
produced the value. This sort of information is relegated to the
associated coordinate Frame in all Measures. I could certainly see
adding a helper method to the Photometry class which short-cuts
this for users.
If "organize the concepts" actually makes things more difficult (and
may even require adding helper methods), perhaps that's another
indicator that we should think again whether this "organize the
concepts" is actually useful. So: Do we get a benefit (which would
be: "Things we can do that we could not otherwise do") for the cost
of having to have a lot of extra classes, needing new classes
whenever new sorts of measurements come up (although many existing
classes would already work) and having an extra indirection?
Standard reminder: We'll have to sell this to the community.
Enabling things is what will make them buy this. Making things more
complicated just because we feel it's more beautiful will make them
frown.
|
On Fri, Apr 23, 2021 at 07:54:38AM -0700, Laurent MICHEL wrote:
> But if I get shown plausible ways how they can do that with a
> reasonable effort, I'll stop whining immediately.
I spent last 2 days on an [implementation](https://github.com/ivoa/dm-usecases/wiki/mango#lm-mango-parser) aiming at satisfying your request . This implementation, based on Mango, can be used at different _model friendships_ levels.
1- Doing nothing with the model
2- Just take the frames/filters in the globals, as you do.
3- Picking MCT instances by exploring the map of the MANGO parameters as I suggested for you.
4- Get the binding between model leaves and fields (e.g. pos + error is mapped on columns 2, 3, 9). That way, clients can read the data with their usual APIs and provide on demand data counterpart onto the model. This is close to what you are doing as well.
5- Directly getting data rows as model instances. I know that this to proceed won't replace what is working very well so far, but I bet it will be attractive when when data will get more complex (e.g. multiple objects in one table, joint data tables)
Hm -- what you show doesn't quite solve what I think we most dearly
need: I have a column (say, the user picked it in a plot widget), and
now I simply want to figure out what's the reference frame it it's
part of a position, or the bandpass if it's photometry.
I'd like to try this myself, though -- could you briefly point at
what code I need to run this?
|
So that you get connections between model parameters and VOTable columns. ex: https://github.com/ivoa/modelinstanceinvot-code/blob/main/python/client/demo/ivoa_csc2_example.py |
I don't see the relevance of the code origin to this discussion. The system in place has proved very useful for keeping the code current with the model changes. There is no requirement for using code generators. You asked to see the script which gets the filter name from the magnitude..
It occurred to me today that you probably were not advocating this approach, but merely being critical of where we DID decide to put the filters. To do as this suggests would mean generating an extension to Measurement which adds the reference to the PhotometryFilters.. This would go against both of your base assertions that 1 Measurement is sufficient, and that models should be decoupled.
well, one can trivially find the Photometry data and its corresponding Filter metadata..
|
On Mon, May 03, 2021 at 09:43:55PM -0700, Mark Cresitello-Dittmar wrote:
>> On Fri, Apr 23, 2021 at 10:18:23AM -0700, Mark Cresitello-Dittmar wrote:
> Ah, that's where the Photometry symbol comes from. So... that's
> generated code? I'll frankly say that anything requiring code
> generation gets large demerits in my book, and I think in many
> other peoples' books.
I don't see the relevance of the code origin to this discussion.
The system in place has proved very useful for keeping the code
current with the model changes. There is no requirement for using
code generators.
If that is true, that's reassuring. How would the client code look
like without the generated code?
You asked to see the script which gets the filter name from the
magnitude..
What is the corresponding thread in your implementation?
Since it's the same logic as
https://github.com/msdemlei/astropy#getting-an-error-for-a-column,
I've not added it in the README, but it's (assuming you want to
figure out the band name for a column col):
```
try:
ann = col.get_annotations("phot:PhotCal")[0].filterIdentifier
except IndexError:
raise Exception(f"No photometry annotation on {col}")
```
* With your proposal of a single Measurement type and uncoupled
models, what makes the connection between the 'flux' Measurement
and the corresponding Filter?
The value reference in phot:PhotCal.
* I see in your TS example the PhotCal instance has added an
attribute to the PhotCal object for 'value' pointing to the flux
column.. which is not part of the photDM spec (values are not part
of PhotCal). I think we hit on this before.
Yeah, that needs to be fixed; some of our DMs have to be taught to
talk about what they annotate. But if we want to de-entangle them,
some minor adjustments will be necessary anyway.
>> So, why not attach the PhotometryFilter directly to the magnitude?
It occurred to me today that you probably were not advocating this
approach, but merely being critical of where we DID decide to put
Ummm... of course I'd like to attach the filter to the column (or
param) it applies to. Why would I not?
the filters. To do as this suggests would mean generating an
extension to Measurement which adds the reference to the
PhotometryFilters.. This would go against both of your base
assertions that 1 Measurement is sufficient, and that models should
be decoupled.
Right. That's one reason to directly annotate the column with the
PhotCal information.
> If "organize the concepts" actually makes things more difficult
> (and may even require adding helper methods), perhaps that's
> another indicator that we should think again whether this
> "organize the concepts" is actually useful. So: Do we get a
> benefit (which would be: "Things we can do that we could not
> otherwise do")
well, one can trivially find the Photometry data and its
corresponding Filter metadata..
Isn't what I show above about as trivial?
In think you find it's not: Building a large and arguably fragile
tree (network?) of models to hide what complexity is unavoidable
will, I expect, backfire. All my programming practice has taught me
that getting such abstractions right is *really* hard. Let's try how
far we get without having to introduce them.
|
From #15: May 4, 2021 - Markus wrote:
This is the 'find all measures' case.
produces I'll note that this is not finding the Measures defined under the mango model, (Photometry, HardnessRatio), but I'd consider this a bug on the package. |
I have no idea..
The case is different in that the PhotCal and Measure are in different models.
With your proposal, you can only accomplish this task because you've made an adhoc change to the PhotDM model which is conceptually wrong. Photometric Filters conceptually stand on their own, they do not know anything about the data that was taken using them. Many flux/mag columns can use the same filter. And presumably, many other sorts of objects can reference a PhotCal instance.
Attaching it to the column is not the same as attaching it to the magnitude (Measure) |
I'm not expert in Rama package but I guess that the generated code is basically object classes matching the model elements with appropriate accessors (setter and getter).
It is hard to imagine that generated code will be able to tackle with all possible cases. For this reason I prefer to allow clients to store annotations as dictionnaries (Python) and to let them to picking data there with appropriate selectors (see here). |
On Tue, May 04, 2021 at 09:52:05AM -0700, Mark Cresitello-Dittmar wrote:
> Since it's the same logic as https://github.com/msdemlei/astropy#getting-an-error-for-a-column, I've not added it in the README, but it's (assuming you want to figure out the band name for a column col):
> ```
> try:
> ann = col.get_annotations("phot:PhotCal")[0].filterIdentifier
> except IndexError:
> raise Exception(f"No photometry annotation on {col}")
> ```
The case is different in that the PhotCal and Measure are in different models.
* given a 'flux' Measure, how do you find the **corresponding** filter band?
What you show merely finds the first PhotCal instance, right?
This is another case where entangling DMs is pernicious: Photometry
shouldn't have anything to do with errors (i.e., Meas). The whole
problem disappears when you drop the notion that there should be
something like a FluxMeasure (or whatever, I can't find it in the
2020-04-13 PR).
When you drop classes and things work even better, to mq that's a
clear indication that the class shouldn't be there.
> Yeah, that needs to be fixed; some of our DMs have to be taught
> to talk about what they annotate. But if we want to de-entangle
> them, some minor adjustments will be necessary anyway.
With your proposal, you can only accomplish this task because
you've made an adhoc change to the PhotDM model which is
Well, yes, this thread is about changes to our models to make them
work better (and more likely to be adopted).
conceptually wrong. Photometric Filters conceptually stand on
their own, they do not know anything about the data that was taken
using them. Many flux/mag columns can use the same filter. And
presumably, many other sorts of objects can reference a PhotCal
instance.
That's how the model works right now, and I argue that we shouldn't
be doing it that way.
You see, it's conceptually as valid to say "there's photometry
metadata attached to this column" -- that in reality there actually
is a filter (as in the concrete thing that you put somewhere into
your optical path) somewhere that has some relation to the photometry
metadata doesn't mean it makes operational sense to model that
particular artefact. Not in principle, and here in particular
because between that filter and the flux you have in a column there's
a lot of additional stuff happening.
Concrete example, as for instance in SDSS: you have PSF and aperture
photometry. You could perhaps shave off a few elements by sharing
the "z filter that was in the optical path" between the PSF and
aperture photometry annotation, but there's a lot of complexity
involved in that that's a lot more expensive in implemenation than
just keeping that piece of information (and perhaps a few additional
infos characterising it) local to both the PSF and aperture
annotations.
So, again, I argue you're paying a high price of a small benefit.
> Ummm... of course I'd like to attach the filter to the column (or param) it applies to. Why would I not?
Attaching it to the column is not the same as attaching it to the
magnitude (Measure)
No, it's not, because there's an additional indirection, and you're
entangling Meas and Phot. Suppose for a moment that there is -- as I
claim -- a price for that entangling, and there's a price for the
extra Meas class: having it on the column saves us from having to pay
both.
|
On Tue, May 04, 2021 at 08:25:35AM -0700, Mark Cresitello-Dittmar wrote:
From #15: May 4, 2021 - Markus wrote:
> With just one Measurement class (or perhaps a few when we add proper
> distributions), with the API of https://github.com/msdemlei/astropy,
> all TOPCAT would need to do is:
>
> ```
> ann = col.get_annotations("meas:Measurement")
> if ann:
> associated_error = ann.naive_error
> ```
> That's a high price to pay; it would be worth paying if we got a
> major benefit from it. But I can't even see a minor one.
This is the 'find all measures' case.
Perhaps in a somewhat roundabout sense; what TOPCAT would *really*
want to do in this use case is "Find a Measurement instance on the
column I'm about to plot", which arguably can do with a shortcut over
"enumerate all measurements and see if I can find one on my column".
With rama package, since these are all children of meas:Measure:
Yeah, that's the problem: You need a complex package (with, as you
say, bugs of its own) that needs to access and parse VO-DML.
Without the per-physics classes all you need is a simple,
model-indepentent, extension to a VOTable parser (it's a 400 lines
diff in astropy/io/votable/tree.py of
https://github.com/msdemlei/astropy) and have it right there, without
having to deal with VO-DML as such (which, don't get me wrong, still
is great, and validators will need to deal with it; it's just that
normal clients shouldn't have to).
I'm here because I'm convinced this sort of simplicity will
eventually decide whether after we're done here we'll have everyday
clients consuming our annotations -- or whether we'll go on as it's
been for the past 15 years, with people ignoring our DMs and there's
still no usable way to automatically plot error bars (or have epoch
sliders without guessing).
|
Lets not side-track this issue with a critique of the Rama package.
The community has had a LOT of questions about how compatible the models are with AstroPy, one of the main goals of the implementation was to leverage AstroPy as much as possible. So it uses QTable, Quantity, has converters for SkyCoord and Time. |
Thank you for these clarifications. |
Everyone who has worked IVOA models to date has taken the current approach. That is a LOT of history by a lot of people. The models should define the entities involved and their relations. They need to support several different uses, from DAL protocols, to Applications, and Serializations. Your approach focuses solely on the Serialization aspect, has no formal model backing, and has only been applied to very simple data structures and cases. It is a VERY big ask to expect the working group to change its entire approach based on this.
I'm not entangling Meas and Phot, I'm defining an association (in the model) between flux data and photometry metadata. These last bits are off the issue topic, and back on the old circular debate, so I think I'll try to leave it here. |
Markus.. this isn't the problem. FWIW: the rama package uses astropy.io.votable in the parser code, though not sure which pieces at the moment. |
On Wed, May 05, 2021 at 09:08:35AM -0700, Mark Cresitello-Dittmar wrote:
> This is another case where entangling DMs is pernicious:
> Photometry shouldn't have anything to do with errors (i.e.,
> Meas). The whole problem disappears when you drop the notion that
> there should be something like a FluxMeasure (or whatever, I
> can't find it in the 2020-04-13 PR). When you drop classes and
> things work even better, to mq that's a clear indication that the
> class shouldn't be there.
1. The Measure containing 'flux' does have errors. In your
proposal, this would be a Measure annotating the appropriate
columns 'flux', 'flux_err'.
Yes.
1. Your use case is: 'If I have a flux or magnitude, figure out
the band it's in, perhaps a zeropoint, etc.'
Yes -- which, in my world, is unrelated to the flux or magnitude also
having errors.
2. so, given that you've found one of several annotations for a
'flux' measure; what is your thread for finding the
corresponding band? All you've shown is that you can find a
PhotCal, not necessarily the correct one.
No. When you say:
```
table["mag_g"].get_annotations("phot:PhotCal")
```
you get the annotation(s) *exactly* for mag_g and for nothing else.
No need to guess.
3. What qualifies Photometry to be a special type (rather than
being handled by GenericMeasure) is that there is associated
metadata.. namely the PhotometryFilter. The specialized model
class tells the provider and client what the expected associated
metadata is, and where to find it. And that is the crux of
different approaches.
Perhaps, but I still wonder what motivates you to prefer this to simply
associating that extra metadata to the annotated column itself. What
sort of client would prefer to first go to a measurement annotation
and then to the PhotCal rather than go to the PhotCal directly? Why
would data providers want to have that intermediate class?
Is it really just that you'd like to model Filter as some sort of
external thing? If so, one could of course make the filter a
reference in the Photometry annotation, and again, there would be no
need for cross-model references.
> That's how the model works right now, and I argue that we
> shouldn't be doing it that way. You see, it's conceptually as
> valid to say "there's photometry metadata attached to this
> column" -- that in reality there actually is a filter (as in the
> concrete thing that you put somewhere into your optical path)
> somewhere that has some relation to the photometry metadata
> doesn't mean it makes operational sense to model that particular
> artefact.
Everyone who has worked IVOA models to date has taken the current
approach. That is a LOT of history by a lot of people. The models
I hate to be blunt, but: that has given us essentially zero uptake.
I wouldn't be here if we had a way to solve these basic uses cases,
which DM should have provided by (at least) VOTable 1.2 in 2009.
I'm not blaming anyone, and I've been part of the failure myself,
but: If something hasn't worked out for more than 10 years, wouldn't
you agree it's an indication that we've been doing it wrong?
Applications, and Serializations. Your approach focuses solely on
the Serialization aspect, has no formal model backing, and has only
What would count as "formal model backing"?
been applied to very simple data structures and cases. It is a
VERY big ask to expect the working group to change its entire
approach based on this.
The "very simple data structures" argument has been levelled a few
times, but nobody has pointed me towards a case where disentangled
models really work worse than entangled ones.
Sure, there is the case of severely denormalised tables that I, well,
refuse to do, but that is completely independent of model
architecture and a question of how complex we want our annotation to
be.
So... what kind of complex data are you concerned about?
I'm not entangling Meas and Phot, I'm defining an association (in
the model) between flux data and photometry metadata.
Well, call it what you want, but you have a reference from Meas to
Phot if you say "In the PhotometryMeasure, there is a filterDef that
points to photdm:PhotCal", right? And with that, we're back to the
old thing that going to Photdm2 will require an incompatible change
to Meas, too, which is what I, perhaps somewhat derogatorily, call
entangling (I'll happily use some other term if you prefer).
|
Ah.. now I see where 'col' comes from..
It might be good to add a similar line in your README.rst as well to show what your starting point is. That makes sense... thanks. |
Yes.. and the work that's been done with VODML, the revamp of these models, the annotation syntax instead of UTypes has all been directed at resolving those problems.
Datamodels, written out, with consideration of their use cases.
These workshop cases have several levels of complexity in the data structure. Implementing these with your proposal would go a long way to supporting your argument. Just with the TimeSeries case we now have:
So far, we haven't seen that the decoupled models approach works at all in these other cases. |
On Thu, May 06, 2021 at 08:04:39AM -0700, Mark Cresitello-Dittmar wrote:
> What would count as "formal model backing"?
Datamodels, written out, with consideration of their use cases.
Since I can't stand the existing tooling for VO-DML, and writing something
that works nicely for me is a bit of a chore: Would it suffice if I
have, say, python class statements?
> refuse to do, but that is completely independent of model
> architecture and a question of how complex we want our annotation
> to be. So... what kind of complex data are you concerned about?
These workshop cases have several levels of complexity in the data structure. Implementing these with your proposal would go a long way to supporting your argument. Just with the TimeSeries case we now have:
1. the simple TimeSeries that you provided: 1 Table = 1 LightCurve
2. ZTF TimeSeries : 1 Table = n LightCurves (one per source in the field-of-view)
3. GAIA multiband: 1 Table = m*n LightCurves (one per band per source) with compact structure
So far, we haven't seen that the decoupled models approach works at
all in these other cases.
As I said, the Gaia serialisation is so denormalised that it's simply
no table, and the DPAC folks will happily change their serialisation
format if we tell them what to do and what to avoid. Actually, Gaia
CU5 (which produces these time series) unsurprisingly uses properly
normalised tables internally.
Now, we can add some normalisation layer (preferably SQL-based) for
denormalised, non-tabular data, but that's neither annotation nor
modelling, and as that's complex and difficult, it's prudent to not
try this on the first go after quite a few failures. Conversely: Try
to make a spec work for broken things you you'll get a broken
spec.
The ZTF thing I can't find in the dm-usecases, but again this sounds
more like a normalisation issue; if you deliver the equivalent of a
```
SELECT * FROM
ts_1
UNION
ts_2
UNION
ts3
```
it stands to reason that pulling these tables apart again is best
done using relational algebra (which produced them) rather than
writing your model to account for this kind of thing.
More importantly, though: You can still annotate photometry, errors
and positions on this, even if you can't do a time series annotation
(because it isn't a time series). And that's the big bonus of
isolated DMs: they can be produced and consumed whenever something
approriate is in view, not only if "the whole thing is a time series"
(and regardless of how many bells and whistles you attach to either
your models or your annotation syntax, there will always be plenty of
tables that look just like a time series but in the end aren't
quite).
|
On Thu, May 06, 2021 at 07:29:11AM -0700, Mark Cresitello-Dittmar wrote:
> No. When you say: ``` table["mag_g"].get_annotations("phot:PhotCal") ``` you get the annotation(s) *exactly* for mag_g and for nothing else. No need to guess.
Ah.. now I see where 'col' comes from..
```
ann = col.get_annotations("phot:PhotCal")[0].filterIdentifier
```
It might be good to add a similar line in your README.rst as well to show what your starting point is.
Question: table["mag_g"] returns the FIELD? and get_annotations() returns the PhotCal block associated with the same FIELD.
Yes on both. I'll see if I can make the README a bit more explicit.
|
here it is GROUPING, JOINING and FILTERING rows are basic operations that do not need a complex algebra. |
Saying that DM is a 10 years failure is an overstatement. STC1.33 was not an integrated model, however it has been only used for as STS-S (great idea before the MOCs) and as a documentation base. The Markus proposal to serialize it didn't succeed. I don't know why. The main reason for this situation is more likely to be searched in the lack of enthusiasm from both providers and developers to work with annotated VOTables. I won't rewrite the history here, but I just hope we will find an good (*) exit way. (*) anything different from a definitive let's get ride of this modeling stuff |
On Fri, May 07, 2021 at 01:28:30AM -0700, Laurent MICHEL wrote:
> The ZTF thing I can't find in the dm-usecases
[here](https://github.com/ivoa/dm-usecases/blob/main/usecases/time-series/raw_data/TimeSeriesZTF.xml) it is
Ok -- except for the transformation necessary to actually make this a
time series (i.e., splitting it up per oid) and which I keep
maintaining is beyond what annotation should do), I don't see a
particular complication -- this basically works like my gaia time
series example, except that you have two independent axes, so you'd
have two cube groups, somewhat like this (again, SIL shorthand):
```
(ndcube:Cube) {
independent_axes: [hjd]
dependent_axes: [mag]
}
(ndcube:Cube) {
independent_axes: [mjd]
dependent_axes: [mag]
}
```
Well, and the photCal wouldn't be immediate:
```
(phot:PhotCal) {
filterIdentifier: @FilterCode
zeroPointFlux: @magzp
magnitudeSystem: ???
effectiveWavelength: ???
value: @mag
}
```
Or am I missing something?
GROUPING, JOINING and FILTERING rows are basic operations that do
not need a complex algebra.
Well, things become complex when you consider the various conditions
you need to express here, think about inner and outer joins, etc --
and, really, that *is* an algebra. Be that as it may: fortuantely
annotation can be cleanly separated from structural modification, and
we really should use this opportunity. Let's not encumber takeup of
the basic features by requiring our adopters to re-implement a
significant part of a SQL engine.
Once we've got people hooked, selling them structural transformations
(in whatever way) is going to be much easier.
These are 3 statements setup to resolve well identified cases:
1- data of different objects mixed in one table (if you dislike
ZTF, Gilles showed up Vizier
[examples](#23 (comment))).
2- data spread over multiple tables. IMO it is wise to consider
now that some data providers will soon take advantage of the
multi-table support in VOTable to publish complex data set (with
1-n relations)
Oh, case 2 is actually sane and non-legacy. It can be handled just
fine with just annotation (saying something like "x is a foreign key
in y").
|
The conditions are perfectly set in the spec.
Namely a JOIN |
I've always assumed the following has been the (rough) consensus agenda behind the effort, but I'm starting to realise that that is not necessarily the case, so I'm trying to raise the following (IMHO) baseline use case as an explicit issue.
"Let a client figure out non-VOTable metadata for a column independently of whether they are in a time series, a spectrum, or some generic table."
Examples (all assuming "if present in the table", of course):
Again, and that is important: Clients must not need to know a full data model (e.g., "timeseries") to do that. The rationale behind that is that client writers will curse us if they will have to re-implement "find an error for a value" again and again just because there's a new product type. And the data providers will curse us if they cannot annotate a value/error pair just because it's, say, a redshift and there's no "data model" for that.
I'll note here I will have a very hard time singing off to anything that does not show code that actually lets clients do these things and that will credibly make it to astropy and/or STIL.
I'll shamelessly link again to https://github.com/msdemlei/astropy, which is an astropy fork that can pull such information from annotation produced right now by DaCHS' time series code. An analogous astropy (or STIL, if you prefer) fork for the other proposals would really help to give me some confidence in them.
I'll freely admit that my fork clearly isn't ready for PR yet (it was hacked in 1 1/2 afternoons). But I claim, can be made ready for a PR with ~ a week of concentrated work, with write and programmatic annoation support possible in perhaps another week, and something like that would of course be enough to convince me of the other proposal's viability (as long as all the features proposed are exercised, which is particularly true for the SQL-in-XML part if you do insist on that).
Sorry for being a bit obstinate here, but I have apologised far too often that 20 years into the VO we still cannot give a reliable recipe for "what's the dec for this ra", and if we again fail to create that it would be a real shame.
After the introduction: Do people roughly agree these are use cases that our annotation, and the models, ought to support?
The text was updated successfully, but these errors were encountered: