Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Baseline use cases #36

Open
msdemlei opened this issue Apr 16, 2021 · 38 comments
Open

Baseline use cases #36

msdemlei opened this issue Apr 16, 2021 · 38 comments

Comments

@msdemlei
Copy link
Contributor

I've always assumed the following has been the (rough) consensus agenda behind the effort, but I'm starting to realise that that is not necessarily the case, so I'm trying to raise the following (IMHO) baseline use case as an explicit issue.

"Let a client figure out non-VOTable metadata for a column independently of whether they are in a time series, a spectrum, or some generic table."

Examples (all assuming "if present in the table", of course):

  • If I see an ra, figure out the corresponding dec
  • If I have an ra/dec, figure out the reference system, the reference position, the epoch
  • If I have a position, figure out an associated proper motion
  • If I have any sort of quantity, figure out the/a error (and, eventually its sort, perhaps one day distribution paraemters, etc)
  • If I have a flux or magnitude, figure out the band it's in, perhaps a zeropoint, etc.

Again, and that is important: Clients must not need to know a full data model (e.g., "timeseries") to do that. The rationale behind that is that client writers will curse us if they will have to re-implement "find an error for a value" again and again just because there's a new product type. And the data providers will curse us if they cannot annotate a value/error pair just because it's, say, a redshift and there's no "data model" for that.

I'll note here I will have a very hard time singing off to anything that does not show code that actually lets clients do these things and that will credibly make it to astropy and/or STIL.

I'll shamelessly link again to https://github.com/msdemlei/astropy, which is an astropy fork that can pull such information from annotation produced right now by DaCHS' time series code. An analogous astropy (or STIL, if you prefer) fork for the other proposals would really help to give me some confidence in them.

I'll freely admit that my fork clearly isn't ready for PR yet (it was hacked in 1 1/2 afternoons). But I claim, can be made ready for a PR with ~ a week of concentrated work, with write and programmatic annoation support possible in perhaps another week, and something like that would of course be enough to convince me of the other proposal's viability (as long as all the features proposed are exercised, which is particularly true for the SQL-in-XML part if you do insist on that).

Sorry for being a bit obstinate here, but I have apologised far too often that 20 years into the VO we still cannot give a reliable recipe for "what's the dec for this ra", and if we again fail to create that it would be a real shame.

After the introduction: Do people roughly agree these are use cases that our annotation, and the models, ought to support?

@gilleslandais
Copy link
Collaborator

I am agree that more simple is the output more it will be adopted by clients.
However, I don’t think that just gather a quantity with its error is enough – in the example I sent previously (https://github.com/ivoa/dm-usecases/wiki/mango) , I listed some examples of tables having columns with associated error, flags, ref..

why Mango ?
For me, the most important is that Mango is capable to expose measures – measure which can be composed by a column and associated columns like error, flags, etc. (and each measure even if it is a collection of columns is defined with a couple UCD+semantic – easily understandable by clients). It is well adapted for VizieR tables – enough generic also. Is it also adapted for other data-centers ?

With Mango, you can annotate with generic measure or you can use a DataModel (in my examples , I use the both: for position and photometry I use DM, and for other quantities GenericMeasure).
For me the question is more: what are the (trusted) DM expected in Mango ?

Note that I am not a fan of “everything in DM” because my feeling is that they are more complex to manage/parse – but a minimum why not ? (may be needed for photometry and always with the condition that nothing is required)

About complex structure (like timeseries): for me it is a bonus – possible with Mango!

You have an astropy version to parse vodmlinstance ? That is a good new - It could be use to demonstrate the benefit of the current work – could we imagine a mango extension possible :-) (limited in a first step to the Mango core) ?

@msdemlei
Copy link
Contributor Author

msdemlei commented Apr 19, 2021 via email

@lmichel
Copy link
Collaborator

lmichel commented Apr 19, 2021

I believe that the use-case agenda hasn't changed:

  1. Showing whether MCT/PhotDM cover our need
  2. Adopting a mapping strategy
  3. Adopting a mapping syntax

Honestly, IMO putting forward a library prototype does not help that much for the discussion.

I won't rehash my argument in this post but I'll ask the true question which relates to the exact scope of the Markus's proposal.

The annotation block starts with a list of the models on which data are mapped.

  • If that list contains Meas and Coord, clients will understand they can find both Meas and Coord instances in the mapping block.
  • If that list contains Meas, Coord and Cube, clients will understand they can find Meas , Coord and Cube instances in the mapping.
    • They are is free to ignore classes of one of these models (likely Cube for the Markus's client)
    • The syntax is clean enough to easily retrieve any object by type or role or both (e.g. looking for POSITION instances within a CUBE). It's a matter of XPath.

This clearly shows that technically both approaches can live together.
I come to the conclusion that the real question asked by @msdemlei is to say whether the IVOA have to get ride of integrated models (CUBE, MANGO,..) or not.

  • If YES: this must be set as a headline of the workshop.
  • If NO: let's consider that both are valid and let's refine the syntax to facilitate the job for all.
    This question is not only a matter of annotation

@msdemlei
Copy link
Contributor Author

msdemlei commented Apr 19, 2021 via email

@mcdittmar
Copy link
Collaborator

mcdittmar commented Apr 20, 2021 via email

@msdemlei
Copy link
Contributor Author

msdemlei commented Apr 21, 2021 via email

@Bonnarel
Copy link
Contributor

@msdemlei wrote

This is related to the question I'm asking, yes. But I'd not worry about God models (note: Cube doesn't need to be one; if it just points to the dependent and independent axes and leaves the rest to other DMs, it'll be a very compact and tidy DM) if there is a clear way in which to satisify these basic use cases in spite of them, and with a clear vision of what happens if we do have a major version change in one of our fundamental DMs.

Cube is much more than that, as far as I remember. Dependent and independent axes fit well with TimeSeries and Spectrum dataproduct types, but in ND Cubes all but one axes are independant and in event list everything is independant.
Important and different question : are the axes sparsed or regularly sampled ? event list are sparsed, TimeSeries may be, ND cubes are regular, etc...

  • If YES: this must be set as a headline of the workshop. - If NO: let's consider that both are valid and let's refine the syntax to facilitate the job for all. This question is not only a matter of annotation
    I also suspect that entangled DMs cannot be fixed in the annotation, which is where the thread last October came from. But perhaps that is just me -- which is why some sample implementation would help me a lot. If entangled models can be done in a way that don't impact client operations for the use cases presented, there's not much I'm worried about (but I'm still rather convinced programmers like Lego bricks better than pre-assembled models). So, yes, I would propose "Bricks or Death Stars" as the workshop's headline.

Of course if you change one of the models the annotation will change and clients will have to adapt. But this is true wether the models are entangled or not. Or there is something I do not understand

@mcdittmar
Copy link
Collaborator

mcdittmar commented Apr 21, 2021 via email

@msdemlei
Copy link
Contributor Author

msdemlei commented Apr 22, 2021 via email

@msdemlei
Copy link
Contributor Author

msdemlei commented Apr 22, 2021 via email

@mcdittmar
Copy link
Collaborator

mcdittmar commented Apr 22, 2021 via email

@msdemlei
Copy link
Contributor Author

msdemlei commented Apr 23, 2021 via email

@lmichel
Copy link
Collaborator

lmichel commented Apr 23, 2021

@msdemlei

But if I get shown plausible ways how they can do that with a
reasonable effort, I'll stop whining immediately.

I spent last 2 days on an implementation aiming at satisfying your request . This implementation, based on Mango, can be used at different model friendships levels.
1- Doing nothing with the model
2- Just take the frames/filters in the globals, as you do.
3- Picking MCT instances by exploring the map of the MANGO parameters as I suggested for you.
4- Get the binding between model leaves and fields (e.g. pos + error is mapped on columns 2, 3, 9). That way, clients can read the data with their usual APIs and provide on demand data counterpart onto the model. This is close to what you are doing as well.
5- Directly getting data rows as model instances. I know that this way to proceed won't replace what is working very well so far, but I bet it will be attractive when data will get more complex (e.g. multiple objects in one table, joint data tables)

@lmichel
Copy link
Collaborator

lmichel commented Apr 23, 2021

@msdemlei

Conversely, I'd be interested in your opinions about the API I'm
proposing over at https://github.com/msdemlei/astropy

My opinion is that we roughly agree on HOW to consume annotated data

  • having getters (selectors indeed) parametrized with model elements (or mapping keys).
  • I presented such API in JAVA in a TDIG session at Victoria (AFAR)

but we definitely disagree on WHAT to consume.

  • e.g. Cube instances vs sparse sets of properties

@mcdittmar
Copy link
Collaborator

if you only understand Meas and PhotDM, you can ( if using the rama package )
mags = doc.find_instances( Photometry ) << find the photometry Measure
band = mags[0].coord.coord_sys.frame.filter.name << band name of the PhotometryFilter
* assuming photDM:PhotometryFilter if off the CoordFrame.

Can I see this in a complete script? I'm wondering at the moment where the Photometry symbol comes from.

The Standard Properties case: Annotated CSC file
The file has several PhotometryFilter bands, and FIELD 'col15' annotated as a meas:Photometry instance from EB1.
NOTE: These PhotometryCoordSys elements point to the PhotometryFIlter instances as per the models.

  <INSTANCE dmtype="mango:coordinates.PhotometryCoordSys" ID="_photsys_EB1">
  <INSTANCE dmtype="mango:coordinates.PhotometryCoordSys" ID="_photsys_EB2">
  <INSTANCE dmtype="mango:coordinates.PhotometryCoordSys" ID="_photsys_EB3">
  <INSTANCE dmtype="mango:coordinates.PhotometryCoordSys" ID="_photsys_EB4">

          <INSTANCE dmtype="mango:measures.Photometry">
            <ATTRIBUTE dmrole="mango:measures.Photometry.coord">
              <INSTANCE dmtype="mango:coordinates.PhotometryCoord">
                <ATTRIBUTE dmrole="mango:coordinates.PhotometryCoord.luminosity">
                 <COLUMN dmtype="ivoa:RealQuantity" ref="col15"/>
                </ATTRIBUTE>
                <REFERENCE dmrole="coords:Coordinate.coordSys">
                  <IDREF>_photsys_EB1</IDREF>
                </REFERENCE>
              </INSTANCE>
            </ATTRIBUTE>
          </INSTANCE>

Python Script:

#!/usr/bin/env python
from rama.reader import Reader
from rama.reader.votable import Votable
from rama.models.mango import Photometry

infile = "ivoa_csc2_example_annotated.vot"
doc = Reader( Votable(infile) )

flux = doc.find_instances(Photometry)[0]
band = flux.coord.coord_sys.frame.name

print("Flux: {} [band: {}]".format( str(flux.coord.luminosity[0]), band) )

Produces:
Flux: 1.7599881487057e-14 erg/s/cm^2 [band: CHANDRA/ACIS.broad]

Summary:
The rama package interprets the annotation and generates python classes.
As far as accessing the instances and using them goes, it makes NO DIFFERENCE, if the Photometry data is in a TimeSeries, Source, or are stand-alone. The package finds the nodes annotating the class and returns the instances.

If you only understand PhotDM (and not Meas), not sure how you found 'mag', but.. you can still find all PhotometryFilters filters = doc.find_instances( PhotometryFilter ) band = filters.name If there is only 1 filter, that must apply to your mag, otherwise you have some troubles.

I hope you'll understand that this statement unnerves me a bit given we're talking about the solution to our 20-years-old coordinates troubles here...

What is unnerving? The model associates the 'col15' flux Measure with the proper PhotometryFilter. With that relation defined, finding the band becomes trivial, without it there is. no way to know which of the 4 bands is relevant.. you really can't do it.

So, why not attach the PhotometryFilter directly to the magnitude? It's clearly more robust and works just as well, no?

That could be done, but the models try organize the concepts. The PhotometryFilter is not a direct property of the Photometry Measure, but rather associated metadata on the environment which produced the value. This sort of information is relegated to the associated coordinate Frame in all Measures.
I could certainly see adding a helper method to the Photometry class which short-cuts this for users.

@msdemlei
Copy link
Contributor Author

msdemlei commented Apr 28, 2021 via email

@msdemlei
Copy link
Contributor Author

msdemlei commented Apr 28, 2021 via email

@lmichel
Copy link
Collaborator

lmichel commented Apr 28, 2021

  • mango_browser.get_parameters() returns a dictionary of all parameters with their frames. This dictionary embeds FIELD identifiers of the mapped columns (id = FIELD id, index = FIELD position)

see

  • mango_browser.get_data(limit=1) returns an object (among others things) binding the column numbers with the keys of the above dictionary.

So that you get connections between model parameters and VOTable columns.

ex: https://github.com/ivoa/modelinstanceinvot-code/blob/main/python/client/demo/ivoa_csc2_example.py

@mcdittmar
Copy link
Collaborator

On Fri, Apr 23, 2021 at 10:18:23AM -0700, Mark Cresitello-Dittmar wrote:
Python Script:
(snip)

Ah, that's where the Photometry symbol comes from. So... that's generated code? I'll frankly say that anything requiring code generation gets large demerits in my book, and I think in many other peoples' books.

I don't see the relevance of the code origin to this discussion. The system in place has proved very useful for keeping the code current with the model changes. There is no requirement for using code generators.

You asked to see the script which gets the filter name from the magnitude..
What is the corresponding thread in your implementation?

  • With your proposal of a single Measurement type and uncoupled models, what makes the connection between the 'flux' Measurement and the corresponding Filter?
  • I see in your TS example the PhotCal instance has added an attribute to the PhotCal object for 'value' pointing to the flux column.. which is not part of the photDM spec (values are not part of PhotCal). I think we hit on this before.

So, why not attach the PhotometryFilter directly to the magnitude?

It occurred to me today that you probably were not advocating this approach, but merely being critical of where we DID decide to put the filters. To do as this suggests would mean generating an extension to Measurement which adds the reference to the PhotometryFilters.. This would go against both of your base assertions that 1 Measurement is sufficient, and that models should be decoupled.

If "organize the concepts" actually makes things more difficult (and may even require adding helper methods), perhaps that's another indicator that we should think again whether this "organize the concepts" is actually useful. So: Do we get a benefit (which would be: "Things we can do that we could not otherwise do")

well, one can trivially find the Photometry data and its corresponding Filter metadata..

"which clearly are not trivial, as evinced by the fact that we still can't do them 20 years in the VO"

@msdemlei
Copy link
Contributor Author

msdemlei commented May 4, 2021 via email

@mcdittmar
Copy link
Collaborator

From #15: May 4, 2021 - Markus wrote:

With just one Measurement class (or perhaps a few when we add proper
distributions), with the API of https://github.com/msdemlei/astropy,
all TOPCAT would need to do is:

ann = col.get_annotations("meas:Measurement")
if ann:
 associated_error = ann.naive_error

(or it would try a few attributes it knows how to plot).

With current Meas, it will, as far as I can see, have to do something like

MEAS_CLASSES = ["meas:GenericMeasure", "meas:Time",
    "meas:Position", "meas:Velocity", "meas:ProperMotion"]
# I'm leaving out Polarisation because it really doesn't belong here

for class_name in MEAS_CLASSES:
  ann = col.get_annotations(class_name)
  if ann:
    associated_error = ann.naive_error

And, worse, each time we invent a new Measure subclass, it will have
to amend MEAS_CLASS.

That's a high price to pay; it would be worth paying if we got a
major benefit from it. But I can't even see a minor one.

This is the 'find all measures' case.
With rama package, since these are all children of meas:Measure:

doc = Reader( Votable(infile) )
measures = doc.find_instances(Measure)

for item in measures:
    print("item type == {}, has Error == {}".format(str(type(item)), "FALSE" if item.error is None else "TRUE" ))

produces
item type == <class 'rama.models.measurements.Time'>, has Error == FALSE
item type == <class 'rama.models.measurements.GenericMeasure'>, has Error == FALSE

I'll note that this is not finding the Measures defined under the mango model, (Photometry, HardnessRatio), but I'd consider this a bug on the package.

@mcdittmar
Copy link
Collaborator

If that is true, that's reassuring. How would the client code look like without the generated code?

I have no idea..
My expectation is that the rama package will become production ready once we've settled on models and annotation.
At that point, clients can use it, or write their own parser, or use someone else's package.
Given your reaction to the code genesis, maybe you can consider it a 'worst case experience' for clients

You asked to see the script which gets the filter name from the magnitude.. What is the corresponding thread in your implementation?

Since it's the same logic as https://github.com/msdemlei/astropy#getting-an-error-for-a-column, I've not added it in the README, but it's (assuming you want to figure out the band name for a column col):

 try:
    ann = col.get_annotations("phot:PhotCal")[0].filterIdentifier 
 except IndexError:
    raise Exception(f"No photometry annotation on {col}")

The case is different in that the PhotCal and Measure are in different models.

  • given a 'flux' Measure, how do you find the corresponding filter band?
    What you show merely finds the first PhotCal instance, right?
  • With your proposal of a single Measurement type and uncoupled models, what makes the connection between the 'flux' Measurement and the corresponding Filter?

The value reference in phot:PhotCal.

  • I see in your TS example the PhotCal instance has added an attribute to the PhotCal object for 'value' pointing to the flux column.. which is not part of the photDM spec (values are not part of PhotCal). I think we hit on this before.

Yeah, that needs to be fixed; some of our DMs have to be taught to talk about what they annotate. But if we want to de-entangle them, some minor adjustments will be necessary anyway.

With your proposal, you can only accomplish this task because you've made an adhoc change to the PhotDM model which is conceptually wrong. Photometric Filters conceptually stand on their own, they do not know anything about the data that was taken using them. Many flux/mag columns can use the same filter. And presumably, many other sorts of objects can reference a PhotCal instance.

So, why not attach the PhotometryFilter directly to the magnitude?

It occurred to me today that you probably were not advocating this approach, but merely being critical of where we DID decide to put

Ummm... of course I'd like to attach the filter to the column (or param) it applies to. Why would I not?

Attaching it to the column is not the same as attaching it to the magnitude (Measure)

@lmichel
Copy link
Collaborator

lmichel commented May 5, 2021

I'm not expert in Rama package but I guess that the generated code is basically object classes matching the model elements with appropriate accessors (setter and getter).
This works fine once your object is properly set.
My concern about this approach is to know how properly setting these instances.
In the archival data world, you have to consider that some VOTable data won't ever match exactly what generated code is expecting:
e.g.

  • Numerical value given as string
  • positions given in unexpected format.

It is hard to imagine that generated code will be able to tackle with all possible cases.
In these condition, I think that a library based on generated code should be used behind some hand-made wrapper.

For this reason I prefer to allow clients to store annotations as dictionnaries (Python) and to let them to picking data there with appropriate selectors (see here).

@msdemlei
Copy link
Contributor Author

msdemlei commented May 5, 2021 via email

@msdemlei
Copy link
Contributor Author

msdemlei commented May 5, 2021 via email

@mcdittmar
Copy link
Collaborator

I'm not expert in Rama package but I guess that the generated code is basically object classes matching the model elements with appropriate accessors (setter and getter).
In the archival data world, you have to consider that some VOTable data won't ever match exactly what generated code is expecting:
e.g.

  • Numerical value given as string
  • positions given in unexpected format.

Lets not side-track this issue with a critique of the Rama package.

  • the auto-generated model classes are merely end-points presented to the user; instantiated with the find_instances() command.
  • there is 'regular' parser code which interprets the annotation and VOTable content
    • uses AstroPy QTable to interpret the VOTable data content. 
      • which conveniently creates AstroPy Quantity types for the data, along with all its Unit handling features. 
      • I did hit a bug in QTable, where it could not understand RA/DEC in HMS/DMS notation.. turns out this was reported and fixed a couple weeks ago, and is slated for a future release.
    • it is at this level that the data inconsistency handling occurs

The community has had a LOT of questions about how compatible the models are with AstroPy, one of the main goals of the implementation was to leverage AstroPy as much as possible. So it uses QTable, Quantity, has converters for SkyCoord and Time.

@lmichel
Copy link
Collaborator

lmichel commented May 5, 2021

Thank you for these clarifications.

@mcdittmar
Copy link
Collaborator

On Tue, May 04, 2021 at 09:52:05AM -0700, Mark Cresitello-Dittmar wrote: > Since it's the same logic as https://github.com/msdemlei/astropy#getting-an-error-for-a-column, I've not added it in the README, but it's (assuming you want to figure out the band name for a column col):

try: 
  ann = col.get_annotations("phot:PhotCal")[0].filterIdentifier 
except IndexError: 
  raise Exception(f"No photometry annotation on {col}") 

The case is different in that the PhotCal and Measure are in different models.

  • given a 'flux' Measure, how do you find the corresponding filter band? What you show merely finds the first PhotCal instance, right?

This is another case where entangling DMs is pernicious: Photometry shouldn't have anything to do with errors (i.e., Meas). The whole problem disappears when you drop the notion that there should be something like a FluxMeasure (or whatever, I can't find it in the 2020-04-13 PR). When you drop classes and things work even better, to mq that's a clear indication that the class shouldn't be there.

  1. The Measure containing 'flux' does have errors. In your proposal, this would be a Measure annotating the appropriate columns 'flux', 'flux_err'.
    1. Your use case is: 'If I have a flux or magnitude, figure out the band it's in, perhaps a zeropoint, etc.'
    2. so, given that you've found one of several annotations for a 'flux' measure; what is your thread for finding the corresponding band? All you've shown is that you can find a PhotCal, not necessarily the correct one.
  2. I'm not sure Photometry has been a component in any of the Meas/Coord drops.. it was in Spectral and an obvious candidate for expansion from the basics since it has an association with the PhotometryFilter.
  3. What qualifies Photometry to be a special type (rather than being handled by GenericMeasure) is that there is associated metadata.. namely the PhotometryFilter. The specialized model class tells the provider and client what the expected associated metadata is, and where to find it. And that is the crux of different approaches.

conceptually wrong. Photometric Filters conceptually stand on their own, they do not know anything about the data that was taken using them. Many flux/mag columns can use the same filter. And presumably, many other sorts of objects can reference a PhotCal instance.

That's how the model works right now, and I argue that we shouldn't be doing it that way. You see, it's conceptually as valid to say "there's photometry metadata attached to this column" -- that in reality there actually is a filter (as in the concrete thing that you put somewhere into your optical path) somewhere that has some relation to the photometry metadata doesn't mean it makes operational sense to model that particular artefact.

Everyone who has worked IVOA models to date has taken the current approach. That is a LOT of history by a lot of people. The models should define the entities involved and their relations. They need to support several different uses, from DAL protocols, to Applications, and Serializations. Your approach focuses solely on the Serialization aspect, has no formal model backing, and has only been applied to very simple data structures and cases. It is a VERY big ask to expect the working group to change its entire approach based on this.

Ummm... of course I'd like to attach the filter to the column (or param) it applies to. Why would I not?

Attaching it to the column is not the same as attaching it to the magnitude (Measure)

No, it's not, because there's an additional indirection, and you're entangling Meas and Phot. Suppose for a moment that there is -- as I claim -- a price for that entangling, and there's a price for the extra Meas class: having it on the column saves us from having to pay both.

I'm not entangling Meas and Phot, I'm defining an association (in the model) between flux data and photometry metadata.
"column" is not a model construct, it is a serialization construct (see above).

These last bits are off the issue topic, and back on the old circular debate, so I think I'll try to leave it here.

@mcdittmar
Copy link
Collaborator

This is the 'find all measures' case.

Perhaps in a somewhat roundabout sense; what TOPCAT would really want to do in this use case is "Find a Measurement instance on the column I'm about to plot", which arguably can do with a shortcut over "enumerate all measurements and see if I can find one on my column".

With rama package, since these are all children of meas:Measure:

Yeah, that's the problem: You need a complex package (with, as you say, bugs of its own) that needs to access and parse VO-DML. Without the per-physics classes all you need is a simple, model-indepentent, extension to a VOTable parser (it's a 400 lines diff in astropy/io/votable/tree.py of https://github.com/msdemlei/astropy) and have it right there, without having to deal with VO-DML as such (which, don't get me wrong, still is great, and validators will need to deal with it; it's just that normal clients shouldn't have to). I'm here because I'm convinced this sort of simplicity will eventually decide whether after we're done here we'll have everyday clients consuming our annotations -- or whether we'll go on as it's been for the past 15 years, with people ignoring our DMs and there's still no usable way to automatically plot error bars (or have epoch sliders without guessing).

Markus.. this isn't the problem.
I'm showing how this would be done IF someone were using the rama package. This does not set any requirements on anyone else to use the same approach. If TOPCAT has this job to do, and the two options are available, they'd select the best approach for their requirements.

FWIW: the rama package uses astropy.io.votable in the parser code, though not sure which pieces at the moment.

@msdemlei
Copy link
Contributor Author

msdemlei commented May 6, 2021 via email

@mcdittmar
Copy link
Collaborator

The Measure containing 'flux' does have errors. In your proposal, this would be a Measure annotating the appropriate columns 'flux', 'flux_err'.

Yes.

  1. Your use case is: 'If I have a flux or magnitude, figure out the band it's in, perhaps a zeropoint, etc.'

Yes -- which, in my world, is unrelated to the flux or magnitude also having errors.

  1. so, given that you've found one of several annotations for a 'flux' measure; what is your thread for finding the corresponding band? All you've shown is that you can find a PhotCal, not necessarily the correct one.

No. When you say: table["mag_g"].get_annotations("phot:PhotCal") you get the annotation(s) exactly for mag_g and for nothing else. No need to guess.

Ah.. now I see where 'col' comes from..

ann = col.get_annotations("phot:PhotCal")[0].filterIdentifier 

It might be good to add a similar line in your README.rst as well to show what your starting point is.
Question: table["mag_g"] returns the FIELD? and get_annotations() returns the PhotCal block associated with the same FIELD.

That makes sense... thanks.

@mcdittmar
Copy link
Collaborator

I hate to be blunt, but: that has given us essentially zero uptake. I wouldn't be here if we had a way to solve these basic uses cases, which DM should have provided by (at least) VOTable 1.2 in 2009. I'm not blaming anyone, and I've been part of the failure myself, but: If something hasn't worked out for more than 10 years, wouldn't you agree it's an indication that we've been doing it wrong?

Yes.. and the work that's been done with VODML, the revamp of these models, the annotation syntax instead of UTypes has all been directed at resolving those problems.

Applications, and Serializations. Your approach focuses solely on the Serialization aspect, has no formal model backing, and has only
What would count as "formal model backing"?

Datamodels, written out, with consideration of their use cases.
We know you've changed the models (Meas, Coords, PhotDM).. how? what do these new models look like?
PhotDM, for one, is a REC.. what are the ramifications of adding the data to PhotCal? What other objects would need to change?

The "very simple data structures" argument has been levelled a few times, but nobody has pointed me towards a case where disentangled models really work worse than entangled ones. Sure, there is the case of severely denormalised tables that I, well, refuse to do, but that is completely independent of model architecture and a question of how complex we want our annotation to be. So... what kind of complex data are you concerned about?

These workshop cases have several levels of complexity in the data structure. Implementing these with your proposal would go a long way to supporting your argument. Just with the TimeSeries case we now have:

  1. the simple TimeSeries that you provided: 1 Table = 1 LightCurve
  2. ZTF TimeSeries : 1 Table = n LightCurves (one per source in the field-of-view)
  3. GAIA multiband: 1 Table = m*n LightCurves (one per band per source) with compact structure

So far, we haven't seen that the decoupled models approach works at all in these other cases.

@msdemlei
Copy link
Contributor Author

msdemlei commented May 7, 2021 via email

@msdemlei
Copy link
Contributor Author

msdemlei commented May 7, 2021 via email

@lmichel
Copy link
Collaborator

lmichel commented May 7, 2021

The ZTF thing I can't find in the dm-usecases

here it is

GROUPING, JOINING and FILTERING rows are basic operations that do not need a complex algebra.
These are 3 statements setup to resolve well identified cases:
1- data of different objects mixed in one table (if you dislike ZTF, Gilles showed up Vizier examples).
2- data spread over multiple tables. IMO it is wise to consider now that some data providers will soon take advantage of the multi-table support in VOTable to publish complex data set (with 1-n relations)

@lmichel
Copy link
Collaborator

lmichel commented May 7, 2021

Saying that DM is a 10 years failure is an overstatement.
Suggesting that the reason for this lack of success (I do prefer) is the development of integrated models is (huummmm) biased.

STC1.33 was not an integrated model, however it has been only used for as STS-S (great idea before the MOCs) and as a documentation base. The Markus proposal to serialize it didn't succeed. I don't know why.

The main reason for this situation is more likely to be searched in the lack of enthusiasm from both providers and developers to work with annotated VOTables.
I take my part of responsibility for not having succeeded to dramatically change this situation.
This workshop is however a good step forward.

I won't rewrite the history here, but I just hope we will find an good (*) exit way.

(*) anything different from a definitive let's get ride of this modeling stuff

@msdemlei
Copy link
Contributor Author

msdemlei commented May 7, 2021 via email

@lmichel
Copy link
Collaborator

lmichel commented May 7, 2021

Well, things become complex when you consider the various conditions

The conditions are perfectly set in the spec.

Oh, case 2 is actually sane and non-legacy. It can be handled just
fine with just annotation (saying something like "x is a foreign key
in y").

Namely a JOIN

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants