-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: fix gaia query #387
base: main
Are you sure you want to change the base?
BUG: fix gaia query #387
Conversation
OK, the function now finishes without errors, but the notebook is still complaining in the plotting cell, some dtypes may be still off. |
Thanks for taking this on Brigitta! I can see why the confusion here about a) Gaia only having one band and also 2) the datalink VOTable stuff. In the COMBINED data structure, Gaia used to return flux and band as two separate columns. Looks like the other data structures are different. This all seems unnecessarily complicated to me, but probably I don't know the full extent of why they have these data structures and datalink confusion in the first place. Anyway, here is the Gaia tutorial for how to download large amounts of data: https://www.cosmos.esa.int/web/gaia-users/archive/datalink-products#datalink_jntb_get_above_lim I would suggest we follow that and use the INDIVIDUAL data structure (mostly because that was Gaia The columns we want are listed in their 'plot_e_phot' function ['g_transit_time', 'g_transit_mag', 'bp_obs_time', 'bp_mag', 'rp_obs_time', 'rp_mag']. I do not know why they called the g band times transit times and the other bands obs times, but there must be a reason. |
I suppose the naming is a focal plane thing, the g mags are obtained while the sources transiting through multiple astrometry detectors. |
But otherwise I agree, that COMBINED was the most intuitive and useful data structure. |
This should be good to go now. |
The gaia server has changed the data products on server side, and
COMBINED
is not available any more.However, neither RAW nor INDIVIDUAL does exactly what we would like to have. The latter returns multiple VOTables in the response with a key that contains the
source_id
. So I went with the other option and usedRAW
. However, in this case there is one row per object, with multidim cell for the flux/time/etc values.I spent some time trying to massage it into a form that makes pandas happy, removed usage of np.ma.MaskedArray, and then also removed the masked in a much earlier step, nevertheless it's still unhappy, so I suspect it is just unable to handle multidim cells.
Opening this PR, so others with a much deeper understanding of pandas can chime in. Ultimately I feel that it's less hackary to parse the source_id out of the keys and go with
INDIVIDUAL
...The PR also has some unrelated commits for cleanup, but I kept the commits separated and well described to it should be eary to select which one to review.
closes #375