Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: fix gaia query #387

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

bsipocz
Copy link
Member

@bsipocz bsipocz commented Mar 7, 2025

The gaia server has changed the data products on server side, and COMBINED is not available any more.

However, neither RAW nor INDIVIDUAL does exactly what we would like to have. The latter returns multiple VOTables in the response with a key that contains the source_id. So I went with the other option and used RAW. However, in this case there is one row per object, with multidim cell for the flux/time/etc values.

I spent some time trying to massage it into a form that makes pandas happy, removed usage of np.ma.MaskedArray, and then also removed the masked in a much earlier step, nevertheless it's still unhappy, so I suspect it is just unable to handle multidim cells.

Opening this PR, so others with a much deeper understanding of pandas can chime in. Ultimately I feel that it's less hackary to parse the source_id out of the keys and go with INDIVIDUAL...

The PR also has some unrelated commits for cleanup, but I kept the commits separated and well described to it should be eary to select which one to review.

closes #375

@bsipocz
Copy link
Member Author

bsipocz commented Mar 7, 2025

OK, the function now finishes without errors, but the notebook is still complaining in the plotting cell, some dtypes may be still off.

@jkrick
Copy link
Contributor

jkrick commented Mar 7, 2025

Thanks for taking this on Brigitta! I can see why the confusion here about a) Gaia only having one band and also 2) the datalink VOTable stuff. In the COMBINED data structure, Gaia used to return flux and band as two separate columns. Looks like the other data structures are different. This all seems unnecessarily complicated to me, but probably I don't know the full extent of why they have these data structures and datalink confusion in the first place.

Anyway, here is the Gaia tutorial for how to download large amounts of data: https://www.cosmos.esa.int/web/gaia-users/archive/datalink-products#datalink_jntb_get_above_lim

I would suggest we follow that and use the INDIVIDUAL data structure (mostly because that was Gaia
s recommendation, and they know their dataset best).

The columns we want are listed in their 'plot_e_phot' function ['g_transit_time', 'g_transit_mag', 'bp_obs_time', 'bp_mag', 'rp_obs_time', 'rp_mag']. I do not know why they called the g band times transit times and the other bands obs times, but there must be a reason.

@bsipocz
Copy link
Member Author

bsipocz commented Mar 7, 2025

I suppose the naming is a focal plane thing, the g mags are obtained while the sources transiting through multiple astrometry detectors.
Anyway, I think it's mostly sorted out now.

@bsipocz
Copy link
Member Author

bsipocz commented Mar 7, 2025

But otherwise I agree, that COMBINED was the most intuitive and useful data structure.

@bsipocz
Copy link
Member Author

bsipocz commented Mar 8, 2025

This should be good to go now.

@bsipocz bsipocz marked this pull request as ready for review March 8, 2025 01:10
@bsipocz bsipocz requested a review from troyraen March 8, 2025 01:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Gaia module in light curve generator bad request
2 participants