-
-
Notifications
You must be signed in to change notification settings - Fork 398
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add NEOCC python interface to astroquery library #2254
base: main
Are you sure you want to change the base?
Conversation
Hello @D-arioSpace! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2024-02-07 15:54:47 UTC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello! This looks like a great new addition to astroquery!
Before I do a more thorough review, there are a couple of important changes that need to be implemented. You are using two packages: pandas
and parse
, that are not part of the astroquery dependency list (https://astroquery.readthedocs.io/en/latest/#requirements). We ask that new modules not depend on packages not already required, except in limited circumstances. Looking through your code I don't think you are doing anything that would be impossible without pandas
or parse
. I would be happy to help you rework your code to remove those dependencies, particularly figuring how to replace the pandas functionality with astropy.tables.
Dear @ceb8, it's a pleasure to meet you. I perfectly understand that this can introduce dependencies and extra-feature that you would like to avoid. Please take also into account that, apart from the lack of resources, there are also other considerations that push us to this choice:
Taking all this into consideration, please let us know if we can focus on the integration into astroquery without considering the pandas replacement. |
Hi @D-arioSpace, Unfortunately we really cannot allow a pandas dependency into our codebase, returning Astropy Tables to users is a very basic requirement in our API specs. However, we can provide the labor to make that change. If this course of action is acceptable, I will implement those changes within this PR. There is one other issue that needs to be resolved before we proceed, which is the matter of the copyright notice you have on your code:
Astroquery is released under a 3-clause BSD style license (see https://astroquery.readthedocs.io/en/latest/license.html), and individual modules cannot be released under different licenses. Would it be OK to remove this notice? We have other modules contributed by ESA developers, so I believe the 3-clause BSD license should be acceptable from that standpoint. Let me know what you think. If I get your OK I will, implement the necessary changes and then ping you so you can review my work. Thanks for your patience. |
Hello @ceb8! First of all, sorry for the delay in replying, unfortunately we had to resolve some legal issues due to the change of copyright requested. From now on, you can change the license and insert a 3-clause BSD style license anywhere in the code. Also, I would like to point out that we carried out (within this fork) another push containing the implementation of the new API provided by NEOCC web portal (https://neo.ssa.esa.int/PSDB-portlet/download?file=past_impactors_list), this time using directly astropy.table (see parse_impacted function). Thanks again for your help. We are looking forward to see this library integrated into astroquery! |
Hi @D-arioSpace! Thanks for your response, and no worries about the delay. That's very good news about the copyright issue. I am a bit confused about the new implementation you mention however, since I still see pandas imported all over. Do you mean this to be an example implementation for me? If so, thank you very much and I will get started on the work shortly. Just want to make sure not to step on your toes. |
Hello @ceb8 you're right, it is just an example, which we hope to have implemented well (we did this to become familiar with the astropy.table data structure, taking advantage of the fact that this specific function had to be implemented almost from scratch). Feel free to improve it. Dario |
@D-arioSpace Excellent, I am in this middle of one coding project, so I will start this one as soon as that one wraps up. |
@D-arioSpace I'm pushing the de-pandafied |
Hi @ceb8! I had a look to the code and it looks fine. I also tested the main functionality within the list.py module and all of them are working! I leave below the steps to reproduce the behavior I mentioned:
(2)
|
Hi @ceb8 just to inform you I added a very small fix to tabs.py and to the 433.clolin file as the latter changed and the former has been adapted to that change |
@D-arioSpace I’ve fully removed pandas from this module. In doing so I also tried to regularise the output formats, so now instead of query-specific objects the outputs are all either a single astropy table or a list of them. I haven’t updated the documentation or the tests, so this is still very much a work in progress. But I wanted to make sure we are all on the same page in terms of the functionality before working on that. I have a couple of big picture questions that @bsipocz and @keflavich may want to also weigh in on:
impacts_table = neocc.query_object(name='1979XB', tab='impacts')
pd_tble.meta
|
Hi @ceb8 sorry for the big delay in answering and many thanks for the update of this interface! Regarding your big picture questions, I personally don't have any issues with the current behavior of query_object returning either a single table or a list of tables, but it would be good to get feedback from @bsipocz and @keflavich as well, since they are the experts in this area. Regarding the two table types that require additional arguments, I'm okay with leaving them as is for now, but if anyone else has strong opinions on this, we can revisit it later. In terms of functionality and output formats, I don't have any major issues with them. However, I did notice an issue with error handling when calling query_object with an invalid tab name, for example: The error message that is returned is not very clean and includes multiple repeated messages. Additionally, I encountered an error when trying to retrieve observations for object "433" (i.e. neocc.query_object(name='433', tab='observations') ). It looks like there may be a bug in the code related to radar observations and the "AsteroidObservation" class, in the “_get_rad_into” method. It would be helpful to investigate this further and see if we can resolve the issue (please let me know if there's anything I can do to help. I anticipate having more time available in the near future to dedicate to this project, so please don't hesitate to reach out to me if there's anything I can assist with.). Overall, thanks for the updates! Let me know if there's anything I can do to help. |
Hi @ceb8 I was wondering if the work is progressing smoothly or if I could assist you in clarifying any doubts or concerns you might have. |
@D-arioSpace Thanks for checking in, I'm just getting back to this after a busy period. Seems pretty straightforward. I'll leave things as they are for now, fix the bugs, and move forward updating the testing to match the new functionality. |
Thank you very much @ceb8, I confirm my availability to clarify any doubts you may have about files and functions. I hope to hear from you soon then! Thanks again. |
Hi @ceb8 sorry for bothering you again. I'm working with NEOCC data for some projects and I was wondering if you believe it would be possible to have the code integrated into astroquery by September. This integration would greatly benefit these other projects, as they would be able to utilise the astroquery package directly instead of relying on the "standalone" version for NEOCC data. Thank you again for your time and effort! |
@D-arioSpace That seems like a reasonable timeframe. |
@bsipocz @keflavich Could I get your opinions on the big picture questions I mentioned above? |
@D-arioSpace Looking into the |
Yes, that's a good idea if there are two distinct routes that both make sense. For some services, like Vizier, queries are supposed to return lists of catalogs, so it just makes sense that that's the return.
No strong opinion here.
I'm fine with the approach that is less work now.
At a glance, that seems fine to me, but I'm not a NEOCC user - it would be better to find people using the data and get their opinions. From a structural perspective, this seems fine.
Maybe we should silence those warnings in this context, since this is a context where far past/future dates are expected. |
Thanks @keflavich! To your first point, I can't tell which paradigm you are advocating. Always returning a list? |
I'm ambivalent, actually - always returning a list is right if we're sticking with one function, but I'm also OK with splitting into two functions if there is an understandable difference between two approaches. I would avoid having one function return lists sometimes and tables other times. |
Hello @ceb8 ,
returns me an error. I investigated it a bit, and it is due to an extra line NEOCC added at the beginning of the close approach file of each asteroid. To solve it, you can tell the program to skip the first line in the pd.read_csv, in class CloseApproaches in the static method clo_appr_parser(data_obj):
Additionally, we have a similar issue with the call:
which returns me the following error:
This time I'm not sure where the error come from, I'm still investigating it. Furthermore, we have recently incorporated an additional column into the close approach lists (both the recent and upcoming lists). This new column, a float value, indicates the frequency of a specific event based on the miss distance from Earth, the size of the asteroid, and its relative velocity. It is important to note that both calls still function properly, so if you believe it is worthwhile to include this information, feel free to do so. |
@ceb8 - There is a lot of commits here, I would even guess duplicates, certainly duplicated commit messages, so I wonder whether you could cleanup the PR, rebase and squash things out a bit? Also, I wonder whether the total diff of ~150k is necessary, I assume it's due to bundled data files, but should they be this many/big? |
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #2254 +/- ##
==========================================
+ Coverage 66.77% 67.77% +0.99%
==========================================
Files 237 246 +9
Lines 18303 19537 +1234
==========================================
+ Hits 12222 13241 +1019
- Misses 6081 6296 +215 ☔ View full report in Codecov by Sentry. |
I don't see why you say it's an alfalfa issue, to me it looks more to be a generic rtd problem and therefore I restarted the build. If it keeps failing I would suggest trying to sort out the problem with local docs builds, it's just easier to debug than waiting on RTD in CI. The windows failure is due to windows having int32 as default rather than int64. So either the code needs to be explicit to use int64 for int columns (if you have long int IDs it may be necessary, as I recall e.g. gaia or SDSS IDs doesn't fit into the 32bit), or the test should be more generic and check for int rather than int64. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall things look to be in a very good shape, thank you both of you for the hard work.
I've left some generic comments, and will do one more careful check when the docs builds and windows issues are sorted out.
Thank you @bsipocz @ceb8 for your effort! I briefly tried all the features of the library, and they all seem well implemented. In these days, I will invite some of my colleagues to play with this fork and see if we can identify any errors before integration. Regarding the warnings astropy Time, I tend for keeping them because in my opinion it's better to make the user aware of the inherited warnings from the library rather than silencing them. |
0de16ec
to
2faac54
Compare
@D-arioSpace I've got all the tests passing and addressed the ones of @bsipocz's comments that I can. All that needs to happen now is that you address the outstanding comments, and anything else that comes up when @bsipocz does a more careful check. |
Hi everyone, my name is Alvaro and I was involved in the development of this functionality. I will try to answer the remaining questions. |
Hello, thank you @Galaksio for clarifying the points. From a functional point of view, the library seems okay. I wonder, @ceb8 and @bsipocz if any input is needed from our side and what is the plan (basically to warn people who now use the standalone version to switch to the one integrated into astroquery). |
Rebased and addressed @bsipocz's comments. |
5fc2b0b
to
d3a087a
Compare
Hi @ceb8 I just had a look at the changes you introduced. Thank you! I also see that in the last push, the tests all passed. At this point I just want to ask if there is anything else to do or if you know approximately when the package will be integrated, just to inform some users and collogues who are using the stand_alone version to switch to the astroquery integrated version. Thank you again for your effort :) |
@D-arioSpace thank you for the work! |
This is a python interface to data that the Near Earth Objects Coordination Centre (NEOCC) makes available through its API service (https://neo.ssa.esa.int/computer-access).