-
-
Notifications
You must be signed in to change notification settings - Fork 397
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update download URLs for SDSS DR18 #3017
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #3017 +/- ##
==========================================
- Coverage 67.60% 67.30% -0.31%
==========================================
Files 233 231 -2
Lines 18271 18280 +9
==========================================
- Hits 12352 12303 -49
- Misses 5919 5977 +58 ☔ View full report in Codecov by Sentry. |
Hello @weaverba137! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2024-06-28 00:05:38 UTC |
This is ready for review. Before merging there is an important question to consider: The I've been informed that in DR19 there will be a unified way to access all SDSS/BOSS/eBOSS/eFEDS spectra in a single table, so I propose possibly documenting the Summary of changes:
I can squash commits if necessary. There was a bit of churn getting tests to work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These all look great, than you so much for the fix.
Two things, neither of which I would hold this PR up for:
- Can/should we change the default DR to 18?
- Running the remote tests I got a different exception for
test_query_timeout
================================================================================= FAILURES =================================================================================
____________________________________________________________________ TestSDSSRemote.test_query_timeout _____________________________________________________________________
self = <urllib3.connectionpool.HTTPSConnectionPool object at 0x11bd9d430>, conn = <urllib3.connection.HTTPSConnection object at 0x11b2b0b00>, method = 'GET'
url = '/dr17/en/tools/search/x_results.aspx?cmd=+SELECT%0D+p.ra%2C+p.dec%2C+p.objid%2C+p.run%2C+p.rerun%2C+p.camcol%2C+p.fie...8%28%28p.ra+BETWEEN+2.02319+AND+2.02374%29+AND+%28p.dec+BETWEEN+14.8395+AND+14.8401%29%29%29&format=csv&searchtool=SQL'
timeout = Timeout(connect=0.01, read=0.01, total=None), chunked = False
httplib_request_kw = {'body': None, 'headers': {'User-Agent': 'astroquery/0.4.8.dev9345_testrun_testrun Python/3.12.1 (Darwin) python-reque...=HCNLOJOANGIBHJGAJMLLPOKI; ASPSESSIONIDSQSCCQRQ=ICNLOJOAOBLOGAIACNDFMJJD; ASP.NET_SessionId=nlotaiz2ednb1acriy14pjms'}}
timeout_obj = Timeout(connect=0.01, read=0.01, total=None), read_timeout = 0.01
def _make_request(
self, conn, method, url, timeout=_Default, chunked=False, **httplib_request_kw
):
"""
Perform a request on a given urllib connection object taken from our
pool.
:param conn:
a connection from one of our connection pools
:param timeout:
Socket timeout in seconds for the request. This can be a
float or integer, which will set the same timeout value for
the socket connect and the socket read, or an instance of
:class:`urllib3.util.Timeout`, which gives you more fine-grained
control over your timeouts.
"""
self.num_requests += 1
timeout_obj = self._get_timeout(timeout)
timeout_obj.start_connect()
conn.timeout = Timeout.resolve_default_timeout(timeout_obj.connect_timeout)
# Trigger any extra validation we need to do.
try:
self._validate_conn(conn)
except (SocketTimeout, BaseSSLError) as e:
# Py2 raises this as a BaseSSLError, Py3 raises it as socket timeout.
self._raise_timeout(err=e, url=url, timeout_value=conn.timeout)
raise
# conn.request() calls http.client.*.request, not the method in
# urllib3.request. It also calls makefile (recv) on the socket.
try:
if chunked:
conn.request_chunked(method, url, **httplib_request_kw)
else:
conn.request(method, url, **httplib_request_kw)
# We are swallowing BrokenPipeError (errno.EPIPE) since the server is
# legitimately able to close the connection after sending a valid response.
# With this behaviour, the received response is still readable.
except BrokenPipeError:
# Python 3
pass
except IOError as e:
# Python 2 and macOS/Linux
# EPIPE and ESHUTDOWN are BrokenPipeError on Python 2, and EPROTOTYPE is needed on macOS
# https://erickt.github.io/blog/2014/11/19/adventures-in-debugging-a-potential-osx-kernel-bug/
if e.errno not in {
errno.EPIPE,
errno.ESHUTDOWN,
errno.EPROTOTYPE,
}:
raise
# Reset the timeout for the recv() on the socket
read_timeout = timeout_obj.read_timeout
# App Engine doesn't have a sock attr
if getattr(conn, "sock", None):
# In Python 3 socket.py will catch EAGAIN and return None when you
# try and read into the file pointer created by http.client, which
# instead raises a BadStatusLine exception. Instead of catching
# the exception and assuming all BadStatusLine exceptions are read
# timeouts, check for a zero timeout before making the request.
if read_timeout == 0:
raise ReadTimeoutError(
self, url, "Read timed out. (read timeout=%s)" % read_timeout
)
if read_timeout is Timeout.DEFAULT_TIMEOUT:
conn.sock.settimeout(socket.getdefaulttimeout())
else: # None or a value
conn.sock.settimeout(read_timeout)
# Receive the response from the server
try:
try:
# Python 2.7, use buffering of HTTP responses
httplib_response = conn.getresponse(buffering=True)
except TypeError:
# Python 3
try:
httplib_response = conn.getresponse()
except BaseException as e:
# Remove the TypeError from the exception chain in
# Python 3 (including for exceptions like SystemExit).
# Otherwise it looks like a bug in the code.
> six.raise_from(e, None)
../../../.pyenv/versions/3.12.1/lib/python3.12/site-packages/urllib3/connectionpool.py:467:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../../.pyenv/versions/3.12.1/lib/python3.12/site-packages/urllib3/connectionpool.py:462: in _make_request
httplib_response = conn.getresponse()
../../../.pyenv/versions/3.12.1/lib/python3.12/http/client.py:1419: in getresponse
response.begin()
../../../.pyenv/versions/3.12.1/lib/python3.12/http/client.py:331: in begin
version, status, reason = self._read_status()
../../../.pyenv/versions/3.12.1/lib/python3.12/http/client.py:292: in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
../../../.pyenv/versions/3.12.1/lib/python3.12/socket.py:707: in readinto
return self._sock.recv_into(b)
../../../.pyenv/versions/3.12.1/lib/python3.12/ssl.py:1253: in recv_into
return self.read(nbytes, buffer)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <ssl.SSLSocket [closed] fd=-1, family=2, type=1, proto=0>, len = 8192, buffer = <memory at 0x11ba80100>
def read(self, len=1024, buffer=None):
"""Read up to LEN bytes and return them.
Return zero-length string on EOF."""
self._checkClosed()
if self._sslobj is None:
raise ValueError("Read on closed or unwrapped SSL socket.")
try:
if buffer is not None:
> return self._sslobj.read(len, buffer)
E TimeoutError: The read operation timed out
../../../.pyenv/versions/3.12.1/lib/python3.12/ssl.py:1105: TimeoutError
During handling of the above exception, another exception occurred:
self = <requests.adapters.HTTPAdapter object at 0x11bb29e20>, request = <PreparedRequest [GET]>, stream = False, timeout = Timeout(connect=0.01, read=0.01, total=None)
verify = True, cert = None, proxies = OrderedDict()
def send(
self, request, stream=False, timeout=None, verify=True, cert=None, proxies=None
):
"""Sends PreparedRequest object. Returns Response object.
:param request: The :class:`PreparedRequest <PreparedRequest>` being sent.
:param stream: (optional) Whether to stream the request content.
:param timeout: (optional) How long to wait for the server to send
data before giving up, as a float, or a :ref:`(connect timeout,
read timeout) <timeouts>` tuple.
:type timeout: float or tuple or urllib3 Timeout object
:param verify: (optional) Either a boolean, in which case it controls whether
we verify the server's TLS certificate, or a string, in which case it
must be a path to a CA bundle to use
:param cert: (optional) Any user-provided SSL certificate to be trusted.
:param proxies: (optional) The proxies dictionary to apply to the request.
:rtype: requests.Response
"""
try:
conn = self.get_connection(request.url, proxies)
except LocationValueError as e:
raise InvalidURL(e, request=request)
self.cert_verify(conn, request.url, verify, cert)
url = self.request_url(request, proxies)
self.add_headers(
request,
stream=stream,
timeout=timeout,
verify=verify,
cert=cert,
proxies=proxies,
)
chunked = not (request.body is None or "Content-Length" in request.headers)
if isinstance(timeout, tuple):
try:
connect, read = timeout
timeout = TimeoutSauce(connect=connect, read=read)
except ValueError:
raise ValueError(
f"Invalid timeout {timeout}. Pass a (connect, read) timeout tuple, "
f"or a single float to set both timeouts to the same value."
)
elif isinstance(timeout, TimeoutSauce):
pass
else:
timeout = TimeoutSauce(connect=timeout, read=timeout)
try:
> resp = conn.urlopen(
method=request.method,
url=url,
body=request.body,
headers=request.headers,
redirect=False,
assert_same_host=False,
preload_content=False,
decode_content=False,
retries=self.max_retries,
timeout=timeout,
chunked=chunked,
)
../../../.pyenv/versions/3.12.1/lib/python3.12/site-packages/requests/adapters.py:486:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../../.pyenv/versions/3.12.1/lib/python3.12/site-packages/urllib3/connectionpool.py:799: in urlopen
retries = retries.increment(
../../../.pyenv/versions/3.12.1/lib/python3.12/site-packages/urllib3/util/retry.py:550: in increment
raise six.reraise(type(error), error, _stacktrace)
../../../.pyenv/versions/3.12.1/lib/python3.12/site-packages/urllib3/packages/six.py:770: in reraise
raise value
../../../.pyenv/versions/3.12.1/lib/python3.12/site-packages/urllib3/connectionpool.py:715: in urlopen
httplib_response = self._make_request(
../../../.pyenv/versions/3.12.1/lib/python3.12/site-packages/urllib3/connectionpool.py:469: in _make_request
self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <urllib3.connectionpool.HTTPSConnectionPool object at 0x11bd9d430>, err = TimeoutError('The read operation timed out')
url = '/dr17/en/tools/search/x_results.aspx?cmd=+SELECT%0D+p.ra%2C+p.dec%2C+p.objid%2C+p.run%2C+p.rerun%2C+p.camcol%2C+p.fie...8%28%28p.ra+BETWEEN+2.02319+AND+2.02374%29+AND+%28p.dec+BETWEEN+14.8395+AND+14.8401%29%29%29&format=csv&searchtool=SQL'
timeout_value = 0.01
def _raise_timeout(self, err, url, timeout_value):
"""Is the error actually a timeout? Will raise a ReadTimeout or pass"""
if isinstance(err, SocketTimeout):
> raise ReadTimeoutError(
self, url, "Read timed out. (read timeout=%s)" % timeout_value
)
E urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='skyserver.sdss.org', port=443): Read timed out. (read timeout=0.01)
../../../.pyenv/versions/3.12.1/lib/python3.12/site-packages/urllib3/connectionpool.py:358: ReadTimeoutError
During handling of the above exception, another exception occurred:
self = <astroquery.sdss.tests.test_sdss_remote.TestSDSSRemote object at 0x11bdcc980>
def test_query_timeout(self):
with pytest.raises(ConnectTimeout):
> sdss.SDSS.query_region(self.coords, width=2.0 * u.arcsec, cache=False, timeout=self.mintimeout)
astroquery/sdss/tests/test_sdss_remote.py:187:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
astroquery/utils/class_or_instance.py:25: in f
return self.fn(obj, *args, **kwds)
astroquery/utils/process_asyncs.py:26: in newmethod
response = getattr(self, async_method_name)(*args, **kwargs)
astroquery/sdss/core.py:387: in query_region_async
return self.query_sql_async(sql_query, timeout=timeout,
astroquery/sdss/core.py:622: in query_sql_async
response = self._request("GET", url, params=request_payload,
astroquery/query.py:368: in _request
response = query.request(self._session, stream=stream,
astroquery/query.py:77: in request
return session.request(self.method, self.url, params=self.params,
../../../.pyenv/versions/3.12.1/lib/python3.12/site-packages/requests/sessions.py:589: in request
resp = self.send(prep, **send_kwargs)
../../../.pyenv/versions/3.12.1/lib/python3.12/site-packages/requests/sessions.py:703: in send
r = adapter.send(request, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <requests.adapters.HTTPAdapter object at 0x11bb29e20>, request = <PreparedRequest [GET]>, stream = False, timeout = Timeout(connect=0.01, read=0.01, total=None)
verify = True, cert = None, proxies = OrderedDict()
def send(
self, request, stream=False, timeout=None, verify=True, cert=None, proxies=None
):
"""Sends PreparedRequest object. Returns Response object.
:param request: The :class:`PreparedRequest <PreparedRequest>` being sent.
:param stream: (optional) Whether to stream the request content.
:param timeout: (optional) How long to wait for the server to send
data before giving up, as a float, or a :ref:`(connect timeout,
read timeout) <timeouts>` tuple.
:type timeout: float or tuple or urllib3 Timeout object
:param verify: (optional) Either a boolean, in which case it controls whether
we verify the server's TLS certificate, or a string, in which case it
must be a path to a CA bundle to use
:param cert: (optional) Any user-provided SSL certificate to be trusted.
:param proxies: (optional) The proxies dictionary to apply to the request.
:rtype: requests.Response
"""
try:
conn = self.get_connection(request.url, proxies)
except LocationValueError as e:
raise InvalidURL(e, request=request)
self.cert_verify(conn, request.url, verify, cert)
url = self.request_url(request, proxies)
self.add_headers(
request,
stream=stream,
timeout=timeout,
verify=verify,
cert=cert,
proxies=proxies,
)
chunked = not (request.body is None or "Content-Length" in request.headers)
if isinstance(timeout, tuple):
try:
connect, read = timeout
timeout = TimeoutSauce(connect=connect, read=read)
except ValueError:
raise ValueError(
f"Invalid timeout {timeout}. Pass a (connect, read) timeout tuple, "
f"or a single float to set both timeouts to the same value."
)
elif isinstance(timeout, TimeoutSauce):
pass
else:
timeout = TimeoutSauce(connect=timeout, read=timeout)
try:
resp = conn.urlopen(
method=request.method,
url=url,
body=request.body,
headers=request.headers,
redirect=False,
assert_same_host=False,
preload_content=False,
decode_content=False,
retries=self.max_retries,
timeout=timeout,
chunked=chunked,
)
except (ProtocolError, OSError) as err:
raise ConnectionError(err, request=request)
except MaxRetryError as e:
if isinstance(e.reason, ConnectTimeoutError):
# TODO: Remove this in 3.0.0: see #2811
if not isinstance(e.reason, NewConnectionError):
raise ConnectTimeout(e, request=request)
if isinstance(e.reason, ResponseError):
raise RetryError(e, request=request)
if isinstance(e.reason, _ProxyError):
raise ProxyError(e, request=request)
if isinstance(e.reason, _SSLError):
# This branch is for urllib3 v1.22 and later.
raise SSLError(e, request=request)
raise ConnectionError(e, request=request)
except ClosedPoolError as e:
raise ConnectionError(e, request=request)
except _ProxyError as e:
raise ProxyError(e)
except (_SSLError, _HTTPError) as e:
if isinstance(e, _SSLError):
# This branch is for urllib3 versions earlier than v1.22
raise SSLError(e, request=request)
elif isinstance(e, ReadTimeoutError):
> raise ReadTimeout(e, request=request)
E requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='skyserver.sdss.org', port=443): Read timed out. (read timeout=0.01)
../../../.pyenv/versions/3.12.1/lib/python3.12/site-packages/requests/adapters.py:532: ReadTimeout
========================================================================= short test summary info ==========================================================================
FAILED astroquery/sdss/tests/test_sdss_remote.py::TestSDSSRemote::test_query_timeout - requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='skyserver.sdss.org', port=443): Read timed out. (read timeout=0.01)
====================================================================== 1 failed, 681 passed in 45.06s ======================================================================
Other things:
Sounds totally reasonable to wait. Could you open a reminder issue for this though?
Yes, it would be great to squash the iterations of tests/style a little bit. |
@bsipocz when you got the different exception above, could you describe what environment you were using? It's probably sufficient to just list the Python and requests version. |
I see the ReadTimeout on py3.12 and request 2.31 but also when upgraded to the latest 2.32, but also can reproduce it with the tox environment |
I don't think we've quite converged on a consistent set of exceptions to use, but everything else should be done at this point. |
I used |
I still see the failure, even on the latest requests version
|
@bsipocz I wonder if your network is so fast that you are seeing a timeout after a connection is established, while I am seeing a timeout before a connection is established. In any case, I think the correct exception is actually:
|
OK, please test with |
Now it all works. I don't think I have a particularly fast internet, but it may indeed routed differently than if you were testing from a university network. Anyway, the suggested solution is correct, we shouldn't really care what type of timeout one runs into. |
Maybe a rebase to get rid of the merge and squash some of the back-and-forths? |
Great! You should be able to squash when you merge, according to what I've read recently about merging on GitHub. |
We don't do squash merges here as it doesn't fit into the workflow, but I'm happy to do the rebase/squash prior merging. |
That's OK, I'll go ahead with the squash now. It's good practice for me. |
Bikeshed, but if possible don't squash down to a single commit, it was more work that that, as well as there were more logical chunks :) |
c3a0546
to
10885b2
Compare
I fixed conflicts in CHANGES.rst 3 or 4 times. If you want to squash further, I'll leave that to you. |
I'm not sure what went wrong, but the first commit here, that is the squashed everything, is clearly contains unrelated changes that have been picked up from |
Thank you @weaverba137! |
This PR closes #3011.
Background: SDSS-V has changed the paths used to access data files. Both imaging and spectroscopic data are affected. As of SDSS-V/DR18, the SQL access does not appear to have changed. Updating these URLs may continue to work throughout SDSS-V, but that is not guaranteed. The SDSS-V data team recommends using sdss-access.
plate
numbers do work.astroquery.sdss
ever return a SDSS-V spectrum?fieldID-MJD-fiberID
instead ofplate-MJD-fiberID
.