Skip to content

File.get_contents() now broken #687

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
byubean opened this issue Apr 10, 2025 · 8 comments
Closed

File.get_contents() now broken #687

byubean opened this issue Apr 10, 2025 · 8 comments
Labels

Comments

@byubean
Copy link

byubean commented Apr 10, 2025

Describe the bug

File.get_contents() now gets a 401 Unauthorized error.

To Reproduce

from canvasapi import Canvas
canvas = Canvas('url', 'token')
course = canvas.get_course(12345)
file = course.get_file(987654321)
print(file.get_contents())

Expected behavior

Should print the file contents.

Environment information

  • Python version: 3.11.5
  • CanvasAPI version: 3.3.0

Additional context

The existing code worked without issue last week. First noticed the problem today (April 10, 2025).

My guess is that Canvas API changed how it processes the file download urls in such a way that canvasapi is now doing it wrong.

The token is valid: all the other commands described work fine.

Workaound

Instead of using .get_contents() I now use:

import requests
...
requests.get(file.url).text

Which works just file.

The file url field has an access token as a query parameter, so you don't need an Auth header to access the file using the credentialed URL (hence a plain requests.get works fine).

My hypothesis is that the canvasapi requester is always including the auth headers (reasonable), but perhaps there is now an issue when trying to access the credentialed file URL while also passing the auth header?

@byubean byubean added the bug label Apr 10, 2025
@bainco
Copy link

bainco commented Apr 10, 2025

Note this also affects the download method of Files as of 2pm CST on April 10th, 2025.

Edit: following up on @byubean's hypothesis, if you manually use the _requester to try and grab the file and remove the auth headers, it still results in the same error which is a from a 401 from the server:

submission_instance._requester.request("GET", _url=<full_file_url_here>, use_auth=False)

Edit 2:

But if you clear the session cookies PRIOR to downloading...everything is all good so does seem like a cookie problem.

submission_instance._requester._session.cookies.clear()

Specifically the following two cookies are the culprits:

  • "canvas_session"
  • "_legacy_normandy_session"

Which are the Canvas session cookies. If you include these in the GET request, you get a 401. If you don't...you get a 200 and the response as expected.

@superle3
Copy link

superle3 commented Apr 10, 2025

@bainco thank you for giving a solution,
but I am not able to replicate the 401 with cookies.
Do you have a MWE? I am just curious.

@bainco
Copy link

bainco commented Apr 10, 2025

Do you have a MWE? I am just curious.

After a little more investigation, I think the problem is that if you accidentally add a (maybe) malformed cookie to the session at any point, that session will now get 401s from this endpoint until you delete those cookies:

Say you have some sub (canvasapi.submission.Submission) which has at least one attachment:

Step 0

You can 100% use the _requester to successfully grab the attachment at this point. It will work like normal, you'll get back the expected data, it'll even set your cookies correctly in the session as long as you specify use_auth to false like OP hypothesized.

response = sub._requester.request("GET", _url=sub.attachments[-1].url, use_auth=False)
print(response.text)

Step 1

But, if you make a seemingly valid request for a sub attachment using the _requester but don't turn off auth...

response = sub._requester.request("GET", _url=sub.attachments[-1].url)

This results in the JSONDecodeError in the original bug report. It also seems to place the _session cookies in an unrecoverable state because if you immediately issue the same GET as in Step 0 you get the JSONDecodeError again. If you try directly from the _session you get the aforementioned 401 (since this is coming directly from requests.Session now instead of from canvasapi).

response = sub._requester._session.get(sub.attachments[-1].url)

Step 2

To recover, you have to go and delete the aforementioned cookies inside the session:

del sub._requester._session.cookies['canvas_session']
del sub._requester._session.cookies['_legacy_normandy_session']
response = sub._requester._session.get(sub.attachments[-1].url)
print(response)

Which now result in a 200 code and the correct response. If you want to reset the _requester you have to reinitialize the session (or clean out its cookies) with something along the lines of:

sub._requester._session = requests.Session()
response = sub._requester.request("GET", _url=sub.attachments[-1].url, use_auth=False)

@bainco
Copy link

bainco commented Apr 10, 2025

Underlying reason aside, seems like you can patch in place by specifying use_auth as False inside of download and get_contents:

from canvasapi.file import File
def patched_download(self, location):
    """
    Download the file to specified location.

    :param location: The path to download to.
    :type location: str
    """
    response = self._requester.request("GET", _url=self.url, use_auth=False)

    with open(location, "wb") as file_out:
        file_out.write(response.content)


def patched_get_contents(self, binary=False):
    """
    Download the contents of this file.
    Pass binary=True to return a bytes object instead of a str.

    :rtype: str or bytes
    """
    response = self._requester.request("GET", _url=self.url, use_auth=False)
    if binary:
        return response.content
    else:
        return response.text

File.get_contents = patched_get_contents
File.download = patched_download

bainco added a commit to bainco/canvasapi that referenced this issue Apr 11, 2025
Roughly April 10th, 2025, something changed about the way auth was handled for files. By including the auth headers in GET requests for files, a session cookie was being set that seemed to cause 401s for any file requests. In essence, one bad request meant that you couldn't recover unless you specifically cleared those cookies out of the session.

Here we avoid that problem entirely by simply not sending auth headers when requesting attachment downloads.

More discussion here: ucfopen#687
bainco added a commit to bainco/canvasapi that referenced this issue Apr 11, 2025
Roughly April 10th, 2025, something changed about the way auth was handled for files. By including the auth headers in GET requests for files, a session cookie was being set that seemed to cause 401s for any file requests. In essence, one bad request meant that you couldn't recover unless you specifically cleared those cookies out of the session.

Here we avoid that problem entirely by simply not sending auth headers when requesting attachment downloads.

More discussion here: ucfopen#687
aureliony referenced this issue in aureliony/CanvasSync Apr 14, 2025
ururk added a commit to umich-its-ai/langchain-doc-canvas that referenced this issue Apr 14, 2025
@ururk
Copy link

ururk commented Apr 15, 2025

This might be fixed on Canvas's end - can someone confirm (besides my testing).

@jonespm
Copy link
Contributor

jonespm commented Apr 15, 2025

Hi @ururk, most likely this was caused in Canvas by this commit instructure/canvas-lms@8ed721d was reverted instructure/canvas-lms@fe37f85 yesterday.

@jakobblomquist
Copy link

Yes, I have been in contact with Canvas team since last Friday, and they started to work on it. They got back to me ,yesterday or today about them having dealt with it. Not sure if they completely reverted the change or did a revert+fix. Either way, I could confirm that it worked for my script again.

@jonespm
Copy link
Contributor

jonespm commented Apr 22, 2025

Closing this as it seems like this case was a Canvas issue.

I feel like a more generic fix would be to have it easier to pass use_auth into these File methods for the request as this seemed to also be an issue for #631 with a fix proposed in #639 that might have also addressed this issue but also hardcodes this rather than makes it configurable.

@jonespm jonespm closed this as completed Apr 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants