Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenAI representation fails to produce output when response content is None #2176

Open
1 task done
jeaninejuliettes opened this issue Oct 11, 2024 · 6 comments
Open
1 task done
Labels
bug Something isn't working

Comments

@jeaninejuliettes
Copy link

Have you searched existing issues? 🔎

  • I have searched and found no existing issues

Desribe the bug

I ran into issues when using the OpenAI representation as it sometimes produces a content of None, which then produced an error when trying to run:
label = response.choices[0].message.content.strip().replace("topic: ", "")

Which makes sense, since the content is not a string.
I'm unable to generate a minimal example since this is due to the output of OpenAI GPT.

I see two ways to work around this, but both have their own downsides/impact on the results, maybe anyone else sees better option:

  1. set the content to type string before processing it any further. With the major downside that the label will then be set to the string 'None'
  2. use a try and except to extract the content, strip this and replace the 'topic:' part of the string. If this fails the label is set to a fixed value like an empty string (and producing a warning that his has happened)

For now I fixed it by creating an inherited customOpenAI representation class within my script where I used the second option as a solution.

Reproduction

from bertopic import BERTopic

BERTopic Version

0.16.4

@jeaninejuliettes jeaninejuliettes added the bug Something isn't working label Oct 11, 2024
@MaartenGr
Copy link
Owner

Thank you for sharing this. I see that you opened a similar issue (#2177). Are you alright with closing that one? To me, they seem like duplicates.

With respect to your issue, the idea of content violation was mentioned in earlier issues and addressed with the following:

# Check whether content was actually generated
# Addresses #1570 for potential issues with OpenAI's content filter
if hasattr(response.choices[0].message, "content"):
label = response.choices[0].message.content.strip().replace("topic: ", "")
else:
label = "No label returned"

Which makes it rather surprising that you get this issue. It may be that the API of OpenAI was updated and now always returns "content" but I'm not sure. Either way, simply doing an additional check here makes sense to me.

@jeaninejuliettes
Copy link
Author

No, I'm sorry this was unclear, for this specific issue I don't get any errors regarding content violation. It simply seems that the result of response.choices[0].message returns None, which then produces an error, since you can't use strip on a NoneType object. I don't know when/why this happens, but it doesnt seem to be the result of an error produced by the API, since the response object exists.

Also the reason why I created a separate "issue" (discussion/question) for the content violation, since I grasped from the code that that supposed to have been fixed, but I'm still running into this unfortunately. But that is a discussion for the #2177 as far as I'm concerned. They don't seem to be related. (as far as I can tell)

@MaartenGr
Copy link
Owner

I think that this:

I ran into issues when using the OpenAI representation as it sometimes produces a content of None, which then produced an error when trying to run:
label = response.choices[0].message.content.strip().replace("topic: ", "")

and this:

response.choices[0].message returns None

contradict with one another. The reason why I think that is because you shouldn't be able to reach label = ... at all because there is this check (which is used for content violation):

# Check whether content was actually generated
# Addresses #1570 for potential issues with OpenAI's content filter
if hasattr(response.choices[0].message, "content"):
label = response.choices[0].message.content.strip().replace("topic: ", "")
else:
label = "No label returned"

Thus, response.choices[0].message returns None cannot be the case because there is check to see whether it contains the attribute "content", right? Or did you mean that "content" returns None? If so, then the API of OpenAI servers might have changed since it didn't show that behavior before.

Looking through the issues, it seems that this was mentioned before and a PR that hasn't been updated in a couple of months. API changes might relate here but also the reason why you get a None, which typically is a content violation issue. Based on what I see, I'm convinced they relate to one another since the None you get is typically some sort of content violation issue.

@jeaninejuliettes
Copy link
Author

jeaninejuliettes commented Oct 11, 2024

Yeas, I mean that the content returns None, the response exists, but the content its returning is empty, the element content does exist in the response object.
Ah, I didn't see that issue (apologies), but it is the exact error message I'm seeing. And reading through the issue, it looks quite similar. But the PR is inactive?

Funny thing is, I'm still also getting content violation errors, but let's keep that out of this discussion for now ;)

@MaartenGr
Copy link
Owner

It does seem to be inactive and unfortunately, I currently do not have the time to look it over. I would also be alright with a small PR just making sure it gives no error. Any additional work can be done later.

@jeaninejuliettes
Copy link
Author

Ok, I can look into that!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants