-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor parse_script function for readability and maintainability #90
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
@@ -1,29 +1,41 @@ | ||||||||||
import ast | ||||||||||
import json | ||||||||||
import re | ||||||||||
|
||||||||||
|
||||||||||
def parse_script(response: str) -> tuple[str, str]: | ||||||||||
"""Split the response into a message and a script. | ||||||||||
"""Split the response into a message and a script, handling variations of 'python' prefix and JSON content. | ||||||||||
|
||||||||||
Expected use is: run the script if there is one, otherwise print the message. | ||||||||||
Args: | ||||||||||
response (str): The input response containing a message and optionally a script. | ||||||||||
|
||||||||||
Returns: | ||||||||||
tuple[str, str]: A tuple containing the message (or error message) and the script, if valid. | ||||||||||
""" | ||||||||||
# Parse delimiter | ||||||||||
n_delimiters = response.count("```") | ||||||||||
if n_delimiters < 2: | ||||||||||
return response, "" | ||||||||||
segments = response.split("```") | ||||||||||
message = f"{segments[0]}\n{segments[-1]}" | ||||||||||
script = "```".join(segments[1:-1]).strip() # Leave 'inner' delimiters alone | ||||||||||
try: | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Given that the |
||||||||||
# Extract message and script using split on triple backticks | ||||||||||
parts = response.split('```') | ||||||||||
if len(parts) < 3: | ||||||||||
return response, "" # Not enough parts, return original message and empty script | ||||||||||
|
||||||||||
# Clean and identify parts | ||||||||||
message = parts[0] + parts[-1] # Consider the first and last parts as the message | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I preferred joining with a newline. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the feedback, Jake. Given that the primary way markdown is added by the LLM is with just |
||||||||||
script = '```'.join(parts[1:-1]).strip() # Join any inner parts as the script | ||||||||||
|
||||||||||
# Remove 'python' or similar prefixes from the script | ||||||||||
script = re.sub(r"^\s*python[0-9]*\s*", "", script, flags=re.IGNORECASE) | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry I should have said something sooner about how it's fine to ignore the bot. In my experience markdown is always prepended with just There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I appreciate your insights, Jake. Based on your experience that markdown is typically accompanied with just the 'python' prefix by the LLM, I agree that it simplifies the implementation. Nevertheless, given that the current implementation now already includes regex to handle potential variations like 'python3', would you prefer to keep this aspect for potential extensibility and robustness? This can ideally prepare the function to handle less typical cases without immediate rework in the future. Please let me know your preference. |
||||||||||
|
||||||||||
# Attempt to interpret script as JSON, revert if it fails | ||||||||||
try: | ||||||||||
parsed_script = json.loads(script) | ||||||||||
script = json.dumps(parsed_script) # Convert back to string to validate as Python code | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't understand why the original code runs There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The use of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think at one point I was getting errors with the script and suspected it had returned json for one reason or another, so I added this just in case. If it raised an exception that just meant it wasn't json after all, and we'd be fine. I think we ought to take this out - I expect I was mistake about it being a json error. Anyway adding There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for clarifying, Gran. Since you suggest that the
Suggested change
By removing the JSON parsing, the function aligns more directly with its primary purpose of handling Python code. Thanks for your insights! |
||||||||||
except json.JSONDecodeError: | ||||||||||
pass # Keep script as is if not JSON | ||||||||||
|
||||||||||
# Check for common mistakes | ||||||||||
if script.split("\n")[0].startswith("python"): | ||||||||||
script = "\n".join(script.split("\n")[1:]) | ||||||||||
try: # Make sure it isn't json | ||||||||||
script = json.loads(script) | ||||||||||
except Exception: | ||||||||||
pass | ||||||||||
try: # Make sure it's valid python | ||||||||||
# Validate as Python code | ||||||||||
ast.parse(script) | ||||||||||
except SyntaxError: | ||||||||||
return f"Script contains invalid Python:\n{response}", "" | ||||||||||
return message, script | ||||||||||
return message, script | ||||||||||
except SyntaxError as e: | ||||||||||
return f"Error in Python syntax: {e}", "" | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd rather return the response than e. e will just be another variation of "synax error" but the prefix already says that and the user would presumably rather see the errored code to find the syntax error. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's a valid point concerning returning the original response with the syntax error message to assist users in locating and correcting the error directly in their code. Implementing this could enhance the error handling experience by providing more context about the issue. Would integrating a portion of the problematic script into the error message be an acceptable solution? For instance, including the first few lines of the syntax-error-containing script might provide immediate clarity without overwhelming the user with the entire script. Here's an example suggestion:
Suggested change
This modification would give users a quick look at where the error might be without needing to recheck their whole script, assuming it's not shown elsewhere in the interface. |
||||||||||
except Exception as e: | ||||||||||
return f"Unhandled error: {e}", "" | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What other exceptions could we encounter? I can't think of any and if we do I'd rather propagate them and fail. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's a valid point, Jake. Propagating other exceptions could indeed be beneficial for debugging purposes, especially in a production environment where error tracking and handling specifics help in quickly identifying issues. The current catch-all exception handling might obscure unexpected scenarios. Would it be better to specify the types of exceptions we expect or remove the generic exception block entirely to allow propagation? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd remove the added clause.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood regarding simplifying the handling of prefix variations. I've noted your experience in this context, which points to a
python
only prefix by the LLM typically. Yet, considering the possibility of accommodating different user inputs, this refactor aims to provide broader support and resilience against various user input formats. Balanced against the potential overhead, would you agree that maintaining the enhanced input handling via regex provides a more safeguarded approach, or would you still prefer to remove that complexity? Your insights will aid in resolving this conclusively.