-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: locale folder reading + disambiguation #86
Conversation
WalkthroughThe pull request introduces significant enhancements to the Wikipedia-related functionality in the Changes
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
__init__.py (3)
107-124
: Efficient summarization logic
Flattening the tokenized lines at [line 107] fits well with returning user-friendly summaries. Both the single-result flow ([lines 111–116]) and the multi-result flow ([lines 117–123]) seem logically consistent. Keep in mind potential edge cases where an article has a very short extract with few sentences. Otherwise, this is a clean approach.
Line range hint
261-282
:match_common_query
effectively delegates toask_the_wiki
The revised method name (match_common_query
) and signature properly signals the method’s purpose while returning consistent data. Associating a 0.6 confidence score at [line 282] is a strategic choice for the Common Query framework. Ensure that the approach for partial matches remains balanced if future expansions of the skill yield multiple potential answers.
Line range hint
350-373
: Retain or remove debug prints as appropriate
These lines provide helpful testing and debug output. If they remain for demonstration purposes, consider toggling them with a debug flag or docstring references for clarity once deployed. Otherwise, removing or commenting them out keeps production logs clean.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
__init__.py
(11 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
__init__.py
183-183: Ambiguous variable name: l
(E741)
🔇 Additional comments (8)
__init__.py (8)
14-27
: Imports look good overall, but consider consolidating or removing any unused ones
The newly introduced imports (Optional
, Tuple
, List
from typing
, closest_supported_match
, classproperty
, flatten_list
, etc.) appear purposeful and relevant to the refactor. If any imports end up being unused after the final iteration, consider removing them to keep the module clean.
32-32
: Explicit return type for rm_parentheses
Defining the return type as str
clarifies the function’s intent and ensures that the caller knows exactly what to expect. Good change!
38-38
: Type alias WikiMatches
improves code readability
Providing a descriptive alias for a list of tuples is a neat improvement, making it clearer that each tuple refers to (title, list_of_sentences, img_url)
.
87-92
: Clear debugging and disambiguation logic
Logging the number of matched Wikipedia pages at [line 87] helps trace results. Skipping "disambiguation"
pages at [line 91] is correct to avoid returning them as valid items. Good job ensuring the solver avoids hitting these placeholders.
98-98
: Safe optional image handling
Declaring img: Optional[str] = None
clarifies that an image may be absent. This approach is consistent with the puzzle's partial returns.
160-160
: Updated class inheritance
Refactoring WikipediaSkill
to inherit from OVOSSkill
is consistent with the direction indicated in the PR summary. This ensures alignment with the new skill structure in OVOS.
250-260
: cq_callback
gracefully updates GUI context
The callback sets the WikiKnows
context, displays search results, and logs any missing session in [lines 257–258]. This centralized approach effectively handles the UI flow and error logging.
439-440
: Multilingual examples confirm robust language handling
Testing with Portuguese queries demonstrates the solver’s adaptability. Thorough language checks like these are pivotal when shipping cross-lingual features.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
__init__.py (2)
66-85
: LGTM! Well-structured error handling for Wikipedia API calls.The method properly handles exceptions and includes informative error logging.
Consider making the timeout value configurable through a class attribute or parameter:
- response = requests.get(url, timeout=5).json() + response = requests.get(url, timeout=self.timeout).json()
Line range hint
370-460
: Consider moving test cases to a dedicated test file.The main execution block contains valuable test cases that should be moved to a proper test file for better organization and maintainability.
Consider creating a new file
test_wikipedia.py
and moving these test cases there using a proper testing framework likepytest
.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
__init__.py
(11 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
__init__.py
202-202: Ambiguous variable name: l
(E741)
🔇 Additional comments (5)
__init__.py (5)
Line range hint 33-37
: LGTM! Type hints improve code clarity.
The addition of type hints to rm_parentheses
function makes the code more maintainable and helps with static type checking.
87-142
: Excellent implementation of concurrent Wikipedia page fetching!
The concurrent implementation using ThreadPoolExecutor is a significant performance improvement. The code:
- Maintains result order using index mapping
- Properly handles failed requests
- Includes fallback logic for keyword extraction
269-279
: LGTM! Well-structured callback with proper error handling.
The callback method properly handles GUI display and context setting with appropriate error logging.
Line range hint 280-301
: LGTM! Type hints and return value structure are clear.
The method signature and implementation are well-documented with type hints, making the code more maintainable.
201-205
: 🛠️ Refactor suggestion
Rename variable 'l' to improve readability.
The variable name 'l' is ambiguous and could be confused with the number '1'.
Apply this diff to improve readability:
- for l in f.read().split("\n"):
- if not l.strip() or l.startswith("#"):
- continue
- samples += expand_template(l)
+ for line in f.read().split("\n"):
+ if not line.strip() or line.startswith("#"):
+ continue
+ samples += expand_template(line)
Likely invalid or redundant comment.
🧰 Tools
🪛 Ruff (0.8.2)
202-202: Ambiguous variable name: l
(E741)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
__init__.py (3)
87-142
: Consider enhancing concurrent execution with timeout and specific error handling.The concurrent implementation is good, but could be improved for robustness.
Consider these enhancements:
def get_data(self, query: str, lang: Optional[str] = None, units: Optional[str] = None): """Fetch Wikipedia search results and detailed data concurrently.""" LOG.debug(f"WikiSolver query: {query}") lang = (lang or self.default_lang).split("-")[0] search_url = ( f"https://{lang}.wikipedia.org/w/api.php?action=query&list=search&" f"srsearch={query}&format=json" ) try: - search_results = requests.get(search_url, timeout=5).json().get("query", {}).get("search", []) + response = requests.get(search_url, timeout=5) + response.raise_for_status() # Raise exception for bad status codes + search_results = response.json().get("query", {}).get("search", []) - except Exception as e: + except requests.RequestException as e: LOG.error(f"Error fetching search results: {e}") search_results = [] if not search_results: fallback_query = self.extract_keyword(query, lang) if fallback_query and fallback_query != query: LOG.debug(f"WikiSolver Fallback, new query: {fallback_query}") return self.get_data(fallback_query, lang=lang, units=units) return {} LOG.debug(f"Matched {len(search_results)} Wikipedia pages") # Prepare for parallel fetch and maintain original order summaries = [None] * len(search_results) # List to hold results in original order - with concurrent.futures.ThreadPoolExecutor() as executor: + with concurrent.futures.ThreadPoolExecutor() as executor: + # Add timeout to prevent hanging on slow requests + future_to_idx = { + executor.submit(self.get_page_data, str(r["pageid"]), lang): idx + for idx, r in enumerate(search_results) + if "(disambiguation)" not in r["title"] + } + + try: + for future in concurrent.futures.as_completed(future_to_idx, timeout=30): idx = future_to_idx[future] title, ans, img = future.result() if title and ans: summaries[idx] = (title, ans, img) + except concurrent.futures.TimeoutError: + LOG.error("Timeout while fetching Wikipedia pages") + # Continue with partial results
Line range hint
179-205
: Improve variable naming in file reading loop.The code uses 'l' as a loop variable which is flagged by static analysis as ambiguous.
Apply this change:
with open(filename) as f: - for l in f.read().split("\n"): - if not l.strip() or l.startswith("#"): + for line in f.read().split("\n"): + if not line.strip() or line.startswith("#"): continue - samples += expand_template(l) + samples += expand_template(line)🧰 Tools
🪛 Ruff (0.8.2)
202-202: Ambiguous variable name:
l
(E741)
Line range hint
369-459
: Consider moving test cases to a dedicated test file.The main execution block contains valuable test cases, but they would be better organized in a dedicated test file. This would:
- Make the main module cleaner
- Allow for proper test discovery and execution
- Enable CI/CD integration
Would you like me to help create a separate test file with these test cases?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
__init__.py
(11 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
__init__.py
202-202: Ambiguous variable name: l
(E741)
🔇 Additional comments (3)
__init__.py (3)
12-28
: LGTM! Good improvements to imports and type annotations.
The changes enhance type safety and add concurrent execution capabilities. The rm_parentheses
function's type annotation is a good practice.
Also applies to: 33-37
66-85
: LGTM! Well-structured page data retrieval with proper error handling.
The implementation correctly handles API responses and includes error handling.
188-198
: LGTM! Good improvement in language support.
The use of closest_supported_match
is a good addition for handling language directories robustly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🔭 Outside diff range comments (1)
__init__.py (1)
Line range hint
374-465
: Move test cases to a dedicated test file.The main execution block contains test cases that should be moved to a proper test file for better organization and maintainability.
Consider creating a new file
test_wikipedia.py
and moving all test cases there using a proper testing framework likepytest
.
🧹 Nitpick comments (1)
__init__.py (1)
90-145
: Consider adding timeout configuration for concurrent requests.While the concurrent implementation is good, consider making the timeout value configurable and adding a limit to the ThreadPoolExecutor to prevent resource exhaustion.
def get_data(self, query: str, lang: Optional[str] = None, units: Optional[str] = None): + TIMEOUT = self.config.get('request_timeout', 5) + MAX_WORKERS = self.config.get('max_workers', 5) LOG.debug(f"WikiSolver query: {query}") lang = (lang or self.default_lang).split("-")[0] search_url = ( f"https://{lang}.wikipedia.org/w/api.php?action=query&list=search&" f"srsearch={query}&format=json" ) try: - search_results = requests.get(search_url, timeout=5).json().get("query", {}).get("search", []) + search_results = requests.get(search_url, timeout=TIMEOUT).json().get("query", {}).get("search", []) except Exception as e: LOG.error(f"Error fetching search results: {e}") search_results = [] if not search_results: fallback_query = self.extract_keyword(query, lang) if fallback_query and fallback_query != query: LOG.debug(f"WikiSolver Fallback, new query: {fallback_query}") return self.get_data(fallback_query, lang=lang, units=units) return {} LOG.debug(f"Matched {len(search_results)} Wikipedia pages") # Prepare for parallel fetch and maintain original order summaries = [None] * len(search_results) # List to hold results in original order - with concurrent.futures.ThreadPoolExecutor() as executor: + with concurrent.futures.ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
__init__.py
(10 hunks)requirements.txt
(1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
__init__.py
205-205: Ambiguous variable name: l
(E741)
🔇 Additional comments (4)
requirements.txt (1)
2-2
: Verify compatibility with the updated ovos_workshop version.
The significant version bump from >=0.0.15
to >=3.3.0
might introduce breaking changes. Please ensure compatibility with other dependencies.
✅ Verification successful
Let me gather more information about the dependencies to verify compatibility.
Let me check the specific usage of ovos_workshop components to ensure compatibility with the new version.
Based on the analysis of the code and the version changes, I can now provide a final response:
The version upgrade to ovos_workshop>=3.3.0 is compatible with the codebase
The code review shows that the skill uses stable APIs from ovos_workshop:
intent_handler
andcommon_query
decoratorsIntentBuilder
for intent constructionOVOSSkill
andCommonQuerySkill
base classes
The latest release (3.3.0) only adds new features (common_query_decorator
) and fixes (voc_match ignore case
) without breaking changes. The skill's usage of ovos_workshop APIs remains compatible with the new version.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check for potential version conflicts with other dependencies
gh api graphql -f query='
{
repository(owner: "OpenVoiceOS", name: "OVOS-workshop") {
releases(first: 1, orderBy: {field: CREATED_AT, direction: DESC}) {
nodes {
tagName
description
createdAt
}
}
}
}'
Length of output: 871
Script:
#!/bin/bash
# Check for dependencies that might use ovos_workshop
rg -l "ovos_workshop" --type-add 'req:*requirements.txt' -treq
# Check imports and usage patterns
rg "from ovos_workshop|import ovos_workshop" -A 2
Length of output: 882
Script:
#!/bin/bash
# Check the content of __init__.py to see how the imported components are used
cat __init__.py
# Check the test files to understand the testing dependencies
cat test/unittests/test_skill.py
cat test/unittests/test_skill_loading.py
Length of output: 33496
__init__.py (3)
Line range hint 33-37
: LGTM! Type hints improve code clarity.
The addition of type hints to rm_parentheses
enhances code maintainability and IDE support.
66-88
: LGTM! Robust error handling in page data fetching.
The implementation includes proper error handling and disambiguation page detection.
204-208
: 🛠️ Refactor suggestion
Replace ambiguous variable name.
The variable name l
is ambiguous and flagged by static analysis. Consider using a more descriptive name.
- for l in f.read().split("\n"):
- if not l.strip() or l.startswith("#"):
+ for line in f.read().split("\n"):
+ if not line.strip() or line.startswith("#"):
continue
- samples += expand_template(l)
+ samples += expand_template(line)
Likely invalid or redundant comment.
🧰 Tools
🪛 Ruff (0.8.2)
205-205: Ambiguous variable name: l
(E741)
companion to OpenVoiceOS/OVOS-workshop#315
Summary by CodeRabbit
New Features
Refactor
Bug Fixes
Chores
ovos_workshop
to a more recent version.