Changed the logic of fetching leader's data #990

samyak003 · 2025-03-03T20:34:29Z

Fixes: #972

coderabbitai · 2025-03-03T20:34:37Z

Summary by CodeRabbit

New Features
- Enhanced leader data presentation across chapters, committees, and projects for more accurate information.
Refactor
- Streamlined the internal process for obtaining leader information using additional context to improve data accuracy and error handling.
Tests
- Updated tests to verify the revised leader information retrieval, ensuring consistent and reliable results.

Walkthrough

The changes update the retrieval of leader information across several OWASP modules. Instead of invoking get_leaders() with no arguments, the code now retrieves a repository from the respective objects and passes it to get_leaders(repository). Consequently, the implementation of get_leaders in the scraper module has been updated to fetch leader data from a GitHub-hosted Markdown file, including error handling and logging modifications. The test suite has also been adjusted to supply a mock repository context.

Changes

File(s)	Summary of Changes
`backend/apps/owasp/management/commands/owasp_scrape_(chapters	committees
`backend/apps/owasp/scraper.py`	Modified `get_leaders` to accept a repository parameter, fetch leader data from a `leaders.md` file, and include error handling and logging; added required imports.
`backend/tests/owasp/scraper_test.py`	Updated tests to create a mock repository object and pass it as an argument to `get_leaders`, aligning with the new method signature.

✨ Finishing Touches

📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

sonarqubecloud · 2025-03-03T20:35:02Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

backend/apps/owasp/scraper.py (2)
70-73: Consider more robust regex pattern and result handling.

The current regex pattern works for simple cases but may miss leaders with special characters or alternate Markdown link formats.

Consider a more robust implementation:
        for line in lines:
            logger.debug("Processing line: %s", line)
-           match = re.findall(r"\* \[([^\]]+)\]", line)
+           # Match both standard Markdown list items with links and variations
+           match = re.findall(r"\*\s*\[([^\]]+)\](?:$[^)]*$)?", line)
            leaders.extend(match)
This pattern also handles optional spaces after the asterisk and optional link URLs in parentheses after the name.

69-69: Consider limiting debug logging for large content.

Logging the entire content may flood logs if the file is large.

Consider limiting the logged content:
-           logger.debug("Content: %s", content)
+           logger.debug("Content length: %d characters", len(content))
+           if len(content) < 500:  # Only log full content if it's reasonably small
+               logger.debug("Content: %s", content)

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c55e7ee and 198f9e3.

📒 Files selected for processing (5)

backend/apps/owasp/management/commands/owasp_scrape_chapters.py (1 hunks)
backend/apps/owasp/management/commands/owasp_scrape_committees.py (1 hunks)
backend/apps/owasp/management/commands/owasp_scrape_projects.py (1 hunks)
backend/apps/owasp/scraper.py (2 hunks)
backend/tests/owasp/scraper_test.py (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (5)

GitHub Check: CodeQL (python)
GitHub Check: Run frontend unit tests
GitHub Check: Run frontend e2e tests
GitHub Check: Run backend tests
GitHub Check: CodeQL (javascript-typescript)

🔇 Additional comments (7)

backend/apps/owasp/management/commands/owasp_scrape_chapters.py (1)

66-67:
✅ Verification successful

Updated leader fetching to use repository context - good approach.

The change correctly implements the new pattern of passing the repository context to the get_leaders method. This is consistent with the changes in other scraper commands and ensures that leader data is retrieved within the proper repository context.

🏁 Script executed:
#!/bin/bash
# Check if all scraper commands consistently use the repository parameter

# Check if other files implement the same pattern for fetching leaders
echo "Checking scraper command implementations for consistency:"
rg -l "\.owasp_repository" --type py backend/apps/owasp/management/commands/
rg "get_leaders\(.*repository" --type py backend/apps/owasp/management/commands/
Length of output: 941
Consistency Verified: Repository Parameter Use in Leader Fetching

The updated implementation in owasp_scrape_chapters.py is consistent with similar patterns in other scraper commands. All files now correctly pass the repository context to get_leaders, ensuring uniform behavior.

File: backend/apps/owasp/management/commands/owasp_scrape_chapters.py (Lines 66-67)

Verified in: owasp_scrape_committees.py & owasp_scrape_projects.py
backend/tests/owasp/scraper_test.py (1)

117-122:
✅ Verification successful

Test updated correctly with mock repository.

The test has been properly updated to accommodate the signature change in get_leaders(). The mock repository includes all the essential properties (key, default_branch, and name) that the updated method likely depends on. This ensures that the test accurately validates the new implementation.

🏁 Script executed:
#!/bin/bash
# Check if the get_leaders implementation relies on these repository properties

# Look at the implementation of get_leaders to verify if it uses these properties
rg -A 10 "def get_leaders" --type py backend/apps/owasp/
Length of output: 865
Action: Approved – Test correctly reflects new implementation expectations.

The updated test in backend/tests/owasp/scraper_test.py uses a mock repository with properties key, default_branch, and name. Verification confirms that while get_leaders() (located in backend/apps/owasp/scraper.py) relies on repository.key and repository.default_branch (to construct the URL for fetching the leaders file), the inclusion of repository.name is extraneous but harmless. The test accurately validates the method's behavior by asserting an empty list is returned.
backend/apps/owasp/management/commands/owasp_scrape_projects.py (1)

78-79:
✅ Verification successful

Leader fetching updated to use repository context - ensures consistency.

The change aligns with the updated approach of fetching leaders with repository context, consistent with the implementations in other scraper commands. This provides a more contextual way to retrieve leader information.

🏁 Script executed:
#!/bin/bash
# Verify that owasp_repository is properly implemented in the Project model

# Check if the Project model has owasp_repository property/method implemented
echo "Checking Project model implementation:"
rg -A 5 "owasp_repository" --type py backend/apps/owasp/models/project.py

# Also check for any error handling around this property access
rg -C 3 "owasp_repository" --type py backend/apps/owasp/models/
Length of output: 6676
Confirmed: Leader fetching now utilizes the repository context consistently.

The change in backend/apps/owasp/management/commands/owasp_scrape_projects.py (lines 78-79) correctly fetches project.owasp_repository before retrieving the leaders. Verification shows that the owasp_repository property is properly implemented in the Project model—and similar patterns are applied in other models—ensuring a uniform approach across the codebase.
backend/apps/owasp/management/commands/owasp_scrape_committees.py (1)

66-67:
✅ Verification successful

Leader fetching updated to use repository context - maintains consistent approach.

The code now correctly retrieves the repository from the committee object before passing it to get_leaders(). This change is consistent with the implementation in other scraper commands and ensures a uniform approach to fetching leader data across different entity types.

🏁 Script executed:
#!/bin/bash
# Check if the Committee model properly implements owasp_repository
# and look for any implementation details of get_leaders in the scraper

# Check Committee model implementation
echo "Checking Committee model implementation:"
rg -A 5 "owasp_repository" --type py backend/apps/owasp/models/committee.py

# Look at how get_leaders uses the repository parameter
echo -e "\nChecking how get_leaders method uses the repository:"
rg -A 15 "def get_leaders" --type py backend/apps/owasp/scraper.py
Length of output: 1327
Consistent Repository Context for Leader Fetching

Verified that the committee’s owasp_repository is correctly set (see backend/apps/owasp/models/committee.py) and that scraper.get_leaders() properly uses the repository context (see backend/apps/owasp/scraper.py). The update in backend/apps/owasp/management/commands/owasp_scrape_committees.py (lines 66-67) now aligns perfectly with similar implementations across the codebase. No further action is needed.
backend/apps/owasp/scraper.py (3)

4-4: Appropriate imports for the new implementation.

Good job adding the necessary imports for the new leader fetching logic. The regex module for pattern matching and the GitHub utility function are well-placed.

Also applies to: 12-13

61-65: Method signature change with clear implementation.

Good refactoring to change from HTML parsing to a more structured approach using the GitHub-hosted Markdown file. The method signature change is appropriate as it now requires repository context.

63-65:
❓ Verification inconclusive

Verify GitHub raw content URL format.

The URL construction assumes a specific GitHub raw content URL format. Make sure this follows GitHub's stable URL scheme for accessing raw file content.

🏁 Script executed:
#!/bin/bash
# Check if the code already handles repositories with spaces or special characters in names
cd $(git rev-parse --show-toplevel)
url_pattern="https://raw.githubusercontent.com/OWASP/.*/.*/.*/leaders.md"
rg -l "https://raw.githubusercontent.com/OWASP/" | xargs rg -l "leaders.md" | xargs cat | rg "$url_pattern" -I --color=always
Length of output: 303
Review the GitHub Raw URL Construction

The URL built in the code:
content = get_repository_file_content(
    f"https://raw.githubusercontent.com/OWASP/{repository.key}/{repository.default_branch}/leaders.md"
)
correctly follows GitHub’s stable raw content URL scheme. However, the verification shell script’s regex pattern expects one more dynamic segment than actually exists (it uses three ".*" groups after /OWASP/ instead of the two provided by {repository.key} and {repository.default_branch}). This mismatch likely explains why the script produced no output.

Action Items:

Confirm that the URL format remains consistent with GitHub’s spec (i.e., https://raw.githubusercontent.com/OWASP/<repository_key>/<default_branch>/leaders.md).

If you intend to validate this URL using a regex, update the pattern (for example, to https://raw.githubusercontent.com/OWASP/[^/]+/[^/]+/leaders.md) so it accurately reflects the URL’s structure.

coderabbitai · 2025-03-03T20:37:41Z

backend/apps/owasp/scraper.py

+    def get_leaders(self, repository):
+        """Get leaders from leaders.md file on GitHub."""
+        content = get_repository_file_content(
+            f"https://raw.githubusercontent.com/OWASP/{repository.key}/{repository.default_branch}/leaders.md"
+        )
+        leaders = []
+        try:
+            lines = content.split("\n")
+            logger.debug("Content: %s", content)
+            for line in lines:
+                logger.debug("Processing line: %s", line)
+                match = re.findall(r"\* \[([^\]]+)\]", line)
+                leaders.extend(match)
+        except AttributeError:
+            logger.exception(
+                "Unable to parse leaders.md content", extra={"repository": repository.name}
+            )
+        return leaders


🛠️ Refactor suggestion

Additional error handling needed for network/GitHub API failures.

The current error handling only catches AttributeError from content parsing, but there's no handling for potential network issues or GitHub API failures that could occur in get_repository_file_content().

Consider adding error handling for network-related exceptions:

def get_leaders(self, repository): """Get leaders from leaders.md file on GitHub.""" - content = get_repository_file_content( - f"https://raw.githubusercontent.com/OWASP/{repository.key}/{repository.default_branch}/leaders.md" - ) + try: + content = get_repository_file_content( + f"https://raw.githubusercontent.com/OWASP/{repository.key}/{repository.default_branch}/leaders.md" + ) + except (requests.exceptions.RequestException, ValueError) as e: + logger.exception( + "Failed to fetch leaders.md file", + extra={"repository": repository.name, "error": str(e)} + ) + return [] leaders = [] try: lines = content.split("\n") logger.debug("Content: %s", content)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def get_leaders(self, repository):

"""Get leaders from leaders.md file on GitHub."""

content = get_repository_file_content(

f"https://raw.githubusercontent.com/OWASP/{repository.key}/{repository.default_branch}/leaders.md"

)

leaders = []

try:

lines = content.split("\n")

logger.debug("Content: %s", content)

for line in lines:

logger.debug("Processing line: %s", line)

match = re.findall(r"\* \[([^\]]+)\]", line)

leaders.extend(match)

except AttributeError:

logger.exception(

"Unable to parse leaders.md content", extra={"repository": repository.name}

)

return leaders

def get_leaders(self, repository):

"""Get leaders from leaders.md file on GitHub."""

try:

content = get_repository_file_content(

f"https://raw.githubusercontent.com/OWASP/{repository.key}/{repository.default_branch}/leaders.md"

)

except (requests.exceptions.RequestException, ValueError) as e:

logger.exception(

"Failed to fetch leaders.md file",

extra={"repository": repository.name, "error": str(e)}

)

return []

leaders = []

try:

lines = content.split("\n")

logger.debug("Content: %s", content)

for line in lines:

logger.debug("Processing line: %s", line)

match = re.findall(r"\* \[([^\]]+)\]", line)

leaders.extend(match)

except AttributeError:

logger.exception(

"Unable to parse leaders.md content", extra={"repository": repository.name}

)

return leaders

coderabbitai · 2025-03-03T20:37:41Z

backend/apps/owasp/scraper.py

+        try:
+            lines = content.split("\n")
+            logger.debug("Content: %s", content)
+            for line in lines:


🛠️ Refactor suggestion

Check for empty content before processing.

There's no check for empty content before attempting to split and process it, which could lead to errors if the file is empty or not found.

Add a check for empty content:

leaders = [] try: + if not content: + logger.warning("Empty leaders.md content", extra={"repository": repository.name}) + return leaders lines = content.split("\n") logger.debug("Content: %s", content)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

try:

lines = content.split("\n")

logger.debug("Content: %s", content)

for line in lines:

leaders = []

try:

if not content:

logger.warning("Empty leaders.md content", extra={"repository": repository.name})

return leaders

lines = content.split("\n")

logger.debug("Content: %s", content)

for line in lines:

arkid15r

Could you move the logic to RepositoryBasedEntityMode::get_leaders

upd: also some tests would be great 👍

Changed the logic of fetching leader's data

198f9e3

samyak003 requested a review from arkid15r as a code owner March 3, 2025 20:34

github-actions bot added backend backend-tests labels Mar 3, 2025

coderabbitai bot reviewed Mar 3, 2025

View reviewed changes

arkid15r requested changes Mar 4, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changed the logic of fetching leader's data #990

Changed the logic of fetching leader's data #990

samyak003 commented Mar 3, 2025

coderabbitai bot commented Mar 3, 2025 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

sonarqubecloud bot commented Mar 3, 2025

coderabbitai bot left a comment

coderabbitai bot Mar 3, 2025

coderabbitai bot Mar 3, 2025

arkid15r left a comment •

edited

Loading

Changed the logic of fetching leader's data #990

Are you sure you want to change the base?

Changed the logic of fetching leader's data #990

Conversation

samyak003 commented Mar 3, 2025

coderabbitai bot commented Mar 3, 2025 • edited Loading

Summary by CodeRabbit

Walkthrough

Changes

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

sonarqubecloud bot commented Mar 3, 2025

Quality Gate passed

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Mar 3, 2025

Choose a reason for hiding this comment

coderabbitai bot Mar 3, 2025

Choose a reason for hiding this comment

arkid15r left a comment • edited Loading

Choose a reason for hiding this comment

coderabbitai bot commented Mar 3, 2025 •

edited

Loading

arkid15r left a comment •

edited

Loading