-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: various changes to make building of catalog working again #30
Conversation
Caution Review failedThe pull request is closed. 📝 WalkthroughWalkthroughThis pull request introduces updates to the Snakemake Workflow Catalog repository, focusing on workflow configuration, dependency management, and script improvements. The changes span multiple files, including GitHub workflow configurations, README documentation, environment specifications, and core scripts. Key modifications include updating deployment actions, refining repository search logic, enhancing rate limit handling, and specifying exact dependency versions. The updates aim to improve the catalog's reliability, maintainability, and precision in processing and documenting Snakemake workflows. Changes
Sequence DiagramsequenceDiagram
participant Workflow as GitHub Workflow
participant Script as generate-catalog.py
participant GitHub as GitHub API
participant Repo as Repository
Workflow->>Script: Trigger catalog generation
Script->>GitHub: Search repositories
GitHub-->>Script: Return repository list
Script->>Script: Apply date threshold
Script->>Repo: Validate repository
alt Repository meets criteria
Script->>Repo: Process repository
Script->>Workflow: Generate catalog entry
else Repository skipped
Script->>Workflow: Log skip reason
end
This sequence diagram illustrates the high-level workflow of the catalog generation process, highlighting the key steps of repository search, validation, and processing. 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (4)
README.md (4)
26-31
: Fix minor grammatical issues in the workflow criteria section.Apply these corrections:
-The workflow is contained in a public Github repository. +The workflow is contained in a public GitHub repository. -very large files should be handled via [Git LFS] +large files should be handled via [Git LFS] -, so that they can be stripped out during cloning). + so that they can be stripped out during cloning).🧰 Tools
🪛 LanguageTool
[misspelling] ~27-~27: This word is normally spelled with a hyphen.
Context: ...g the words "snakemake" and "workflow" (case insensitive). - The repository contains a workflow ...(EN_COMPOUNDS_CASE_INSENSITIVE)
[style] ~30-~30: As an alternative to the over-used intensifier ‘very’, consider replacing this phrase.
Context: ...ions/understanding-github-actions) job (very large files should be handled via [Git LFS](h...(EN_WEAK_ADJECTIVE)
[typographical] ~30-~30: The conjunction “so that” does not have a comma in front.
Context: ...working-with-files/managing-large-files), so that they can be stripped out during cloning...(SO_THAT_UNNECESSARY_COMMA)
35-39
: Improve clarity in the standardized usage section.Apply these corrections:
-In order to additionally appear in the "standardized usage" area, repositories additionally have to: +To appear in the "standardized usage" area, repositories must:
55-55
: Fix grammatical issues in the usage instructions.Apply these corrections:
-Once included in the standardized usage area you can link +Once included in the standardized usage area, you can link -Hence, in order to update your workflow's records here, +Hence, to update your workflow's records here,Also applies to: 59-59
🧰 Tools
🪛 LanguageTool
[uncategorized] ~55-~55: Possible missing comma found.
Context: ...Once included in the standardized usage area you can link directly to the usage inst...(AI_HYDRA_LEO_MISSING_COMMA)
80-80
: Format the URL properly.Convert the bare URL to a proper markdown link:
-For more information and to explore the available workflows, visit https://snakemake.github.io/snakemake-workflow-catalog/. +For more information and to explore the available workflows, visit [the Snakemake Workflow Catalog](https://snakemake.github.io/snakemake-workflow-catalog/).🧰 Tools
🪛 Markdownlint (0.37.0)
80-80: null
Bare URL used(MD034, no-bare-urls)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
.github/workflows/generate.yml
(4 hunks).github/workflows/test-repo.yml
(1 hunks)README.md
(1 hunks)blacklist.txt
(1 hunks)environment.yml
(1 hunks)scripts/common.py
(1 hunks)scripts/generate-catalog.py
(7 hunks)
✅ Files skipped from review due to trivial changes (1)
- blacklist.txt
🧰 Additional context used
📓 Path-based instructions (2)
scripts/common.py (1)
Pattern **/*.py
: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self
argument of methods.
Do not suggest type annotation of the cls
argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return
statement.
scripts/generate-catalog.py (1)
Pattern **/*.py
: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self
argument of methods.
Do not suggest type annotation of the cls
argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return
statement.
🪛 Ruff (0.8.2)
scripts/generate-catalog.py
14-14: Redefinition of unused Environment
from line 12
Remove definition: Environment
(F811)
19-19: common.check_repo_exists
imported but unused
Remove unused import: common.check_repo_exists
(F401)
275-275: Function definition does not bind loop variable config_readme
(B023)
🪛 LanguageTool
README.md
[misspelling] ~27-~27: This word is normally spelled with a hyphen.
Context: ...g the words "snakemake" and "workflow" (case insensitive). - The repository contains a workflow ...
(EN_COMPOUNDS_CASE_INSENSITIVE)
[style] ~30-~30: As an alternative to the over-used intensifier ‘very’, consider replacing this phrase.
Context: ...ions/understanding-github-actions) job (very large files should be handled via [Git LFS](h...
(EN_WEAK_ADJECTIVE)
[typographical] ~30-~30: The conjunction “so that” does not have a comma in front.
Context: ...working-with-files/managing-large-files), so that they can be stripped out during cloning...
(SO_THAT_UNNECESSARY_COMMA)
[style] ~34-~34: Consider a shorter alternative to avoid wordiness.
Context: ...ere. ### Standardized usage workflows In order to additionally appear in the "standardize...
(IN_ORDER_TO_PREMIUM)
[uncategorized] ~55-~55: Possible missing comma found.
Context: ...Once included in the standardized usage area you can link directly to the usage inst...
(AI_HYDRA_LEO_MISSING_COMMA)
[style] ~59-~59: Consider a shorter alternative to avoid wordiness.
Context: ... the latest non-preview release. Hence, in order to update your workflow's records here, yo...
(IN_ORDER_TO_PREMIUM)
🪛 Markdownlint (0.37.0)
README.md
80-80: null
Bare URL used
(MD034, no-bare-urls)
🔇 Additional comments (15)
scripts/generate-catalog.py (6)
9-10
: Imports for regex and datetime calculations look good.
These imports are necessary for filtering tokens and implementing the date threshold.
111-112
: Date threshold-based repository search is a solid improvement.
Using “pushed:>=” in the search query helps filter stale repositories.
Also applies to: 114-115
252-252
: Redacting GitHub tokens is a good security measure.
Ensure the regex is comprehensive enough to catch all possible token formats.
272-272
: Topic retrieval logic is appropriate.
This step ensures relevant repository metadata is captured.
278-287
: Repo object instantiation is consistent with the updated fields.
The logic for setting the usage flags, release information, readme content, etc., is clear and cohesive.
32-33
: Verify the logic behind 'latest_commit' and scaled 'offset'.
• "latest_commit" is set from an environment variable to determine how far back you look for updates. Ensure that all related calls or references are aware of this new usage.
• Changing "offset" from its old calculation to "offset * 10" can drastically expand your repository search. Double-check that this multiplication aligns with your desired iteration logic.
✅ Verification successful
Let me search for references to LATEST_COMMIT
to complete the verification.
Based on the analysis of both offset
and LATEST_COMMIT
usage in the codebase, I can now provide the final response:
The offset multiplication and latest_commit usage are correctly scoped
• The offset * 10
scaling is used consistently within generate-catalog.py for pagination, with proper bounds checking via min(offset + 100, total_count)
.
• LATEST_COMMIT
is only used once to determine the search boundary and doesn't have any dependent calls that need updating.
• The offset value is properly initialized from environment variables in common.py and used appropriately across the cleanup-catalog.py script.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Searching all references to 'offset' in the codebase to ensure the usage is correct
rg 'offset'
Length of output: 66324
Script:
#!/bin/bash
# Search for LATEST_COMMIT references
rg "LATEST_COMMIT"
Length of output: 99
environment.yml (1)
6-12
: Pinning specific package versions increases reproducibility.
This helps avoid unexpected breakages from upstream updates.
.github/workflows/test-repo.yml (1)
57-57
: Updated micromamba action version helps maintain consistency.
Switching to “mamba-org/setup-micromamba@v2” aligns with the other workflow changes.
.github/workflows/generate.yml (5)
5-5
: Weekly schedule and additional branch trigger are logical.
Reducing the cron frequency to once a week can help conserve computing resources, and including the “dev” branch broadens testing.
Also applies to: 9-9
20-25
: Migration to 'mamba-org/setup-micromamba@v2' and pulling latest changes.
This ensures deployment uses the new environment file and rebase prevents push conflicts.
Also applies to: 26-28
35-35
: 'LATEST_COMMIT' environment variable.
Exposing it here matches the usage in 'generate-catalog.py' for filtering repositories by date.
43-43
: Graceful handling of 'No changes to commit'.
Nicely avoids exit failure when there are no new modifications.
60-64
: Consistent cleanup job improvements.
Applying the same micromamba upgrade and pull logic to the cleanup step keeps the workflows in sync.
Also applies to: 66-68, 82-82, 88-88
scripts/common.py (1)
Line range hint 45-54
: LGTM! Well-implemented rate limit handling.
The new rate_limit_wait
function is a solid implementation with:
- Correct timestamp handling
- Buffer time to ensure reset completion
- Proper logging for debugging
- Protection against negative wait times
README.md (1)
3-6
: LGTM! Informative status badges added.
The addition of status badges provides valuable at-a-glance information about the project's health and activity.
Dear admins, can somebody review this PR and merge if everything is OK? |
Any comments or reviews on this PR? It would be nice to fix this timely so that the catalog can build again. |
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
scripts/generate-catalog.py
(7 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
scripts/generate-catalog.py (1)
Pattern **/*.py
: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self
argument of methods.
Do not suggest type annotation of the cls
argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return
statement.
🪛 Ruff (0.8.2)
scripts/generate-catalog.py
14-14: Redefinition of unused Environment
from line 12
Remove definition: Environment
(F811)
274-274: Function definition does not bind loop variable config_readme
(B023)
🔇 Additional comments (8)
scripts/generate-catalog.py (8)
9-10
: LGTM: New imports support enhanced functionality.The added imports for
re
anddatetime
modules are necessary for the new date-based filtering and output sanitization features.
17-26
: LGTM: Cleaned up imports from common module.Unused imports have been removed, improving code maintainability.
110-114
: LGTM: Improved repository filtering with date threshold.The addition of date-based filtering (
pushed:>={date_threshold}
) effectively implements the PR objective of processing only recently updated repositories, which should reduce build times and resource usage.
212-213
: LGTM: Added proper rule file extension validation.The check ensures that rule modules use the
.smk
extension, promoting consistency across workflows.
251-251
: LGTM: Added security measure to prevent token exposure.The regex pattern effectively sanitizes GitHub tokens from the linting output, preventing accidental exposure of sensitive information.
277-286
: LGTM: Improved code readability.The Repo instantiation has been reformatted for better readability while maintaining functionality.
296-298
: LGTM: Improved efficiency of old repository handling.The use of list comprehension provides a more efficient and readable way to extend the repository list.
31-33
: Verify the impact of offset calculation change.The offset calculation has been changed from division to multiplication (
offset * 10
). This significant change could affect repository pagination and processing.Run this script to analyze the impact:
✅ Verification successful
Offset calculation change is correct ✅
The offset multiplication by 10 in generate-catalog.py is intentional for pagination purposes and works correctly with the offset value imported from common.py. The different offset calculations in cleanup-catalog.py serve a separate batch processing purpose.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check the usage of offset in the codebase # and its historical values # Check offset usage in code rg -B 5 -A 5 'offset.*=.*' # Check historical offset values from git history git log -p | rg -B 5 -A 5 'offset.*=.*'Length of output: 2479
Main changes:
environment.yml
filesnakemake --lint
output which sometimes get exposeduntar
comment to fix warnings from pythonREADME.md
Summary by CodeRabbit
New Features
README.md
to include new sections on workflow generation, deployment, and contributing guidelines.generate-catalog
script to filter repositories based on last push date and improved error handling.Bug Fixes
Documentation
README.md
for better visibility of repository status.Chores
blacklist.txt
file.environment.yml
.