-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: ensembl reference collections #2977
base: master
Are you sure you want to change the base?
feat: ensembl reference collections #2977
Conversation
Co-authored-by: Filipe G. Vieira <[email protected]>
Co-authored-by: Filipe G. Vieira <[email protected]>
Co-authored-by: Filipe G. Vieira <[email protected]>
Co-authored-by: Filipe G. Vieira <[email protected]>
Co-authored-by: Filipe G. Vieira <[email protected]>
Co-authored-by: Filipe G. Vieira <[email protected]>
Co-authored-by: Filipe G. Vieira <[email protected]>
Co-authored-by: Filipe G. Vieira <[email protected]>
This PR was marked as stale because it has been open for 6 months with no activity. |
📝 WalkthroughWalkthroughThe pull request introduces enhanced configurability for Ensembl annotation and sequence retrieval processes. The changes add two new optional parameters, Changes
Sequence DiagramsequenceDiagram
participant User
participant Snakefile
participant Wrapper
participant ENSEMBL FTP
User->>Snakefile: Specify branch and collection
Snakefile->>Wrapper: Pass branch and collection parameters
Wrapper->>ENSEMBL FTP: Construct URL with branch/collection
ENSEMBL FTP-->>Wrapper: Return requested data
Wrapper->>User: Download and save data
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
bio/reference/ensembl-sequence/meta.yaml (2)
9-10
: LGTM! Consider enhancing parameter documentation.The new parameters align well with the PR objectives for supporting bacterial references. The examples are clear and helpful.
Consider adding more context about when these parameters should be used, particularly for bacterial references. For example:
- - branch: branch of ftp server to download cache data if required (optional; e.g. "plants") - - collection: collection of ftp server to download cache data if required (optional; e.g. "bacteria_0_collection") + - branch: branch of ftp server to download cache data if required (optional; e.g. "plants"). Required for non-default Ensembl divisions. + - collection: collection of ftp server to download cache data if required (optional; e.g. "bacteria_0_collection"). Required for bacterial references due to additional folder structure.🧰 Tools
🪛 yamllint (1.35.1)
[error] 10-10: no new line character at the end of file
(new-line-at-end-of-file)
10-10
: Add newline at end of file.The YAML file is missing a newline character at the end, which is a YAML best practice.
- collection: collection of ftp server to download cache data if required (optional; e.g. "bacteria_0_collection") +
🧰 Tools
🪛 yamllint (1.35.1)
[error] 10-10: no new line character at the end of file
(new-line-at-end-of-file)
bio/reference/ensembl-annotation/meta.yaml (2)
9-10
: LGTM! Parameters are consistent with ensembl-sequence.The parameters match those in the ensembl-sequence configuration, maintaining consistency across the codebase.
Consider applying the same documentation improvements suggested for the ensembl-sequence configuration to maintain consistency.
🧰 Tools
🪛 yamllint (1.35.1)
[error] 10-10: no new line character at the end of file
(new-line-at-end-of-file)
10-10
: Add newline at end of file.The YAML file is missing a newline character at the end, which is a YAML best practice.
- collection: collection of ftp server to download cache data if required (optional; e.g. "bacteria_0_collection") +
🧰 Tools
🪛 yamllint (1.35.1)
[error] 10-10: no new line character at the end of file
(new-line-at-end-of-file)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
bio/reference/ensembl-annotation/meta.yaml
(1 hunks)bio/reference/ensembl-annotation/test/Snakefile
(1 hunks)bio/reference/ensembl-annotation/wrapper.py
(2 hunks)bio/reference/ensembl-sequence/meta.yaml
(1 hunks)bio/reference/ensembl-sequence/test/Snakefile
(2 hunks)bio/reference/ensembl-sequence/wrapper.py
(2 hunks)test.py
(2 hunks)
✅ Files skipped from review due to trivial changes (1)
- test.py
🧰 Additional context used
📓 Path-based instructions (2)
bio/reference/ensembl-sequence/wrapper.py (2)
Pattern **/*.py
: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self
argument of methods.
Do not suggest type annotation of the cls
argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return
statement.
Pattern **/wrapper.py
: Do not complain about use of undefined variable called snakemake
.
bio/reference/ensembl-annotation/wrapper.py (2)
Pattern **/*.py
: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self
argument of methods.
Do not suggest type annotation of the cls
argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return
statement.
Pattern **/wrapper.py
: Do not complain about use of undefined variable called snakemake
.
🪛 Ruff (0.8.2)
bio/reference/ensembl-sequence/wrapper.py
22-22: SyntaxError: Expected an identifier
bio/reference/ensembl-annotation/wrapper.py
34-34: SyntaxError: Expected an identifier
🪛 yamllint (1.35.1)
bio/reference/ensembl-annotation/meta.yaml
[error] 10-10: no new line character at the end of file
(new-line-at-end-of-file)
bio/reference/ensembl-sequence/meta.yaml
[error] 10-10: no new line character at the end of file
(new-line-at-end-of-file)
🔇 Additional comments (9)
bio/reference/ensembl-annotation/test/Snakefile (2)
25-28
: Confirmed log path update and wrapper usage.
The new log file name "logs/get_annotation_gz.log"
and other changes align with the updated rule. Good clarity and separation of logs.
31-42
: Validate new rule integration and parameters.
The new get_off_branch_annotation
rule nicely showcases how to configure an alternate branch and collection. Please confirm that your tests cover all possible parameter combinations (e.g., missing branch
, missing collection
, and so on).
bio/reference/ensembl-annotation/wrapper.py (2)
55-55
: URL construction appears correct.
Including collection
neatly. No further issues are apparent.
34-36
:
Fix syntax error in snakemake.params.get("collection". "")
.
Replace the period .
with a comma ,
to correctly specify the default value.
-collection = snakemake.params.get("collection". "")
+collection = snakemake.params.get("collection", "")
Likely invalid or redundant comment.
🧰 Tools
🪛 Ruff (0.8.2)
34-34: SyntaxError: Expected an identifier
bio/reference/ensembl-sequence/test/Snakefile (3)
26-26
: Separate logs for single chromosome retrieval.
Using "logs/get_single_genome.log"
provides clarity in debugging.
41-44
: Improved log naming for multiple chromosome retrieval.
Renaming to "logs/get_multiple_chromosome.log"
clarifies scope.
47-59
: New off-branch genome rule is well-structured.
The parameters appear consistent with the annotation rules, promoting symmetry across usage. Consider testing additional branches, if feasible.
bio/reference/ensembl-sequence/wrapper.py (2)
60-60
: URL prefix includes collection correctly.
The logic for url_prefix
accounts for the new collection parameter. Looks good.
22-24
:
Syntax error in params.get("collection". "")
.
Change the period to a comma to avoid the parse error.
-collection = snakemake.params.get("collection". "")
+collection = snakemake.params.get("collection", "")
Likely invalid or redundant comment.
🧰 Tools
🪛 Ruff (0.8.2)
22-22: SyntaxError: Expected an identifier
Ping @mauripops |
feat: collections optional reference parameter(i.e. bacterial references)
This PR builds on the recently pushed (#2928) where branches and specific url can be defined as parameters. The previous method was unable to find data from bacteria ensembl due to data in it containing one more folder level. There is also a fix for the spec variable in in ensembl-sequence where the release version would not match for branches other than the default.
QC
For all wrappers added by this PR,
input:
andoutput:
file paths in the resulting rule can be changed arbitrarily,threads: x
statement withx
being a reasonable default,map_reads
for a step that maps reads),environment.yaml
specifications follow the respective best practices,environment.yaml
pinning has been updated by runningsnakedeploy pin-conda-envs environment.yaml
on a linux machine,input:
oroutput:
),Snakefile
s and their entries are explained via comments (input:
/output:
/params:
etc.),stderr
and/orstdout
are logged correctly (log:
), depending on the wrapped tool,tempfile.gettempdir()
points to (see here; this also means that using any Pythontempfile
default behavior works),meta.yaml
contains a link to the documentation of the respective tool or command,Snakefile
s pass the linting (snakemake --lint
),Snakefile
s are formatted with snakefmt,Summary by CodeRabbit
Release Notes
New Features
branch
andcollection
parameters for Ensembl annotation and sequence downloadsImprovements
Testing