Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Primer3 takes a list of VCFs instead of a VariantLookup object #62

Closed
wants to merge 2 commits into from

Conversation

emmcauley
Copy link
Collaborator

@emmcauley emmcauley commented Oct 3, 2024

Closes #53.

Summary:

  • We want users to be able to provide Primer3 with either a list of VCF paths OR a pre-constructed VariantLookup object if they would like to cutomize the settings (e.g., MAF filtering).
  • The choice of VariantLookup does not need to be exposed to users -- so this PR changes the VariantLookup API to such that, under the hood, the correct VariantLookup object is created. Depending on the value of the boolean flag cached, either a _InMemoryLookup or _DiskBasedLookup object is constructed.
  • The Primer3 invocation is changed to optionally accept a list of VCF Paths or VariantLookup object (with default cached=True).

Tests are updated accordingly. Both the public and private objects are tested directly (so I intentionally did not address some of the comments made by coderabbit).

Copy link

codecov bot commented Oct 3, 2024

Codecov Report

Attention: Patch coverage is 86.95652% with 6 lines in your changes missing coverage. Please review.

Project coverage is 96.11%. Comparing base (385e1b1) to head (86db19f).
Report is 8 commits behind head on main.

Files with missing lines Patch % Lines
prymer/api/variant_lookup.py 90.00% 4 Missing ⚠️
prymer/primer3/primer3.py 66.66% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #62      +/-   ##
==========================================
- Coverage   96.66%   96.11%   -0.56%     
==========================================
  Files          26       26              
  Lines        1708     1723      +15     
  Branches      333      334       +1     
==========================================
+ Hits         1651     1656       +5     
- Misses         31       38       +7     
- Partials       26       29       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@emmcauley emmcauley force-pushed the em_feat_issue53_take_vcf branch from 4f54f9a to 0a83439 Compare October 3, 2024 19:47
@emmcauley emmcauley marked this pull request as ready for review October 3, 2024 19:53
@emmcauley emmcauley requested review from nh13 and tfenne as code owners October 3, 2024 19:53
@tfenne
Copy link
Member

tfenne commented Oct 4, 2024

I'd appreciate @msto's feedback on this PR. I think original the thought was it would just be a list of VCFs vs. a VariatnLookup, but the reality of it being 4 parameters that control how variants are read from VCFs, makes it less desirable IMHO.

I think I would propose either:

  • Creation of a small dataclass (e.g. VariantParams) to hold a list of VCFs and all the other parameters that effect VariantLookup creation, and then optionally take that in primer3.py? (Or VariantParams | VariantLookup | None).
  • Go back to just having it take a Variant Lookup

But I'd wait for @msto's input before doing anything further.

Copy link
Collaborator

@msto msto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tfenne if it were up to me, I would drop min_maf and include_missing_mafs entirely, and defer responsibility for any allele frequency filtering to upstream preprocessing.

You could also change Primer3 to accept a VCF path (default/no filtering applied) or a VariantLookup, instead of removing the latter outright, if you wanted to expose the AF parameters

prymer/primer3/primer3.py Outdated Show resolved Hide resolved
@msto msto assigned tfenne and unassigned msto Oct 7, 2024
@emmcauley emmcauley assigned emmcauley and msto and unassigned tfenne and emmcauley Oct 10, 2024
@emmcauley emmcauley requested a review from msto October 14, 2024 15:02
prymer/api/variant_lookup.py Outdated Show resolved Hide resolved
prymer/api/variant_lookup.py Outdated Show resolved Hide resolved
prymer/primer3/primer3.py Outdated Show resolved Hide resolved
prymer/primer3/primer3.py Outdated Show resolved Hide resolved
prymer/primer3/primer3.py Outdated Show resolved Hide resolved
@msto msto assigned emmcauley and unassigned msto Oct 15, 2024
prymer/api/variant_lookup.py Outdated Show resolved Hide resolved
prymer/api/variant_lookup.py Outdated Show resolved Hide resolved
prymer/primer3/primer3.py Outdated Show resolved Hide resolved
prymer/primer3/primer3.py Outdated Show resolved Hide resolved
prymer/primer3/primer3.py Outdated Show resolved Hide resolved
prymer/primer3/primer3.py Outdated Show resolved Hide resolved
Copy link

coderabbitai bot commented Nov 27, 2024

Walkthrough

The pull request introduces significant changes to the prymer library, specifically in the variant_lookup and primer3 modules. In prymer/api/__init__.py, several imports related to variant lookup have been commented out or removed, altering the public API. The VariantLookup class in prymer/api/variant_lookup.py has been refactored into an abstract base class _VariantLookup and a new context manager class VariantLookup, which now includes a cached parameter for choosing between in-memory and disk-based lookups. The previous concrete implementations have been renamed and adjusted to fit this new structure. The Primer3 class constructor has been modified to accept a more flexible variants parameter, allowing for either a list of VCF paths or a VariantLookup object. Tests across various modules have been updated to reflect these changes, ensuring that the new structure is thoroughly validated.

Assessment against linked issues

Objective Addressed Explanation
Primer3 should accept a variant lookup Path (#53)

Possibly related PRs

Suggested reviewers

  • tfenne
  • msto
  • nh13

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Outside diff range and nitpick comments (18)
prymer/api/__init__.py (2)

13-15: Remove commented imports.

Delete commented code instead of keeping it as comments.

-# from prymer.api.variant_lookup import _DiskBasedLookup
-# from prymer.api.variant_lookup import _InMemoryLookup
-# from prymer.api.variant_lookup import _VariantLookup

API changes need attention - internal classes still in use

The commented-out classes (_DiskBasedLookup, _InMemoryLookup, _VariantLookup) are actively used in:

  • prymer/primer3/primer3.py
  • tests/primer3/test_primer3.py
  • tests/api/test_variant_lookup.py
  • prymer/api/variant_lookup.py

These classes are implementation details still required by the codebase.

🔗 Analysis chain

Line range hint 1-35: Verify API changes don't break existing code.

API changes align with PR objectives but need verification.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Find usage of removed exports
rg "FileBasedVariantLookup|VariantOverlapDetector|cached|disk_based" --type python

Length of output: 115


Script:

#!/bin/bash
# Search for usage of removed exports in Python files
rg -t py "FileBasedVariantLookup|VariantOverlapDetector|cached|disk_based"

# Find all Python files that import from variant_lookup
rg -t py "from prymer\.api\.variant_lookup import"

# Check for any references to the commented-out classes
rg -t py "_DiskBasedLookup|_InMemoryLookup|_VariantLookup"

Length of output: 6911

prymer/primer3/primer3.py (2)

232-239: Consider using more readable type hint syntax.

The Union type hint could be more readable.

-        variants: list[Path] | VariantLookup | None = None,
+        variants: None | list[Path] | VariantLookup = None,

250-270: Consider these improvements to variant lookup initialization:

  1. Make default parameters configurable via constructor.
  2. Type checking could be more robust.
  3. Error message could be more specific.
-        elif isinstance(variants, list) and all(
-            isinstance(path, Path) for path in variants
-        ):
+        elif isinstance(variants, list) and all(isinstance(path, (Path, str)) for path in variants):
             self.variant_lookup = VariantLookup(
                 vcf_paths=variants,
-                cached=True,
-                min_maf=0.00,
-                include_missing_mafs=False,
+                cached=cache_variants,
+                min_maf=min_maf,
+                include_missing_mafs=include_missing_mafs,
             )
         else:
             raise ValueError(
-                "Variant lookup is required to be a list of `Path` objects or a"
-                f" pre-constructed `VariantLookup` object: received {variants}"
+                "Variant lookup must be a list of Path objects or a"
+                f" VariantLookup object, got {type(variants).__name__}"
             )
prymer/api/variant_lookup.py (6)

11-11: Typo in documentation: Duplicate word.

Correct "a minimum Minimum Allele Frequency" to "a minimum Allele Frequency".


12-13: Formatting issue in parameter descriptions.

Avoid starting a new sentence within the parameter list. Combine sentences for clarity.


371-379: Use public class in documentation example.

The example uses the private class _DiskBasedLookup. Update it to use the public VariantLookup class for better clarity.


250-253: Clarify the necessity of the close method in abstract base class.

Since not all subclasses of _VariantLookup require resource cleanup, consider whether close needs to be an abstract method.


310-313: Provide default values for constructor parameters.

In VariantLookup.__init__, consider setting default values for min_maf and include_missing_mafs to enhance usability.


343-345: Optional parameters may be unnecessary.

The maf and include_missing_mafs parameters in VariantLookup.query default to None but are immediately assigned values from the instance if None. Consider simplifying by removing these parameters or setting defaults.

tests/api/test_variant_lookup.py (6)

426-426: Remove unused parameter sample_vcf

The parameter sample_vcf is not used in test_simple_variant_conversion.

Clean up the function signature:

-def test_simple_variant_conversion(vcf_path: Path, sample_vcf: list[VariantRecord]) -> None:
+def test_simple_variant_conversion(vcf_path: Path) -> None:

441-445: Consistent use of context managers

In lines 441-445, VariantLookup is used within a with statement, but elsewhere it's instantiated without one.

For consistency and proper resource management, always use VariantLookup within a context manager.


466-469: Duplicate exception handling

Both context-managed and direct instantiation tests check for the same exception.

Combine them to avoid redundancy.


475-478: Redundant tests for missing VCF paths

Lines 475-478 repeat the same test in different contexts.

Simplify the tests by combining them.


572-577: Clarify cached parameter usage

In line 577, cached=True is set, but the test focus is on MAF filtering.

Ensure that caching is intended here, or adjust if unnecessary.


604-607: Consistent resource management

Lines 604-607 use a context manager but have commented-out code checking the lookup type.

Remove the commented code or update it if type checking is required.

tests/primer3/test_primer3.py (2)

57-62: Address the FIXME comment about variant parsing

The FIXME indicates that variant parsing could be improved to handle alternates more accurately. Enhancing this logic would improve the test coverage and accuracy.

Would you like assistance in implementing this improvement or opening a GitHub issue to track it?


411-472: Refactor tests to eliminate code duplication

The tests test_cached_variant_lookup and test_diskbased_variant_lookup share similar logic. Consider refactoring to reduce duplication, possibly by parameterizing over the cached flag.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 385e1b1 and 3c40cec.

📒 Files selected for processing (5)
  • prymer/api/__init__.py (1 hunks)
  • prymer/api/variant_lookup.py (10 hunks)
  • prymer/primer3/primer3.py (3 hunks)
  • tests/api/test_variant_lookup.py (9 hunks)
  • tests/primer3/test_primer3.py (6 hunks)
🔇 Additional comments (6)
prymer/primer3/primer3.py (1)

20-28: LGTM: Documentation is clear and accurate.

prymer/api/variant_lookup.py (3)

Line range hint 229-237: Parameters should not be optional if they have default values.

In _VariantLookup.__init__, the parameters min_maf and include_missing_mafs are annotated as Optional but are required for initialization. Remove Optional to reflect that they are mandatory.


317-325: Simplify lookup class instantiation.

Reduce redundancy by refactoring the conditional instantiation of _lookup.

Apply this change:

lookup_cls = _InMemoryLookup if cached else _DiskBasedLookup
self._lookup = lookup_cls(
    vcf_paths=vcf_paths, min_maf=min_maf, include_missing_mafs=include_missing_mafs
)

359-364: Verify MAF filtering logic.

The condition maf is None or maf <= 0.0 may unintentionally bypass MAF filtering. Ensure that variants are filtered according to the intended threshold.

tests/primer3/test_primer3.py (2)

40-64: LGTM!

The variant_masking_test_data fixture is well-defined and provides comprehensive test cases.


476-478: LGTM!

The test test_invalid_variant_lookup correctly validates error handling for invalid variant inputs.

Comment on lines 18 to 22
from prymer.api.variant_lookup import VariantLookup
from prymer.api.variant_lookup import VariantType
from prymer.api.variant_lookup import cached
from prymer.api.variant_lookup import _DiskBasedLookup
from prymer.api.variant_lookup import _InMemoryLookup
from prymer.api.variant_lookup import _VariantLookup
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Avoid importing protected classes

Lines 18-22 import classes with leading underscores (_DiskBasedLookup, _InMemoryLookup, _VariantLookup), which are meant to be internal. Importing protected classes in tests can lead to maintenance issues.

Consider accessing functionality through the public VariantLookup interface instead.

@@ -491,7 +494,7 @@
caplog.set_level(logging.DEBUG)
vcf_paths = [vcf_path, mini_chr1_vcf, mini_chr3_vcf]
random.Random(random_seed).shuffle(vcf_paths)
with FileBasedVariantLookup(
with _DiskBasedLookup(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Avoid using internal classes directly

Line 497 uses _DiskBasedLookup directly.

Use the public VariantLookup interface to maintain encapsulation.

@@ -380,7 +380,7 @@


def variant_overlap_detector_query(
detector: VariantOverlapDetector,
detector: VariantLookup | _VariantLookup,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Type hinting with internal classes

Line 383 includes _VariantLookup in the type hint. Exposing internal classes in type hints can leak implementation details.

Limit the type hint to the public class:

-detector: VariantLookup | _VariantLookup,
+detector: VariantLookup,
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
detector: VariantLookup | _VariantLookup,
detector: VariantLookup,

@@ -534,7 +537,7 @@

def test_variant_overlap_detector_query(vcf_path: Path) -> None:
"""Test `VariantOverlapDetector.query()` positional filtering."""
variant_overlap_detector = VariantOverlapDetector(
variant_overlap_detector = _InMemoryLookup(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Direct use of _InMemoryLookup

Line 540 accesses an internal class.

Instantiate VariantLookup instead:

-variant_overlap_detector = _InMemoryLookup(
+variant_overlap_detector = VariantLookup(
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
variant_overlap_detector = _InMemoryLookup(
variant_overlap_detector = VariantLookup(

Comment on lines +456 to +459
variant_lookup = VariantLookup(
vcf_paths=[vcf_path], min_maf=0.01, include_missing_mafs=False, cached=False
)
variant_lookup.query(refname="foo", start=1, end=2)
variant_lookup.query(refname="chr2", start=1, end=2)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Ensure resource cleanup with context manager

Lines 456-459 instantiate VariantLookup without a context manager. This may lead to resource leaks.

Wrap the instantiation in a with statement:

-variant_lookup = VariantLookup(
+with VariantLookup(
     vcf_paths=[vcf_path], min_maf=0.01, include_missing_mafs=False, cached=False
-)   
+) as variant_lookup:
     variant_lookup.query(refname="chr2", start=1, end=2)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
variant_lookup = VariantLookup(
vcf_paths=[vcf_path], min_maf=0.01, include_missing_mafs=False, cached=False
)
variant_lookup.query(refname="foo", start=1, end=2)
variant_lookup.query(refname="chr2", start=1, end=2)
with VariantLookup(
vcf_paths=[vcf_path], min_maf=0.01, include_missing_mafs=False, cached=False
) as variant_lookup:
variant_lookup.query(refname="chr2", start=1, end=2)

Comment on lines +14 to +16
from prymer.api.variant_lookup import VariantLookup
from prymer.api.variant_lookup import _DiskBasedLookup
from prymer.api.variant_lookup import _InMemoryLookup
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Avoid importing private classes

Importing classes prefixed with an underscore indicates they are intended for internal use. Consider refactoring to avoid importing _DiskBasedLookup and _InMemoryLookup.

@emmcauley emmcauley force-pushed the em_feat_issue53_take_vcf branch from 3c40cec to e2d71aa Compare November 27, 2024 14:09
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (12)
prymer/api/variant_lookup.py (4)

10-15: Enhance documentation about memory usage implications.

Add explicit memory usage implications for the cached parameter.

 boolean `cached` flag. It is recommended to set `cached` to `False` for large VCFs. When `cached`
-is True, variants from smaller VCFs are loaded into memory.
+is True, variants from smaller VCFs are loaded into memory, which may significantly increase memory usage.
+When `cached` is False, variants are read from disk on-demand, reducing memory footprint but potentially
+impacting query performance.

Line range hint 256-275: Add return type documentation.

The query method's docstring is missing return type documentation.

         Args:
             refname: the reference name
             start: the 1-based start position
             end: the 1-based end position
             maf: the MAF of the variant
             include_missing_mafs: whether to include variants with a missing MAF
                 (overrides self.include_missing_mafs)
+        Returns:
+            A list of SimpleVariant objects that overlap the specified region.

359-366: Move MAF filtering to base class.

MAF filtering logic should be in _VariantLookup to avoid duplication in concrete implementations.

-        if len(variants) == 0:
-            _logger.debug(f"No variants extracted from region of interest: {refname}:{start}-{end}")
-        if maf is None or maf <= 0.0:
-            return variants
-        elif include_missing_mafs:  # return variants with a MAF above threshold or missing
-            return [v for v in variants if (v.maf is None or v.maf >= maf)]
-        else:
-            return [v for v in variants if v.maf is not None and v.maf >= maf]
+        return self._filter_by_maf(variants, maf, include_missing_mafs)

Add to _VariantLookup:

def _filter_by_maf(
    self,
    variants: list[SimpleVariant],
    maf: Optional[float],
    include_missing_mafs: bool,
) -> list[SimpleVariant]:
    if len(variants) == 0:
        _logger.debug("No variants extracted from region of interest")
    if maf is None or maf <= 0.0:
        return variants
    elif include_missing_mafs:
        return [v for v in variants if (v.maf is None or v.maf >= maf)]
    else:
        return [v for v in variants if v.maf is not None and v.maf >= maf]

Line range hint 392-403: Extract common validation logic.

Index file validation is duplicated. Move to base class.

+    @staticmethod
+    def _validate_vcf_paths(vcf_paths: list[Path]) -> None:
+        if len(vcf_paths) == 0:
+            raise ValueError("No VCF paths given to query.")
+        for path in vcf_paths:
+            if (not path.with_suffix(path.suffix + ".csi").is_file()
+                and not path.with_suffix(path.suffix + ".tbi").is_file()):
+                raise ValueError(f"Cannot perform fetch with missing index file for VCF: {path}")

Update both classes to use this method in their __init__.

Also applies to: 452-463

tests/api/test_variant_lookup.py (3)

Line range hint 497-502: Replace _DiskBasedLookup with VariantLookup

Direct usage of internal class. Use public interface.

-with _DiskBasedLookup(
+with VariantLookup(
     vcf_paths=vcf_paths,
     min_maf=0.00,
-    include_missing_mafs=True
+    include_missing_mafs=True,
+    cached=False
) as variant_lookup:

572-577: Update docstring to reflect VariantLookup usage

Docstring references internal class.

-"""Test that `_InMemoryLookup.query()` MAF filtering is as expected.
+"""Test that `VariantLookup.query()` MAF filtering is as expected.

607-607: Remove commented assertion

Dead code should be removed.

-        # assert isinstance(file_based_vcf_query._lookup, _DiskBasedLookup)
tests/primer3/test_primer3.py (3)

40-64: Add type hints to fixture docstring

Add return type hint to the docstring for better code documentation.

 @pytest.fixture()
 def variant_masking_test_data() -> list[tuple[Span, str, str]]:
+    """Return test data for variant masking.
+    
+    Returns:
+        list[tuple[Span, str, str]]: List of tuples containing:
+            - Span: Region to test
+            - str: Expected hard-masked sequence
+            - str: Expected soft-masked sequence
+    """
     return [

411-478: Add edge cases to variant lookup tests

Add tests for:

  1. Empty VCF list
  2. Non-existent VCF file
  3. Malformed VCF file
def test_empty_vcf_list(genome_ref: Path) -> None:
    """Test behavior with empty VCF list."""
    with pytest.raises(ValueError, match="At least one VCF file required"):
        Primer3(genome_fasta=genome_ref, variants=[])

def test_nonexistent_vcf(genome_ref: Path) -> None:
    """Test behavior with non-existent VCF."""
    with pytest.raises(FileNotFoundError):
        Primer3(genome_fasta=genome_ref, variants=[Path("nonexistent.vcf")])

485-531: Add boundary test cases for design region

Add test cases for:

  1. Minimum valid amplicon length
  2. Maximum supported amplicon length
  3. Equal target and amplicon lengths
@pytest.mark.parametrize("amplicon_length", [
    20,  # minimum (2 * min_primer_length)
    1000000,  # maximum supported
    50,  # equal to target length
])
def test_create_design_region_boundaries(amplicon_length: int, genome_ref: Path) -> None:
    """Test boundary conditions for design region creation."""
    target_region = Span(refname="chr1", start=201, end=250)
    with Primer3(genome_fasta=genome_ref) as designer:
        if amplicon_length < target_region.length:
            with pytest.raises(ValueError, match="exceeds the maximum size"):
                designer._create_design_region(
                    target_region=target_region,
                    max_amplicon_length=amplicon_length,
                    min_primer_length=10,
                )
        else:
            design_region = designer._create_design_region(
                target_region=target_region,
                max_amplicon_length=amplicon_length,
                min_primer_length=10,
            )
            assert design_region.length >= target_region.length
prymer/primer3/primer3.py (2)

250-265: Consider using a dataclass for variant parameters.

The default parameters (cached, min_maf, include_missing_mafs) could be encapsulated in a dataclass for better maintainability.

+@dataclass
+class VariantParams:
+    cached: bool = True
+    min_maf: float = 0.00
+    include_missing_mafs: bool = False

 def __init__(
     self,
     genome_fasta: Path,
     executable: Optional[str] = None,
     variants: list[Path] | VariantLookup | None = None,
+    variant_params: Optional[VariantParams] = None,
 ) -> None:

266-270: Improve error message specificity.

The error message could be more specific about the actual type received.

-            raise ValueError(
-                "Variant lookup is required to be a list of `Path` objects or a"
-                f" pre-constructed `VariantLookup` object: received {variants}"
-            )
+            raise ValueError(
+                f"Variant lookup must be a list of Path objects or a VariantLookup object, "
+                f"got {type(variants).__name__}"
+            )
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 3c40cec and e2d71aa.

📒 Files selected for processing (5)
  • prymer/api/__init__.py (1 hunks)
  • prymer/api/variant_lookup.py (10 hunks)
  • prymer/primer3/primer3.py (3 hunks)
  • tests/api/test_variant_lookup.py (9 hunks)
  • tests/primer3/test_primer3.py (6 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • prymer/api/init.py
🔇 Additional comments (7)
tests/api/test_variant_lookup.py (5)

18-22: Avoid importing protected classes

Importing _DiskBasedLookup, _InMemoryLookup, and _VariantLookup breaks encapsulation. Use public VariantLookup interface instead.


383-383: Remove protected class from type hint

Type hint exposes internal implementation details.

-detector: VariantLookup | _VariantLookup,
+detector: VariantLookup,

441-447: LGTM: Proper context manager usage

Correctly uses context manager with VariantLookup.


456-459: Use context manager for resource cleanup

VariantLookup should be used with context manager to ensure proper cleanup.

-variant_lookup = VariantLookup(
+with VariantLookup(
     vcf_paths=[vcf_path], min_maf=0.01, include_missing_mafs=False, cached=False
-)
+) as variant_lookup:
     variant_lookup.query(refname="chr2", start=1, end=2)

540-542: 🛠️ Refactor suggestion

Replace _InMemoryLookup with VariantLookup

Direct usage of internal class. Use public interface.

-variant_overlap_detector = _InMemoryLookup(
+variant_overlap_detector = VariantLookup(
     vcf_paths=[vcf_path],
     min_maf=0.0,
-    include_missing_mafs=True
+    include_missing_mafs=True,
+    cached=True
)

Likely invalid or redundant comment.

tests/primer3/test_primer3.py (1)

14-16: Avoid importing private classes

prymer/primer3/primer3.py (1)

232-239: LGTM: Constructor signature and documentation are clear.

The parameter rename and type hint accurately reflect the new flexibility in accepting either VCF paths or a VariantLookup object.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (6)
prymer/api/variant_lookup.py (3)

4-14: Add type hints to examples.

Add type hints to improve code readability in documentation.

->>> lookup = VariantLookup(vcf_paths=[Path("./tests/api/data/miniref.variants.vcf.gz")], min_maf=0.00, include_missing_mafs=True)
+>>> lookup: VariantLookup = VariantLookup(vcf_paths=[Path("./tests/api/data/miniref.variants.vcf.gz")], min_maf=0.00, include_missing_mafs=True)

251-254: Add docstring to abstract close method.

Document the purpose of the close method in the abstract class.

 @abstractmethod
 def close(self) -> None:
+    """Close any resources held by this lookup implementation."""
     pass

Line range hint 391-401: Extract duplicate index validation logic.

Move VCF index validation to a common method.

+def _validate_vcf_index(path: Path) -> None:
+    """Validate that the VCF file has an index."""
+    if (
+        not path.with_suffix(path.suffix + ".csi").is_file()
+        and not path.with_suffix(path.suffix + ".tbi").is_file()
+    ):
+        raise ValueError(f"Cannot perform fetch with missing index file for VCF: {path}")

 class _DiskBasedLookup(_VariantLookup):
     def __init__(self, vcf_paths: list[Path], min_maf: Optional[float], include_missing_mafs: bool):
         self._readers: list[VariantFile] = []
         super().__init__(
             vcf_paths=vcf_paths, min_maf=min_maf, include_missing_mafs=include_missing_mafs
         )
         if len(vcf_paths) == 0:
             raise ValueError("No VCF paths given to query.")
         for path in vcf_paths:
-            if (
-                not path.with_suffix(path.suffix + ".csi").is_file()
-                and not path.with_suffix(path.suffix + ".tbi").is_file()
-            ):
-                raise ValueError(f"Cannot perform fetch with missing index file for VCF: {path}")
+            _validate_vcf_index(path)

Apply the same change in _InMemoryLookup.__init__.

Also applies to: 456-466

tests/api/test_variant_lookup.py (3)

Line range hint 496-501: Replace _DiskBasedLookup with VariantLookup.

Direct usage of internal implementation class.

-    with _DiskBasedLookup(
+    with VariantLookup(
         vcf_paths=vcf_paths, min_maf=0.00, include_missing_mafs=True
     ) as variant_lookup:

571-574: Update docstring to reflect new class name.

Docstring still references old internal class name.

-    """Test that `_InMemoryLookup.query()` MAF filtering is as expected.
+    """Test that `VariantLookup.query()` MAF filtering is as expected.

606-606: Remove commented assertion.

Dead code should be removed.

-        # assert isinstance(file_based_vcf_query._lookup, _DiskBasedLookup)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between e2d71aa and 86db19f.

📒 Files selected for processing (2)
  • prymer/api/variant_lookup.py (10 hunks)
  • tests/api/test_variant_lookup.py (9 hunks)
🔇 Additional comments (3)
tests/api/test_variant_lookup.py (3)

455-458: ⚠️ Potential issue

Use context manager to ensure proper resource cleanup.

VariantLookup should be used with a context manager.

-    variant_lookup = VariantLookup(
-        vcf_paths=[vcf_path], min_maf=0.01, include_missing_mafs=False, cached=False
-    )
-    variant_lookup.query(refname="chr2", start=1, end=2)
+    with VariantLookup(
+        vcf_paths=[vcf_path], min_maf=0.01, include_missing_mafs=False, cached=False
+    ) as variant_lookup:
+        variant_lookup.query(refname="chr2", start=1, end=2)

Likely invalid or redundant comment.


425-427: ⚠️ Potential issue

Use public VariantLookup API instead of internal class.

Replace direct usage of _InMemoryLookup with VariantLookup.

-    variant_overlap_detector = _InMemoryLookup(
+    variant_overlap_detector = VariantLookup(
         vcf_paths=[vcf_path], min_maf=0.0, include_missing_mafs=True, cached=True
     )

Likely invalid or redundant comment.


18-21: ⚠️ Potential issue

Remove imports of internal implementation classes.

Direct imports of _DiskBasedLookup, _InMemoryLookup expose implementation details.

-from prymer.api.variant_lookup import _DiskBasedLookup
-from prymer.api.variant_lookup import _InMemoryLookup

Likely invalid or redundant comment.

Comment on lines +469 to +476
def query(
self,
refname: str,
start: int,
end: int,
maf: Optional[float] = None,
include_missing_mafs: bool = None,
) -> list[SimpleVariant]:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add chromosome validation in _InMemoryLookup.query.

Match chromosome validation behavior with _DiskBasedLookup.

 def query(
     self,
     refname: str,
     start: int,
     end: int,
     maf: Optional[float] = None,
     include_missing_mafs: bool = None,
 ) -> list[SimpleVariant]:
+    if not any(refname in self._overlap_detector.intervals_by_refname):
+        _logger.debug(f"No variants found for chromosome {refname}.")
+        return []
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def query(
self,
refname: str,
start: int,
end: int,
maf: Optional[float] = None,
include_missing_mafs: bool = None,
) -> list[SimpleVariant]:
def query(
self,
refname: str,
start: int,
end: int,
maf: Optional[float] = None,
include_missing_mafs: bool = None,
) -> list[SimpleVariant]:
if not any(refname in self._overlap_detector.intervals_by_refname):
_logger.debug(f"No variants found for chromosome {refname}.")
return []

Comment on lines +304 to +327
class VariantLookup(ContextManager):
def __init__(
self,
vcf_paths: list[Path],
min_maf: Optional[float],
include_missing_mafs: bool,
cached: bool = True,
):
self.vcf_paths: list[Path] = vcf_paths
self.min_maf: Optional[float] = min_maf
self.include_missing_mafs: bool = include_missing_mafs
self.cached: bool = cached

self._lookup: _VariantLookup

if cached:
self._lookup = _InMemoryLookup(
vcf_paths=vcf_paths, min_maf=min_maf, include_missing_mafs=include_missing_mafs
)
else:
self._lookup = _DiskBasedLookup(
vcf_paths=vcf_paths, min_maf=min_maf, include_missing_mafs=include_missing_mafs
)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add input validation for vcf_paths.

Validate vcf_paths before creating lookup implementation.

 def __init__(
     self,
     vcf_paths: list[Path],
     min_maf: Optional[float],
     include_missing_mafs: bool,
     cached: bool = True,
 ):
+    if not vcf_paths:
+        raise ValueError("No VCF paths provided")
+    if any(not path.exists() for path in vcf_paths):
+        raise ValueError("One or more VCF paths do not exist")
     self.vcf_paths: list[Path] = vcf_paths
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
class VariantLookup(ContextManager):
def __init__(
self,
vcf_paths: list[Path],
min_maf: Optional[float],
include_missing_mafs: bool,
cached: bool = True,
):
self.vcf_paths: list[Path] = vcf_paths
self.min_maf: Optional[float] = min_maf
self.include_missing_mafs: bool = include_missing_mafs
self.cached: bool = cached
self._lookup: _VariantLookup
if cached:
self._lookup = _InMemoryLookup(
vcf_paths=vcf_paths, min_maf=min_maf, include_missing_mafs=include_missing_mafs
)
else:
self._lookup = _DiskBasedLookup(
vcf_paths=vcf_paths, min_maf=min_maf, include_missing_mafs=include_missing_mafs
)
class VariantLookup(ContextManager):
def __init__(
self,
vcf_paths: list[Path],
min_maf: Optional[float],
include_missing_mafs: bool,
cached: bool = True,
):
if not vcf_paths:
raise ValueError("No VCF paths provided")
if any(not path.exists() for path in vcf_paths):
raise ValueError("One or more VCF paths do not exist")
self.vcf_paths: list[Path] = vcf_paths
self.min_maf: Optional[float] = min_maf
self.include_missing_mafs: bool = include_missing_mafs
self.cached: bool = cached
self._lookup: _VariantLookup
if cached:
self._lookup = _InMemoryLookup(
vcf_paths=vcf_paths, min_maf=min_maf, include_missing_mafs=include_missing_mafs
)
else:
self._lookup = _DiskBasedLookup(
vcf_paths=vcf_paths, min_maf=min_maf, include_missing_mafs=include_missing_mafs
)

@emmcauley emmcauley requested a review from clintval November 27, 2024 21:20
@emmcauley emmcauley assigned tfenne and unassigned emmcauley Dec 4, 2024
@emmcauley
Copy link
Collaborator Author

closing in favor of #115

@emmcauley emmcauley closed this Jan 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Primer3 should accept a variant lookup Path
3 participants