-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Primer3 takes a list of VCFs instead of a VariantLookup object #62
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #62 +/- ##
==========================================
- Coverage 96.66% 96.11% -0.56%
==========================================
Files 26 26
Lines 1708 1723 +15
Branches 333 334 +1
==========================================
+ Hits 1651 1656 +5
- Misses 31 38 +7
- Partials 26 29 +3 ☔ View full report in Codecov by Sentry. |
4f54f9a
to
0a83439
Compare
I'd appreciate @msto's feedback on this PR. I think original the thought was it would just be a list of VCFs vs. a VariatnLookup, but the reality of it being 4 parameters that control how variants are read from VCFs, makes it less desirable IMHO. I think I would propose either:
But I'd wait for @msto's input before doing anything further. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tfenne if it were up to me, I would drop min_maf
and include_missing_mafs
entirely, and defer responsibility for any allele frequency filtering to upstream preprocessing.
You could also change Primer3
to accept a VCF path (default/no filtering applied) or a VariantLookup
, instead of removing the latter outright, if you wanted to expose the AF parameters
8d049f5
to
3c40cec
Compare
WalkthroughThe pull request introduces significant changes to the Assessment against linked issues
Possibly related PRs
Suggested reviewers
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🧹 Outside diff range and nitpick comments (18)
prymer/api/__init__.py (2)
13-15
: Remove commented imports.Delete commented code instead of keeping it as comments.
-# from prymer.api.variant_lookup import _DiskBasedLookup -# from prymer.api.variant_lookup import _InMemoryLookup -# from prymer.api.variant_lookup import _VariantLookup
API changes need attention - internal classes still in use
The commented-out classes (
_DiskBasedLookup
,_InMemoryLookup
,_VariantLookup
) are actively used in:
prymer/primer3/primer3.py
tests/primer3/test_primer3.py
tests/api/test_variant_lookup.py
prymer/api/variant_lookup.py
These classes are implementation details still required by the codebase.
🔗 Analysis chain
Line range hint
1-35
: Verify API changes don't break existing code.API changes align with PR objectives but need verification.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Find usage of removed exports rg "FileBasedVariantLookup|VariantOverlapDetector|cached|disk_based" --type pythonLength of output: 115
Script:
#!/bin/bash # Search for usage of removed exports in Python files rg -t py "FileBasedVariantLookup|VariantOverlapDetector|cached|disk_based" # Find all Python files that import from variant_lookup rg -t py "from prymer\.api\.variant_lookup import" # Check for any references to the commented-out classes rg -t py "_DiskBasedLookup|_InMemoryLookup|_VariantLookup"Length of output: 6911
prymer/primer3/primer3.py (2)
232-239
: Consider using more readable type hint syntax.The Union type hint could be more readable.
- variants: list[Path] | VariantLookup | None = None, + variants: None | list[Path] | VariantLookup = None,
250-270
: Consider these improvements to variant lookup initialization:
- Make default parameters configurable via constructor.
- Type checking could be more robust.
- Error message could be more specific.
- elif isinstance(variants, list) and all( - isinstance(path, Path) for path in variants - ): + elif isinstance(variants, list) and all(isinstance(path, (Path, str)) for path in variants): self.variant_lookup = VariantLookup( vcf_paths=variants, - cached=True, - min_maf=0.00, - include_missing_mafs=False, + cached=cache_variants, + min_maf=min_maf, + include_missing_mafs=include_missing_mafs, ) else: raise ValueError( - "Variant lookup is required to be a list of `Path` objects or a" - f" pre-constructed `VariantLookup` object: received {variants}" + "Variant lookup must be a list of Path objects or a" + f" VariantLookup object, got {type(variants).__name__}" )prymer/api/variant_lookup.py (6)
11-11
: Typo in documentation: Duplicate word.Correct "a minimum Minimum Allele Frequency" to "a minimum Allele Frequency".
12-13
: Formatting issue in parameter descriptions.Avoid starting a new sentence within the parameter list. Combine sentences for clarity.
371-379
: Use public class in documentation example.The example uses the private class
_DiskBasedLookup
. Update it to use the publicVariantLookup
class for better clarity.
250-253
: Clarify the necessity of theclose
method in abstract base class.Since not all subclasses of
_VariantLookup
require resource cleanup, consider whetherclose
needs to be an abstract method.
310-313
: Provide default values for constructor parameters.In
VariantLookup.__init__
, consider setting default values formin_maf
andinclude_missing_mafs
to enhance usability.
343-345
: Optional parameters may be unnecessary.The
maf
andinclude_missing_mafs
parameters inVariantLookup.query
default toNone
but are immediately assigned values from the instance ifNone
. Consider simplifying by removing these parameters or setting defaults.tests/api/test_variant_lookup.py (6)
426-426
: Remove unused parametersample_vcf
The parameter
sample_vcf
is not used intest_simple_variant_conversion
.Clean up the function signature:
-def test_simple_variant_conversion(vcf_path: Path, sample_vcf: list[VariantRecord]) -> None: +def test_simple_variant_conversion(vcf_path: Path) -> None:
441-445
: Consistent use of context managersIn lines 441-445,
VariantLookup
is used within awith
statement, but elsewhere it's instantiated without one.For consistency and proper resource management, always use
VariantLookup
within a context manager.
466-469
: Duplicate exception handlingBoth context-managed and direct instantiation tests check for the same exception.
Combine them to avoid redundancy.
475-478
: Redundant tests for missing VCF pathsLines 475-478 repeat the same test in different contexts.
Simplify the tests by combining them.
572-577
: Clarifycached
parameter usageIn line 577,
cached=True
is set, but the test focus is on MAF filtering.Ensure that caching is intended here, or adjust if unnecessary.
604-607
: Consistent resource managementLines 604-607 use a context manager but have commented-out code checking the lookup type.
Remove the commented code or update it if type checking is required.
tests/primer3/test_primer3.py (2)
57-62
: Address the FIXME comment about variant parsingThe
FIXME
indicates that variant parsing could be improved to handle alternates more accurately. Enhancing this logic would improve the test coverage and accuracy.Would you like assistance in implementing this improvement or opening a GitHub issue to track it?
411-472
: Refactor tests to eliminate code duplicationThe tests
test_cached_variant_lookup
andtest_diskbased_variant_lookup
share similar logic. Consider refactoring to reduce duplication, possibly by parameterizing over thecached
flag.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (5)
prymer/api/__init__.py
(1 hunks)prymer/api/variant_lookup.py
(10 hunks)prymer/primer3/primer3.py
(3 hunks)tests/api/test_variant_lookup.py
(9 hunks)tests/primer3/test_primer3.py
(6 hunks)
🔇 Additional comments (6)
prymer/primer3/primer3.py (1)
20-28
: LGTM: Documentation is clear and accurate.
prymer/api/variant_lookup.py (3)
Line range hint 229-237
: Parameters should not be optional if they have default values.
In _VariantLookup.__init__
, the parameters min_maf
and include_missing_mafs
are annotated as Optional
but are required for initialization. Remove Optional
to reflect that they are mandatory.
317-325
: Simplify lookup class instantiation.
Reduce redundancy by refactoring the conditional instantiation of _lookup
.
Apply this change:
lookup_cls = _InMemoryLookup if cached else _DiskBasedLookup
self._lookup = lookup_cls(
vcf_paths=vcf_paths, min_maf=min_maf, include_missing_mafs=include_missing_mafs
)
359-364
: Verify MAF filtering logic.
The condition maf is None or maf <= 0.0
may unintentionally bypass MAF filtering. Ensure that variants are filtered according to the intended threshold.
tests/primer3/test_primer3.py (2)
40-64
: LGTM!
The variant_masking_test_data
fixture is well-defined and provides comprehensive test cases.
476-478
: LGTM!
The test test_invalid_variant_lookup
correctly validates error handling for invalid variant inputs.
tests/api/test_variant_lookup.py
Outdated
from prymer.api.variant_lookup import VariantLookup | ||
from prymer.api.variant_lookup import VariantType | ||
from prymer.api.variant_lookup import cached | ||
from prymer.api.variant_lookup import _DiskBasedLookup | ||
from prymer.api.variant_lookup import _InMemoryLookup | ||
from prymer.api.variant_lookup import _VariantLookup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Avoid importing protected classes
Lines 18-22 import classes with leading underscores (_DiskBasedLookup
, _InMemoryLookup
, _VariantLookup
), which are meant to be internal. Importing protected classes in tests can lead to maintenance issues.
Consider accessing functionality through the public VariantLookup
interface instead.
@@ -491,7 +494,7 @@ | |||
caplog.set_level(logging.DEBUG) | |||
vcf_paths = [vcf_path, mini_chr1_vcf, mini_chr3_vcf] | |||
random.Random(random_seed).shuffle(vcf_paths) | |||
with FileBasedVariantLookup( | |||
with _DiskBasedLookup( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Avoid using internal classes directly
Line 497 uses _DiskBasedLookup
directly.
Use the public VariantLookup
interface to maintain encapsulation.
tests/api/test_variant_lookup.py
Outdated
@@ -380,7 +380,7 @@ | |||
|
|||
|
|||
def variant_overlap_detector_query( | |||
detector: VariantOverlapDetector, | |||
detector: VariantLookup | _VariantLookup, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Type hinting with internal classes
Line 383 includes _VariantLookup
in the type hint. Exposing internal classes in type hints can leak implementation details.
Limit the type hint to the public class:
-detector: VariantLookup | _VariantLookup,
+detector: VariantLookup,
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
detector: VariantLookup | _VariantLookup, | |
detector: VariantLookup, |
tests/api/test_variant_lookup.py
Outdated
@@ -534,7 +537,7 @@ | |||
|
|||
def test_variant_overlap_detector_query(vcf_path: Path) -> None: | |||
"""Test `VariantOverlapDetector.query()` positional filtering.""" | |||
variant_overlap_detector = VariantOverlapDetector( | |||
variant_overlap_detector = _InMemoryLookup( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Direct use of _InMemoryLookup
Line 540 accesses an internal class.
Instantiate VariantLookup
instead:
-variant_overlap_detector = _InMemoryLookup(
+variant_overlap_detector = VariantLookup(
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
variant_overlap_detector = _InMemoryLookup( | |
variant_overlap_detector = VariantLookup( |
variant_lookup = VariantLookup( | ||
vcf_paths=[vcf_path], min_maf=0.01, include_missing_mafs=False, cached=False | ||
) | ||
variant_lookup.query(refname="foo", start=1, end=2) | ||
variant_lookup.query(refname="chr2", start=1, end=2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure resource cleanup with context manager
Lines 456-459 instantiate VariantLookup
without a context manager. This may lead to resource leaks.
Wrap the instantiation in a with
statement:
-variant_lookup = VariantLookup(
+with VariantLookup(
vcf_paths=[vcf_path], min_maf=0.01, include_missing_mafs=False, cached=False
-)
+) as variant_lookup:
variant_lookup.query(refname="chr2", start=1, end=2)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
variant_lookup = VariantLookup( | |
vcf_paths=[vcf_path], min_maf=0.01, include_missing_mafs=False, cached=False | |
) | |
variant_lookup.query(refname="foo", start=1, end=2) | |
variant_lookup.query(refname="chr2", start=1, end=2) | |
with VariantLookup( | |
vcf_paths=[vcf_path], min_maf=0.01, include_missing_mafs=False, cached=False | |
) as variant_lookup: | |
variant_lookup.query(refname="chr2", start=1, end=2) |
from prymer.api.variant_lookup import VariantLookup | ||
from prymer.api.variant_lookup import _DiskBasedLookup | ||
from prymer.api.variant_lookup import _InMemoryLookup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Avoid importing private classes
Importing classes prefixed with an underscore indicates they are intended for internal use. Consider refactoring to avoid importing _DiskBasedLookup
and _InMemoryLookup
.
3c40cec
to
e2d71aa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (12)
prymer/api/variant_lookup.py (4)
10-15
: Enhance documentation about memory usage implications.Add explicit memory usage implications for the
cached
parameter.boolean `cached` flag. It is recommended to set `cached` to `False` for large VCFs. When `cached` -is True, variants from smaller VCFs are loaded into memory. +is True, variants from smaller VCFs are loaded into memory, which may significantly increase memory usage. +When `cached` is False, variants are read from disk on-demand, reducing memory footprint but potentially +impacting query performance.
Line range hint
256-275
: Add return type documentation.The
query
method's docstring is missing return type documentation.Args: refname: the reference name start: the 1-based start position end: the 1-based end position maf: the MAF of the variant include_missing_mafs: whether to include variants with a missing MAF (overrides self.include_missing_mafs) + Returns: + A list of SimpleVariant objects that overlap the specified region.
359-366
: Move MAF filtering to base class.MAF filtering logic should be in
_VariantLookup
to avoid duplication in concrete implementations.- if len(variants) == 0: - _logger.debug(f"No variants extracted from region of interest: {refname}:{start}-{end}") - if maf is None or maf <= 0.0: - return variants - elif include_missing_mafs: # return variants with a MAF above threshold or missing - return [v for v in variants if (v.maf is None or v.maf >= maf)] - else: - return [v for v in variants if v.maf is not None and v.maf >= maf] + return self._filter_by_maf(variants, maf, include_missing_mafs)Add to
_VariantLookup
:def _filter_by_maf( self, variants: list[SimpleVariant], maf: Optional[float], include_missing_mafs: bool, ) -> list[SimpleVariant]: if len(variants) == 0: _logger.debug("No variants extracted from region of interest") if maf is None or maf <= 0.0: return variants elif include_missing_mafs: return [v for v in variants if (v.maf is None or v.maf >= maf)] else: return [v for v in variants if v.maf is not None and v.maf >= maf]
Line range hint
392-403
: Extract common validation logic.Index file validation is duplicated. Move to base class.
+ @staticmethod + def _validate_vcf_paths(vcf_paths: list[Path]) -> None: + if len(vcf_paths) == 0: + raise ValueError("No VCF paths given to query.") + for path in vcf_paths: + if (not path.with_suffix(path.suffix + ".csi").is_file() + and not path.with_suffix(path.suffix + ".tbi").is_file()): + raise ValueError(f"Cannot perform fetch with missing index file for VCF: {path}")Update both classes to use this method in their
__init__
.Also applies to: 452-463
tests/api/test_variant_lookup.py (3)
Line range hint
497-502
: Replace _DiskBasedLookup with VariantLookupDirect usage of internal class. Use public interface.
-with _DiskBasedLookup( +with VariantLookup( vcf_paths=vcf_paths, min_maf=0.00, - include_missing_mafs=True + include_missing_mafs=True, + cached=False ) as variant_lookup:
572-577
: Update docstring to reflect VariantLookup usageDocstring references internal class.
-"""Test that `_InMemoryLookup.query()` MAF filtering is as expected. +"""Test that `VariantLookup.query()` MAF filtering is as expected.
607-607
: Remove commented assertionDead code should be removed.
- # assert isinstance(file_based_vcf_query._lookup, _DiskBasedLookup)
tests/primer3/test_primer3.py (3)
40-64
: Add type hints to fixture docstringAdd return type hint to the docstring for better code documentation.
@pytest.fixture() def variant_masking_test_data() -> list[tuple[Span, str, str]]: + """Return test data for variant masking. + + Returns: + list[tuple[Span, str, str]]: List of tuples containing: + - Span: Region to test + - str: Expected hard-masked sequence + - str: Expected soft-masked sequence + """ return [
411-478
: Add edge cases to variant lookup testsAdd tests for:
- Empty VCF list
- Non-existent VCF file
- Malformed VCF file
def test_empty_vcf_list(genome_ref: Path) -> None: """Test behavior with empty VCF list.""" with pytest.raises(ValueError, match="At least one VCF file required"): Primer3(genome_fasta=genome_ref, variants=[]) def test_nonexistent_vcf(genome_ref: Path) -> None: """Test behavior with non-existent VCF.""" with pytest.raises(FileNotFoundError): Primer3(genome_fasta=genome_ref, variants=[Path("nonexistent.vcf")])
485-531
: Add boundary test cases for design regionAdd test cases for:
- Minimum valid amplicon length
- Maximum supported amplicon length
- Equal target and amplicon lengths
@pytest.mark.parametrize("amplicon_length", [ 20, # minimum (2 * min_primer_length) 1000000, # maximum supported 50, # equal to target length ]) def test_create_design_region_boundaries(amplicon_length: int, genome_ref: Path) -> None: """Test boundary conditions for design region creation.""" target_region = Span(refname="chr1", start=201, end=250) with Primer3(genome_fasta=genome_ref) as designer: if amplicon_length < target_region.length: with pytest.raises(ValueError, match="exceeds the maximum size"): designer._create_design_region( target_region=target_region, max_amplicon_length=amplicon_length, min_primer_length=10, ) else: design_region = designer._create_design_region( target_region=target_region, max_amplicon_length=amplicon_length, min_primer_length=10, ) assert design_region.length >= target_region.lengthprymer/primer3/primer3.py (2)
250-265
: Consider using a dataclass for variant parameters.The default parameters (cached, min_maf, include_missing_mafs) could be encapsulated in a dataclass for better maintainability.
+@dataclass +class VariantParams: + cached: bool = True + min_maf: float = 0.00 + include_missing_mafs: bool = False def __init__( self, genome_fasta: Path, executable: Optional[str] = None, variants: list[Path] | VariantLookup | None = None, + variant_params: Optional[VariantParams] = None, ) -> None:
266-270
: Improve error message specificity.The error message could be more specific about the actual type received.
- raise ValueError( - "Variant lookup is required to be a list of `Path` objects or a" - f" pre-constructed `VariantLookup` object: received {variants}" - ) + raise ValueError( + f"Variant lookup must be a list of Path objects or a VariantLookup object, " + f"got {type(variants).__name__}" + )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (5)
prymer/api/__init__.py
(1 hunks)prymer/api/variant_lookup.py
(10 hunks)prymer/primer3/primer3.py
(3 hunks)tests/api/test_variant_lookup.py
(9 hunks)tests/primer3/test_primer3.py
(6 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- prymer/api/init.py
🔇 Additional comments (7)
tests/api/test_variant_lookup.py (5)
18-22
: Avoid importing protected classes
Importing _DiskBasedLookup
, _InMemoryLookup
, and _VariantLookup
breaks encapsulation. Use public VariantLookup
interface instead.
383-383
: Remove protected class from type hint
Type hint exposes internal implementation details.
-detector: VariantLookup | _VariantLookup,
+detector: VariantLookup,
441-447
: LGTM: Proper context manager usage
Correctly uses context manager with VariantLookup
.
456-459
: Use context manager for resource cleanup
VariantLookup
should be used with context manager to ensure proper cleanup.
-variant_lookup = VariantLookup(
+with VariantLookup(
vcf_paths=[vcf_path], min_maf=0.01, include_missing_mafs=False, cached=False
-)
+) as variant_lookup:
variant_lookup.query(refname="chr2", start=1, end=2)
540-542
: 🛠️ Refactor suggestion
Replace _InMemoryLookup with VariantLookup
Direct usage of internal class. Use public interface.
-variant_overlap_detector = _InMemoryLookup(
+variant_overlap_detector = VariantLookup(
vcf_paths=[vcf_path],
min_maf=0.0,
- include_missing_mafs=True
+ include_missing_mafs=True,
+ cached=True
)
Likely invalid or redundant comment.
tests/primer3/test_primer3.py (1)
14-16
: Avoid importing private classes
prymer/primer3/primer3.py (1)
232-239
: LGTM: Constructor signature and documentation are clear.
The parameter rename and type hint accurately reflect the new flexibility in accepting either VCF paths or a VariantLookup object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (6)
prymer/api/variant_lookup.py (3)
4-14
: Add type hints to examples.Add type hints to improve code readability in documentation.
->>> lookup = VariantLookup(vcf_paths=[Path("./tests/api/data/miniref.variants.vcf.gz")], min_maf=0.00, include_missing_mafs=True) +>>> lookup: VariantLookup = VariantLookup(vcf_paths=[Path("./tests/api/data/miniref.variants.vcf.gz")], min_maf=0.00, include_missing_mafs=True)
251-254
: Add docstring to abstract close method.Document the purpose of the close method in the abstract class.
@abstractmethod def close(self) -> None: + """Close any resources held by this lookup implementation.""" pass
Line range hint
391-401
: Extract duplicate index validation logic.Move VCF index validation to a common method.
+def _validate_vcf_index(path: Path) -> None: + """Validate that the VCF file has an index.""" + if ( + not path.with_suffix(path.suffix + ".csi").is_file() + and not path.with_suffix(path.suffix + ".tbi").is_file() + ): + raise ValueError(f"Cannot perform fetch with missing index file for VCF: {path}") class _DiskBasedLookup(_VariantLookup): def __init__(self, vcf_paths: list[Path], min_maf: Optional[float], include_missing_mafs: bool): self._readers: list[VariantFile] = [] super().__init__( vcf_paths=vcf_paths, min_maf=min_maf, include_missing_mafs=include_missing_mafs ) if len(vcf_paths) == 0: raise ValueError("No VCF paths given to query.") for path in vcf_paths: - if ( - not path.with_suffix(path.suffix + ".csi").is_file() - and not path.with_suffix(path.suffix + ".tbi").is_file() - ): - raise ValueError(f"Cannot perform fetch with missing index file for VCF: {path}") + _validate_vcf_index(path)Apply the same change in
_InMemoryLookup.__init__
.Also applies to: 456-466
tests/api/test_variant_lookup.py (3)
Line range hint
496-501
: Replace _DiskBasedLookup with VariantLookup.Direct usage of internal implementation class.
- with _DiskBasedLookup( + with VariantLookup( vcf_paths=vcf_paths, min_maf=0.00, include_missing_mafs=True ) as variant_lookup:
571-574
: Update docstring to reflect new class name.Docstring still references old internal class name.
- """Test that `_InMemoryLookup.query()` MAF filtering is as expected. + """Test that `VariantLookup.query()` MAF filtering is as expected.
606-606
: Remove commented assertion.Dead code should be removed.
- # assert isinstance(file_based_vcf_query._lookup, _DiskBasedLookup)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
prymer/api/variant_lookup.py
(10 hunks)tests/api/test_variant_lookup.py
(9 hunks)
🔇 Additional comments (3)
tests/api/test_variant_lookup.py (3)
455-458
:
Use context manager to ensure proper resource cleanup.
VariantLookup
should be used with a context manager.
- variant_lookup = VariantLookup(
- vcf_paths=[vcf_path], min_maf=0.01, include_missing_mafs=False, cached=False
- )
- variant_lookup.query(refname="chr2", start=1, end=2)
+ with VariantLookup(
+ vcf_paths=[vcf_path], min_maf=0.01, include_missing_mafs=False, cached=False
+ ) as variant_lookup:
+ variant_lookup.query(refname="chr2", start=1, end=2)
Likely invalid or redundant comment.
425-427
:
Use public VariantLookup API instead of internal class.
Replace direct usage of _InMemoryLookup
with VariantLookup
.
- variant_overlap_detector = _InMemoryLookup(
+ variant_overlap_detector = VariantLookup(
vcf_paths=[vcf_path], min_maf=0.0, include_missing_mafs=True, cached=True
)
Likely invalid or redundant comment.
18-21
:
Remove imports of internal implementation classes.
Direct imports of _DiskBasedLookup
, _InMemoryLookup
expose implementation details.
-from prymer.api.variant_lookup import _DiskBasedLookup
-from prymer.api.variant_lookup import _InMemoryLookup
Likely invalid or redundant comment.
def query( | ||
self, | ||
refname: str, | ||
start: int, | ||
end: int, | ||
maf: Optional[float] = None, | ||
include_missing_mafs: bool = None, | ||
) -> list[SimpleVariant]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add chromosome validation in _InMemoryLookup.query.
Match chromosome validation behavior with _DiskBasedLookup.
def query(
self,
refname: str,
start: int,
end: int,
maf: Optional[float] = None,
include_missing_mafs: bool = None,
) -> list[SimpleVariant]:
+ if not any(refname in self._overlap_detector.intervals_by_refname):
+ _logger.debug(f"No variants found for chromosome {refname}.")
+ return []
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def query( | |
self, | |
refname: str, | |
start: int, | |
end: int, | |
maf: Optional[float] = None, | |
include_missing_mafs: bool = None, | |
) -> list[SimpleVariant]: | |
def query( | |
self, | |
refname: str, | |
start: int, | |
end: int, | |
maf: Optional[float] = None, | |
include_missing_mafs: bool = None, | |
) -> list[SimpleVariant]: | |
if not any(refname in self._overlap_detector.intervals_by_refname): | |
_logger.debug(f"No variants found for chromosome {refname}.") | |
return [] |
class VariantLookup(ContextManager): | ||
def __init__( | ||
self, | ||
vcf_paths: list[Path], | ||
min_maf: Optional[float], | ||
include_missing_mafs: bool, | ||
cached: bool = True, | ||
): | ||
self.vcf_paths: list[Path] = vcf_paths | ||
self.min_maf: Optional[float] = min_maf | ||
self.include_missing_mafs: bool = include_missing_mafs | ||
self.cached: bool = cached | ||
|
||
self._lookup: _VariantLookup | ||
|
||
if cached: | ||
self._lookup = _InMemoryLookup( | ||
vcf_paths=vcf_paths, min_maf=min_maf, include_missing_mafs=include_missing_mafs | ||
) | ||
else: | ||
self._lookup = _DiskBasedLookup( | ||
vcf_paths=vcf_paths, min_maf=min_maf, include_missing_mafs=include_missing_mafs | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add input validation for vcf_paths.
Validate vcf_paths before creating lookup implementation.
def __init__(
self,
vcf_paths: list[Path],
min_maf: Optional[float],
include_missing_mafs: bool,
cached: bool = True,
):
+ if not vcf_paths:
+ raise ValueError("No VCF paths provided")
+ if any(not path.exists() for path in vcf_paths):
+ raise ValueError("One or more VCF paths do not exist")
self.vcf_paths: list[Path] = vcf_paths
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
class VariantLookup(ContextManager): | |
def __init__( | |
self, | |
vcf_paths: list[Path], | |
min_maf: Optional[float], | |
include_missing_mafs: bool, | |
cached: bool = True, | |
): | |
self.vcf_paths: list[Path] = vcf_paths | |
self.min_maf: Optional[float] = min_maf | |
self.include_missing_mafs: bool = include_missing_mafs | |
self.cached: bool = cached | |
self._lookup: _VariantLookup | |
if cached: | |
self._lookup = _InMemoryLookup( | |
vcf_paths=vcf_paths, min_maf=min_maf, include_missing_mafs=include_missing_mafs | |
) | |
else: | |
self._lookup = _DiskBasedLookup( | |
vcf_paths=vcf_paths, min_maf=min_maf, include_missing_mafs=include_missing_mafs | |
) | |
class VariantLookup(ContextManager): | |
def __init__( | |
self, | |
vcf_paths: list[Path], | |
min_maf: Optional[float], | |
include_missing_mafs: bool, | |
cached: bool = True, | |
): | |
if not vcf_paths: | |
raise ValueError("No VCF paths provided") | |
if any(not path.exists() for path in vcf_paths): | |
raise ValueError("One or more VCF paths do not exist") | |
self.vcf_paths: list[Path] = vcf_paths | |
self.min_maf: Optional[float] = min_maf | |
self.include_missing_mafs: bool = include_missing_mafs | |
self.cached: bool = cached | |
self._lookup: _VariantLookup | |
if cached: | |
self._lookup = _InMemoryLookup( | |
vcf_paths=vcf_paths, min_maf=min_maf, include_missing_mafs=include_missing_mafs | |
) | |
else: | |
self._lookup = _DiskBasedLookup( | |
vcf_paths=vcf_paths, min_maf=min_maf, include_missing_mafs=include_missing_mafs | |
) |
closing in favor of #115 |
Closes #53.
Summary:
Primer3
with either a list of VCF paths OR a pre-constructedVariantLookup
object if they would like to cutomize the settings (e.g., MAF filtering).VariantLookup
does not need to be exposed to users -- so this PR changes theVariantLookup
API to such that, under the hood, the correctVariantLookup
object is created. Depending on the value of the boolean flagcached
, either a_InMemoryLookup
or_DiskBasedLookup
object is constructed.Primer3
invocation is changed to optionally accept a list of VCFPath
s orVariantLookup
object (with defaultcached=True
).Tests are updated accordingly. Both the public and private objects are tested directly (so I intentionally did not address some of the comments made by coderabbit).