Skip to content

Commit

Permalink
New --file flag to enable search using a file of rsIDs (#7)
Browse files Browse the repository at this point in the history
  • Loading branch information
standage authored Apr 4, 2022
1 parent 385e828 commit f058034
Show file tree
Hide file tree
Showing 6 changed files with 72 additions and 2 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/cibuild.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ jobs:
strategy:
max-parallel: 4
matrix:
python-version: [3.5, 3.6, 3.7, 3.8]
python-version: [3.6, 3.7, 3.8, 3.9]

steps:
- uses: actions/checkout@v1
Expand Down
35 changes: 35 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Change Log
All notable changes to this project will be documented in this file.
This project adheres to [Semantic Versioning](http://semver.org/).


## Unreleased

### Added
- New `--file` flag to enable search using a file of rsIDs (#7)


## [0.2] 2020-04-21

### Added
- Added support for multiple rsIDs in the VCF ID column
- Added support for rsIDs appearing in multiple records (forbidden by VCF spec but used in some popularpopulation survey data)
- Configured continuous integration (CI) with GitHub actions

### Changed
- Changed `rsidx index` so that it now fails if index already exists; provided a `--force` flag to override


## [0.1.1] 2019-05-21

Bugfix release with updated file manifest.


## [0.1] 2019-05-21

Initial release!

- Command-line entry point: `rsidx`
- Command-line operations:
- `rsidx index`: index a VCF file
- `rsidx search`: query a VCF file by rsID
4 changes: 4 additions & 0 deletions rsidx/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,10 @@ def search_subparser(subparsers):
'-o', '--out', metavar='FILE', help='write output to specified FILE; '
'default is terminal (stdout)'
)
cli.add_argument(
'--file', action='store_true', help='rsIDs are provided in a text file, one per line, '
'rather than as command line arguments'
)
cli.add_argument('vcf', help='sorted and indexed VCF file')
cli.add_argument('idx', help='rsidx index file')
cli.add_argument('rsid', nargs='+', help='rsID(s) to search')
Expand Down
14 changes: 13 additions & 1 deletion rsidx/search.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,9 +55,21 @@ def fmt(row):
yield line


def parse_rsids(rsidlist, fromfile):
if fromfile is False:
return rsidlist
rsids = list()
for filename in rsidlist:
with open(filename, 'r') as fh:
for line in fh:
rsids.extend(line.strip().split())
return rsids


def main(args):
rsidlist = parse_rsids(args.rsid, args.file)
conn = sqlite3.connect(args.idx)
with rsidx.open(args.out, 'w') as out:
for line in search(args.rsid, conn, args.vcf, header=args.header):
for line in search(rsidlist, conn, args.vcf, header=args.header):
print(line, end='', file=out)
conn.close()
3 changes: 3 additions & 0 deletions rsidx/tests/data/five-rsids.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
rs1472751972
rs1287502205 rs897983471 rs1172219431
rs189123651
16 changes: 16 additions & 0 deletions rsidx/tests/test_search.py
Original file line number Diff line number Diff line change
Expand Up @@ -145,6 +145,22 @@ def test_search_stdout(capsys):
assert len(outlines) == 5


def test_search_with_file(tmp_path):
outfile = str(tmp_path / "out.vcf")
arglist = [
'search', data_file('chr17-sample.vcf.gz'), data_file('chr17-sample.rsidx'),
'--file', data_file('five-rsids.txt'), '--out', outfile
]
args = rsidx.cli.get_parser().parse_args(arglist)
rsidx.search.main(args)
with open(outfile, 'r') as fh:
positions = list()
for line in fh:
pos = line.split()[1]
positions.append(pos)
assert positions == ['132359', '1313935', '1458046', '1521873', '1895904']


@pytest.mark.parametrize('doheader,numlines', [
(False, 1),
(True, 57 + 1),
Expand Down

0 comments on commit f058034

Please sign in to comment.