Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Support stdin/stdout special cases in sam.reader/writer #69

Merged
merged 2 commits into from
Jun 6, 2024
Merged

Conversation

msto
Copy link
Contributor

@msto msto commented Oct 25, 2023

Addressing #67

@nh13 @tfenne would you entertain something like this?

Two open questions

  • whether writing to stdout should default to SAM or BAM
  • I'm not sure how best to test these. In the past when unit testing against sys.stdin I've used pytest.MonkeyPatch and a StringIO, but pysam doesn't support reading from StringIO

Copy link
Member

@nh13 nh13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I think we should remove it default to SAM when writing to stdout and keep the requirement that file_type must be given when writing (the Args needed to be updated otherwise).
  2. See here for how pysam tests stdin/stdout:
    https://github.com/pysam-developers/pysam/blob/cdc0ed12fbe2d7633b8fa47534ab2c2547f66b84/tests/StreamFiledescriptors_test.py#L47-L65

fgpyo/sam/__init__.py Outdated Show resolved Hide resolved
fgpyo/sam/__init__.py Outdated Show resolved Hide resolved
@msto
Copy link
Contributor Author

msto commented Oct 25, 2023

I would argue that it is helpful to set a default file_type for stdout.

Essentially, I want to be able to do the following, and have it work sensibly whether in_bam and out_bam are file paths or references to stdin/stdout:

def my_tool(
    *,
    in_bam: Path,
    out_bam: Path,
):
    """
    Run my tool.
    """

    with (
        reader(in_bam) as bam_reader,
        writer(out_bam, header=bam_reader.header) as bam_writer
    ):
        pass

If writer doesn't have a default file_type when writing to stdout, the user has to manually check whether the output path is stdout. Avoiding that was the motivation for this PR

@nh13
Copy link
Member

nh13 commented Oct 25, 2023

I think it's a footgun to have it default to a SAM file type if writing to standard output in library code. Let's go without it defaulting to SAM please.

@tfenne
Copy link
Member

tfenne commented Oct 25, 2023

I agree with Nils about not defaulting to SAM. If anything I would say it should default to uncompressed BAM as that's what I'd want when piping between tools ... and if a user wrote directly to stdout you'd quickly see it. But this points out why it should probably not have a default - because I'd argue for a different one.

I think the PR also needs to handle what if a user passes e.g. Path("/dev/stdin")

@msto
Copy link
Contributor Author

msto commented Oct 25, 2023

Will circle back to add tests, but made the requested changes.

Follow-up question: When opening from a filepath, should we require that the file type be inferred from the file extension, rather than giving the user the opportunity to specify a file_type that conflicts with the file extension?

file_type = file_type or SamFileType.from_path(path)

@nh13
Copy link
Member

nh13 commented Oct 25, 2023

I think the follow-up question deserves a separate issue. See comment in the code pysam-developers/pysam#655 and a similar issue our other rust repo: fulcrumgenomics/fgoxide#9.

@msto msto marked this pull request as ready for review June 6, 2024 13:59
@msto msto requested a review from tfenne as a code owner June 6, 2024 13:59
@msto msto marked this pull request as draft June 6, 2024 14:00
@msto msto marked this pull request as ready for review June 6, 2024 14:01
Copy link
Member

@nh13 nh13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use standard output versus stdout in the docstrings etc.

fgpyo/sam/__init__.py Outdated Show resolved Hide resolved
fgpyo/sam/__init__.py Outdated Show resolved Hide resolved
fgpyo/sam/__init__.py Show resolved Hide resolved
fgpyo/sam/__init__.py Outdated Show resolved Hide resolved
fgpyo/sam/__init__.py Outdated Show resolved Hide resolved
fgpyo/sam/__init__.py Outdated Show resolved Hide resolved
fgpyo/sam/__init__.py Outdated Show resolved Hide resolved
chore: type hints

fix: check for paths and remove default file_type for stdout

doc: add docs

Update fgpyo/sam/__init__.py

Co-authored-by: Nils Homer <[email protected]>

Update fgpyo/sam/__init__.py

Co-authored-by: Nils Homer <[email protected]>

Update fgpyo/sam/__init__.py

Co-authored-by: Nils Homer <[email protected]>

Update fgpyo/sam/__init__.py

Co-authored-by: Nils Homer <[email protected]>

Update fgpyo/sam/__init__.py

Co-authored-by: Nils Homer <[email protected]>

Update fgpyo/sam/__init__.py

Co-authored-by: Nils Homer <[email protected]>

chore: line wrap docstrings
@msto msto merged commit 520e508 into main Jun 6, 2024
6 checks passed
@msto msto deleted the MS_stdin branch June 6, 2024 14:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants