Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vcf support #308

Merged
merged 34 commits into from
Oct 25, 2024
Merged

Vcf support #308

merged 34 commits into from
Oct 25, 2024

Conversation

stschiff
Copy link
Member

@stschiff stschiff commented Sep 6, 2024

Draft PR for VCF reading support

Copy link

codecov bot commented Sep 6, 2024

Codecov Report

Attention: Patch coverage is 67.56757% with 108 lines in your changes missing coverage. Please review.

Project coverage is 59.88%. Comparing base (683d3ce) to head (d10a59f).
Report is 35 commits behind head on master.

Files with missing lines Patch % Lines
src/Poseidon/CLI/OptparseApplicativeParsers.hs 15.21% 36 Missing and 3 partials ⚠️
src/Poseidon/GenotypeData.hs 79.66% 18 Missing and 6 partials ⚠️
src/Poseidon/CLI/Genoconvert.hs 56.00% 9 Missing and 2 partials ⚠️
src/Poseidon/CLI/Rectify.hs 52.94% 7 Missing and 1 partial ⚠️
src/Poseidon/CLI/Serve.hs 61.11% 6 Missing and 1 partial ⚠️
src/Poseidon/Package.hs 86.79% 0 Missing and 7 partials ⚠️
src/Poseidon/CLI/Forge.hs 78.57% 5 Missing and 1 partial ⚠️
src/Poseidon/CLI/Validate.hs 0.00% 4 Missing and 1 partial ⚠️
src/Poseidon/CLI/Survey.hs 80.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #308      +/-   ##
==========================================
+ Coverage   59.56%   59.88%   +0.31%     
==========================================
  Files          29       29              
  Lines        4078     4170      +92     
  Branches      484      481       -3     
==========================================
+ Hits         2429     2497      +68     
- Misses       1165     1192      +27     
+ Partials      484      481       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Base automatically changed from gzip-support-reading to master September 9, 2024 10:21
@stschiff stschiff marked this pull request as ready for review October 23, 2024 15:59
@stschiff
Copy link
Member Author

OK, this was begun a while ago, and only tests were missing. I've added one now.

Note that as with GZIP, this PR only implements reading of VCF files, not yet of writing. But I think this would already have a number of use cases that are important, so I would vote for getting this into a release even before we support writing.

There are also a number of decisions I have taken that need to stand the test of time. For example, since VCF does not contain information on group names or Sex, I had to tweak the Janno-Ind consistency check function. Also, VCF reading is somewhat difficult to implement in its full glory of the VCF spec. The Spec is very flexible and complex, and of course my sequence-formats library is far from fully implementing a parser for the entire format. It's really somewhat of a minimal parsing that I hope is robust enough to support most use-cases.

I would like to advertise this feature as "experimental" for now and first have it tested for a while before we move on and review some of the decisions I've made.

I hope it won't be too much work to review this PR. Happy to give you a run-through in person, @nevrome, if that helps.

Copy link
Member

@nevrome nevrome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks solid. But there are a lot changes in the CLI subcommands that require good testing. I hope the golden tests cover everything important...

Maybe it's worth to write some more golden tests explicitly for the .vcf input.

I also think the POSEIDON_* test data hack is really reaching its limits here. We should have proper test packages for these increasingly different setups!

src/Poseidon/CLI/Genoconvert.hs Outdated Show resolved Hide resolved
parseFileWithEndings :: String -> [String] -> OP.Parser FilePath
parseFileWithEndings long endings = OP.option (OP.maybeReader fileEndingReader) (
OP.long long <>
OP.help ("Accepted file endings are " ++ intercalate ", " endings) <>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This "universal" parser function is a good idea. Scales really well. Maybe the "Accepted file endings are ..." help text is a bit too minimalist.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. I've now added a bit more specific help texts. Take a look.

@stschiff
Copy link
Member Author

Thanks for the review. Golden tests are a good idea, and I will also refactor those test packages.

@stschiff
Copy link
Member Author

OK, I've restructured the test packages and added Golden tests. I've also updated the version number, Changelog and Releasechangelog. Have a look.

@nevrome
Copy link
Member

nevrome commented Oct 25, 2024

Perfect! I also ran some little experiments on the command line and could find no issues. So I'll merge this now and trigger the release.

@nevrome nevrome merged commit 7f744ad into master Oct 25, 2024
4 checks passed
@nevrome nevrome deleted the vcf_support branch October 25, 2024 17:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants