-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add a rule to ensure path directives are S3 URIs #44
base: main
Are you sure you want to change the base?
Conversation
- Add new rule to healthomics package - Add new rule to resources xml: `healthomics-nextflow-rules/src/main/resources/rulesets/healthomics.xml` - Add test cases
...-rules/src/test/groovy/software/amazon/nextflow/rules/healthomics/InputS3FileRuleTest.groovy
Outdated
Show resolved
Hide resolved
...-rules/src/test/groovy/software/amazon/nextflow/rules/healthomics/InputS3FileRuleTest.groovy
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a great start but you might want to start with the use cases that are simplest to check and build up to the more complicated cases. Unfortunately Nextflow has many valid ways to declare a path
and only some are simple to check if they are S3 or local paths.
path par.infile | ||
} | ||
''' | ||
assertSingleViolation(SOURCE, 4, "path par.infile", "The file path must be either a string or params.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not be a violation. A path can be a variable and not just one from the params map. It could also be an expression https://www.nextflow.io/docs/latest/process.html#input-files-path
final SOURCE = ''' | ||
process bar { | ||
input: | ||
path 's3://my-input-bucket/data/input1.txt', 's3://my-input-bucket/data/input2.txt' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is correct but there are ways in which Path can be a list of files. https://www.nextflow.io/docs/latest/process.html#multiple-input-files
Yeah @markjschreiber , I agree that building up from some simple cases will be a good approach. In thinking about this, I believe that the simplest cases are
And then from there I am a little unclear how best to proceed. Because as you pointed out, we could also have substitutions like:
But my understanding is that codenarc is a static analysis tool, thus we cannot evaluate the parameter values or variables during runtime of linter. Given this, what do you suggest for handling these cases? And regarding you description in #44:
|
That is correct. In some cases you can infer simple non-static things like string concatenation but this means you have to keep lookup tables and have some processing so not simple. I think it's fine to keep your rule focussed on the simple cases, you just don't want it to trigger on something like
No, those are things that we also want to exclude and would also be good test cases for triggering the rule. |
Fixes #43
Description of Changes
Added a validation rule to the HealthOmics package to enforce
s3://
paths for all input and output file paths specified via thepath
directive.Description of how you validated changes
Validation was performed by adding two test cases: one with valid
s3://
paths, which passed, and another with a local file path, which contains one violation as expected, demonstrating the rule's enforcement.Checklist
healthomics-nextflow-rules/src/main/resources/rulesets/healthomics.xml
./gradlew clean build
)By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license