Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add comment about setting a default prefix that isn't just meta.id #2608

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

SPPearce
Copy link
Contributor

@SPPearce SPPearce commented Jul 2, 2024

@netlify /docs/guidelines/components/modules

Copy link

netlify bot commented Jul 2, 2024

Deploy Preview for nf-core-main-site ready!

Name Link
🔨 Latest commit db059e4
🔍 Latest deploy log https://app.netlify.com/sites/nf-core-main-site/deploys/67a074cc8226c8000851e454
😎 Deploy Preview https://deploy-preview-2608--nf-core-main-site.netlify.app/docs/guidelines/components/modules
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@mashehu
Copy link
Contributor

mashehu commented Jul 3, 2024

failing playwright tests can be ignored for now

@SPPearce SPPearce requested a review from jfy133 July 4, 2024 15:25

```nextflow
script:
if ("$bam" == "${prefix}.bam") error "Input and output names are the same, set prefix in module configuration to disambiguate!"
```

- If the input and output files are likely to have the same name, then an appropriate default prefix may be set, for example:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- If the input and output files are likely to have the same name, then an appropriate default prefix may be set, for example:
- If the input and output files are likely to have the same name, then an appropriate default prefix MAY be set, for example:

I feel this should be left for the developer to decide how the resulting file should be called. The error should make them aware of this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting the default prefix does give the control to the developer, as they can overwrite it in the modules.config. The problem is when it's hard-coded into the output path.
I also added the -C bash flag to the shell directive in the template so that should also prevent accidental clobbering.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is when it's hard-coded into the output path.

I don't think I follow... isn't the suggestion here to technically embed a hardcoded string?

Copy link
Member

@mahesh-panchal mahesh-panchal Jul 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the example clearly shows prefix having a default of "${meta.id}_sorted". If I wanted something different, I just update in the config

    ext.prefix = { "${meta.id}_mysorted" } 

The command should still look like:

mycommand --input $file > ${prefix}.out

but the file goes from <meta.id>_sorted.out to <meta.id>_mysorted.out

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is setting a default, but can be entirely overwritten in the usual way

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disagree on both points (1. it should, pipeline devs should be very aware of the output files, 2. I don't think we should sacrifice flexibility just to avoid a small config file).

Bbut like I said - 'small mound' 😆 Maybe ask for one more opinion and you can merge

I mean, we could say that pipeline devs should make each module from scratch because that would be purer. I don't understand your objection here. This sacrifices no flexibility whatsoever, it defines a default prefix that can then be overwritten in exactly the usual way.

Copy link
Member

@mahesh-panchal mahesh-panchal Jul 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't understand the objections. Is there any way you can clarify in a toy example of what the objection is?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I follow James here. IMO it's better to have a common way of doing things in modules so you don't have to check how it's done in every module. If I know every module has the meta.id as default, then I don't have to check the default every time. Adding different defaults could become confusing for some developers. But also not a hill I'm willing to die on :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My experience with using nf-core modules, when written by someone else is usually that I'll have to check the code anyway to know what the output channels are named at the least, and what inputs I'm expecting. If I then see the

if ( file.name == "${prefix}.ext" ){

I'll know there's a possibility for a filename collision. However, this is where I would expect the module to have a sensible default not to have filename collision if I just plug and play. More often than not though, the current state is that once I use it, I'll discover the default is not actually sensible and have to modify my modules.config to deal with it. This to me is a time waster. If we did use "sensible defaults", rather than "meta.id", then I'll see when I inspect the module to change prefix if I wanted it differently, but I'd rather not assume that ${prefix}.ext will result in a filename collision if I don't set my own prefix anyway. That's at least my experience, which is why I'm for this update.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should move this discussion to the next maintainers meeting? This seems like a topic that could fit very well in the meetings

@mashehu
Copy link
Contributor

mashehu commented Jul 31, 2024

What is the state of this docs update, @SPPearce ?

@SPPearce
Copy link
Contributor Author

What is the state of this docs update, @SPPearce ?

@jfy133 and @nvnieuwk are still against it as far as I know, but I still don't understand why.
The suggestion was to discuss at a maintainers meeting but I was on leave for the one last week.

@mashehu
Copy link
Contributor

mashehu commented Jul 31, 2024

What is the state of this docs update, @SPPearce ?

@jfy133 and @nvnieuwk are still against it as far as I know, but I still don't understand why. The suggestion was to discuss at a maintainers meeting but I was on leave for the one last week.

ah, yes, remember now. let's try to bring it up in the next meeting then.

@nvnieuwk
Copy link
Contributor

Sorry for being difficult 😁 let's talk about it in the next meeting indeed :)

Copy link

netlify bot commented Jan 31, 2025

Deploy Preview for nf-core-docs ready!

Name Link
🔨 Latest commit db059e4
🔍 Latest deploy log https://app.netlify.com/sites/nf-core-docs/deploys/67a074ccf106910008a7bcd3
😎 Deploy Preview https://deploy-preview-2608--nf-core-docs.netlify.app/docs/guidelines/components/modules
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@jfy133
Copy link
Member

jfy133 commented Feb 3, 2025

#small-mound-i-feel-strongly-about-but-not-enough-to-fight-everyone

@mahesh-panchal
Copy link
Member

The issue is more we still don't understand the objection. Since you're experienced, at least I feel like, there is something we're not understanding about your viewpoint.

We've established though that the developer still has control.

@jfy133
Copy link
Member

jfy133 commented Feb 3, 2025

The issue is more we still don't understand the objection. Since you're experienced, at least I feel like, there is something we're not understanding about your viewpoint.

We've established though that the developer still has control.

I can't remember the depths of the discussion anymore, but mostly it's a development style.

It firstly irks me because I firstly hate pipelines that tag loads of extra suffixes at the end (_sorted_indexed_sorted_indexed), however shouldn't happen here because of course it's just meta.id + prefix/suffix every time.

But the main reason is: I strongly feel that developers should think very carefully about how output is presented to users. Automating output names with defaults to making the development experience 'easier' at the expense user experience is not a good practice in my personal opinion. Sure, it can be overridden by an organised pipeline developer, however most people are too busy and will take any opportunity to skip steps if they can. By allowing default prefixes it'll make people not think about this, and thus not think carefully about what goes in the results directory etc and just result in big mess and harder and less attractive to use the output. This of course would be mitigated with good documentation, but that still remains poor across all of bioinformatics (/rant).

I personally prefer having a hard error when having a name conflict, as it forces the developer to think carefully about the name, then should it even be presented to the user, and then logically where should it go in the output directory etc etc. Rather than 'hoping' they'll do some TLC.

But I do recognise that this is a personal opinion/development style about a relatively minor point so I won't block it for that reason when I've been outvoted ;).

@mahesh-panchal
Copy link
Member

It firstly irks me because I firstly hate pipelines that tag loads of extra suffixes at the end (_sorted_indexed_sorted_indexed), however shouldn't happen here because of course it's just meta.id + prefix/suffix every time.

Agreed. Perhaps this should be a pipeline linting check (warning) or an nf-test check. Split filenames on underscores/periods/non-alpha-numeric characters and check the number of unique parts against total (duplicate words) and that the total is not more than say 7 (ultra-long names).

But the main reason is: I strongly feel that developers should think very carefully about how output is presented to users. Automating output names with defaults to making the development experience 'easier' at the expense user experience is not a good practice in my personal opinion. Sure, it can be overridden by an organised pipeline developer, however most people are too busy and will take any opportunity to skip steps if they can. By allowing default prefixes it'll make people not think about this, and thus not think carefully about what goes in the results directory etc and just result in big mess and harder and less attractive to use the output. This of course would be mitigated with good documentation, but that still remains poor across all of bioinformatics (/rant).

I think we all agree on the first sentence. I disagree that this change is at expense of user experience though. While it makes developer experience better by not automatically resulting in a filename collision, the user should see at most one _something tacked on ( maybe we need to be explicit about how the default prefix can differ from meta.id, for example the default prefix cannot be based on file.baseName ). So then even in a chain of these modules with defaults changed from meta.id because they might commonly result in collision, there should be at most one _something.

Conversely though, are we currently making the user experience better by how we're setting the meta.id? For many developers it's just a key to make something unique for joining. A busy developer is just as likely to skip proper value setting of meta.id too.

I personally prefer having a hard error when having a name conflict, as it forces the developer to think carefully about the name, then should it even be presented to the user, and then logically where should it go in the output directory etc etc. Rather than 'hoping' they'll do some TLC.

I think this is the key part though. Putting in developer roadblocks means the developer has to directly act. I don't think it'll force them to put TLC into it though. Workflow design ( including naming outputs, etc) is still a skill imo.

Sorry, I guess this wall of text wasn't necessary, but maybe there is one thing we still need to do, and that's define how much the prefix can differ from meta.id.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants