Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider application of filename_patterns to readers that accept directories #155

Open
DragaDoncila opened this issue Apr 15, 2022 · 0 comments

Comments

@DragaDoncila
Copy link
Contributor

Description

Currently, readers with accepts_directories: true get passed all directories, regardless of whether or not they also declare any filename_patterns.

Based on this discussion in zulip, there is a user desire to have these filename_patterns be applied to directories as well.

I remember in community discussions we semi-decided that if you want filename patterns to be applied, and preferences available, you should provide a custom extension to your directory. Currently, we don't apply filename_patterns at all, so this behavior is not supported (though we can separately support preference assignment in napari).

The remaining questions are how these filename_patterns should be applied. It could be reasonably interpreted that they are applied to the directory name itself, or to the directory's contents. Indeed we may wish to support both use cases, so we need to make sure the interpretation is unambiguous.

Proposed Spec Implementations

  1. Add an explicit directory_patterns field to reader_contributions. Readers can provide both directory_patterns and filename_patterns. Directory names must match directory_patterns and directory contents must match filename_patterns for a path to be compatible with this reader. For example:
# Only individual tiff files are passed to this reader
  readers:
  - command: tiff_reader.read_tiffs
  - title: Single tiff reader
    filename_patterns:
    - '*.tiff' 
    accepts_directories: false 

or

# All directories with one or more tiffs inside them are passed to this reader
  readers:
  - command: tiff_reader.read_tiffs
  - title: Directory of tiff reader
    filename_patterns:
    - '*.tiff'
    accepts_directories: true

or

# Directories matching directory_patterns with one or more tiffs inside them are passed to this reader
  readers:
  - command: tiff_reader.read_tiffs
  - title: Super specific tiff reader
    filename_patterns:
    - '*.tiff'
    accepts_directories: true
    directory_patterns: ['*/my-special-folder']
  1. Use the "glob style" filename_patterns to specifically match each case and document examples e.g.
# Only individual tiff files are passed to this reader
  readers:
  - command: tiff_reader.read_tiffs
  - title: Single tiff reader
    filename_patterns:
    - '*.tiff' 
    accepts_directories: false 

or

# All directories with one or more tiffs inside them are passed to this reader
  readers:
  - command: tiff_reader.read_tiffs
  - title: Directory of tiff reader
    filename_patterns:
    - '*/*.tiff' # not too sure about this one
    accepts_directories: true

or

# Directories named my-special-folder with one or more tiffs inside them are passed to this reader
  readers:
  - command: tiff_reader.read_tiffs
  - title: Super specific tiff reader
    filename_patterns:
    - '*/my-special-folder/*.tiff'
    accepts_directories: true

Other Considerations

  • We should document recommended approaches for various common use cases e.g. (I don't know if this is what we recommend but we should come to a consensus):
    • if you have files scattered over a number of directories we recommend using a metadata file pointing to required associated information and building a reader for this metadata file
    • if you have a directory of associated files that should be read in a certain way you should consider adding a custom extension to your directory
    • others...?
  • Are directories with extensions a special case? In my opinion they should be treated as not directories at all but rather a file-like path. We should definitely support filtering compatible directory readers with extensions
  • Do we want to support any further incarnation of specifying the necessary files or directories that need to be present for a reader to be able to claim a file e.g. OME-zarr requires the presence of well-formatted .zarray and .zgroup files inside a directory named with a .zarr extension - should all of this be able to specified in the manifest? My personal opinion is no - this is the job of the get_reader function.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant