Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Formalize "life-cycle"/movement and placement of metadata #85

Open
yarikoptic opened this issue Oct 25, 2024 · 0 comments
Open

Formalize "life-cycle"/movement and placement of metadata #85

yarikoptic opened this issue Oct 25, 2024 · 0 comments
Labels
consistency Aspect requiring special treatment/logic outside of generic common principles metadata Changes to metadata fields/files.

Comments

@yarikoptic
Copy link
Contributor

A brief discussion came up on this topic in the recent BIDS 2.0 WG meeting.

ATM we have two major principled formats and locations for metadata:

  • sidecar .json files -- metadata applicable to a specific data file
    • due to inheritance principle, a single .json file (e.g. at higher level in hierarchy) already can provide metadata for many data files, groupping on the entities or suffix used in the filename (e.g. task-rest_bold.json)
  • .tsv file(s), of which I see a few major types
    • {entity_plural}.tsv (e.g. participants.tsv, sessions.tsv, etc) - metadata which is typically not placed into individual sidecar files and groups metadata based on a specific single entity per {entity_id} value (so similar to aforementioned {entity}-{value}*.json where it is not groupped)
    • scans.tsv - summarization of metadata about individual data files at the higher level in hierarchy (TODO: link issues on need to rename, since no longer a good name which initially MRI specific)
    • {nonentity_plural}.tsv (e.g. channels.tsv, etc) -- metadata on some groupping level present within data files, and thus without (yet) an explicit {entity} defined

There is of cause also notion of an entity itself which some times (e.g. sub, ses etc) contains the actual metadata "value" which could also be present in a .tsv or .json file(s). But for those we are in agreement that "use of the entity values for metadata storage is discouraged and they are used more for indexing and identification" (TODO: replace with quote and ref)

In particular, both "sidecar .json" and {entity_plural}.tsv (and scans.tsv) are the places for metadata in groupped or not "fashion".

.json and .tsv formalizations have some similarities

  • inheritance principle
  • for BIDS prescribed metadata fields/columns we define names and types in the schema allowing for validation
  • TODO: more?

but also different "features", (TODO: make into a table?) e.g.

  • .tsv have formalization to describe their columns and validator complains whenever undescribed column is included
  • We use CamelCaseing for fields in .json but snake_case for columns in .tsv
  • TODO: more?

Ideally, for consistency, and also various needs (e.g. in BEP036) where metadata clearly could be defined in two forms ("summarized" in .tsv, hence also see "inheritance->summarization" issue ) and thus for overall "standard forming common principles (#66) it would be great if

  • naming of metadata fields in .tsv and .json was harmonized (e.g. all snake_case, with some metadata field describing all or non-standard fields etc)
  • "semantic" unified - metadata field blah in .json would be the same meanin/type/etc as column blah in .tsv.
  • specification unified - treat/describe non-spec fields uniformly across .tsv and .json (e.g. x- prefix for all non-standard not only sidecar fields but columns as well)

and provide recommendations on where/when to place metadata

  • Replace "inheritance" with "summarization" principle #65 is highly relevant since would just help to reduce cognitive load - seeing value for metadata field at any level immediately tells you the value without needing to resort to tools to establish the value by traversing full hierarchy
@yarikoptic yarikoptic added consistency Aspect requiring special treatment/logic outside of generic common principles metadata Changes to metadata fields/files. labels Oct 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
consistency Aspect requiring special treatment/logic outside of generic common principles metadata Changes to metadata fields/files.
Projects
Status: No status
Development

No branches or pull requests

1 participant