implement zarr-based caching for major classes #28

aalok-sathe · 2022-04-05T21:33:51Z

we need reliable state-caching for most classes to persist results to the disk, for later analysis and reuse in pipelines.
if cached results exist, they may be reused based on a flag (e.g. overwrite_cache=False)

The text was updated successfully, but these errors were encountered:

…to start with

aalok-sathe · 2022-04-05T21:52:40Z

proposal: make the __repr__ method of each Cacheable class uniquely identify that instance.
E.g., the repr(BrainScore()) should contain information about Mapping, Metric, and the encoders (all this can come from respective calls to the repr methods of these objects)

below list is in the form:

Object to repr()
- entity it depends on
BrainScore
- Mapping
- Metric
- Encoder1 outputs
- Encoder2 outputs (should we create a class EncoderOutput, for more logical dependency in cache handling?) @lipkinb @gretatuckute
Mapping
- str algorithm
- hparams? tbd
Metric
- str algorithm
EncoderOutput (?)
- Encoder
- Dataset
HFEncoder
- str algorithm (pretrained_model_name_or_path)
- str aggregation choices
- ~~Dataset~~
BrainEncoder
- ~~Dataset~~
Dataset
- str path to the data

aalok-sathe · 2022-05-20T20:50:52Z

zarr is unable to cache xarrays with dtype object in them. Somehow we're getting dtype object bleed in from somewhere. Once that is corrected to string, this issue disappears.
This issue is referenced here: pydata/xarray#3476
It is partially sovled by commits in #34

aalok-sathe referenced this issue Apr 5, 2022

add a basic caching implementation in general and for Dataset object …

032f884

…to start with

aalok-sathe added enhancement New feature or request lbs:encoders related to the encoder part of the library lbs:mapping related to the mapping part of the library lbs:data related to the part of the library handling datasets labels Apr 5, 2022

aalok-sathe mentioned this issue May 20, 2022

MWE of caching EncoderRepresentations in encode() #34

Merged

aalok-sathe self-assigned this May 23, 2022

aalok-sathe added this to the Reliably compute a brainscore for any model, for Pereira dataset, with proper caching and result saving milestone May 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implement zarr-based caching for major classes #28

implement zarr-based caching for major classes #28

aalok-sathe commented Apr 5, 2022

aalok-sathe commented Apr 5, 2022 •

edited

Loading

aalok-sathe commented May 20, 2022

implement zarr-based caching for major classes #28

implement zarr-based caching for major classes #28

Comments

aalok-sathe commented Apr 5, 2022

aalok-sathe commented Apr 5, 2022 • edited Loading

aalok-sathe commented May 20, 2022

aalok-sathe commented Apr 5, 2022 •

edited

Loading