This repository was archived by the owner on Jun 2, 2025. It is now read-only.
This repository was archived by the owner on Jun 2, 2025. It is now read-only.
Add Visualization Options #231
Closed as not planned
Description
Detailed Description
We want to be able to easily see what our batches look like and have utilities that plot them to help with debugging and ensuring that our pipelines are doing what we expect.
We have had multiple one-off visualization scripts before, but the goal of this is to build them into datapipes, and ideally keep them up to date, and possibly run them on PRs to give a quick, automatic view if any of the datapipes are changed or updated.
I think the steps would be
- Make
visualization
module in datapipes - Add visualizing a whole example of all modalities as an image
- Add visualizing examples as little videos (to see the timeseries in the videos)
- Add option to save out batches in more interpretable format (i.e. NetCDF or something that keeps coordinates and the like, vs PyTorch tensors)
Possible Implementation
Satip used to have a step in the workflows that ran visualization code of the outputs of some processing steps on PRs, it was quite helpful to know if changes broke end-to-end processing pipelines, and for the images coming out still looked correct.
Notes
Goal:
- to show what is in the batches right before the model runs
- To show in training what is going in at any timestep
- User can step through periods
- Time and space is aligned
Users: - ML team only
- Prototype examples
- NWP data wasn’t aligned with GSP data - James found this when plotting these out
- Early on, Jacob found the satellite data was 500 km off
Effort to build: - Make it so people don’t need to rebuild anything from scratch
- Build something a bit less ad-hoc than before
Effort to run: - Hopefully takes someone <1 min to run this from Datapipes
- It would be useful for training & production use cases