Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset.__repr__ upgrade #431

Merged
merged 23 commits into from
Sep 4, 2024
Merged

Dataset.__repr__ upgrade #431

merged 23 commits into from
Sep 4, 2024

Conversation

SolarDrew
Copy link
Contributor

Some updates to make the dataset representation a bit more user friendly and informative. Includes changes to TiledDataset.__repr__ from #402 so that should probably be merged first.

Output for the sample VISP and VBI datasets:

This VISP Dataset has 4 pixel and 5 world dimensions and consists of 1700 frames
Files are stored in {ds.files.basepath}

The data are represented by a <class 'dask.array.core.Array'> object:
dask.array<reshape, shape=(4, 425, 980, 2554), dtype=float64, chunksize=(1, 1, 980, 2554), chunktype=numpy.ndarray>

Array Dim  Axis Name                Data size  Bounds
        0  polarization state               4  None
        1  raster scan step number        425  None
        2  dispersion axis                980  None
        3  spatial along slit            2554  None

World Dim  Axis Name                  Physical Type                   Units
        4  stokes                     phys.polarization.stokes        unknown
        3  time                       time                            s
        2  helioprojective latitude   custom:pos.helioprojective.lat  arcsec
        1  wavelength                 em.wl                           nm
        0  helioprojective longitude  custom:pos.helioprojective.lon  arcsec

Correlation between array and world axes:

                          |                      PIXEL DIMENSIONS
                          |   spatial    |  dispersion  | raster scan  | polarization
         WORLD DIMENSIONS |  along slit  |     axis     | step number  |    state
------------------------- | ------------ | ------------ | ------------ | ------------
helioprojective longitude |      x       |              |      x       |
               wavelength |              |      x       |              |
 helioprojective latitude |      x       |              |      x       |
                     time |              |              |      x       |
                   stokes |              |              |              |      x

-----

This TiledDataset consists of an array of (3, 3) Dataset objects

Each VBI Dataset has 3 pixel and 3 world dimensions and consists of 27 frames
Files are stored in {ds.files.basepath}

The data are represented by a <class 'dask.array.core.Array'> object:
dask.array<reshape, shape=(3, 4096, 4096), dtype=float32, chunksize=(1, 4096, 4096), chunktype=numpy.ndarray>

Array Dim  Axis Name                  Data size  Bounds
        0  time                               3  None
        1  helioprojective latitude        4096  None
        2  helioprojective longitude       4096  None

World Dim  Axis Name                  Physical Type                   Units
        2  time                       time                            s
        1  helioprojective latitude   custom:pos.helioprojective.lat  arcsec
        0  helioprojective longitude  custom:pos.helioprojective.lon  arcsec

Correlation between array and world axes:

                          |                   PIXEL DIMENSIONS
                          | helioprojective | helioprojective |       time
         WORLD DIMENSIONS |    longitude    |     latitude    |
------------------------- | --------------- | --------------- | ---------------
helioprojective longitude |        x        |        x        |        x
 helioprojective latitude |        x        |        x        |        x
                     time |                 |                 |        x

Copy link

codspeed-hq bot commented Aug 23, 2024

CodSpeed Performance Report

Merging #431 will not alter performance

Comparing SolarDrew:repr_upgrade (cd0efaa) with main (78b960c)

Summary

✅ 9 untouched benchmarks

@SolarDrew
Copy link
Contributor Author

SolarDrew commented Aug 23, 2024

Update:

This VISP Dataset BKPLX has 4 pixel and 5 world dimensions and consists of 1700 frames
Files are stored in /home/drew/.local/share/dkist/VISP_BKPLX

The data are represented by a <class 'dask.array.core.Array'> object:
dask.array<reshape, shape=(4, 425, 980, 2554), dtype=float64, chunksize=(1, 1, 980, 2554), chunktype=numpy.ndarray>

Array Dim  Axis Name                Data size  Bounds
        0  polarization state               4  None
        1  raster scan step number        425  None
        2  dispersion axis                980  None
        3  spatial along slit            2554  None

World Dim  Axis Name                  Physical Type                   Units
        4  stokes                     phys.polarization.stokes        unknown
        3  time                       time                            s
        2  helioprojective latitude   custom:pos.helioprojective.lat  arcsec
        1  wavelength                 em.wl                           nm
        0  helioprojective longitude  custom:pos.helioprojective.lon  arcsec

Correlation between pixel and world axes:

                          |                      PIXEL DIMENSIONS
                          |   spatial    |  dispersion  | raster scan  | polarization
         WORLD DIMENSIONS |  along slit  |     axis     | step number  |    state
------------------------- | ------------ | ------------ | ------------ | ------------
helioprojective longitude |      x       |              |      x       |
               wavelength |              |      x       |              |
 helioprojective latitude |      x       |              |      x       |
                     time |              |              |      x       |
                   stokes |              |              |              |      x

-----

This VBI TiledDataset AJQWW consists of an array of (3, 3) Dataset objects

Each Dataset has 3 pixel and 3 world dimensions and consists of 3 frames
Files are stored in /home/drew/.local/share/dkist/VBI_AJQWW

The data are represented by a <class 'dask.array.core.Array'> object:
dask.array<reshape, shape=(3, 4096, 4096), dtype=float32, chunksize=(1, 4096, 4096), chunktype=numpy.ndarray>

Array Dim  Axis Name                  Data size  Bounds
        0  time                               3  None
        1  helioprojective latitude        4096  None
        2  helioprojective longitude       4096  None

World Dim  Axis Name                  Physical Type                   Units
        2  time                       time                            s
        1  helioprojective latitude   custom:pos.helioprojective.lat  arcsec
        0  helioprojective longitude  custom:pos.helioprojective.lon  arcsec

Correlation between pixel and world axes:

                          |                   PIXEL DIMENSIONS
                          | helioprojective | helioprojective |       time
         WORLD DIMENSIONS |    longitude    |     latitude    |
------------------------- | --------------- | --------------- | ---------------
helioprojective longitude |        x        |        x        |        x
 helioprojective latitude |        x        |        x        |        x
                     time |                 |                 |        x

Copy link
Member

@Cadair Cadair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I think we should add some tests. I don't want to get too prescriptive with the tests, but checking that things like the instrument name, dataset id, etc are in the generated repr would be good.

dkist/dataset/utils.py Outdated Show resolved Hide resolved
dkist/dataset/utils.py Outdated Show resolved Hide resolved
dkist/dataset/loader.py Outdated Show resolved Hide resolved
@Cadair
Copy link
Member

Cadair commented Sep 2, 2024

I would just skip the test in 1.0rst we don't need to pull up old release notes to current standard. https://github.com/DKISTDC/dkist/actions/runs/10665166513/job/29557982384?pr=431#step:10:982

dkist/dataset/loader.py Outdated Show resolved Hide resolved
@SolarDrew
Copy link
Contributor Author

Any last objections before we merge this?

Copy link
Member

@Cadair Cadair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@Cadair Cadair merged commit 1f94849 into DKISTDC:main Sep 4, 2024
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants