Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ceda-icompress python implementations #232

Open
milankl opened this issue Sep 13, 2023 · 6 comments
Open

ceda-icompress python implementations #232

milankl opened this issue Sep 13, 2023 · 6 comments

Comments

@milankl
Copy link
Collaborator

milankl commented Sep 13, 2023

@nmassey001 just reached out to point towards ceda-icompress another python implementation of the bitinformation algorithm and bit rounding.

The package is xarray-free and I'm curious to know differences in performance as we've been suffering from allocations

  1. Bitrounding
    A 400MB Float32 array
julia> using BitInformation,  BenchmarkTools
julia> A = rand(Float32,100,100,100,100);
julia> sizeof(A)/1000^2
400.0

julia> @btime round!(A,7)
  25.276 ms (0 allocations: 0 bytes)

reaches 16GB/s on my macbook air and (afaik) is memory bounded at that point (max bandwidth of reading from RAM).

  1. Bitinformation
julia> @btime bitinformation($A);
  5.596 s (285 allocations: 13.53 KiB)

The bitinformation algorithm here is essentially allocation free as only the counter array has to be allocated while counting all 00,01,10,11 combinations in the data set. It reaches about 70MB/s and is at that stage on a similar order of magnitude as lossless codecs at higher compression levels.

Neil, would you mind throwing in a similar quick benchmark of ceda-icompress?

@nmassey001
Copy link

nmassey001 commented Sep 14, 2023

Hi @milankl

Here's some timing results. Not quite as fast as the Julia, but also not dreadful.

import numpy as np
import time
from ceda_icompress.BitManipulation.bitshave import BitShave
from ceda_icompress.InfoMeasures.bitinformation import bitinformation

def millis_now():
    return int(round(time.time() * 1000))

def bitshave_test():
    g = np.random.default_rng()
    A = g.random([100,100,100,100],dtype=np.float32)

    start = millis_now()
    b = BitShave(A, 7)
    # start timing
    B = b.process(A)
    end = millis_now()
    print(f"BitShave time: {end - start}ms")

def bitinfo_test():
    g = np.random.default_rng()
    A = np.ma.array(g.random([100,100,100,100],dtype=np.float32))
    start = millis_now()
    bi = bitinformation(A)
    end = millis_now()
    print(f"BitInfo time: {(end - start)/1000}s")

if __name__ == "__main__":
    bitshave_test()
    bitinfo_test()

Results:

BitShave time: 35ms
BitInfo time: 20.732s

@nmassey001
Copy link

If I don't convert the exponent:

BitShave time: 38ms
BitInfo time: 11.423s

@milankl
Copy link
Collaborator Author

milankl commented Sep 19, 2023

Thanks @nmassey001 !!! @observingClouds could you, at some point, compare this to xbitinfo performance?

@observingClouds
Copy link
Owner

Thanks @nmassey001 for posting these numbers and reaching out to us. I hope I find time soon to provide these numbers as well.

@nmassey001
Copy link

A belated update to this: I've finally got around to implementing the bitpaircount function using Dask, so it can run in parallel.
Using 3 threads, the above timings are approximately halved.
Using more than 3 threads increases the memory footprint to be more than the physical ram available on my MacBook Pro (16GB). This machine (M1 Pro) has 6 "proper" cores, so 6 threads should be possible, but the memory requirements blow up.

@milankl
Copy link
Collaborator Author

milankl commented Nov 28, 2023

I believe this memory issue is something general we have to work on. Technically the algorithm should be almost allocation free as mentioned above (only a small counter array has to be allocated) but python seems to do something else. I don't know enough about python to easily identify where it allocates and why but this problem seems to get prohibitive for larger datasets. @nmassey001 could you measure your memory allocations too? Maybe this helps @observingClouds to understand why we also seem to copy the array, which we really shouldn't.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants