Skip to content
This repository has been archived by the owner on Dec 6, 2024. It is now read-only.

free memory for data load #30

Open
nilshempelmann opened this issue Feb 15, 2018 · 14 comments
Open

free memory for data load #30

nilshempelmann opened this issue Feb 15, 2018 · 14 comments

Comments

@nilshempelmann
Copy link
Member

There are cases, where the dataload for the memory is bigger the the available memory.
For that the data can't be processed in one block. Ocgis provides calculation in chunks (chunk size needs to be determined as well) to be switched on in case of potential memory overload.

Two parts of the problem:

  1. check available memory:
    existing function to be optimized: utils.FreeMemory()
  2. check dataload for the process:
    There is a get_base_request_size and "large_array.compute" in ocgis on operations which should be revisited. There can be metadata on the calculations themselves used for defining the decompositions:

check dataload is tricky, when the process includes:

  • polygon subset
  • icclim computation
  • remapping
  • ... etc ?

test standalone script is here:
https://github.com/bird-house/flyingpigeon/blob/next/scripts/ocgis_freememory.py

@huard
Copy link
Collaborator

huard commented Jun 19, 2018

Please provide description of issue.

@nilshempelmann
Copy link
Member Author

@huard description provided

@nkadygrov
Copy link
Member

I also would like to use Ocgis calculation with chunks.

Meanwhile I just use workaround, with ocgis as well, but not in sophisticated way...

Say here:
https://github.com/bird-house/blackswan/blob/master/blackswan/processes/wps_analogs_reanalyse.py#L397

Define the size of the dataset (with simple function get_files_size : https://github.com/bird-house/blackswan/blob/master/blackswan/utils.py#L32 , which should be in eggshell)
And here do the calcs:
https://github.com/bird-house/blackswan/blob/master/blackswan/processes/wps_analogs_reanalyse.py#L442
step-by-step or at once....

@bekozi
Copy link

bekozi commented Jun 20, 2018

It is definitely time to address this. Let's be sure and work out a plan on the next call!

Related to bird-house/bird-house.github.io#17.

@nilshempelmann
Copy link
Member Author

Listed as a "hot topic".
Shouldn't it be moved to eggshell? Eggshell is going to be the function library. to be used from all the other birds.

@nilshempelmann
Copy link
Member Author

the standalone is updated:
https://github.com/bird-house/flyingpigeon/blob/master/scripts/ocgis_freememory.py

@bekozi still a bug in ocgis, when calc=None its not possible to compute in chunks

once the freememory and dataload comparison is running basically, we can integrate it properly in eggshell and add more complex cases (polygon subset etc ... )

@bekozi
Copy link

bekozi commented Jun 25, 2018

@bekozi still a bug in ocgis, when calc=None its not possible to compute in chunks

Thanks for the reminder. I'll look into it this morning (NCPP/ocgis#402).

@bekozi
Copy link

bekozi commented Jun 25, 2018

@nilshempelmann I pushed changes to this ocgis branch allowing for calc=None. Do you mind testing from there? TBH, I'm not sure of the application so I expect some iterations!

@nilshempelmann
Copy link
Member Author

@bekozi runing into this error:

....
netCDF4/_netCDF4.pyx in netCDF4._netCDF4._Variable.getattr()

AttributeError: datatype

conda ocgis-next was able to handle the datatype.

@bekozi
Copy link

bekozi commented Jul 9, 2018

Unless this is a blocker, let's hold off on debugging. An improved version of compute is almost ready, and it would be best to test against that.

@nilshempelmann
Copy link
Member Author

@bekozi is the improved version to calculate chunks ready? Could you provide a code snipppet?
I hope to have some time next week to work on this issue ....

@bekozi
Copy link

bekozi commented Sep 13, 2018

@nilshempelmann Could you provide more context for the AttributeError?. I cannot reproduce locally.

@nilshempelmann nilshempelmann transferred this issue from bird-house/flyingpigeon Dec 5, 2018
@nilshempelmann
Copy link
Member Author

standalone script moved to eggshell and is adopted

@nilshempelmann
Copy link
Member Author

@bekozi
calc=None is not allowed for compute() execution.

And there is a BIG difference in perfomance if data are performed directly or in chunks:

operation performed with execute in 0.636506 sec.

/tiles progress: [########################################]
complete.
operation performed with compute in 13.801171 sec.

So swithing to compute() should done really only if the data load is not fitting into the memory.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants