waterbalance calculation is slow...? #875
-
Hi,
How should I speedup? thx, PS: I know this is not really a bug, but perhaps a lack of skill from my part. On the other hand it might be nice if there was a "how do i" do efficient waterbalance calculations. |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 2 replies
-
Hi Jacco, This is because of the lazy evaluation. We wrote something about this in the documentation. Lazy evaluation is especially useful for out of memory computations. iMOD Python loads its IDF data always as dask arrays, chunked per layer per time. Loading data into memory at the right time can speed things up a lot. You can load data into memory manually by calling the If your selection fits in memory, you can try: bdg = imod.idf.open(f'{bdg_naam}_*.idf')
zone =imod.idf.open(zone.idf).compute()
# select part of needed bdg
bdg = bdg.sel(time=slice('2012-01-01', '2021-01-01')).compute()
bdg = bdg.where(zone>0)
bdg = bdg.sum(dim='layer') # i have 4 layers
bdg = bdg.where(bdg>0) # only get the infiltration flux
result =[]
for datum in bdg.time:
dagwaarde = bdg.sel(time=datum.values).sum().values.item()
result.append([pd.Timestamp(datum.values),dagwaarde]) I think the code can be further optimized, by avoiding doing the bdg = imod.idf.open(f'{bdg_naam}_*.idf')
zone =imod.idf.open(zone.idf).compute()
# select part of needed bdg
bdg = bdg.sel(time=slice('2012-01-01', '2021-01-01'))
# Load into memory after selection
bdg = bdg.compute()
bdg = bdg.sum(dim='layer') # i have 4 layers
bdg = bdg.where((bdg>0) & (zone>0)) # only get the infiltration flux
dagwaardes = bdg.sum(dim=["y", "x"])
timestamps = [pd.Timestamp(t) for t in dagwaardes.coords["time"]] |
Beta Was this translation helpful? Give feedback.
-
I'm converting this issue to a discussion on the discussion board, which is the best place to ask user questions! If you are not sure if you are running into a bug or doing something wrong, feel free to post an issue. |
Beta Was this translation helpful? Give feedback.
-
(I see I'm cross-posting with Joeri...) Hi @JaccoHoogewoud, That is indeed ridiculously slow. To be fair, I don't see things in your example that immediately suspicion, but there are some things which could lead to some inefficiencies and the loop isn't necessary either. Here's how I would write it, making use of the fact that you use bitwise operators ( budget = imod.idf.open(f'{bdg_naam}_*.idf').sel(time=slice('2012-01-01', '2021-01-01'))
zone = imod.idf.open("zone.idf")
zone_infiltration = budget.where((zone > 0) & (budget > 0))
timeseries_infiltration = zone_infiltration.sum(["layer", "y", "x"]).compute() Everything in the snippet above should be fast except the last line since it's "lazy evaluation". If you want a DataFrame timeseries: timeseries_infiltration = zone_infiltration.sum(["layer", "y", "x"]).to_dataframe() This will also force a compute, since pandas DataFrames are always eager (they don't support lazy evaluation). The reason your implementation is slow is maybe due to manually In your case, if all timesteps fit in memory, I expect you can also do this to fix performance. ...
bdg = bdg.where(zone>0)
bdg = bdg.sum(dim='layer').compute()
... But I'd try the snippet I posted above first. I wrote this part of the documentation to give a little idea of the delayed evaluation: https://deltares.github.io/imod-python/user-guide/06-lazy-evaluation.html |
Beta Was this translation helpful? Give feedback.
(I see I'm cross-posting with Joeri...)
Hi @JaccoHoogewoud,
That is indeed ridiculously slow. To be fair, I don't see things in your example that immediately suspicion, but there are some things which could lead to some inefficiencies and the loop isn't necessary either.
Here's how I would write it, making use of the fact that you use bitwise operators (
&
,|
,~
) for elementwise boolean logic, and you can sum over multiple dimensions at once with xarray: