waterbalance calculation is slow...? #875

JaccoHoogewoud · 2024-02-26T15:26:12Z

JaccoHoogewoud
Feb 26, 2024

Hi,
I calculate the riverbudgets per day for a given zone, and store the values in a list.
It works but is unexpectedly (8 hours, for 11 years of daily data with grid aprox 320*320) slow, running the groundwatermodel is almost as fast (16 hours).

    bdg = imod.idf.open(f'{bdg_naam}_*.idf')
    zone =imod.idf.open(zone.idf)
    # select part of needed bdg
    bdg = bdg.sel(time=slice('2012-01-01', '2021-01-01'))
    bdg = bdg.where(zone>0)
    bdg = bdg.sum(dim='layer') # i have 4 layers
    bdg = bdg.where(bdg>0)     # only get the infiltration flux
    result =[]
    for datum in bdg.time:
        dagwaarde = bdg.sel(time=datum.values).sum().values.item()
        result.append([pd.Timestamp(datum.values),dagwaarde])

How should I speedup?

thx,
Jacco.

PS: I know this is not really a bug, but perhaps a lack of skill from my part. On the other hand it might be nice if there was a "how do i" do efficient waterbalance calculations.

Answered by Huite

Feb 27, 2024

(I see I'm cross-posting with Joeri...)

Hi @JaccoHoogewoud,

That is indeed ridiculously slow. To be fair, I don't see things in your example that immediately suspicion, but there are some things which could lead to some inefficiencies and the loop isn't necessary either.

Here's how I would write it, making use of the fact that you use bitwise operators (&, |, ~) for elementwise boolean logic, and you can sum over multiple dimensions at once with xarray:

budget = imod.idf.open(f'{bdg_naam}_*.idf').sel(time=slice('2012-01-01', '2021-01-01'))
zone = imod.idf.open("zone.idf")
zone_infiltration = budget.where((zone > 0) & (budget > 0))
timeseries_infiltration = zone_infiltration.sum(["layer", "y"

View full answer

JoerivanEngelen · 2024-02-27T08:46:48Z

JoerivanEngelen
Feb 27, 2024
Maintainer

Hi Jacco,

This is because of the lazy evaluation. We wrote something about this in the documentation. Lazy evaluation is especially useful for out of memory computations. iMOD Python loads its IDF data always as dask arrays, chunked per layer per time. Loading data into memory at the right time can speed things up a lot. You can load data into memory manually by calling the .compute() method. You can do this often best after selecting data, if the selected piece of data fits into memory.

If your selection fits in memory, you can try:

    bdg = imod.idf.open(f'{bdg_naam}_*.idf')
    zone =imod.idf.open(zone.idf).compute()
    # select part of needed bdg
    bdg = bdg.sel(time=slice('2012-01-01', '2021-01-01')).compute()
    bdg = bdg.where(zone>0)
    bdg = bdg.sum(dim='layer') # i have 4 layers
    bdg = bdg.where(bdg>0)     # only get the infiltration flux
    result =[]
    for datum in bdg.time:
        dagwaarde = bdg.sel(time=datum.values).sum().values.item()
        result.append([pd.Timestamp(datum.values),dagwaarde])

I think the code can be further optimized, by avoiding doing the bdg.where(zone>0) twice, and making better use of the sum() method:

    bdg = imod.idf.open(f'{bdg_naam}_*.idf')
    zone =imod.idf.open(zone.idf).compute()
    # select part of needed bdg
    bdg = bdg.sel(time=slice('2012-01-01', '2021-01-01'))
    # Load into memory after selection
    bdg = bdg.compute()
    bdg = bdg.sum(dim='layer') # i have 4 layers
    bdg = bdg.where((bdg>0) & (zone>0))     # only get the infiltration flux
    dagwaardes = bdg.sum(dim=["y", "x"])
    timestamps = [pd.Timestamp(t) for t in dagwaardes.coords["time"]]

0 replies

JoerivanEngelen · 2024-02-27T08:48:59Z

JoerivanEngelen
Feb 27, 2024
Maintainer

I'm converting this issue to a discussion on the discussion board, which is the best place to ask user questions! If you are not sure if you are running into a bug or doing something wrong, feel free to post an issue.

1 reply

JaccoHoogewoud Feb 28, 2024
Author

thx, i will next time

Huite · 2024-02-27T08:49:16Z

Huite
Feb 27, 2024
Maintainer

(I see I'm cross-posting with Joeri...)

Hi @JaccoHoogewoud,

That is indeed ridiculously slow. To be fair, I don't see things in your example that immediately suspicion, but there are some things which could lead to some inefficiencies and the loop isn't necessary either.

Here's how I would write it, making use of the fact that you use bitwise operators (&, |, ~) for elementwise boolean logic, and you can sum over multiple dimensions at once with xarray:

budget = imod.idf.open(f'{bdg_naam}_*.idf').sel(time=slice('2012-01-01', '2021-01-01'))
zone = imod.idf.open("zone.idf")
zone_infiltration = budget.where((zone > 0) & (budget > 0))
timeseries_infiltration = zone_infiltration.sum(["layer", "y", "x"]).compute()

Everything in the snippet above should be fast except the last line since it's "lazy evaluation".

If you want a DataFrame timeseries:

timeseries_infiltration = zone_infiltration.sum(["layer", "y", "x"]).to_dataframe()

This will also force a compute, since pandas DataFrames are always eager (they don't support lazy evaluation).

The reason your implementation is slow is maybe due to manually sel'ing all timesteps, forcing a sum (still all delayed evaluation), then forcing the computation with the .values.item(). Most of the time that something like is very slow, it's because during execution, the IDF files are repeatedly read. This is useful if you have large data, so you can work on data bigger than your RAM.

In your case, if all timesteps fit in memory, I expect you can also do this to fix performance.

    ...
    bdg = bdg.where(zone>0)
    bdg = bdg.sum(dim='layer').compute()
    ...

But I'd try the snippet I posted above first.

I wrote this part of the documentation to give a little idea of the delayed evaluation: https://deltares.github.io/imod-python/user-guide/06-lazy-evaluation.html

1 reply

JaccoHoogewoud Feb 28, 2024
Author

Thx @JoerivanEngelen @Huite !
I tried it and the runtime went from hours to seconds....

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

waterbalance calculation is slow...? #875

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

waterbalance calculation is slow...? #875

JaccoHoogewoud Feb 26, 2024

Replies: 3 comments · 2 replies

JoerivanEngelen Feb 27, 2024 Maintainer

JoerivanEngelen Feb 27, 2024 Maintainer

JaccoHoogewoud Feb 28, 2024 Author

Huite Feb 27, 2024 Maintainer

JaccoHoogewoud Feb 28, 2024 Author

JaccoHoogewoud
Feb 26, 2024

Replies: 3 comments 2 replies

JoerivanEngelen
Feb 27, 2024
Maintainer

JoerivanEngelen
Feb 27, 2024
Maintainer

JaccoHoogewoud Feb 28, 2024
Author

Huite
Feb 27, 2024
Maintainer

JaccoHoogewoud Feb 28, 2024
Author