Skip to content

Commit

Permalink
Speed up the loading of large tables (#2026)
Browse files Browse the repository at this point in the history
This fixes a performance issue that @rbruijnshkv encountered trying to
initialize a model with a `Basin / time` column of 6 million rows,
spread over 1000 Basin nodes. It spent around 1-2 seconds per Basin node
on this line. `time is a StructVector`, which stores columns as vectors.
By broadcasting getfield we iterated over rows generating BasinTime
structs and then taking one field, which works but is much slower than
just taking out the field that is already a vector.

The general recommendation for such large tables is to not store them in
the model database but a separate Arrow file like here:
https://github.com/Deltares/Ribasim/blob/v2025.1.0/python/ribasim_testmodels/ribasim_testmodels/basic.py#L210.
Doing this shrank the database from 400 to 100 MB, and also sped up
initialization. This should help both formats though.
  • Loading branch information
visr authored Feb 3, 2025
1 parent 61ed745 commit c75888a
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion core/src/util.jl
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ function get_scalar_interpolation(
interpolation_type::Type{<:AbstractInterpolation},
)::interpolation_type
rows = searchsorted(time.node_id, node_id)
parameter = getfield.(time, param)[rows]
parameter = getproperty(time, param)[rows]
parameter = coalesce.(parameter, default_value)
times = seconds_since.(time.time[rows], starttime)
# Add extra timestep at start for constant extrapolation
Expand Down

0 comments on commit c75888a

Please sign in to comment.