Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to use threads when computing freqresp! #912

Closed
wants to merge 1 commit into from

Conversation

VaiTon
Copy link

@VaiTon VaiTon commented Dec 18, 2023

No description provided.

@baggepinnen baggepinnen marked this pull request as draft December 18, 2023 17:17
@baggepinnen
Copy link
Member

Hello and thank you for your PR!

Do you have any benchmark where running threaded is beneficial?

@VaiTon
Copy link
Author

VaiTon commented Dec 19, 2023

Yes :) Here they are.

Trial with $10^7$ freqs

Parallel

julia> trials.data["bode, parallel=true, unwrap=false, 10^7"]
BenchmarkTools.Trial: 3 samples with 1 evaluation.
 Range (min … max):  2.476 s …   2.497 s  ┊ GC (min … max): 0.10% … 0.09%
 Time  (median):     2.484 s              ┊ GC (median):    0.10%
 Time  (mean ± σ):   2.485 s ± 10.572 ms  ┊ GC (mean ± σ):  0.17% ± 0.12%

  █                     █                                 █  
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  2.48 s         Histogram: frequency by time         2.5 s <

 Memory estimate: 2.68 GiB, allocs estimate: 107.

Serial

julia> trials.data["bode, parallel=false, unwrap=false, 10^7"]
BenchmarkTools.Trial: 1 sample with 1 evaluation.
 Single result which took 6.128 s (0.03% GC) to evaluate,
 with a memory estimate of 2.68 GiB, over 6 allocations.

Time graph

bode_time

Memory graph

bode_memory

Code

begin
    using ControlSystemsBase
    using BenchmarkTools
    using Plots
    using BenchmarkPlots
    using StatsPlots
end

G = ssrand(3) |> tf


suite = BenchmarkGroup()

for i in 1:7
    w = LinRange(1e-5, 1e5, 10^i)

    for p in [true, false]
        b = @benchmarkable bode($G, $w; parallel=$p, unwrap=false)
        suite["bode, parallel=$p, unwrap=false, 10^$i"] = b
    end
end

tune!(suite; verbose=true)

trials = run(suite, verbose=true)

parallelseries = []
for i in 1:7
    push!(parallelseries, trials.data["bode, parallel=true, unwrap=false, 10^$i"])
end

serialseries = []
for i in 1:7
    push!(serialseries, trials.data["bode, parallel=false, unwrap=false, 10^$i"])
end

sizeseries = ["10^1", "10^2", "10^3", "10^4", "10^5", "10^6", "10^7"]

plot(sizeseries,
    [
        (t) -> mean(trials.data["bode, parallel=true, unwrap=false, $t"]).time,
        (t) -> mean(trials.data["bode, parallel=false, unwrap=false, $t"]).time,
    ],
    label=["parallel" "serial"], 
    xlabel="size of frequency vector",
    ylabel="time (μs)",
    legend=:topleft,
)
savefig("bode_time.png")

plot(
    sizeseries,
    [
        (t) -> mean(trials.data["bode, parallel=true, unwrap=false, $t"]).memory,
        (t) -> mean(trials.data["bode, parallel=false, unwrap=false, $t"]).memory,
    ],
    label=["parallel" "serial"],
    xlabel="size of frequency vector",
    ylabel="memory (bytes)",
    legend=:topleft,
)
savefig("bode_memory.png")

@VaiTon
Copy link
Author

VaiTon commented Dec 19, 2023

If you want to use a different interface please do. I'm not an expert in how APIs should be exposed in Julia :)

@VaiTon
Copy link
Author

VaiTon commented Dec 31, 2023

@baggepinnen any news on this?

@baggepinnen
Copy link
Member

Thanks for the benchmarks!

I am not convinced this is the best optimization to perform for freqresp on transfer functions. With 12 threads, I see no more than 2x speedup for frequency vectors up to length 10k. I am also not quite sure the benchmark is entirely realistic, representing 3x3 MIMO systems of order 6 as transfer functions is questionable from a numeric standpoint, a statespace representation would probably make better sense for such a system?

If this benchmark is indeed representative of a workload you are running in practice where the call to freqresp is the computational bottleneck, I'd rather look into avoiding the repeated computations that take place for all the similar denominators. The system G in the benchmark evaluates the same denominator polynomial 6 times for each frequency.

@VaiTon VaiTon closed this Jan 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants