Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use OnceLock instead of OnceCell #13410

Merged
merged 5 commits into from
Nov 11, 2024
Merged

Conversation

mtreinish
Copy link
Member

Summary

OnceLock is a thread-safe version of OnceCell that enables us to use PackedInstruction from a threaded environment. There is some overhead associated with this, primarily in memory as the OnceLock is a larger type than a OnceCell. But the tradeoff is worth it to start leverage multithreading for circuits.

Details and comments

Fixes #13219

OnceLock is a thread-safe version of OnceCell that enables us to use
PackedInstruction from a threaded environment. There is some overhead
associated with this, primarily in memory as the OnceLock is a larger
type than a OnceCell. But the tradeoff is worth it to start leverage
multithreading for circuits.

Fixes Qiskit#13219
@mtreinish mtreinish added performance Changelog: None Do not include in changelog Rust This PR or issue is related to Rust code in the repository labels Nov 8, 2024
@mtreinish mtreinish added this to the 2.0.0 milestone Nov 8, 2024
@mtreinish mtreinish requested a review from a team as a code owner November 8, 2024 10:19
@qiskit-bot
Copy link
Collaborator

One or more of the following people are relevant to this code:

  • @Qiskit/terra-core

mtreinish added a commit to mtreinish/qiskit-core that referenced this pull request Nov 8, 2024
With Qiskit#13410 removing the non-threadsafe structure from our circuit
representation we're now able to read and iterate over a DAGCircuit from
multiple threads. This commit is the first small piece doing this, it
moves the analysis portion of the BarrierBeforeFinalMeasurements pass to
execure in parallel. The pass checks every node to ensure all it's
decendents are either a measure or a barrier before reaching the end of
the circuit. This commit iterates over all the nodes and does the check
in parallel.
@coveralls
Copy link

coveralls commented Nov 8, 2024

Pull Request Test Coverage Report for Build 11766429415

Details

  • 19 of 23 (82.61%) changed or added relevant lines in 7 files are covered.
  • 5 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+0.008%) to 88.935%

Changes Missing Coverage Covered Lines Changed/Added Lines %
crates/circuit/src/dag_node.rs 1 2 50.0%
crates/circuit/src/dag_circuit.rs 5 8 62.5%
Files with Coverage Reduction New Missed Lines %
crates/qasm2/src/lex.rs 5 92.48%
Totals Coverage Status
Change from base Build 11749692210: 0.008%
Covered Lines: 79065
Relevant Lines: 88902

💛 - Coveralls

Copy link
Contributor

@kevinhartman kevinhartman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a simple enough move. Have you tried running any benchmarks?

@mtreinish
Copy link
Member Author

After 8bc4844 merged fixing the asv tests I did a quick asv run which yielded basically no change:

Benchmarks that have stayed the same:

| Change   | Before [8bc48442] <once-lock^2>   | After [487e1bec] <once-lock>   | Ratio   | Benchmark (Parameter)                                                                                           |
|----------|-----------------------------------|--------------------------------|---------|-----------------------------------------------------------------------------------------------------------------|
|          | 113±5μs                           | 246±70μs                       | ~2.17   | circuit_construction.CircuitConstructionBench.time_circuit_copy(1, 8192)                                        |
|          | 452±200μs                         | 827±300μs                      | ~1.83   | circuit_construction.CircuitConstructionBench.time_circuit_copy(2, 32768)                                       |
|          | 486±200μs                         | 799±200μs                      | ~1.64   | circuit_construction.CircuitConstructionBench.time_circuit_copy(5, 32768)                                       |
|          | 665±300μs                         | 962±200μs                      | ~1.45   | circuit_construction.CircuitConstructionBench.time_circuit_copy(1, 32768)                                       |
|          | 461±200μs                         | 614±200μs                      | ~1.33   | circuit_construction.CircuitConstructionBench.time_circuit_copy(8, 32768)                                       |
|          | 34.2±8μs                          | 45.2±20μs                      | ~1.32   | circuit_construction.CircuitConstructionBench.time_circuit_copy(2, 2048)                                        |
|          | 194±50μs                          | 248±50μs                       | ~1.28   | circuit_construction.CircuitConstructionBench.time_circuit_copy(8, 8192)                                        |
|          | 5.89±0.2ms                        | 6.88±0.4ms                     | ~1.17   | manipulate.TestCircuitManipulate.time_DTC100_twirling                                                           |
|          | 3.42±0.03s                        | 3.97±0.2s                      | ~1.16   | utility_scale.UtilityScaleBenchmarks.time_circSU2('cz')                                                         |
|          | 222±2μs                           | 252±70μs                       | ~1.14   | circuit_construction.CircuitConstructionBench.time_circuit_copy(2, 8192)                                        |
|          | 2.26±0.5ms                        | 2.50±0.6ms                     | ~1.11   | circuit_construction.CircuitConstructionBench.time_circuit_copy(20, 131072)                                     |
|          | 172±60μs                          | 189±60μs                       | ~1.10   | circuit_construction.CircuitConstructionBench.time_circuit_copy(5, 8192)                                        |
|          | 2.28±0.4ms                        | 2.51±0.5ms                     | 1.10    | circuit_construction.CircuitConstructionBench.time_circuit_copy(14, 131072)                                     |
|          | 132±10μs                          | 144±7μs                        | 1.09    | circuit_construction.CircuitConstructionBench.time_circuit_copy(14, 8192)                                       |
|          | 2.34±0.1ms                        | 2.55±0.07ms                    | 1.09    | circuit_construction.CircuitConstructionBench.time_circuit_copy(8, 131072)                                      |
|          | 2.34±0.2ms                        | 2.53±0.04ms                    | 1.08    | circuit_construction.CircuitConstructionBench.time_circuit_copy(2, 131072)                                      |
|          | 41.9±1μs                          | 44.9±1μs                       | 1.07    | circuit_construction.CircuitConstructionBench.time_circuit_copy(14, 2048)                                       |
|          | 782±10μs                          | 833±40μs                       | 1.07    | circuit_construction.CliffordSynthesis.time_clifford_synthesis(10)                                              |
|          | 2.47±0.05ms                       | 2.62±0.05ms                    | 1.06    | circuit_construction.CircuitConstructionBench.time_circuit_copy(1, 131072)                                      |
|          | 2.37±0.1ms                        | 2.52±0.06ms                    | 1.06    | circuit_construction.CircuitConstructionBench.time_circuit_copy(5, 131072)                                      |
|          | 1.13±0.03ms                       | 1.19±0.03ms                    | 1.06    | circuit_construction.CircuitConstructionBench.time_circuit_extend(1, 8192)                                      |
|          | 5.78±0.2ms                        | 6.12±0.2ms                     | 1.06    | circuit_construction.CircuitConstructionBench.time_circuit_extend(14, 32768)                                    |
|          | 375±10μs                          | 393±2μs                        | 1.05    | circuit_construction.CircuitConstructionBench.time_circuit_extend(14, 2048)                                     |
|          | 343±7μs                           | 361±2μs                        | 1.05    | circuit_construction.CircuitConstructionBench.time_circuit_extend(2, 2048)                                      |
|          | 1.39±0.01ms                       | 1.46±0.02ms                    | 1.05    | circuit_construction.CircuitConstructionBench.time_circuit_extend(2, 8192)                                      |
|          | 108±6ms                           | 113±20ms                       | 1.05    | circuit_construction.ParamaterizedDifferentCircuit.time_DTC100_set_build(100, 150)                              |
|          | 34.4±2μs                          | 35.6±1μs                       | 1.04    | circuit_construction.CircuitConstructionBench.time_circuit_copy(1, 2048)                                        |
|          | 462±30μs                          | 481±40μs                       | 1.04    | circuit_construction.CircuitConstructionBench.time_circuit_copy(14, 32768)                                      |
|          | 4.52±0.2ms                        | 4.68±0.3ms                     | 1.04    | circuit_construction.CircuitConstructionBench.time_circuit_extend(1, 32768)                                     |
|          | 380±2μs                           | 397±5μs                        | 1.04    | circuit_construction.CircuitConstructionBench.time_circuit_extend(20, 2048)                                     |
|          | 1.46±0.03ms                       | 1.51±0.04ms                    | 1.04    | circuit_construction.CircuitConstructionBench.time_circuit_extend(5, 8192)                                      |
|          | 5.65±0.08ms                       | 5.86±0.1ms                     | 1.04    | circuit_construction.CircuitConstructionBench.time_circuit_extend(8, 32768)                                     |
|          | 3.27±0.08ms                       | 3.39±0.05ms                    | 1.04    | circuit_construction.ParamaterizedDifferentCircuit.time_DTC100_set_build(50, 10)                                |
|          | 29.4±0.2ms                        | 30.5±0.3ms                     | 1.04    | circuit_construction.ParamaterizedDifferentCircuit.time_QV100_build(50, 150)                                    |
|          | 296±3ms                           | 307±1ms                        | 1.04    | circuit_construction.ParameterizedCircuitBindBench.time_bind_params(20, 131072, 128)                            |
|          | 39.2±3μs                          | 40.5±7μs                       | 1.03    | circuit_construction.CircuitConstructionBench.time_circuit_copy(8, 2048)                                        |
|          | 37.1±0.5μs                        | 38.1±0.3μs                     | 1.03    | circuit_construction.CircuitConstructionBench.time_circuit_extend(14, 128)                                      |
|          | 28.9±0.3μs                        | 29.8±0.09μs                    | 1.03    | circuit_construction.CircuitConstructionBench.time_circuit_extend(2, 128)                                       |
|          | 5.32±0.1ms                        | 5.50±0.07ms                    | 1.03    | circuit_construction.CircuitConstructionBench.time_circuit_extend(2, 32768)                                     |
|          | 44.2±0.3μs                        | 45.4±0.4μs                     | 1.03    | circuit_construction.CircuitConstructionBench.time_circuit_extend(20, 128)                                      |
|          | 22.3±1ms                          | 23.0±1ms                       | 1.03    | circuit_construction.CircuitConstructionBench.time_circuit_extend(20, 131072)                                   |
|          | 22.9±0.3μs                        | 23.7±0.4μs                     | 1.03    | circuit_construction.CircuitConstructionBench.time_circuit_extend(20, 8)                                        |
|          | 364±7μs                           | 375±7μs                        | 1.03    | circuit_construction.CircuitConstructionBench.time_circuit_extend(5, 2048)                                      |
|          | 1.48±0.04ms                       | 1.53±0.05ms                    | 1.03    | circuit_construction.CircuitConstructionBench.time_circuit_extend(8, 8192)                                      |
|          | 4.78±0.04ms                       | 4.94±0.1ms                     | 1.03    | circuit_construction.CliffordSynthesis.time_clifford_synthesis(50)                                              |
|          | 8.91±0.3ms                        | 9.22±0.2ms                     | 1.03    | circuit_construction.ParamaterizedDifferentCircuit.time_DTC100_set_build(100, 10)                               |
|          | 34.6±0.9ms                        | 35.5±0.7ms                     | 1.03    | circuit_construction.ParamaterizedDifferentCircuit.time_DTC100_set_build(50, 150)                               |
|          | 604±4ms                           | 624±2ms                        | 1.03    | circuit_construction.ParameterizedCircuitConstructionBench.time_build_parameterized_circuit(20, 131072, 8)      |
|          | 9.40±0.06ms                       | 9.65±0.06ms                    | 1.03    | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('ecr')                                                |
|          | 484±4ms                           | 495±3ms                        | 1.02    | circuit_construction.CircuitConstructionBench.time_circuit_construction(14, 131072)                             |
|          | 481±1ms                           | 490±7ms                        | 1.02    | circuit_construction.CircuitConstructionBench.time_circuit_construction(2, 131072)                              |
|          | 123±0.3ms                         | 125±0.6ms                      | 1.02    | circuit_construction.CircuitConstructionBench.time_circuit_construction(20, 32768)                              |
|          | 30.9±0.1ms                        | 31.5±0.05ms                    | 1.02    | circuit_construction.CircuitConstructionBench.time_circuit_construction(20, 8192)                               |
|          | 30.8±0.2ms                        | 31.4±0.1ms                     | 1.02    | circuit_construction.CircuitConstructionBench.time_circuit_construction(5, 8192)                                |
|          | 572±3μs                           | 585±2μs                        | 1.02    | circuit_construction.CircuitConstructionBench.time_circuit_construction(8, 128)                                 |
|          | 123±0.5ms                         | 125±1ms                        | 1.02    | circuit_construction.CircuitConstructionBench.time_circuit_construction(8, 32768)                               |
|          | 11.9±0.05μs                       | 12.0±0.1μs                     | 1.02    | circuit_construction.CircuitConstructionBench.time_circuit_copy(5, 128)                                         |
|          | 12.8±0.4μs                        | 13.1±0.2μs                     | 1.02    | circuit_construction.CircuitConstructionBench.time_circuit_copy(8, 128)                                         |
|          | 25.3±0.05μs                       | 25.9±0.1μs                     | 1.02    | circuit_construction.CircuitConstructionBench.time_circuit_extend(1, 128)                                       |
|          | 293±3μs                           | 297±8μs                        | 1.02    | circuit_construction.CircuitConstructionBench.time_circuit_extend(1, 2048)                                      |
|          | 1.53±0.04ms                       | 1.56±0.04ms                    | 1.02    | circuit_construction.CircuitConstructionBench.time_circuit_extend(14, 8192)                                     |
|          | 8.80±0.1μs                        | 8.93±0.06μs                    | 1.02    | circuit_construction.CircuitConstructionBench.time_circuit_extend(2, 8)                                         |
|          | 5.94±0.1ms                        | 6.04±0.1ms                     | 1.02    | circuit_construction.CircuitConstructionBench.time_circuit_extend(20, 32768)                                    |
|          | 1.48±0.03ms                       | 1.51±0.06ms                    | 1.02    | circuit_construction.CircuitConstructionBench.time_circuit_extend(20, 8192)                                     |
|          | 32.6±0.4μs                        | 33.4±0.2μs                     | 1.02    | circuit_construction.CircuitConstructionBench.time_circuit_extend(5, 128)                                       |
|          | 23.1±0.8ms                        | 23.6±0.6ms                     | 1.02    | circuit_construction.CircuitConstructionBench.time_circuit_extend(5, 131072)                                    |
|          | 5.65±0.09ms                       | 5.78±0.1ms                     | 1.02    | circuit_construction.CircuitConstructionBench.time_circuit_extend(5, 32768)                                     |
|          | 10.1±0.07μs                       | 10.3±0.07μs                    | 1.02    | circuit_construction.CircuitConstructionBench.time_circuit_extend(5, 8)                                         |
|          | 22.3±0.8ms                        | 22.7±0.8ms                     | 1.02    | circuit_construction.CircuitConstructionBench.time_circuit_extend(8, 131072)                                    |
|          | 371±2μs                           | 380±9μs                        | 1.02    | circuit_construction.CircuitConstructionBench.time_circuit_extend(8, 2048)                                      |
|          | 12.2±0.4ms                        | 12.5±0.05ms                    | 1.02    | circuit_construction.ParamaterizedDifferentCircuit.time_DTC100_set_build(50, 50)                                |
|          | 599±3ms                           | 613±5ms                        | 1.02    | circuit_construction.ParameterizedCircuitBindBench.time_bind_params(20, 131072, 131072)                         |
|          | 145±1ms                           | 147±2ms                        | 1.02    | circuit_construction.ParameterizedCircuitBindBench.time_bind_params(20, 32768, 32768)                           |
|          | 31.4±0.5ms                        | 32.0±1ms                       | 1.02    | circuit_construction.ParameterizedCircuitBindBench.time_bind_params(20, 8192, 8192)                             |
|          | 660±4ms                           | 675±7ms                        | 1.02    | circuit_construction.ParameterizedCircuitConstructionBench.time_build_parameterized_circuit(20, 131072, 8192)   |
|          | 309±2ms                           | 314±3ms                        | 1.02    | circuit_construction.ParameterizedCircuitConstructionBench.time_build_parameterized_circuit(20, 32768, 32768)   |
|          | 38.5±0.2ms                        | 39.5±0.3ms                     | 1.02    | circuit_construction.ParameterizedCircuitConstructionBench.time_build_parameterized_circuit(20, 8192, 128)      |
|          | 46.9±0.1ms                        | 47.7±0.2ms                     | 1.02    | circuit_construction.ParameterizedCircuitConstructionBench.time_build_parameterized_circuit(20, 8192, 2048)     |
|          | 3.59±0.1s                         | 3.67±0.1s                      | 1.02    | utility_scale.UtilityScaleBenchmarks.time_circSU2('cx')                                                         |
|          | 3.60±0.08s                        | 3.65±0.2s                      | 1.02    | utility_scale.UtilityScaleBenchmarks.time_circSU2('ecr')                                                        |
|          | 100±2ms                           | 102±2ms                        | 1.02    | utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('cx')                                                  |
|          | 32.8±0.2ms                        | 33.3±0.1ms                     | 1.02    | utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('cx')                                    |
|          | 32.6±0.5ms                        | 33.2±0.3ms                     | 1.02    | utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('ecr')                                   |
|          | 153±1ms                           | 156±0.9ms                      | 1.02    | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cz')                                               |
|          | 7.29±0.01ms                       | 7.33±0.1ms                     | 1.01    | circuit_construction.CircuitConstructionBench.time_circuit_construction(1, 2048)                                |
|          | 580±5μs                           | 587±1μs                        | 1.01    | circuit_construction.CircuitConstructionBench.time_circuit_construction(14, 128)                                |
|          | 158±0.5μs                         | 160±1μs                        | 1.01    | circuit_construction.CircuitConstructionBench.time_circuit_construction(14, 8)                                  |
|          | 30.8±0.08ms                       | 31.2±0.3ms                     | 1.01    | circuit_construction.CircuitConstructionBench.time_circuit_construction(14, 8192)                               |
|          | 30.4±0.3ms                        | 30.8±0.2ms                     | 1.01    | circuit_construction.CircuitConstructionBench.time_circuit_construction(2, 8192)                                |
|          | 674±2μs                           | 682±2μs                        | 1.01    | circuit_construction.CircuitConstructionBench.time_circuit_construction(20, 128)                                |
|          | 486±3ms                           | 490±7ms                        | 1.01    | circuit_construction.CircuitConstructionBench.time_circuit_construction(20, 131072)                             |
|          | 7.83±0.04ms                       | 7.93±0.04ms                    | 1.01    | circuit_construction.CircuitConstructionBench.time_circuit_construction(20, 2048)                               |
|          | 218±0.6μs                         | 221±1μs                        | 1.01    | circuit_construction.CircuitConstructionBench.time_circuit_construction(20, 8)                                  |
|          | 7.80±0.04ms                       | 7.90±0.05ms                    | 1.01    | circuit_construction.CircuitConstructionBench.time_circuit_construction(5, 2048)                                |
|          | 31.2±0.3ms                        | 31.5±0.2ms                     | 1.01    | circuit_construction.CircuitConstructionBench.time_circuit_construction(8, 8192)                                |
|          | 10.4±0.05μs                       | 10.6±0.3μs                     | 1.01    | circuit_construction.CircuitConstructionBench.time_circuit_copy(1, 128)                                         |
|          | 8.04±0.03μs                       | 8.09±0.01μs                    | 1.01    | circuit_construction.CircuitConstructionBench.time_circuit_extend(1, 8)                                         |
|          | 17.9±0.4μs                        | 18.1±0.6μs                     | 1.01    | circuit_construction.CircuitConstructionBench.time_circuit_extend(14, 8)                                        |
|          | 22.1±0.1ms                        | 22.5±0.2ms                     | 1.01    | circuit_construction.CircuitConstructionBench.time_circuit_extend(2, 131072)                                    |
|          | 34.3±0.3μs                        | 34.8±0.2μs                     | 1.01    | circuit_construction.CircuitConstructionBench.time_circuit_extend(8, 128)                                       |
|          | 25.5±0.2ms                        | 25.6±0.4ms                     | 1.01    | circuit_construction.CliffordSynthesis.time_clifford_synthesis(100)                                             |
|          | 2.25±0.01ms                       | 2.26±0.04ms                    | 1.01    | circuit_construction.ParamaterizedDifferentCircuit.time_DTC100_set_build(10, 50)                                |
|          | 2.63±0.08ms                       | 2.65±0.03ms                    | 1.01    | circuit_construction.ParamaterizedDifferentCircuit.time_QV100_build(10, 50)                                     |
|          | 2.92±0.06ms                       | 2.94±0.06ms                    | 1.01    | circuit_construction.ParamaterizedDifferentCircuit.time_QV100_build(50, 10)                                     |
|          | 1.68±0.03ms                       | 1.69±0.02ms                    | 1.01    | circuit_construction.ParameterizedCirc.time_param_circSU2_100_build(10)                                         |
|          | 302±4ms                           | 306±1ms                        | 1.01    | circuit_construction.ParameterizedCircuitBindBench.time_bind_params(20, 131072, 8)                              |
|          | 316±4ms                           | 320±2ms                        | 1.01    | circuit_construction.ParameterizedCircuitBindBench.time_bind_params(20, 131072, 8192)                           |
|          | 128±2μs                           | 129±1μs                        | 1.01    | circuit_construction.ParameterizedCircuitBindBench.time_bind_params(20, 8, 8)                                   |
|          | 18.5±0.2ms                        | 18.6±0.08ms                    | 1.01    | circuit_construction.ParameterizedCircuitBindBench.time_bind_params(20, 8192, 128)                              |
|          | 1.35±0.01ms                       | 1.36±0.01ms                    | 1.01    | circuit_construction.ParameterizedCircuitConstructionBench.time_build_parameterized_circuit(20, 128, 128)       |
|          | 857±4μs                           | 863±2μs                        | 1.01    | circuit_construction.ParameterizedCircuitConstructionBench.time_build_parameterized_circuit(20, 128, 8)         |
|          | 1.23±0.01s                        | 1.24±0.01s                     | 1.01    | circuit_construction.ParameterizedCircuitConstructionBench.time_build_parameterized_circuit(20, 131072, 131072) |
|          | 18.8±0.1ms                        | 18.9±0.2ms                     | 1.01    | circuit_construction.ParameterizedCircuitConstructionBench.time_build_parameterized_circuit(20, 2048, 2048)     |
|          | 9.75±0.05ms                       | 9.82±0.07ms                    | 1.01    | circuit_construction.ParameterizedCircuitConstructionBench.time_build_parameterized_circuit(20, 2048, 8)        |
|          | 156±0.5ms                         | 157±2ms                        | 1.01    | circuit_construction.ParameterizedCircuitConstructionBench.time_build_parameterized_circuit(20, 32768, 128)     |
|          | 161±2ms                           | 163±2ms                        | 1.01    | circuit_construction.ParameterizedCircuitConstructionBench.time_build_parameterized_circuit(20, 32768, 2048)    |
|          | 153±0.4ms                         | 154±1ms                        | 1.01    | circuit_construction.ParameterizedCircuitConstructionBench.time_build_parameterized_circuit(20, 32768, 8)       |
|          | 194±2ms                           | 195±1ms                        | 1.01    | circuit_construction.ParameterizedCircuitConstructionBench.time_build_parameterized_circuit(20, 32768, 8192)    |
|          | 38.4±0.2ms                        | 38.6±0.2ms                     | 1.01    | circuit_construction.ParameterizedCircuitConstructionBench.time_build_parameterized_circuit(20, 8192, 8)        |
|          | 75.4±0.8ms                        | 75.9±0.8ms                     | 1.01    | circuit_construction.ParameterizedCircuitConstructionBench.time_build_parameterized_circuit(20, 8192, 8192)     |
|          | 938±6ms                           | 947±4ms                        | 1.01    | circuit_construction.QasmImport.time_QV100_qasm2_import                                                         |
|          | 9.41±0.05ms                       | 9.54±0.03ms                    | 1.01    | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('cx')                                                 |
|          | 9.48±0.05ms                       | 9.55±0.06ms                    | 1.01    | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('cz')                                                 |
|          | 32.9±0.2ms                        | 33.3±0.2ms                     | 1.01    | utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('cz')                                    |
|          | 623±5ms                           | 629±3ms                        | 1.01    | utility_scale.UtilityScaleBenchmarks.time_qft('cz')                                                             |
|          | 634±2ms                           | 642±2ms                        | 1.01    | utility_scale.UtilityScaleBenchmarks.time_qft('ecr')                                                            |
|          | 812±3ms                           | 817±6ms                        | 1.01    | utility_scale.UtilityScaleBenchmarks.time_qv('ecr')                                                             |
|          | 135±0.7ms                         | 136±0.5ms                      | 1.01    | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cx')                                               |
|          | 155±2ms                           | 157±1ms                        | 1.01    | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('ecr')                                              |
|          | 464±2ms                           | 463±2ms                        | 1.00    | circuit_construction.CircuitConstructionBench.time_circuit_construction(1, 131072)                              |
|          | 29.1±0.2ms                        | 29.3±0.3ms                     | 1.00    | circuit_construction.CircuitConstructionBench.time_circuit_construction(1, 8192)                                |
|          | 7.81±0.05ms                       | 7.81±0.05ms                    | 1.00    | circuit_construction.CircuitConstructionBench.time_circuit_construction(14, 2048)                               |
|          | 124±0.6ms                         | 123±1ms                        | 1.00    | circuit_construction.CircuitConstructionBench.time_circuit_construction(14, 32768)                              |
|          | 529±3μs                           | 530±0.8μs                      | 1.00    | circuit_construction.CircuitConstructionBench.time_circuit_construction(2, 128)                                 |
|          | 7.80±0.04ms                       | 7.81±0.07ms                    | 1.00    | circuit_construction.CircuitConstructionBench.time_circuit_construction(2, 2048)                                |
|          | 122±0.5ms                         | 123±1ms                        | 1.00    | circuit_construction.CircuitConstructionBench.time_circuit_construction(2, 32768)                               |
|          | 490±2ms                           | 490±1ms                        | 1.00    | circuit_construction.CircuitConstructionBench.time_circuit_construction(5, 131072)                              |
|          | 123±1ms                           | 123±1ms                        | 1.00    | circuit_construction.CircuitConstructionBench.time_circuit_construction(5, 32768)                               |
|          | 488±2ms                           | 489±3ms                        | 1.00    | circuit_construction.CircuitConstructionBench.time_circuit_construction(8, 131072)                              |
|          | 7.93±0.06ms                       | 7.97±0.05ms                    | 1.00    | circuit_construction.CircuitConstructionBench.time_circuit_construction(8, 2048)                                |
|          | 15.5±0.5μs                        | 15.4±0.8μs                     | 1.00    | circuit_construction.CircuitConstructionBench.time_circuit_copy(14, 128)                                        |
|          | 13.8±0.3μs                        | 13.8±0.8μs                     | 1.00    | circuit_construction.CircuitConstructionBench.time_circuit_copy(14, 8)                                          |
|          | 10.9±0.05μs                       | 10.8±0.6μs                     | 1.00    | circuit_construction.CircuitConstructionBench.time_circuit_copy(2, 128)                                         |
|          | 9.30±0.05μs                       | 9.29±0.4μs                     | 1.00    | circuit_construction.CircuitConstructionBench.time_circuit_copy(2, 8)                                           |
|          | 10.2±0.1μs                        | 10.2±0.05μs                    | 1.00    | circuit_construction.CircuitConstructionBench.time_circuit_copy(5, 8)                                           |
|          | 19.6±0.2ms                        | 19.6±0.3ms                     | 1.00    | circuit_construction.CircuitConstructionBench.time_circuit_extend(1, 131072)                                    |
|          | 237±1μs                           | 237±2μs                        | 1.00    | circuit_construction.MultiControl.time_multi_control_circuit(10)                                                |
|          | 679±6μs                           | 677±9μs                        | 1.00    | circuit_construction.ParamaterizedDifferentCircuit.time_DTC100_set_build(10, 10)                                |
|          | 6.00±0.01ms                       | 6.00±0.06ms                    | 1.00    | circuit_construction.ParamaterizedDifferentCircuit.time_DTC100_set_build(10, 150)                               |
|          | 726±30μs                          | 728±40μs                       | 1.00    | circuit_construction.ParamaterizedDifferentCircuit.time_QV100_build(10, 10)                                     |
|          | 5.12±0.1ms                        | 5.11±0.1ms                     | 1.00    | circuit_construction.ParamaterizedDifferentCircuit.time_QV100_build(100, 10)                                    |
|          | 60.9±0.5ms                        | 61.1±0.5ms                     | 1.00    | circuit_construction.ParamaterizedDifferentCircuit.time_QV100_build(100, 150)                                   |
|          | 2.15±0.03ms                       | 2.14±0.02ms                    | 1.00    | circuit_construction.ParameterizedCirc.time_param_circSU2_100_build(16)                                         |
|          | 1.22±0ms                          | 1.22±0ms                       | 1.00    | circuit_construction.ParameterizedCirc.time_param_circSU2_100_build(5)                                          |
|          | 542±5μs                           | 544±5μs                        | 1.00    | circuit_construction.ParameterizedCircuitBindBench.time_bind_params(20, 128, 128)                               |
|          | 392±7μs                           | 392±1μs                        | 1.00    | circuit_construction.ParameterizedCircuitBindBench.time_bind_params(20, 128, 8)                                 |
|          | 372±4ms                           | 371±3ms                        | 1.00    | circuit_construction.ParameterizedCircuitBindBench.time_bind_params(20, 131072, 32768)                          |
|          | 4.75±0.01ms                       | 4.77±0.01ms                    | 1.00    | circuit_construction.ParameterizedCircuitBindBench.time_bind_params(20, 2048, 128)                              |
|          | 4.73±0.02ms                       | 4.72±0.02ms                    | 1.00    | circuit_construction.ParameterizedCircuitBindBench.time_bind_params(20, 2048, 8)                                |
|          | 78.0±1ms                          | 78.1±0.6ms                     | 1.00    | circuit_construction.ParameterizedCircuitBindBench.time_bind_params(20, 32768, 2048)                            |
|          | 73.3±0.6ms                        | 73.4±0.6ms                     | 1.00    | circuit_construction.ParameterizedCircuitBindBench.time_bind_params(20, 32768, 8)                               |
|          | 91.1±2ms                          | 91.1±0.7ms                     | 1.00    | circuit_construction.ParameterizedCircuitBindBench.time_bind_params(20, 32768, 8192)                            |
|          | 18.4±0.3ms                        | 18.4±0.2ms                     | 1.00    | circuit_construction.ParameterizedCircuitBindBench.time_bind_params(20, 8192, 8)                                |
|          | 617±7ms                           | 616±10ms                       | 1.00    | circuit_construction.ParameterizedCircuitConstructionBench.time_build_parameterized_circuit(20, 131072, 128)    |
|          | 628±8ms                           | 629±6ms                        | 1.00    | circuit_construction.ParameterizedCircuitConstructionBench.time_build_parameterized_circuit(20, 131072, 2048)   |
|          | 797±6ms                           | 801±4ms                        | 1.00    | circuit_construction.ParameterizedCircuitConstructionBench.time_build_parameterized_circuit(20, 131072, 32768)  |
|          | 10.3±0.05ms                       | 10.3±0.08ms                    | 1.00    | circuit_construction.ParameterizedCircuitConstructionBench.time_build_parameterized_circuit(20, 2048, 128)      |
|          | 304±3μs                           | 303±0.7μs                      | 1.00    | circuit_construction.ParameterizedCircuitConstructionBench.time_build_parameterized_circuit(20, 8, 8)           |
|          | 139±2ms                           | 139±0.5ms                      | 1.00    | utility_scale.UtilityScaleBenchmarks.time_bv_100('cx')                                                          |
|          | 143±0.5ms                         | 143±0.8ms                      | 1.00    | utility_scale.UtilityScaleBenchmarks.time_bv_100('cz')                                                          |
|          | 9.45±0.04ms                       | 9.42±0.06ms                    | 1.00    | utility_scale.UtilityScaleBenchmarks.time_bvlike('cx')                                                          |
|          | 9.50±0.04ms                       | 9.45±0.09ms                    | 1.00    | utility_scale.UtilityScaleBenchmarks.time_bvlike('cz')                                                          |
|          | 101±2ms                           | 102±0.9ms                      | 1.00    | utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('cz')                                                  |
|          | 101±1ms                           | 101±0.4ms                      | 1.00    | utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('ecr')                                                 |
|          | 272±2ms                           | 272±0.5ms                      | 1.00    | utility_scale.UtilityScaleBenchmarks.time_qaoa('cx')                                                            |
|          | 359±2ms                           | 360±1ms                        | 1.00    | utility_scale.UtilityScaleBenchmarks.time_qaoa('cz')                                                            |
|          | 495±2ms                           | 495±2ms                        | 1.00    | utility_scale.UtilityScaleBenchmarks.time_qft('cx')                                                             |
|          | 835±8ms                           | 839±5ms                        | 1.00    | utility_scale.UtilityScaleBenchmarks.time_qv('cz')                                                              |
|          | 390                               | 390                            | 1.00    | utility_scale.UtilityScaleBenchmarks.track_bv_100_depth('cx')                                                   |
|          | 407                               | 407                            | 1.00    | utility_scale.UtilityScaleBenchmarks.track_bv_100_depth('cz')                                                   |
|          | 407                               | 407                            | 1.00    | utility_scale.UtilityScaleBenchmarks.track_bv_100_depth('ecr')                                                  |
|          | 300                               | 300                            | 1.00    | utility_scale.UtilityScaleBenchmarks.track_circSU2_depth('cx')                                                  |
|          | 300                               | 300                            | 1.00    | utility_scale.UtilityScaleBenchmarks.track_circSU2_depth('cz')                                                  |
|          | 300                               | 300                            | 1.00    | utility_scale.UtilityScaleBenchmarks.track_circSU2_depth('ecr')                                                 |
|          | 1607                              | 1607                           | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('cx')                                                     |
|          | 1622                              | 1622                           | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('cz')                                                     |
|          | 1622                              | 1622                           | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('ecr')                                                    |
|          | 1954                              | 1954                           | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qft_depth('cx')                                                      |
|          | 1954                              | 1954                           | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qft_depth('cz')                                                      |
|          | 1954                              | 1954                           | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qft_depth('ecr')                                                     |
|          | 2709                              | 2709                           | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qv_depth('cx')                                                       |
|          | 2709                              | 2709                           | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qv_depth('cz')                                                       |
|          | 2709                              | 2709                           | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qv_depth('ecr')                                                      |
|          | 462                               | 462                            | 1.00    | utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('cx')                                        |
|          | 462                               | 462                            | 1.00    | utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('cz')                                        |
|          | 462                               | 462                            | 1.00    | utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('ecr')                                       |
|          | 489±4μs                           | 485±2μs                        | 0.99    | circuit_construction.CircuitConstructionBench.time_circuit_construction(1, 128)                                 |
|          | 117±0.4ms                         | 115±0.4ms                      | 0.99    | circuit_construction.CircuitConstructionBench.time_circuit_construction(1, 32768)                               |
|          | 48.9±0.1μs                        | 48.6±0.1μs                     | 0.99    | circuit_construction.CircuitConstructionBench.time_circuit_construction(1, 8)                                   |
|          | 571±6μs                           | 566±4μs                        | 0.99    | circuit_construction.CircuitConstructionBench.time_circuit_construction(5, 128)                                 |
|          | 65.8±1μs                          | 65.3±0.7μs                     | 0.99    | circuit_construction.CircuitConstructionBench.time_circuit_construction(5, 8)                                   |
|          | 98.2±2μs                          | 97.3±0.4μs                     | 0.99    | circuit_construction.CircuitConstructionBench.time_circuit_construction(8, 8)                                   |
|          | 9.09±0.1μs                        | 8.98±0.03μs                    | 0.99    | circuit_construction.CircuitConstructionBench.time_circuit_copy(1, 8)                                           |
|          | 11.4±0.5μs                        | 11.3±0.4μs                     | 0.99    | circuit_construction.CircuitConstructionBench.time_circuit_copy(8, 8)                                           |
|          | 394±2μs                           | 389±2μs                        | 0.99    | circuit_construction.MultiControl.time_multi_control_circuit(16)                                                |
|          | 508±4μs                           | 503±3μs                        | 0.99    | circuit_construction.MultiControl.time_multi_control_circuit(20)                                                |
|          | 37.0±1ms                          | 36.7±0.1ms                     | 0.99    | circuit_construction.ParamaterizedDifferentCircuit.time_DTC100_set_build(100, 50)                               |
|          | 20.7±0.1ms                        | 20.5±0.09ms                    | 0.99    | circuit_construction.ParamaterizedDifferentCircuit.time_QV100_build(100, 50)                                    |
|          | 74.4±0.5ms                        | 73.8±2ms                       | 0.99    | circuit_construction.ParameterizedCircuitBindBench.time_bind_params(20, 32768, 128)                             |
|          | 21.6±0.3ms                        | 21.5±0.3ms                     | 0.99    | circuit_construction.ParameterizedCircuitBindBench.time_bind_params(20, 8192, 2048)                             |
|          | 9.52±0.04ms                       | 9.45±0.06ms                    | 0.99    | utility_scale.UtilityScaleBenchmarks.time_bvlike('ecr')                                                         |
|          | 342±3ms                           | 339±1ms                        | 0.99    | utility_scale.UtilityScaleBenchmarks.time_qaoa('ecr')                                                           |
|          | 60.4±2μs                          | 59.1±0.5μs                     | 0.98    | circuit_construction.CircuitConstructionBench.time_circuit_construction(2, 8)                                   |
|          | 23.8±0.3ms                        | 23.2±1ms                       | 0.98    | circuit_construction.CircuitConstructionBench.time_circuit_extend(14, 131072)                                   |
|          | 6.60±0.1ms                        | 6.47±0.2ms                     | 0.98    | circuit_construction.ParamaterizedDifferentCircuit.time_QV100_build(10, 150)                                    |
|          | 11.0±0.4ms                        | 10.7±0.09ms                    | 0.98    | circuit_construction.ParamaterizedDifferentCircuit.time_QV100_build(50, 50)                                     |
|          | 306±2ms                           | 300±5ms                        | 0.98    | circuit_construction.ParameterizedCircuitBindBench.time_bind_params(20, 131072, 2048)                           |
|          | 7.44±0.1ms                        | 7.29±0.04ms                    | 0.98    | circuit_construction.ParameterizedCircuitBindBench.time_bind_params(20, 2048, 2048)                             |
|          | 145±2ms                           | 142±0.4ms                      | 0.98    | utility_scale.UtilityScaleBenchmarks.time_bv_100('ecr')                                                         |
|          | 701±2ms                           | 689±3ms                        | 0.98    | utility_scale.UtilityScaleBenchmarks.time_qv('cx')                                                              |
|          | 16.7±0.7μs                        | 16.1±0.8μs                     | 0.97    | circuit_construction.CircuitConstructionBench.time_circuit_copy(20, 8)                                          |
|          | 12.9±0.3μs                        | 12.6±0.09μs                    | 0.97    | circuit_construction.CircuitConstructionBench.time_circuit_extend(8, 8)                                         |
|          | 18.5±0.8μs                        | 17.7±0.9μs                     | 0.96    | circuit_construction.CircuitConstructionBench.time_circuit_copy(20, 128)                                        |
|          | 130±4μs                           | 121±10μs                       | 0.93    | circuit_construction.CircuitConstructionBench.time_circuit_copy(20, 8192)                                       |
|          | 43.7±3μs                          | 40.4±1μs                       | 0.93    | circuit_construction.CircuitConstructionBench.time_circuit_copy(5, 2048)                                        |
|          | 50.2±4μs                          | 46.1±0.7μs                     | 0.92    | circuit_construction.CircuitConstructionBench.time_circuit_copy(20, 2048)                                       |

Benchmarks that have got worse:

| Change   | Before [8bc48442] <once-lock^2>   | After [487e1bec] <once-lock>   |   Ratio | Benchmark (Parameter)                                                      |
|----------|-----------------------------------|--------------------------------|---------|----------------------------------------------------------------------------|
| +        | 368±40μs                          | 485±20μs                       |    1.32 | circuit_construction.CircuitConstructionBench.time_circuit_copy(20, 32768) |

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED.

This is mostly expected because we only use the py gate cache if when we're interacting with Python which will always end up being the bottleneck. The exception I guess is copy() which ends up being a rust clone which is probably more expensive now given the tracked regression on that one benchmark. But I think a 30% slowdown on a benchmark in the worst case 100s of microsecond scale is acceptable given that this unlocks a whole class of multithreaded pass implementations like #13419 and #13411 which will yield further speed ups for the transpiler.

Copy link
Contributor

@kevinhartman kevinhartman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this looks good to me.

@kevinhartman kevinhartman added this pull request to the merge queue Nov 11, 2024
Merged via the queue into Qiskit:main with commit 3a9993a Nov 11, 2024
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Changelog: None Do not include in changelog performance Rust This PR or issue is related to Rust code in the repository
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Stop using OnceCell for PackedInstruction.py_op field
4 participants