Enabling L2+ Optimizations for EPs #23517

chilo-ms · 2025-01-28T17:57:08Z

There are some requirements to modify the graph which are specific to the hardware GPU/NPU.
ORT has the hardcoded EP list for optimizations but that can't scale and it's hard be extended to enable EP custom optimizations.

Here is the prototype to enable L2+ optimizations for EPs (The original overview is provided by @skottmckay) as well as the TRT EP implementation for the ConstantFoldingDQ optimization.

Signatures for selection/optimization functions:

  - Selection: std::function<std::vector<std::unique_ptr<ComputeCapability>>(const GraphViewer&)>
  - Optimization: std::function<Status(const Graph&, const ComputeCapability& this_optimization, ComputeCapability& cc_to_update)>

GetCapability

call (new) provider bridge API to lookup pre-defined optimizer by name and get selection function
- ComputeCapability.optimize_func would be set by the optimizer to the function that does the optimization
EP has to update the returning ComputeCapability to include the optimization ComputeCapability in nodes_to_optimize. So that later ORT can perform optimization/transformation accordingly.

GraphPartitioner

After assigning the ComputeCapability to the EP and prior to Compile, if the ComputeCapability has nodes_to_optimize, iterate that list
- optimization function needs to be called with
  - a mutable Graph instance
  - the ComputeCapability for the individual optimization
  - the overall ComputeCapability so it can be updated

…Capability, selection function and ComputeCapability

…uteCapability, selection function and ComputeCapability

adrianlizarraga · 2025-01-28T23:41:42Z

onnxruntime/core/session/provider_bridge_ort.cc

+        dq_node_index_set.insert(index);
+      }
+
+      ConstantFoldingDQ* transformer = static_cast<ConstantFoldingDQ*>(graph_transformer_mgr.GetTransformerByName(optimizer_name));


Just a thought to potentially get rid of GraphTransformerManager:

Would it make sense to instantiate the ConstantFoldingDQ transformer directly instead of querying the graph_transformer_manager? For example, we have code in InferenceSession::TransformGraph that directly instantiates an instance of EnsureUniqueDQ and just calls .Apply() on it:

onnxruntime/onnxruntime/core/session/inference_session.cc

Lines 1231 to 1232 in a770a8d

EnsureUniqueDQForNodeUnit ensure_unique_dq_for_node_unit{};

ORT_RETURN_IF_ERROR_SESSIONID_(apply_transformer_once(ensure_unique_dq_for_node_unit, *session_logger_, graph));

So in this case:

// Would need to pass the EP name to GetEPOptimizerByName()? std::unique_ptr<ConstantFoldingDQ> const_fold_dq_transformer = std::make_unique<ConstantFoldingDQ>(/*...*/); const_fold_dq_transformer.Apply(/*...*/);

This may also allow the EP to provide key/value strings to configure an optimizer. This way we can avoid having to create a completely new derived class every time an EP needs a slightly different variation of an optimizer.

Ah, this wouldn't work because we need to pass in CPU EP.

I think it makes sense to instantiate/run directly. May need some structure to facilitate that. e.g. if we need things like the CPU allocator to be available to create an optimizer.

In this case I think we need the derived class as we're augmenting the behavior of the constant folding optimizer, but I expect this to be an exception rather than typical.

For other optimizers I expect we'll want something like key/value strings to configure instead of creating derived classes. Need to figure out what sort of configuration is required. Some existing things are that an EP supports a subset of data types for a fusion, or a subset of operators for a fusion (e.g. subset of activation ops can be fused into Conv to create a FusedConv node).

skottmckay · 2025-01-28T22:54:11Z

onnxruntime/core/framework/graph_partitioner.h

+  GraphPartitioner(KernelRegistryManager& kernel_registry_mgr, const GraphTransformerManager& graph_transformer_mgr, const ExecutionProviders& providers)
      : kernel_registry_mgr_(kernel_registry_mgr),
+        graph_transformer_mgr_(graph_transformer_mgr),


Instead of wiring in the GraphTransformerManager would it be better to add a new standalone class that provides lookup-based access to a set of L2 optimizers that are directly usable by an EP? That would decouple the GraphPartitioner from the optimizer registration/lookup a bit more. I don't think GraphTransformerManager is providing value in this case as I don't think we need to loop.

The registration/lookup class for re-usable optimizers could be a singleton with a static 'create' method that calls GenerateTransformersForEP and saves the returned list. We could have InferenceSession call the 'create' method to provide any other required things like the CPU allocator, and potentially apply the optimization level when doing so if we want to do that on the ORT side. GraphPartitioner could call a static 'get' method to get the instance so we don't need to wire it through from the inference session.

skottmckay · 2025-01-28T22:58:43Z

onnxruntime/core/optimizer/constant_folding.h

@@ -28,6 +28,17 @@ class ConstantFolding : public GraphTransformer {
                  const InlinedHashSet<std::string_view>& compatible_execution_providers = {},
                  const InlinedHashSet<std::string>& excluded_initializers = {}) noexcept;

+  /* Same as above but with a name provided by derived class.


Suggested change

/* Same as above but with a name provided by derived class.

protected:

/* Same as above but with a name provided by derived class.

skottmckay · 2025-01-28T22:59:22Z

onnxruntime/core/optimizer/constant_folding.h

+                  const InlinedHashSet<std::string_view>& compatible_execution_providers = {},
+                  const InlinedHashSet<std::string>& excluded_initializers = {}) noexcept;
+
+  virtual bool AllowConstantFolding(const Node& node) const { return true; } 


nit: Add comment with quick explanation of how this is used

skottmckay · 2025-01-28T23:15:43Z

onnxruntime/core/optimizer/qdq_transformer/constant_folding_dq_node.h

+                    const InlinedHashSet<std::string>& excluded_initializers = {}) noexcept;
+
+  bool AllowConstantFolding(const Node& node) const;
+  Status UpdateNodeIndexSet(InlinedHashSet<onnxruntime::NodeIndex>& node_index_set);


Can we create a new instance of the optimizer for each node_index_set instead of having an update method in order to keep it a little cleaner?

skottmckay · 2025-01-28T23:16:51Z

onnxruntime/core/providers/shared_library/provider_interfaces.h

@@ -142,6 +142,10 @@ struct Node__EdgeIterator {
 struct ProviderHost {
  virtual const OrtApiBase* OrtGetApiBase() = 0;

+  virtual Status GetEPOptimizerByName(const std::string& name,


nit: Sounds like this optimizes an EP. Could we call it GetOptimizerByName?

skottmckay · 2025-01-28T23:23:19Z

onnxruntime/core/session/provider_bridge_ort.cc

+    static const std::string kEP_GRAPH_TRANSFORMER_CONSTANT_FOLDING_DQ = "ConstantFoldingDQ";
+
+    // ConstantFoldingDQ optimization function
+    auto constant_folding_dq_optimization = [&](Graph& graph, const ComputeCapability& optimization_cc, ComputeCapability& cc_to_update) -> Status {


Can we put this in a separate class/file, maybe in the optimizer library?

skottmckay · 2025-01-28T23:24:58Z

onnxruntime/core/session/provider_bridge_ort.cc

+      }
+      cc_to_update.sub_graph->nodes = updated_nodes;
+
+      auto original_meta_def = cc_to_update.sub_graph->GetMetaDef();


Instead of creating a new MetaDef, now that we have a use-case where it makes sense to update it could we instead add a method to get a mutable MetaDef from sub_graph?

skottmckay · 2025-01-28T23:44:12Z

onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc

+   */
+
+  std::function<std::vector<std::unique_ptr<ComputeCapability>>(const GraphViewer&)> selection_func;
+  auto status = g_host->GetEPOptimizerByName("ConstantFoldingDQ", graph_transformer_mgr, selection_func);


Should this check status?

chilo-ms added 7 commits January 21, 2025 15:54

init

1d5ca89

include GraphTransformerManager to GetCapability

e9119d5

Add GraphTransformerManager for EP, optimization function and Compute…

b7a0b79

…Capability, selection function and ComputeCapability

refine GraphTransformerManager for EP, optimization function and Comp…

3b28ffc

…uteCapability, selection function and ComputeCapability

TRT EP creates optimization compute capability

309341e

add comments

d0cbc65

remove unnecessary code

b239db0

chilo-ms requested review from skottmckay, jywu-msft and adrianlizarraga January 28, 2025 17:57

adrianlizarraga reviewed Jan 28, 2025

View reviewed changes

skottmckay reviewed Jan 28, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enabling L2+ Optimizations for EPs #23517

Enabling L2+ Optimizations for EPs #23517

chilo-ms commented Jan 28, 2025

adrianlizarraga Jan 28, 2025

adrianlizarraga Jan 28, 2025

adrianlizarraga Jan 29, 2025

skottmckay Jan 29, 2025

skottmckay Jan 28, 2025

skottmckay Jan 28, 2025

skottmckay Jan 28, 2025

skottmckay Jan 28, 2025

skottmckay Jan 28, 2025

skottmckay Jan 28, 2025

skottmckay Jan 28, 2025

skottmckay Jan 28, 2025

	EnsureUniqueDQForNodeUnit ensure_unique_dq_for_node_unit{};
	ORT_RETURN_IF_ERROR_SESSIONID_(apply_transformer_once(ensure_unique_dq_for_node_unit, *session_logger_, graph));

Enabling L2+ Optimizations for EPs #23517

Are you sure you want to change the base?

Enabling L2+ Optimizations for EPs #23517

Conversation

chilo-ms commented Jan 28, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment