Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enabling L2+ Optimizations for EPs #23517

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

chilo-ms
Copy link
Contributor

There are some requirements to modify the graph which are specific to the hardware GPU/NPU.
ORT has the hardcoded EP list for optimizations but that can't scale and it's hard be extended to enable EP custom optimizations.

Here is the prototype to enable L2+ optimizations for EPs (The original overview is provided by @skottmckay) as well as the TRT EP implementation for the ConstantFoldingDQ optimization.

Signatures for selection/optimization functions:

  - Selection: std::function<std::vector<std::unique_ptr<ComputeCapability>>(const GraphViewer&)>
  - Optimization: std::function<Status(const Graph&, const ComputeCapability& this_optimization, ComputeCapability& cc_to_update)>

GetCapability

  • call (new) provider bridge API to lookup pre-defined optimizer by name and get selection function

    • ComputeCapability.optimize_func would be set by the optimizer to the function that does the optimization
  • EP has to update the returning ComputeCapability to include the optimization ComputeCapability in nodes_to_optimize. So that later ORT can perform optimization/transformation accordingly.

GraphPartitioner

  • After assigning the ComputeCapability to the EP and prior to Compile, if the ComputeCapability has nodes_to_optimize, iterate that list
    • optimization function needs to be called with
      • a mutable Graph instance
      • the ComputeCapability for the individual optimization
      • the overall ComputeCapability so it can be updated

dq_node_index_set.insert(index);
}

ConstantFoldingDQ* transformer = static_cast<ConstantFoldingDQ*>(graph_transformer_mgr.GetTransformerByName(optimizer_name));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought to potentially get rid of GraphTransformerManager:

Would it make sense to instantiate the ConstantFoldingDQ transformer directly instead of querying the graph_transformer_manager? For example, we have code in InferenceSession::TransformGraph that directly instantiates an instance of EnsureUniqueDQ and just calls .Apply() on it:

EnsureUniqueDQForNodeUnit ensure_unique_dq_for_node_unit{};
ORT_RETURN_IF_ERROR_SESSIONID_(apply_transformer_once(ensure_unique_dq_for_node_unit, *session_logger_, graph));

So in this case:

// Would need to pass the EP name to GetEPOptimizerByName()?
std::unique_ptr<ConstantFoldingDQ> const_fold_dq_transformer = std::make_unique<ConstantFoldingDQ>(/*...*/);

const_fold_dq_transformer.Apply(/*...*/);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may also allow the EP to provide key/value strings to configure an optimizer. This way we can avoid having to create a completely new derived class every time an EP needs a slightly different variation of an optimizer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this wouldn't work because we need to pass in CPU EP.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to instantiate/run directly. May need some structure to facilitate that. e.g. if we need things like the CPU allocator to be available to create an optimizer.

In this case I think we need the derived class as we're augmenting the behavior of the constant folding optimizer, but I expect this to be an exception rather than typical.

For other optimizers I expect we'll want something like key/value strings to configure instead of creating derived classes. Need to figure out what sort of configuration is required. Some existing things are that an EP supports a subset of data types for a fusion, or a subset of operators for a fusion (e.g. subset of activation ops can be fused into Conv to create a FusedConv node).

Comment on lines +28 to +30
GraphPartitioner(KernelRegistryManager& kernel_registry_mgr, const GraphTransformerManager& graph_transformer_mgr, const ExecutionProviders& providers)
: kernel_registry_mgr_(kernel_registry_mgr),
graph_transformer_mgr_(graph_transformer_mgr),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of wiring in the GraphTransformerManager would it be better to add a new standalone class that provides lookup-based access to a set of L2 optimizers that are directly usable by an EP? That would decouple the GraphPartitioner from the optimizer registration/lookup a bit more. I don't think GraphTransformerManager is providing value in this case as I don't think we need to loop.

The registration/lookup class for re-usable optimizers could be a singleton with a static 'create' method that calls GenerateTransformersForEP and saves the returned list. We could have InferenceSession call the 'create' method to provide any other required things like the CPU allocator, and potentially apply the optimization level when doing so if we want to do that on the ORT side. GraphPartitioner could call a static 'get' method to get the instance so we don't need to wire it through from the inference session.

@@ -28,6 +28,17 @@ class ConstantFolding : public GraphTransformer {
const InlinedHashSet<std::string_view>& compatible_execution_providers = {},
const InlinedHashSet<std::string>& excluded_initializers = {}) noexcept;

/* Same as above but with a name provided by derived class.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/* Same as above but with a name provided by derived class.
protected:
/* Same as above but with a name provided by derived class.

const InlinedHashSet<std::string_view>& compatible_execution_providers = {},
const InlinedHashSet<std::string>& excluded_initializers = {}) noexcept;

virtual bool AllowConstantFolding(const Node& node) const { return true; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Add comment with quick explanation of how this is used

const InlinedHashSet<std::string>& excluded_initializers = {}) noexcept;

bool AllowConstantFolding(const Node& node) const;
Status UpdateNodeIndexSet(InlinedHashSet<onnxruntime::NodeIndex>& node_index_set);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we create a new instance of the optimizer for each node_index_set instead of having an update method in order to keep it a little cleaner?

@@ -142,6 +142,10 @@ struct Node__EdgeIterator {
struct ProviderHost {
virtual const OrtApiBase* OrtGetApiBase() = 0;

virtual Status GetEPOptimizerByName(const std::string& name,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Sounds like this optimizes an EP. Could we call it GetOptimizerByName?

static const std::string kEP_GRAPH_TRANSFORMER_CONSTANT_FOLDING_DQ = "ConstantFoldingDQ";

// ConstantFoldingDQ optimization function
auto constant_folding_dq_optimization = [&](Graph& graph, const ComputeCapability& optimization_cc, ComputeCapability& cc_to_update) -> Status {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put this in a separate class/file, maybe in the optimizer library?

}
cc_to_update.sub_graph->nodes = updated_nodes;

auto original_meta_def = cc_to_update.sub_graph->GetMetaDef();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of creating a new MetaDef, now that we have a use-case where it makes sense to update it could we instead add a method to get a mutable MetaDef from sub_graph?

*/

std::function<std::vector<std::unique_ptr<ComputeCapability>>(const GraphViewer&)> selection_func;
auto status = g_host->GetEPOptimizerByName("ConstantFoldingDQ", graph_transformer_mgr, selection_func);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this check status?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants