Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support async collective op execution #1287

Merged
merged 5 commits into from
May 23, 2024

Conversation

eedalong
Copy link
Collaborator

No description provided.

@eedalong eedalong force-pushed the support_async_collective branch 9 times, most recently from 9e99f8f to 745c0a1 Compare March 15, 2024 06:09
@Yancey1989 Yancey1989 mentioned this pull request Mar 15, 2024
2 tasks
@eedalong eedalong force-pushed the support_async_collective branch 6 times, most recently from 724a836 to 9235735 Compare March 18, 2024 08:54
@eedalong eedalong changed the title support async collective op execution Support async collective op execution Mar 18, 2024
@eedalong eedalong force-pushed the support_async_collective branch 2 times, most recently from eeba6d6 to 30248fa Compare March 18, 2024 09:46
@eedalong eedalong force-pushed the support_async_collective branch 8 times, most recently from cf81fd5 to 7fb1499 Compare March 22, 2024 08:10
@eedalong eedalong force-pushed the support_async_collective branch 2 times, most recently from c4fdcae to 05c6189 Compare April 2, 2024 07:24
@eedalong eedalong force-pushed the support_async_collective branch 2 times, most recently from df89148 to 53d7b68 Compare April 22, 2024 08:59
@eedalong eedalong force-pushed the support_async_collective branch 2 times, most recently from c48ecb3 to 9b8cd86 Compare May 6, 2024 06:12
@eedalong eedalong force-pushed the support_async_collective branch 8 times, most recently from 48c9638 to 7c33e32 Compare May 20, 2024 07:07
@eedalong eedalong requested a review from Yancey1989 May 20, 2024 07:19
@@ -529,6 +530,14 @@ LogicalResult LowerHLOToLLVM(ModuleOp m, const DISCLoweringOptions& options) {
pm.addNestedPass<FuncOp>(createCanonicalizerPass());
pm.addNestedPass<FuncOp>(createCSEPass());
pm.addNestedPass<FuncOp>(createCanonicalizerPass());

bool enable_op_schedule = false;
tensorflow::ReadBoolFromEnvVar("DISC_ENABLE_OP_SCHEDULE", enable_op_schedule,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this pass cause any native optimization? If not, let's enable it by default, we have too many flags...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, we havent found a case which schedule pass will cause negative effect.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@@ -55,6 +55,24 @@ std::optional<std::string> ReductionKindToString(ReductionKind kind) {
return std::nullopt;
}

bool EnableAsyncCollective(Operation* op) {
if (llvm::isa<mhlo::AllReduceOp>(op)) {
if (const char* env_p = std::getenv("ENABLE_ASYNC_ALL_REDUCE")) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make all collective ops to be async? Do we need to separate sync and async collective ops ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.


class Edge;

struct GraphNode {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is latency hidden schedule in tensorflow source code, can we reuse the base class implementation?
ref: https://github.com/pai-disc/tensorflow/blob/features/bladedisc_rebase_20230202/tensorflow/compiler/xla/service/latency_hiding_scheduler.h

@eedalong eedalong force-pushed the support_async_collective branch 4 times, most recently from a43e5a4 to ef6bdaf Compare May 21, 2024 08:18
Copy link
Collaborator

@Yancey1989 Yancey1989 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with tiny comment, does not block merge.

#endif

struct BaseCudaContextOption {
ncclComm_t nccl_comm = nullptr;
GpuStreamHandle stream = nullptr;
GpuStreamHandle comm_stream = nullptr;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comm_stream or compute_stream ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original stream is the compute stream, and we add this new comm_stream

}
}

void InitilizeGrpahTopology(std::vector<Operation*>& post_order_instructions,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: Graph

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, will fixed it in next pr.


for (auto& block : main_func.getBody()) {
for (int op_idx = 0; op_idx < scheduled_op_sequence.size(); op_idx++) {
scheduled_op_sequence[op_idx]->moveBefore(&block.front());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the scheduled_op_sequence contain all ops before scheduling? Does L983 loop mean using scheduled op sequence conver the old op sequence ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it contains all ops.

@eedalong eedalong merged commit 1716b1c into alibaba:main May 23, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants