Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Definition of Transmission Delay #481

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion chapter_distributed_training/collective.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
设备之间的点对点(Point-to-Point, P2P)通信由全双工传输(Full-Duplex Transmission)实现。该通信模型的基本行为可以定义如下:

* 每次通信有且仅有一个发送者(Sender)和一个接收者(Receiver)。在某个特定时刻,每个设备仅能至多发送或接收一个消息(Message)。每个设备可以同时发送一个消息和接收一个消息。一个网络中可以同时传输多个来自于不同设备的消息。
* 传输一个长度为$l$个字节(Byte)的消息会花费$a+b \times l$的时间,其中$a$代表延迟(Latency),即一个字节通过网络从一个设备出发到达另一个设备所需的时间;$b$代表传输延迟(Transmission Delay),即传输一个具有$l$个字节的消息所需的全部时间。前者取决于两个设备间的物理距离(如跨设备、跨机器、跨集群等),后者取决于通信网络的带宽。需要注意的是,这里简化了传输延迟的定义,其并不考虑在真实网络传输中会出现的丢失的消息(Dropped Message)和损坏的消息(Corrupted Message)的情况。
* 传输一个长度为$l$个字节(Byte)的消息会花费$a+b \times l$的时间,其中$a$代表延迟(Latency),即一个字节通过网络从一个设备出发到达另一个设备所需的时间;$b \times l$ 代表传输延迟(Transmission Delay),即传输一个具有$l$个字节的消息所需的全部时间。前者取决于两个设备间的物理距离(如跨设备、跨机器、跨集群等),后者取决于通信网络的带宽。需要注意的是,这里简化了传输延迟的定义,其并不考虑在真实网络传输中会出现的丢失的消息(Dropped Message)和损坏的消息(Corrupted Message)的情况。

根据上述通信模型,我们可以定义集合通信算子,并且分析算子的通信性能。下面介绍一些常见的集合通信算子。

Expand Down
Loading