Hot Interconnect 2010

Quick outline:

State of the art

Sockets: most widely used, by far. Simple API. Poor scalability. Scott: Is this true of UDP or simply TCP? Patrick: It's true for both, it's due to separate buffer for each sockets Scott: but a UDP socket (and its buffers) are akin to a CCI endpoint and its buffers *Patrick: no, socket buffers are per Socket. CCI buffers are per endpoint, not per connection. 1000 Sockets, 1000 buffers. 1000 CCI connection, 1 CCI endpoint. * Very robust. No support for zero-copy (assume buffering) Scott: Do we need to acknowledge TOE and the complications it brings? Patrick: TOE is about TCP, it's orthogonal to Sockets Built-in async operations (everything is buffered) Scott: as long as they set O_NONBLOCK Patrick: no, O_NONBLOCK is when your socket buffer is full. If it's not full, the sends are buffered by default Scott: I was conflating async and non-blocking. No support for one-sided operations.
MPI: most widely used in HPC ghetto. Complicated API. More scalable. Very fragile. Limited support for zero-copy (no memory registration). Undefined buffering semantics (eager or rende-vous, which size ?). Shitty one-sided operations in MPI-2, shittier in MPI-3. Scott: What do you really think? ;-)
Vendor-specific interfaces: Mellanox Verbs, Cray/Sandia Portals, Qlogic PSM, Myricom MX, LBL Gasnet, etc. A lot of choice, but none is perfect and they share: no critical mass (vendor fragmentation), more complicated APIs (driven by hardware design).

Goals

Portable: need critical mass, semantics common to most Vendor interfaces.
Simple: Socket is the reference.
Performance: Async operation (buffered or not), support for zero-copy, support for one-sided operations.
Scalability: demultiplexing, shared resources, support NUMA/multicore .
Robustness: connection-oriented semantics, error recovery, support for unreliable communications Scott: including multicast

What is worth talking

built-in connection/broker.
Actives Messages:
- buffer message provides asynchronism but consume time and space. Asynchronism is fundamental for scalability, but space is bounded and time is precious.
- matching interfaces (MPI) are powerful but very complex. The matching semantics (wildcards) may require coherent matching implementation, preventing effective offload (has to be done in the host). Furthermore, matching generally requires support for unexpected messages, which consumes time and space. Finally matching interfaces are stateful, which is bad for fault-tolerance and offload.
- Socket semantic is akin to ordered matching, all messages demultiplexed to the same destination Socket are treated as unexpected messages until consumed.
- Active Messages is well known solution but has 2 problems:
  - async handlers are a bitch => use event-driven instead.
  - messages larger than MTU requires reassembly: means stateful (bad), means unexpected buffers (bad) => Solution, limit message size to MTU, segmentation/reassembly in application. Scott: I think this will cause the most complaints since IB uses 1-2 KB and SeaStar is 256 bytes Patrick: but they have order on the wire and they do segmentation/reassembly, so the CCI MTU can be larger than the physical MTU. *Patrick: ultimately, the CCI MTU is not about MTU on the wire, it's the max size of a send, it's the amount of contiguous memory a single message can take in an endpoint receive queue. Because when you deliver a message to the app, it's contiguous. The bigger this max send size, the more space you waste in endpoint receive queue.
Remote Memory Access:
- [...]

Evalutation:

Simplicity: lines of code ? For native implementations or over current interfaces? *Patrick: for middlewares. See Open-MPI comparison of back-ends, Verbs 10x larger *
Portability: number of backends ?
Performance: CPU overhead, latency.
Scalability: memory footprint, polling time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hot Interconnect 2010

State of the art

Goals

What is worth talking

Evalutation:

Clone this wiki locally