High-Speed Interconnects: Powering AI & HPC

The Imperative for Speed

The contemporary computational landscape is characterized by an unrelenting escalation in data volumes and processing requirements. This surge is predominantly driven by advancements in Artificial Intelligence (AI) 🧠, Machine Learning (ML), and High-Performance Computing (HPC) 💻.

In these environments, interconnect technologies form the critical data pathways. Their performance, specifically bandwidth and latency, directly dictates overall system efficiency and scalability. This infographic explores four prominent high-speed interconnects: PCIe, NVLink, InfiniBand, and RoCEv2.

Interconnect Landscape at a Glance

A high-level comparison of the latest generation capabilities for each technology, highlighting their maximum bidirectional bandwidth and typical lowest end-to-end latency ranges.

Technology (Latest Gen.)	Max. Bidirectional Bandwidth	Typical Low End-to-End Latency
PCIe 7.0 (x16)	~512 GB/s	<100 ns (projected)
NVLink 5.0 (Blackwell)	1.8 TB/s	<50-100 ns (estimated)
InfiniBand XDR (4x)	~200 GB/s (800 Gbps port)	<0.5 µs - 2 µs (estimated)
RoCEv2 (800GbE)	~200 GB/s (800 Gbps port)	<1 µs - 5 µs

Note: Bandwidth for IB/RoCE is port-based aggregate; PCIe/NVLink are device/GPU aggregate. Latencies are estimates/targets and can vary significantly based on system configuration and workload.

Bandwidth Evolution & Projections

PCIe Bidirectional Bandwidth Growth (x16 Slot)

Illustrating the doubling of bandwidth with each PCIe generation, crucial for general peripheral and accelerator connectivity.

Peak Bidirectional Bandwidth Comparison (Latest Generations)

Comparing the maximum achievable bidirectional bandwidth across the leading edge of these interconnect technologies.

The Latency Imperative

Typical Low-End Latency Comparison

Lower latency is critical for applications with frequent, small data exchanges. This chart shows typical *lowest reported* end-to-end latencies. Actuals vary.

~50 ns

Target for NVLink 5.0 GPU-GPU

<100 ns

InfiniBand Switch Hops

Note: PCIe latencies are component-to-component. NVLink is GPU-to-GPU. InfiniBand and RoCEv2 are end-to-end network latencies, highly dependent on fabric configuration and scale.

Technology Deep Dive & Use Cases

PCI Express (PCIe)

~512 GB/s

(PCIe 7.0 x16 Bidirectional)

Primary Use Cases:

GPUs (Graphics & Compute) 🎮
NVMe SSDs (High-Speed Storage) 💾
Network Interface Cards (NICs) 🌐
General Peripherals & Accelerators

Defining Trait: Universal peripheral interconnect, CPU-mediated.

NVIDIA NVLink

1.8 TB/s

(NVLink 5.0 per GPU Bidirectional)

Primary Use Cases:

GPU-to-GPU Communication (NVIDIA) 🔗
Large-Scale AI Model Training 🧠
High-Performance Computing (HPC) 💻
Real-Time AI Inference

Defining Trait: Proprietary, direct, ultra-low latency GPU link.

InfiniBand (IB)

~200 GB/s

(XDR 4x Port Bidirectional)

Primary Use Cases:

HPC System Fabric ⚙️
Large AI Training Clusters
Low-Latency RDMA Applications
Scalable Supercomputing

Defining Trait: Native RDMA, very low latency, HPC-focused fabric.

RoCEv2

~200 GB/s

(800GbE Port Bidirectional)

Primary Use Cases:

Data Center Networking 🏢
AI/ML over Ethernet
Storage Area Networks (SANs)
Hyperscale Deployments

Defining Trait: RDMA over Converged Ethernet, requires lossless fabric.

The Signaling Shift: PAM4

To achieve higher data rates, many modern interconnects are adopting Pulse Amplitude Modulation with 4 levels (PAM4) signaling. PAM4 doubles the data rate compared to traditional Non-Return-to-Zero (NRZ) signaling for the same fundamental frequency by encoding two bits per symbol instead of one.

NRZ (Non-Return-to-Zero)

2 Signal Levels (0 or 1)

1 bit per symbol

PAM4 (Pulse Amplitude Modulation 4-level)

4 Signal Levels (00, 01, 10, or 11)

2 bits per symbol = Double Data Rate

While PAM4 enables significant bandwidth increases, it introduces complexities like higher signal-to-noise ratio (SNR) requirements and the need for Forward Error Correction (FEC).

Key Adopters of PAM4:

PCIe 6.0 and newer generations
InfiniBand HDR, NDR, and XDR
High-Speed Ethernet (200GbE and above, forming the basis for RoCEv2)

Choosing Your Interconnect

The optimal interconnect depends heavily on specific application needs, scale, budget, and existing infrastructure. Here's a simplified guide:

Primary Need?

↓

Universal Peripheral?
(SSDs, NICs, single GPU)

↓

➡️ PCIe

↓

Max NVIDIA GPU-to-GPU?
(Large AI Models)

↓

➡️ NVLink

↓

Lowest Latency RDMA Fabric?
(HPC, Specialized AI)

↓

➡️ InfiniBand

↓

RDMA over Ethernet?
(Data Centers, Hyperscale)

↓

➡️ RoCEv2

This is a high-level guide. Cost, complexity of lossless Ethernet (for RoCEv2), ecosystem, and specific workload profiles are also critical factors.

Future Outlook

The drive for higher performance is relentless, fueled by AI and HPC. Key trends shaping the future of interconnects include:

Optical Interconnects: As electrical signaling faces limits, on-board optics, co-packaged optics (CPO), and silicon photonics are becoming crucial for overcoming copper's limitations in speed, reach, and power.
Convergence of Technologies: Shared adoption of PAM4 signaling and FEC techniques across different interconnects indicates common solutions to physical layer challenges.
Resource Disaggregation: Technologies like Compute Express Link (CXL), built on the PCIe physical layer, are enabling more flexible and composable system architectures with shared memory and resources.

The interconnect landscape will continue to evolve rapidly, with software and management ecosystems playing an increasingly vital role in harnessing the full potential of these advanced hardware technologies.

512 GB/s

Projected bidirectional bandwidth for a single PCIe 7.0 x16 slot, showcasing the ongoing scaling.