Design Gateway — Special Reference Design

Scalable NVMe over TCP
Initiator on Alveo Card

Special Reference Design  for  FPGA-Accelerated Distributed NVMe Storage Access
NVMeTCP25G-IP 4 × 25G Ethernet Zero CPU Protocol Load PCIe XRT DMA AMD Alveo U50
~9,500
MB/s
4× Read Throughput
~9,000
MB/s
4× Write Throughput

Built on an AMD Alveo accelerator card and powered by multiple NVMeTCP25G-IP cores, this reference design enables a host system to access four remote NVMe SSDs simultaneously through four independent 25G Ethernet links — with a scalable FPGA architecture ready for more NVMe/TCP sessions and zero CPU involvement in protocol processing.

The Challenge
Remove the CPU Bottleneck from NVMe over TCP Host Access

A standard NIC-based NVMe/TCP host relies on CPU software to handle all protocol layers — a bottleneck that grows with each additional remote SSD. This Alveo-based design offloads TCP/IP and NVMe/TCP processing into FPGA logic, keeping the CPU free for the application while the card handles data movement to host memory over PCIe/XRT.

Aspect ⚠ Standard NIC Approach ✓ NVMeTCP25G-IP on Alveo
Protocol Processing CPU processes TCP/IP stack and NVMe/TCP protocol Hardware offload for TCP/IP and NVMe/TCP host operations
Data Path Multiple intermediate data copies in host memory Independent DMA paths for multiple 25G sessions
Scalability Scaling to multiple remote SSDs increases CPU and memory load FPGA scalability for additional NVMe/TCP channels
Design Principles
Four Keys to Scalable Remote NVMe SSD Access

The design combines FPGA protocol offload, independent 25G Ethernet channels, direct host-memory transfer, and a scalable IP-core architecture to create a practical platform for high-performance remote storage access.

Full NVMe/TCP Host Offload
Full NVMe/TCP Host Offload

NVMeTCP25G-IP integrates TCP/IP stack and NVMe/TCP host functions in hardware for write/read access to remote NVMe SSDs — with zero CPU involvement in protocol processing.

Four Independent 25G Links
Four Independent 25G Links

Four 25G Ethernet connections operate simultaneously, allowing the host to access four remote NVMe SSDs in parallel with full per-session bandwidth isolation.

PCIe Accelerator Integration
PCIe Accelerator Integration

The Alveo card plugs into the host system as a PCIe accelerator, using XRT for register access and high-speed DMA transfers directly into host memory.

Scalable FPGA Architecture
Scalable FPGA Architecture

Additional NVMeTCP25G-IP instances can be integrated to expand the number of remote NVMe SSD sessions, scaling storage bandwidth for larger deployments.

Target Markets
Ideal for Applications Requiring Distributed High-Speed Storage

Especially attractive where data is generated, stored, or processed across multiple remote locations but must be accessed by a central host with minimal CPU overhead.

Application for AI / HPC
AI / HPC

Offload NVMe/TCP transport entirely to the Alveo FPGA, freeing host CPUs for training and inference. A GPU server and FPGA card connect through a 25G Ethernet switch to access model data, intermediate results, and inference artifacts in parallel — with zero software stack overhead and deterministic low latency across all sessions.

Remote Target Servers
01
Datasets / Model CheckpointsTraining data ingestion and checkpoint persistence
02
Shared Scratch / ResultsIntermediate outputs and distributed job results
03
Inference ModelsDeployed model weights and serving artifacts
Application for Video & Media Processing
Video & Media Processing

Sustain high-bandwidth, multi-stream media workflows without burdening the host CPU. A GPU server with the Alveo card routes 25GbE sessions through a switch to dedicated media servers — enabling simultaneous codec processing, frame-accurate playback, and adaptive HTTP delivery from independent NVMe storage targets.

Remote Target Servers
01
Transcode ServerCodec processing for format and bitrate conversion
02
Player ServerFrame scheduler & timecode management
03
Edge PlayerSegment packaging and HTTP delivery
Measured Results
Performance Matrix

Benchmarked on a single Alveo U50 card. All throughput figures are sustained transfer rates over 25G Ethernet to remote NVMe SSDs, with TCP/IP and NVMe/TCP fully offloaded to FPGA logic — the host CPU contributes 0% to protocol processing.

NVMe/TCP Performance on Alveo U50 Card
Configuration Read Speed Write Speed CPU Usage
NVMeTCP25G ×1 2,679 MB/s 2,531 MB/s 0%
NVMeTCP25G ×4 ~9,500 MB/s ~9,000 MB/s 0%

A single IP core sustains 2,679 MB/s read / 2,531 MB/s write — near line-rate for a 25G Ethernet link. Scaling to four independent NVMeTCP25G-IP instances delivers ~9,500 MB/s read and ~9,000 MB/s write, with throughput growing linearly as additional cores are added. Because the entire protocol stack runs in FPGA logic, host CPU utilisation remains at 0% throughout.

Technical Specs
System Requirements
FPGA Accelerator CardXilinx Alveo U50 (16nm UltraScale+ FPGA, PCIe Gen3 x16)
Network Interfaces100G to 4× 25G breakout cable
IP CoreNVMeTCP25G-IP — 4 instances, each managing one 25GbE session
Protocol SupportNVMe/TCP (NVMe over TCP/IP)
Host InterfacePCIe Gen3 x16 — standard add-in card slot
Target SystemAny Linux PC or server running NVMe/TCP target driver (nvmet) with NVMe SSD
Host OSUbuntu 20.04.1 OS
Live Demonstration
Free Evaluation Demo

Watch the full demo of four NVMeTCP25G-IP sessions running simultaneously on the Alveo U50 card — real hardware, real NVMe SSDs, real 25G Ethernet links.

For more details, please refer to the demo videos and documentation published on our website.

📄 Reference Design Document 📋 Demo Instruction Manual 💾 Free Evaluation Demo Bitfile (Alveo U50) ✉️ Contact Us
■ For a time-limited free evaluation, please contact us