Configuring NVLink on a VPS with Multi-GPU Setup

NVLink, NVIDIA’s high-bandwidth GPU interconnect, revolutionizes multi-GPU workloads by enabling rapid memory sharing and data transfer, far surpassing PCIe limitations. For data scientists, ML engineers, or HPC users, NVLink can accelerate tasks like deep learning model training or large-scale simulations. For example, you could use NVLink with dual A100 GPUs to train a large language model faster by pooling GPU memory. However, configuring NVLink in a virtualized environment is complex and requires specific hardware and setup. This guide explains NVLink, its limitations in a VPS, and how to configure it for optimal performance.

Limitations and Caveats

  • Not all VPS providers support NVLink setups.

  • Only bare-metal-based VPS or dedicated GPU virtual machines with direct passthrough allow NVLink to work.

  • No NVLink support in containerized environments like Docker unless run on the host directly.

What Is NVLink?

NVLink allows two or more compatible NVIDIA GPUs to:

  • Share memory across GPUs for large datasets

  • Exchange data at up to 600 GB/s total bandwidth

  • Perform faster multi-GPU training without CPU involvement

Supported on GPUs like:

  • NVIDIA A100, V100, RTX 3090, 4090, A6000, etc.

  • Usually requires a physical NVLink bridge

NVLink in VPS: Prerequisites

Before attempting to configure NVLink on a VPS, ensure the following:

Host Hardware

  • The physical server must have:

    • At least two NVLink-compatible GPUs

    • NVLink bridge(s) installed

    • BIOS and firmware that supports NVLink

  • Common compatible setups include dual A100 or RTX 3090 with NVLink bridge.

VPS Configuration

  • The VPS must be provisioned on a GPU passthrough-enabled hypervisor, like:

    • KVM/QEMU with VFIO (PCI passthrough)

    • VMware ESXi with DirectPath I/O

    • Proxmox VE with GPU passthrough

⚠️ Note: NVLink does not work across virtualized devices unless both GPUs are passed through as full PCIe devices to the same VM.

Step-by-Step: How to Configure NVLink on a VPS

Step 1: Ensure Passthrough of GPUs

The host needs to pass both physical GPUs directly to your VPS.

For KVM/QEMU with VFIO:

# Example for assigning two GPUs via vfio-pci
echo "vendor_id device_id" > /sys/bus/pci/devices/0000:65:00.0/driver/unbind
echo "vendor_id device_id" > /sys/bus/pci/devices/0000:66:00.0/driver/unbind
echo "vendor_id device_id" > /sys/bus/pci/drivers/vfio-pci/new_id

Update libvirt or qemu XML to pass both GPUs through.

Step 2: Install NVIDIA Drivers

Inside the VPS (guest OS), install the latest NVIDIA driver:

sudo apt update
sudo apt install -y nvidia-driver-535

Reboot after installation.

Step 3: Verify NVLink Topology

Once inside the guest OS:

nvidia-smi topo -m

You should see:

GPU0GPU1CPU Affinity
GPU0XNV10-15
GPU1NV1X0-15

Where NV1 means NVLink is active between GPU0 and GPU1.

Step 4: Enable Peer-to-Peer Access (Optional but Recommended)

nvidia-smi p2p

Ensure Peer-to-Peer and Access are both marked as Enabled.

 Security Considerations

  • Isolated access: Ensure your VPS is not oversubscribed or co-hosted with others when using full GPU passthrough.

  • No shared memory leakage: NVLink creates a shared memory space—limit access to trusted environments.

  • Audit access to /dev/nvidia devices*.

Troubleshooting NVLink Issues

 

SymptomPossible CauseFix
NVLink not shown in nvidia-smiGPUs not bridged properlyPower off host and reinstall physical NVLink bridge
Only one GPU visiblePassthrough misconfigurationCheck VM XML/device passthrough settings
Peer-to-peer disabledDriver mismatch or BIOS settingsUpgrade driver, check BIOS for NVLink support
Low bandwidthNVLink lanes underutilizedUse nvidia-smi nvlink –status to verify lanes

 

NVLink is a game-changer for GPU-intensive workloads, offering immense performance advantages when properly configured—even in virtual environments. With direct GPU passthrough and careful setup, you can harness the power of multi-GPU interconnects on a VPS, turning it into a high-performance computing node for demanding applications.