NVIDIA H100 Tensor Core GPU

Extraordinary performance, scalability, and security for every data center.

NVIDIA H100 Tensor Core GPU

An Order-of-Magnitude Leap for Accelerated Computing

Tap into exceptional performance, scalability, and security for every workload with the NVIDIA H100 Tensor Core GPU. With the NVIDIA NVLink Switch System, up to 256 H100 GPUs can be connected to accelerate exascale workloads. The GPU also includes a dedicated Transformer Engine to solve trillion-parameter language models. The H100’s combined technology innovations can speed up large language models (LLMs) by an incredible 30X over the previous generation to deliver industry-leading conversational AI.

Securely Accelerate Workloads From Enterprise to Exascale

Projected performance subject to change. GPT-3 175B training A100 cluster: HDR IB network, H100 cluster: NDR IB network | Mixture of Experts (MoE) Training Transformer Switch-XXL variant with 395B parameters on 1T token dataset, A100 cluster: HDR IB network, H100 cluster: NDR IB network with NVLink Switch System where indicated.

Transformational AI Training

H100 features fourth-generation Tensor Cores and a Transformer Engine with FP8 precision that provides up to 4X faster training over the prior generation for GPT-3 (175B) models. The combination of fourth-generation NVLink, which offers 900 gigabytes per second (GB/s) of GPU-to-GPU interconnect; NDR Quantum-2 InfiniBand networking, which accelerates communication by every GPU across nodes; PCIe Gen5; and NVIDIA Magnum IO™ software delivers efficient scalability from small enterprise systems to massive, unified GPU clusters.

Deploying H100 GPUs at data center scale delivers outstanding performance and brings the next generation of exascale high-performance computing (HPC) and trillion-parameter AI within the reach of all researchers.

Real-Time Deep Learning Inference

AI solves a wide array of business challenges, using an equally wide array of neural networks. A great AI inference accelerator has to not only deliver the highest performance but also the versatility to accelerate these networks.

H100 extends NVIDIA’s market-leading inference leadership with several advancements that accelerate inference by up to 30X and deliver the lowest latency. Fourth-generation Tensor Cores speed up all precisions, including FP64, TF32, FP32, FP16, INT8, and now FP8, to reduce memory usage and increase performance while still maintaining accuracy for LLMs.

Projected performance subject to change. Inference on Megatron 530B parameter model based chatbot for input sequence length=128, output sequence length =20 | A100 cluster: HDR IB network | H100 cluster: NVLink Switch System, NDR IB

Projected performance subject to change. 3D FFT (4K^3) throughput | A100 cluster: HDR IB network | H100 cluster: NVLink Switch System, NDR IB | Genome Sequencing (Smith-Waterman) | 1 A100 | 1 H100

Exascale High-Performance Computing

The NVIDIA data center platform consistently delivers performance gains beyond Moore’s law. And H100’s new breakthrough AI capabilities further amplify the power of HPC+AI to accelerate time to discovery for scientists and researchers working on solving the world’s most important challenges.

H100 triples the floating-point operations per second (FLOPS) of double-precision Tensor Cores, delivering 60 teraflops of FP64 computing for HPC. AI-fused HPC applications can also leverage H100’s TF32 precision to achieve one petaflop of throughput for single-precision matrix-multiply operations, with zero code changes.

H100 also features new DPX instructions that deliver 7X higher performance over A100 and 40X speedups over CPUs on dynamic programming algorithms such as Smith-Waterman for DNA sequence alignment and protein alignment for protein structure prediction.

Supercharge Large Language Model Inference with H100 NVL

For LLMs up to 175 billion parameters, the PCIe-based NVIDIA H100 NVL with NVLink bridge utilizes Transformer Engine, NVLink, and 188GB HBM3 memory to provide optimum performance and easy scaling across any data center, bringing LLMs to the mainstream. Servers equipped with H100 NVL GPUs increase GPT-175B model performance up to 12X over NVIDIA DGX A100 systems while maintaining low latency in power-constrained data center environments.

NVIDIA H200 Tensor Core GPU

Form Factor H100 SXM H100 NVL
FP64 34 teraFLOPS 30 teraFLOPS
FP64 Tensor Core 67 teraFLOPS 60 teraFLOPS
FP32 67 teraFLOPS 60 teraFLOPS
TF32 Tensor Core 989 teraFLOPS 835 teraFLOPS
BFLOAT16 Tensor Core 1979 teraFLOPS 1671 teraFLOPS
FP16 Tensor Core 1979 teraFLOPS 1671 teraFLOPS
FP8 Tensor Core 3958 teraFLOPS 3341 teraFLOPS
INT 8 Tensor Core 3958 TOPS 3341 TOPS
GPU Memory 80GB 94GB
GPU Memory Bandwidth 3.35TB/s 3.9TB/s
Decoders 7 NVDEC
7 JPEG
7 NVDEC
7 JPEG
NVIDIA AI Enterprise Add-on Included
Max Thermal Design Power (TDP) Up to 700W(configurable) 350-400W(configurable)
Multi-Instance GPUs Up to 7 MIGs @10GB each Up to 7 MIGS @12GB each
Form Factor SXM PCIe dual-slot-air-cooled
Interconnect NVIDIA NVLink 900 GB/s
PCIe Gen5: 128GB/s
2- or 4-way NVIDIA NVLink bridge: 900GB/s PCIe Gen5: 128GB/s
Server Options NVIDIA HGX H100 partner and NVIDIA-Certified Systems with 4 or 8 GPUs
NVIDIA DGX H100 with 8 GPUs
Partner and NVIDIA-Certified Systems with 1-8 GPUs
NVIDIA AI Enterprise Add-on Included

NVIDIA DGX SuperPOD

Leadership-Class AI Infrastructure

For More Information