FP16, or 16-bit half-precision floating-point, represents each value using 1 sign bit, 5 exponent bits, and 10 mantissa bits — half the storage of FP32 (32-bit single precision) at the cost of reduced numerical range and precision.
In GPU computing, FP16 is the workhorse precision for deep learning training and inference. Modern datacenter GPUs deliver substantially higher FP16 throughput than FP32 because tensor cores are optimized for lower-precision matrix operations. An NVIDIA H100, for example, reaches roughly 1,979 TFLOPS at FP16 (with sparsity) versus 67 TFLOPS at FP32.
AIMC's fit-score algorithm uses each GPU's FP16 TFLOPS rating as one of three inputs alongside VRAM capacity and GPU class. Workloads like LLM training require both adequate VRAM and meaningful FP16 throughput; the score weights both.