CUDA Cores are the general-purpose parallel processing units in NVIDIA GPUs, responsible for executing programs written in CUDA (NVIDIA's parallel computing platform). They handle FP32 and FP64 work, as well as integer operations and any code that isn't a tensor matrix operation.
A modern datacenter GPU has thousands of CUDA cores — the H100 has 14,592; the RTX 5090 has 21,760. Each CUDA core is much simpler than a CPU core but operates in massive parallel groups.
For deep learning, the relative importance of CUDA cores has decreased as workloads shifted to Tensor Core-eligible operations. They remain critical for scientific computing, rendering, simulation, and any workload using full FP32/FP64 precision.