Image generation with diffusion models is one of the most accessible high-end AI workloads. The VRAM floor depends heavily on the model: Stable Diffusion 1.5 fits in 6 GB, SDXL needs 8-12 GB for typical workflows, and FLUX.1 [dev] in full FP16 wants 24 GB. Quantized variants (FP8, NF4) cut these requirements roughly in half.
ComfyUI and Automatic1111 are the dominant inference frontends, with ComfyUI increasingly preferred for programmatic and node-graph workflows. For batch generation, throughput scales nearly linearly with GPU count up to memory limits. Training and fine-tuning (DreamBooth, LoRA, full fine-tuning) demands 24 GB or more for serious work.
Consumer GPUs with strong FP16/FP32 performance and 16-24 GB VRAM (RTX 4090, RTX 3090, RTX 4080) hit the sweet spot for individual creators. For production-scale image services, datacenter cards offer better concurrency at higher cost per image.