It has been initially supported by GPUs, where it is useful especially for storing the color components of pixels. For geometry data, FP32 is preferred.
In CPUs, some support has been first added in 2012, in Intel Ivy Bridge. Better support is provided in some server CPUs, and since next year also in the desktop AMD Zen 6 and Intel Nova Lake.
BF16 is a format introduced by Google, intended only for AI/ML applications, not for graphics, so initially it was implemented in some of the Intel server CPUs and only later in GPUs. Unlike FP16, which is balanced, BF16 has great dynamic range, but very low precision. This is fine for ML but inappropriate for any other applications.
Nowadays, most LLMs are trained preponderantly using BF16, with a small number of parameters using FP32, for higher precision.
Then from the biggest model that uses BF16, smaller quantized models are derived, which use 8 bits or less per parameter, trading off accuracy for speed.
Yes, however it’s a different format from standard fp16, it trades precision for greater dynamic range.