Skip to content

Add bf16-f32-vcvt kernels for avx512/avx512bf16#10023

Merged
copybara-service[bot] merged 2 commits into
google:masterfrom
GregoryComer:bf16-f32-vcvt-avx512
Apr 28, 2026
Merged

Add bf16-f32-vcvt kernels for avx512/avx512bf16#10023
copybara-service[bot] merged 2 commits into
google:masterfrom
GregoryComer:bf16-f32-vcvt-avx512

Conversation

@GregoryComer

@GregoryComer GregoryComer commented Apr 21, 2026

Copy link
Copy Markdown
Contributor

Add AVX512 and AVX512_BF16 kernels for f32<->bf16 vcvt. For the non _BF16 fp32->bf16 kernels, I match the scalar kernel rounding logic, so it should be strictly correct (including for NaNs and Infs).

Performance looks like it hits the memory wall at larger sizes, but the native BF16 convert pulls ahead significantly (~2x) at smaller sizes. It generally tracks FP16 across the board on Genoa.

I verified tests pass with CMake on x86. I also verified that Bazel build of //:XNNPACK succeeds on x86 and CMake builds on ARM Mac.

Benchmarks

AMD Genoa

bf16 -> f32 (GB/s)

Kernel N:4800 N:43200
bf16 avx512skx_u32 264.6 138.4
bf16 avx512skx_u16 255.5 138.4
f16 avx512skx_u32 262.5 138.4
f16 avx512skx_u16 254.3 138.1
bf16 scalar_u4 38.6 39.0
f16 scalar_u4 6.4 6.2

f32 -> bf16 (GB/s)

Kernel N:4800 N:43200
bf16 avx512bf16_u32 269.7 124.2
bf16 avx512bf16_u16 259.5 141.3
f16 avx512skx_u32 267.8 138.2
f16 avx512skx_u16 256.2 140.2
bf16 avx512skx_u32 116.2 117.1
bf16 avx512skx_u16 112.7 114.7
bf16 scalar_u4 9.9 9.9
f16 scalar_u4 6.4 6.2

@GregoryComer GregoryComer marked this pull request as ready for review April 21, 2026 23:03
@GregoryComer GregoryComer changed the title Bf16 f32 vcvt avx512 Add bf16-f32-vcvt kernels for avx512/avx512bf16 Apr 21, 2026
@GregoryComer GregoryComer marked this pull request as draft April 21, 2026 23:17
@GregoryComer GregoryComer marked this pull request as ready for review April 21, 2026 23:42
@copybara-service copybara-service Bot merged commit a9cb6cb into google:master Apr 28, 2026
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants