Add bf16-f32-vcvt kernels for avx512/avx512bf16 by GregoryComer · Pull Request #10023 · google/XNNPACK

GregoryComer · 2026-04-21T23:03:35Z

Add AVX512 and AVX512_BF16 kernels for f32<->bf16 vcvt. For the non _BF16 fp32->bf16 kernels, I match the scalar kernel rounding logic, so it should be strictly correct (including for NaNs and Infs).

Performance looks like it hits the memory wall at larger sizes, but the native BF16 convert pulls ahead significantly (~2x) at smaller sizes. It generally tracks FP16 across the board on Genoa.

I verified tests pass with CMake on x86. I also verified that Bazel build of //:XNNPACK succeeds on x86 and CMake builds on ARM Mac.

Benchmarks

AMD Genoa

bf16 -> f32 (GB/s)

Kernel	N:4800	N:43200
bf16 avx512skx_u32	264.6	138.4
bf16 avx512skx_u16	255.5	138.4
f16 avx512skx_u32	262.5	138.4
f16 avx512skx_u16	254.3	138.1
bf16 scalar_u4	38.6	39.0
f16 scalar_u4	6.4	6.2

f32 -> bf16 (GB/s)

Kernel	N:4800	N:43200
bf16 avx512bf16_u32	269.7	124.2
bf16 avx512bf16_u16	259.5	141.3
f16 avx512skx_u32	267.8	138.2
f16 avx512skx_u16	256.2	140.2
bf16 avx512skx_u32	116.2	117.1
bf16 avx512skx_u16	112.7	114.7
bf16 scalar_u4	9.9	9.9
f16 scalar_u4	6.4	6.2

GregoryComer added 2 commits April 21, 2026 15:47

Add AVX512 f32<->bf16 vcvt kernels

803079c

Add native AVX512_BF16 f32->bf16 vcvt kernel

80303ee

GregoryComer marked this pull request as ready for review April 21, 2026 23:03

GregoryComer mentioned this pull request Apr 21, 2026

[Tracker] Expanded BF16 Support #9728

Open

27 tasks

GregoryComer changed the title ~~Bf16 f32 vcvt avx512~~ Add bf16-f32-vcvt kernels for avx512/avx512bf16 Apr 21, 2026

GregoryComer marked this pull request as draft April 21, 2026 23:17

GregoryComer marked this pull request as ready for review April 21, 2026 23:42

dsharlet approved these changes Apr 22, 2026

View reviewed changes

copybara-service Bot mentioned this pull request Apr 28, 2026

Copybara import of the project: #10114

Merged

copybara-service Bot merged commit a9cb6cb into google:master Apr 28, 2026
21 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add bf16-f32-vcvt kernels for avx512/avx512bf16#10023

Add bf16-f32-vcvt kernels for avx512/avx512bf16#10023
copybara-service[bot] merged 2 commits into
google:masterfrom
GregoryComer:bf16-f32-vcvt-avx512

GregoryComer commented Apr 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

GregoryComer commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks

AMD Genoa

bf16 -> f32 (GB/s)

f32 -> bf16 (GB/s)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

GregoryComer commented Apr 21, 2026 •

edited

Loading