Releases: Zaneham/BarraCUDA
BarraCUDA 0.5
BarraCUDA 0.5
The first tagged release. The headline is that you can write a Triton kernel, matmul and all, and run it on a CPU with
no GPU. The --cpu backend lowers BIR straight to x86-64 with the SIMT model collapsed into a thread loop, and the
rank-2 tile path materialises and unrolls so tl.dot plus a K-loop sweeps an arbitrary contraction.
New in this cycle
- CPU backend (
--cpu). CUDA and Triton kernels compile to a host object and run natively. Headline demo:
examples/cpu_launch_matmul.c. - RISC-V backend (
--rv64). Same idea, RV64IMFD objects that run under qemu. - Cross-backend differential testing (
tests/diff/). Same BIR through two backends, diff the output buffers, CPU
is the oracle. Every case runs--injectso a green result actually means something. - Triton scalar math intrinsics.
exp,log,sin,cos,tan,tanh,sqrt,rsqrt,abs,floor,
ceil,maximum,minimum,fdiv. Thanks to @shivam2931120 for the PR, radians-to-turns convention done right. - Triton constexpr ABI compaction.
tl.constexprparams with defaults fold to literals and drop out of the
runtime signature. - CUDA fixes.
--cpu/--rv64on their own now run sema and don't trip the parse-dump fallback. Typedef-struct
kernels compile through--cpu, and--parseno longer segfaults on synthetic anon names. --versionflag.
Targets
-
AMD GPU.
--amdgpu-binfor ELF code objects (.hsaco),--amdgpufor assembly. CDNA 2 (gfx90a, MI250), CDNA
3 (gfx942, MI300X), RDNA 2 (gfx1030), RDNA 3 (gfx1100), RDNA 4 (gfx1200). -
NVIDIA PTX.
--nvidia-ptx, defaults tosm_89. JIT-loaded via the CUDA Driver API. -
Tenstorrent Metalium C++.
--tensix, SFPU compute. -
Tenstorrent baby cores.
--rv-elf, native RV32IM ELF via the TDF layer. Integer kernels for now; soft-float
runtime exists but not yet wired in. -
x86-64 host object.
--cpu, links and runs on Linux. -
RV64IMFD.
--rv64, runs under qemu-riscv64. -
Apple Metal MSL.
--metal, stub backend, hardware validation pending. -
Intel SPIR-V for Xe.
--intel-spirv, stub backend.Frontends
-
CUDA C. The same
.cufiles you'd feednvcc. Real preprocessor (#include, function-like macros,
#ifdef/#if/#elif). -
HIP.
--hipor.hipfiles, CUDA's AMD sibling with the same kernel-language shape. -
Triton.
@triton.jitPython source through a from-scratch lex/parse/sema/lower. Rank-1 and rank-2 tiles,
tl.dot, K-loop tiling, the math intrinsics above.Mainframe curios
-
ABEND dumps (
src/runtime/bc_abend.*). GPU faults become IBM-style G0Cx completion codes, correlated against
tracked allocations with a dispatch snapshot. Fires automatically on the HSA path. -
SNAP (
--snap). Per-kernel parameter dump on entry. AMD only for now. -
SYSPRINT. Class-tagged structured kernel output, pattern-routed sinks on the host. Demo:
examples/sysprint_kernel.cu+examples/launch_sysprint.c. -
TDF (Tile DataFlow). The layer above BIR for dataflow targets: regions, channels, NoC arcs, L1 placement,
multi-core fission. Dump with--tdf.Runtime + tooling
-
HSA runtime launcher (
src/runtime/bc_runtime.h). Loadslibhsa-runtime64.soat runtime viadlopen, so the
launcher itself has no compile-time ROCm dependency. -
Bilingual errors (
--lang <file>). Te reo Māori translation included; the format is data, so any language with
a translation file works. -
Optimisation passes. mem2reg, constant folding, dead code elimination. Each one is skippable (
--no-mem2reg/
--no-cfold/--no-dce) for bisection. -
Differential testing harness (
tests/diff/), as above.Validated on real silicon
-
AMD MI300X (CDNA 3, GFX942). 8/8 test kernels passing. Moa Monte Carlo neutron transport produces correct
physics (k_eff = 0.995 vs reference 1.000). -
AMD RDNA3 (GFX1100). Full test suite via the tinygrad mockgpu emulator in CI.
-
NVIDIA RTX 4060 Ti. Moa transport benchmark produces correct results with a 3.8x speedup over single-thread CPU.
No NVCC anywhere in the pipeline. -
Tenstorrent Blackhole. Compiles to valid Metalium C++.
This is not the full changelog as this is the first "release". See CHANGELOG.txt for the full prose.
Thanks to the people who've contributed, especially @nataliakokoromyti and @shivam2931120 and other people who've sent in tips and tricks and raised issues.