systolic-array

Hardware accelerator for 2D convolution using an 8×8 weight-stationary systolic array with split-kernel support, dual-port SRAM architecture, and DMA-based streaming

fpga verilog sram convolution vlsi dma digital-design systolic-arrays rtl-design hardware-accelerator systolic-array

Updated Feb 8, 2026
Verilog

Parameterized N×N output-stationary systolic array accelerator for INT8 neural network inference. Full RTL-to-GDS flow on ASAP7 7nm using Cadence Genus + Innovus. 667 MHz, 42.7 GOPS peak throughput, 0.33 mW/GOP. SystemVerilog RTL, synthesis, place-and-route and self-checking testbench included.

neural-network accelerator systemverilog vlsi cadence digital-design genus tpu physical-design innovus 7nm rtl-to-gds systolic-array asap7

Updated Feb 18, 2026
Verilog

akira2963753 / Neural-Network-Implementation-on-FPGA

Star

fpga verilog hdl tpu ai-accelerator systolic-array

Updated Sep 3, 2025
SystemVerilog

VuDaiDuong-325 / int8-matmul-accelerator

Star

A high-performance INT8 Matrix Multiplication Accelerator implemented in pure Verilog, optimized for Edge AI inference on Xilinx Kria KV260 (Zynq UltraScale+ MPSoC).

accelerator gemm rtl-design kv260 matrix-multiply systolic-array

Updated May 28, 2026
VHDL

llamasearchai / OpenAccelerator

Star

High-performance systolic array computing framework with AI agents and medical compliance.

python docker machine-learning accelerator openai hipaa fastapi medical-ai systolic-array fda-validation

Updated Jul 22, 2025
Python

yimeng-blake / hpc-final

Star

Deadline-constrained SCALE-Sim and Accelergy experiment for image-processing systolic arrays

hpc image-processing accelergy systolic-array scalesim

Updated Jun 7, 2026
Python

alicekim07 / fpga-systolic-array-accelerator

Star

INT8 Systolic-Array AI Accelerator on Zynq SoC with HW-SW Co-Design and Roofline Performance Analysis

fpga zynq verilog quantization hw-sw-codesign edge-ai ai-accelerator systolic-array

Updated Mar 20, 2026
Jupyter Notebook

kartar-singh-cs / matrix-multiplication-logisim

Star

Three Logisim implementations of matrix multiplication — Standard, Systolic TPU, and Tiling. Built from scratch in digital logic.

tiling matrix-multiplication digital-logic computer-architecture logisim systolic-array

Updated Jun 11, 2026

DhruvDes / FPGA-ACC-MAC

Star

4×4 7-bit matrix multiplication hardware accelerator using a systolic array, with a Python driver for the Basys 3 FPGA and a systolic array UVC using UVM.

fpga rtl matrix-multiplication verilog python-driver systemverilog hdl uvm uvc 8bit digital-design basys3 hardware-verification hardware-accelerator systolic-array

Updated Feb 4, 2026
SystemVerilog

VRM21-Studios / 8x8-Systolic-Array-Module-FPGA

Star

An AXI-native 8x8 systolic array accelerator in Verilog. Features pure dataflow pipelining, Q-format fixed-point arithmetic, and hardware validation on the Kria KV260 FPGA.

fpga verilog xilinx fixed-point axi-stream axi-lite systolic-array kria-kv260

Updated Mar 19, 2026
Verilog

ansh07verma / fpga-mini-npu

Star

Small-scale FPGA-based Neural Processing Unit (CNN Accelerator) with INT8 systolic array matrix multiplication in Verilog.

machine-learning fpga verilog npu cnn-accelerator hardware-design systolic-array

Updated May 25, 2026
Verilog

Lord1Egypt / PtahCore

Star

𓁰 PtahCore — open-source FP8 tensor accelerator, RTL→GDSII on open 7nm ASAP7, 100% open toolchain

asic gpu accelerator verilog systemverilog gdsii openroad chip-design 7nm fp8 systolic-array asap7

Updated Jun 12, 2026
Tcl

anupamsarashwat1-cloud / smvdu-titan-x

Star

A high-performance, production-grade 64-bit RISC-V Multicore SoC ecosystem and industry-standard Cadence ASIC CAD flow (Genus/Innovus). Fully integrated 5-hart coherent core complex, TileLink interconnect, custom RoCC ML Systolic Array, PCIe, USB, HDMI, and silicon-proven IP blocks.

asic fpga chisel verilog systemverilog soc multicore risc-v semiconductor cadence open-source-hardware genus chipyard innovus systolic-array

Updated May 31, 2026
Verilog

AnilS454 / systolic-array-fpga-zcu104

Star

An 8×8 systolic array AI accelerator implemented in SystemVerilog on Zynq UltraScale+ ZCU104, achieving 1.7 GOPS at 6 mW PL logic power (~283 GOPS/W efficiency) with full AXI-Stream PS-PL integration. Targets INT8 matrix multiplication for transformer inference acceleration, verified across behavioral, post-synthesis and implementation simulation.

fpga zynq vivado systemverilog rtl-design ai-accelerator systolic-array

Updated Apr 21, 2026
SystemVerilog

baul-iisc / nova-spacecraft-artifact

Star

Reproducibility artifact for IEEE TC paper TC-2025-09-0830 'Characterizing and Accelerating Spacecraft Onboard Workloads on RISC-V Platform': 28 synthetic spacecraft workloads, gem5 with NOVA RVTrig+RVMatrix ISA extensions, FPGA RTL (VU9P), McPAT models, and RISC-V toolchain patches.

spacecraft hardware-acceleration risc-v gem5 cordic systolic-array

Updated Jun 9, 2026
C++

Improve this page

Add a description, image, and links to the systolic-array topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the systolic-array topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

systolic-array

Here are 33 public repositories matching this topic...

trevorpogue / algebraic-nnhw

RightNow-AI / tiny-tpu

Purdue-SoCET / atalla

nikhiledm97 / TheGEMMCoreProject

avikde / tiny-xpu

AhmedSobhy01 / convolution-accelerator

Kev1nLevin / matmul-engine

akira2963753 / Neural-Network-Implementation-on-FPGA

VuDaiDuong-325 / int8-matmul-accelerator

llamasearchai / OpenAccelerator

yimeng-blake / hpc-final

alicekim07 / fpga-systolic-array-accelerator

kartar-singh-cs / matrix-multiplication-logisim

DhruvDes / FPGA-ACC-MAC

VRM21-Studios / 8x8-Systolic-Array-Module-FPGA

ansh07verma / fpga-mini-npu

Lord1Egypt / PtahCore

anupamsarashwat1-cloud / smvdu-titan-x

AnilS454 / systolic-array-fpga-zcu104

baul-iisc / nova-spacecraft-artifact

Improve this page

Add this topic to your repo