Fearless hardware design
-
Updated
Aug 20, 2025 - Verilog
Fearless hardware design
A Flexible and Energy Efficient Accelerator For Sparse Convolution Neural Network
Energy-efficient Event-driven Spiking Neural Network accelerator for FPGA with PyTorch integration
SneakySnake:snake: is the first and the only pre-alignment filtering algorithm that works efficiently and fast on modern CPU, FPGA, and GPU architectures. It greatly (by more than two orders of magnitude) expedites sequence alignment calculation for both short and long reads. Described in the Bioinformatics (2020) by Alser et al. https://arxiv.o…
NPUsim: Full-Model, Cycle-Level, and Value-Aware Simulator for DNN Accelerators
audio/video toolkit based FFmpeg 6.x, 7.x supported for multimedia with Hardware Acceleration.
Open source RTL simulation acceleration on commodity hardware
Chameleon: A Multiplier-Free Temporal Convolutional Network Accelerator for End-to-End Few-Shot and Continual Learning from Sequential Data
NeuroSpector: Dataflow and Mapping Optimizer for Deep Neural Network Accelerators
GenStore is the first in-storage processing system designed for genome sequence analysis that greatly reduces both data movement and computational overheads of genome sequence analysis by exploiting low-cost and accurate in-storage filters. Described in the ASPLOS 2022 paper by Mansouri Ghiasi et al. at https://people.inf.ethz.ch/omutlu/pub/GenS…
Bare-metal FPGA implementation of the pccx NPU for LLM inference on Kria KV260: SystemVerilog RTL, W4A8 quantization, GEMM/GEMV datapaths, KV-cache scheduling, and driver code.
Garuda: CVXIF coprocessor optimizing batch-1 attention microkernels with 7.5-9× lower p99 latency. RISC-V INT8 MAC accelerator for transformer inference.
NPUWattch: ML-based Power, Area, and Timing Modeling for Neural Accelerators
Hardware accelerator for 2D convolution using an 8×8 weight-stationary systolic array with split-kernel support, dual-port SRAM architecture, and DMA-based streaming
PCCX is an open NPU architecture for memory-bound Transformer inference on edge FPGAs, focused on GEMM/GEMV, KV-cache, W4A8 quantization, and custom ISA scheduling.
Hardware Accelerator implementation for solving an ordinary differential equation using Runge Kutta Numerical methods using VHDL language
Systolic-Tensor-Core, References the "Systolic Array" architecture used in TPUs.
Hi everyone !! Here i have modified the vanila LeNet-5 model slightly and trained with the german traffic sign benchmark dataset. So by analysing the computation heavy layers i have designed an IP using Vitis HLS 2024.1 and implemented it in the PYNQ Z2 Platform.
This project implements AXI-based matrix multiply accelerator.
Add a description, image, and links to the hardware-accelerator topic page so that developers can more easily learn about it.
To associate your repository with the hardware-accelerator topic, visit your repo's landing page and select "manage topics."