Fredrik Kjolstad

Abstract

Recent research has focused on leveraging sparsity in hardware accelerators to improve the efficiency of applications spanning scientific computing to machine learning. Most such prior accelerators are fixed-function, which is insufficient for two reasons. First, applications typically include both dense and sparse components, and second, the algorithms that comprise these applications are constantly evolving. To address these challenges, we designed a programmable accelerator called Onyx for both sparse tensor algebra and dense workloads. Onyx extends a coarse-grained reconfigurable array (CGRA) optimized for dense applications with composable hardware primitives to support arbitrary sparse tensor algebra kernels. In this paper, we show that we can further optimize Onyx by adding a small set of hardware features for parallelization that significantly increase both temporal and spatial utilization of the CGRA, reducing runtime by up to 6.2×.

BibTeX


      @article{koul2025,
  title={Designing Programmable Accelerators for Sparse Tensor Algebra},
  author={Kalhan Koul and Zhouhua Xie and Maxwell Strange and Sai Gautham Ravipati and Bo Wun Cheng and Olivia Hsu and Po-Han Chen and Mark Horowitz and Fredrik Kjolstad and Priyanka Raina},
  journal={IEEE Micro},
  year={2025},
  month={April}
}