Abstract

We introduce Stardust, a compiler from a sparse tensor algebra language to a reconfigurable dataflow architecture, by way of the Spatial parallel-patterns programming model. The key insight is to let performance engineers specify the placement of data into memories separately from the placement of computation onto compute units. Data is placed using an abstract memory model, and Stardust binds that data to complex, on-chip physical memories. Stardust then binds computation that uses on-chip data structures to the appropriate parallel patterns. Using cycle-accurate simulation, we show that Stardust can generate nine more tensor algebra kernels than the original Capstan work. The generated kernels result in 138x better performance on average than generated CPU kernels and 41x better performance on average than generated GPU kernels.

BibTeX

@article{hsu2025,
  title={Stardust: Compiling Sparse Tensor Algebra to a Reconfigurable Dataflow Architecture},
  author={Olivia Hsu and Alexander Rucker and Tian Zhao and Varun Desai and Kunle Olukotun and Fredrik Kjolstad},
  journal={International Symposium on Code Generation and Optimization},
  year={2025},
  month={March}
}