A Compiler for Fused Relational Operations on Multisets
Abstract
We describe a comprehensive compilation approach for relational algebra, centered on an abstract loop-based intermediate representation (IR) that can express fused implementations of relational algebra operators on both sets and multisets. The loops are abstracted away from physical data structures thus making it easier to optimize. We show how to lower relational algebra query plan trees to the IR, including the complex operators that are used in production database systems, such as outer joins, non-equi joins, and differences. We then show how to compile this IR to efficient C++ code that co-iterates over the physical data structures present in relational algebra expressions. Finally, we show that our approach is portable across data structures. Since our approach can fuse across disparate operators, it achieves a 3.8x speedup (0.8--12.1x) compared to Hyper on selected LSQB benchmarks and worst-case optimal triangle queries. Our compiler also generates code of high quality: it has similar sequential performance to Hyper on TPC-H with a 1.0x speedup (0.4--4.3x) and is approaching their parallel performance with a 0.6x speedup (0.2--1.8x).
BibTeX
@article{jamesdong2026,
title={A Compiler for Fused Relational Operations on Multisets},
author={James Dong and Fredrik Kjolstad},
journal={Proceedings of the ACM on Programming Languages},
volume={10},
issue={PLDI},
year={2026},
month={June}
}