摘要:
The continuous increase in compute speed and main-memory capacity of modern servers triggered the development of a new generation of in-memory database sys- tems. These systems completely rewrote the traditional database architecture to use main memory as primary storage. Discarding several now obsolete abstrac- tions of disk-based database systems enabled unprecedented query performance on a single server. However, network communication slows down queries as soon as multiple servers are involved. The result is a significant performance gap between local and distributed query processing. Still, a scale out to a cluster becomes inevitable when the workload exceeds the capacity of a single server. This thesis seeks to further the state-of-the-art of distributed query processing in parallel main-memory database systems by addressing the performance bar- rier introduced by network communication. Thus, instead of concentrating on an isolated problem, we design a novel distributed query engine that adapts to the available network bandwidth as well as unexpected workload characteristics that hinder scalability. It exploits locality to speed up query processing over commod- ity networks and implements a novel parallelism model to fully leverage modern high-speed interconnects. We prove the feasibility of our design with a prototypical implementation for the high-performance in-memory database system HyPer. Us- ing redo log multicasting and global transaction-consistent snapshots, the engine further enables query processing on fresh transactional data. An extensive evalu- ation with the renowned TPC-H analytical benchmark demonstrates that HyPer with our novel distributed query engine not only outperforms competing parallel database systems but also scales its query performance with the cluster size.