Machine learning based optimization for database workloads

Machine Learning Based Optimization for Database Workloads

Participants
  • IBM: Shaikh Quader, ML Architect, DB2; David Kalmuk, IBM, Senior Technical Staff Member , DB2; Calisto Zuzarte, IBM, Director DB2 Development
  • York University: Marin Litoiu, Professor; Manos Papagelis, Associate Professor; Andrew Jaramillo, MSc student; Sumona Mukhopadhyay, Postdoctoral Fellow; Yonis Abokar, undergrad student; Ahnaf Baig, undergrad student.
Project Description

This project aims to investigate machine learning and self-optimization algorithms for database workloads. The focus is on techniques that can enable run-time adaptive reconfiguration through workload classification, performance prediction and resource planning and execution. Current state-of-the-art approaches adopted in the industry reside on a number of static memory models implemented for each sort of consuming query operator in the access plan (e.g. sort, hash join, etc.). Each model is provided with different inputs depending on the operator type (e.g. cardinality, number of columns, column widths, and various other statistics) and computes an estimate of the memory that would be used by the operator. While these models were built with an understanding of runtime behavior, they are static models, not sufficiently accurate and not enabled for self-tuning and self-optimization. As a result, when runtime implementation of an operator or the operating environment changes, then the models become outdated.

In contrast to existing approaches, this project looks at a broad spectrum of deep learning models for prediction and at adaptive look-ahead optimization to account for accurate, cost-effective runtime decisions. The first goal of the project is to develop and test new machine learning models for database workload  and resource consumption estimation. The second complementary goal is to develop new model based self-optimization algorithms for memory, computing and storage allocation. The research will generate knowledge and innovation that yield (i) better system utilization and less cost and (ii) better system performance and stability. At the same time, the project will prepare a new generation of researchers and developers, trained in machine learning, database systems and cloud computing.

The partner in the project is IBM Canada Lab and IBM Centre for Advanced Studies.

Project Results

[1] Learning-based workload resource optimization for database management systems(US Patent)

[2] CASCON 2022 Poster presentation(video presentation)

[3] LearnedWMP: Database Workload Memory Prediction Using
Distribution of Query Templates ( paper, submitted and under review)