About CliffGuard

A fundamental problem in database systems is choosing the best physical design, i.e., a small set of auxiliary structures that enable the fastest execution of future queries. Modern databases come with designer tools that create a number of indices or materialized views, but they find designs that are sub-optimal and remarkably brittle. This is because future workload is often not known a priori and these tools optimize for past workloads in hopes that future queries and data will be similar. In practice, these input parameters are often noisy or missing.

CliffGuard is a practical framework that create robust designs that are immune to parameter uncertainties as much as desired.

CliffGuard is the first attempt at applying robust optimization theory in Operations Research to building a practical framework for solving one of the most fundamental problems in databases, namely finding the best physical design.


The Architecture of CliffGuard
How Does CliffGuard Work?

A major goal in the design of CliffGuard is compatibility with almost any existing database in order to facilitate its adoption in the commercial world.
Thus, we have made two key decisions in our design to make sure that CliffGuard can work with any DBMS:

  1. CliffGuard should operate alongside an existing (nominal) designer rather than replacing it.
  2. CliffGuard is designed to treat existing designers as a black-box (i.e., without modifying their internal implementations).

The database administrator states her desired degree of robustness Γ to CliffGuard, which is located outside the DBMS. CliffGuard in turn invokes the existing physical designer via its public API. After evaluating the output (nominal) design sent back from the existing designer, CliffGuard may decide to manipulate the existing designer’s output by merely modifying some of its input parameters (in a principled manner) and invoking its API again. CliffGuard repeats this process, until it is satisfied with the robustness of the design produced by the nominal designer. The final (robust) design is then sent back to the administrator, who may decide to deploy it in the DBMS.


We tested our algorithm against Vertica’s database designer (called DBD) and DBMS-X’s designer as two of the most heavily-used state-of-the-art commercial designers, as well as two other baseline algorithms (introduced later in this section). For Vertica experiments, we used its community edition and invoked its DBD and query optimizer via a JDBC driver. Similarly, we used DBMS-X’s latest API. We ran each experiment on two machines: a server and a client. The server ran a copy of the database and was used for testing different designs. The client was used for invoking the designer and sending queries to the server. We ran the Vertica experiments on two Dell machines running Red Hat Enterprise Linux 6.5, each with two quad-core Intel Xeon 2.10GHz processors. One of the machines had 128GB memory and 8×4TB 7.2K RPM disks (used as server) and the other had 64GB memory and 4 × 4TB 7.2K RPM disks. For DBMS-X experiments, we used two Azure Standard Tier A3 instances, each with a quad-core AMD Opteron 4171 HE 2.10GHz processor, 7GB memory, and 126GB virtual disks.

We conducted our experiments on a real-world (R1) workload and two synthetic ones (S1 and S2). R1 belongs to one of the largest customers of the Vertica database, composed of 310 tables and 430+K time-stamped queries issued between March 2011 and April 2012 out of which 15.5K queries conform to their latest schema (i.e., can be parsed). Since we did not have access to any real workloads from DBMS-X’s customers, we used the same query log but on a smaller dataset (20GB) given the smaller memory capacity of our Azure instances (compared to our Dell servers). We also created two synthetic workloads, called S1 and S2, as follows. We used the same schema and dataset as R1, but chose different subsets and relative ordering of R1 queries to artificially cause different degrees of workload change.
Performance Boost for HP Vertica
Performance Boost for Another Commercial DBMS
We compared six different designers, including CliffGuard. For details of these designers, please refer to our paper. In summary, compared to Vertica’s state-of-the-art designer, our robust designer reduces the average and maximum latency of queries by up to 5× and 11×, respectively. Similarly, CliffGuard improves upon DBMS-X’s designer by 3–5×.

(Note: DBMS-X is a major database system, which we cannot reveal due to the vendor’s restrictions on publishing performance results.)

Supported DBMS

CliffGuard currently supports the following DBMSs:

HP Vertica
Microsoft SQL Server


CliffGuard is available on GitHub.
(CliffGuard is available under the Apache License, Version 2.0)


CliffGuard: A Principled Framework for Finding Robust Database Designs
Barzan Mozafari, Eugene Zhen Ye Goh, and Dong Young Yoon
In Proceedings of the ACM SIGMOD 2015 Conference, May 31-June 4, 2015
(Download an extended report on CliffGuard)