过去一年间,对优化器相关论文做了个系统性的学习,把过程中阅读的论文笔记记录在这里,和大家分享,欢迎大家和我一起讨论,纠错补差,共同进步 ~
阅读路线基本遵照了pingcap github上的一个Awesome Database Learning的资料,这个资料非常棒,包含了一些基本的课程 + 书籍,还按照内核中不同模块的不同方面做了分类,非常系统化,尤其是SQL层面非常详尽,正好符合需求,因此阅读基本也是按其中的paper来,并扩展到一些没有涉及的内容,总体目录如下(优化器部分),由于内容较多,主要挑选其中影响力较大的或者最有参考意义的。
Planner framework
- 1979, Access Path Selection in a Relational Database Management System , SIGMOD
- Query Processing in Main Memory Database Management Systems , VLDB
- 1988, Grammar-like Functional Rules for Representing Query Optimization Alternatives , SIGMOD
- 1993, The Volcano Optimizer Generator- Extensibility and Efficient Search , ICDE
- 1995, The Cascades Framework for Query Optimization , IEEE Data engineering Bulltin
- 1998, An Overview of Query Optimization in Relational Systems , PODS
- 2014, Orca: A Modular Query Optimizer Architecture for Big Data
- 2016, The MemSQL Query Optimizer: A modern optimizer for real-time analytics in a distributed database
- 2012, Testing the Accuracy of Query Optimizers , DBTest
Transformation
- Eager Aggregation and Lazy Aggregation , VLDB
- 2000, Parameterized Queries and Nesting Equivalences , Microsoft Research
- 2001, Orthogonal Optimization of Subqueries and Aggregation
- 2003, WinMagic : Subquery Elimination Using Window Aggregation
- 2006, Cost-based query transformation in Oracle
- 2009, Enhanced subquery optimizations in Oracle
- 2015, Unnesting Arbitrary Queries , BTW
- 2017, The Complete Story of Joins
Join Order Optimization
- Analysis of Two Existing and One New Dynamic Programming Algorithm for the Generation of Optimal Bushy Join Trees without Cross Products
- 2008, Dynamic Programming Strikes Back
- 2013, On the Correct and Complete Enumeration of the Core Search Space
- 2018, Adaptive Optimization of Very Large Join Queries
- Improving Join Reorderability with Compensation Operators
Functional Dependency & Physical Properties
- 1996, Fundamental Techniques for Order Optimization , SIGMOD
- 2004, An Efficient Framework for Order Optimization , ICDE
- 2010, Incorporating Partitioning and Parallel Plans into the SCOPE Optimizer , ICDE
- 2011, Accelerating Queries with GroupBy and Join by Group join , VLDB
Cost Model
- Modelling Costs for a MM-DBMS , in Real-Time Databases
- SEEKing the truth about ad hoc join costs , IBM Almaden Research Center
- Approximation Schemes for Many-Objective Query Optimization
Statistics
Histogram
- 1984, Accurate Estimation of the Number of Tuples Satisfying a Condition
- Optimal Histograms for Limiting Worst-Case Error Propagation in the Size of Join Results , ACM Trans. on Database Systems
- Universality of Serial Histograms
- Balancing Histogram Optimality and Practicality for Query Result Size Estimation
- Improved Histograms for Selectivity Estimation of Range Predicates
- 1997,
- The History of Histograms
- Preventing Bad Plans by Bounding the Impact of Cardinality Estimation Errors
- Histograms Reloaded: The Merits of Bucket Diversity
- Exploiting Ordered Dictionaries to Efficiently Construct Histograms with Q-Error Guarantees in SAP HANA
- How Good Are Query Optimizers, Really?
Probabilistic Counting
- Towards Estimation Error Guarantees for Distinct Values , SIGMOD/PODS
- Distinct Sampling for Highly-Accurate Answers to Distinct Values Queries and Event Reports
- 2005, An Improved Data Stream Summary:The Count-Min Sketch and its Applications Journal of Algorithms
- 2007, New Estimation Algorithms for Streaming Data: Count-min Can Do More
- HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm , ACM
Others
- 2002, Exploiting Statistics on Query Expressions for Optimization
- Adaptive Statistics in Oracle 12c
- Statisticum: Data Statistics Management in SAP HANA
- 2019, Pessimistic Cardinality Estimation: Tighter Upper Bounds for Intermediate Join Cardinalities
- Deep Unsupervised Cardinality Estimation
- 2020, NeuroCard: One Cardinality Estimator for All Tables
Query feedback loop
- LEO – DB2’s LEarning Optimizer
- Robust Query Processing through Progressive Optimization
- Automated Statistics Collection in DB2 UDB
- Optimizer plan change management: improved stability and performance in Oracle 11g
- Adaptive Query Processing in the Looking Glass , CIDR
MPP optimization
- DB2 Parallel Edition , IBM System Journal
- Parallel SQL execution in Oracle 10g
- Query Optimization in Microsoft SQL Server PDW , ACM
- Adaptive and big data scale parallel execution in Oracle , VLDB
- Optimizing Queries over Partitioned Tables in MPP Systems
Benchmark
TPC-H Analyzed: Hidden Messages and Lessons Learned from an Influential Benchmark,
tpc.org Quantifying TPCH Choke Points and Their Optimizations