TY - GEN
T1 - Joins over UNION all queries in TeradataR
T2 - 44th ACM SIGMOD International Conference on Management of Data, SIGMOD 2018
AU - Al-Kateb, Mohammed
AU - Sinclair, Paul
AU - Au, Grace
AU - Nair, Sanjay
AU - Sirek, Mark
AU - Ma, Lu
AU - Eltabakh, Mohamed Y.
N1 - Publisher Copyright:
© 2018 Association for Computing Machinery.
PY - 2018/5/27
Y1 - 2018/5/27
N2 - The UNION ALL set operator is useful for combining data from multiple sources. With the emergence and prevalence of big data ecosystems in which data is typically stored on multiple systems, UNION ALL has become even more important in many analytical queries. In this project, we demonstrate novel cost-based optimization techniques implemented in Teradata Database for join queries involving UNION ALL views and derived tables. Instead of the naive and traditional way of spooling each UNION ALL branch to a common spool prior to performing join operations, which can be prohibitively expensive, we demonstrate new techniques developed in Teradata Database including: 1) Cost-based pushing of joins into UNION ALL branches, 2) Branch grouping strategy prior to join pushing, 3) Geography adjustment of the pushed relations to avoid unnecessary redistribution or duplication, 4) Iterative join decomposition of a pushed join to multiple joins, and 5) Combining multiple join steps into a single multisource join step. In the demonstration, we use the Teradata Visual Explain tool, which offers a rich set of visual rendering capabilities of query plans, the display of various metadata information for each plan step, and several interactive UGI options for end-users.
AB - The UNION ALL set operator is useful for combining data from multiple sources. With the emergence and prevalence of big data ecosystems in which data is typically stored on multiple systems, UNION ALL has become even more important in many analytical queries. In this project, we demonstrate novel cost-based optimization techniques implemented in Teradata Database for join queries involving UNION ALL views and derived tables. Instead of the naive and traditional way of spooling each UNION ALL branch to a common spool prior to performing join operations, which can be prohibitively expensive, we demonstrate new techniques developed in Teradata Database including: 1) Cost-based pushing of joins into UNION ALL branches, 2) Branch grouping strategy prior to join pushing, 3) Geography adjustment of the pushed relations to avoid unnecessary redistribution or duplication, 4) Iterative join decomposition of a pushed join to multiple joins, and 5) Combining multiple join steps into a single multisource join step. In the demonstration, we use the Teradata Visual Explain tool, which offers a rich set of visual rendering capabilities of query plans, the display of various metadata information for each plan step, and several interactive UGI options for end-users.
KW - Cost-based optimization
KW - Joins over union all
KW - Query optimization
UR - http://www.scopus.com/inward/record.url?scp=85048805184&partnerID=8YFLogxK
U2 - 10.1145/3183713.3193565
DO - 10.1145/3183713.3193565
M3 - Conference contribution
AN - SCOPUS:85048805184
T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data
SP - 1705
EP - 1708
BT - SIGMOD 2018 - Proceedings of the 2018 International Conference on Management of Data
A2 - Das, Gautam
A2 - Jermaine, Christopher
A2 - Eldawy, Ahmed
A2 - Bernstein, Philip
PB - Association for Computing Machinery
Y2 - 10 June 2018 through 15 June 2018
ER -