Guest User

Untitled

a guest
Jan 21st, 2018
76
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 7.82 KB | None | 0 0
  1. ## Query
  2.  
  3. DDL:
  4. ```sql
  5. CREATE TABLE foo (a int, b int, c int) DISTRIBUTED BY (a);
  6. ```
  7.  
  8. Query:
  9. ```sql
  10. SELECT aggfn(DISTINCT b) FROM foo;
  11. ```
  12.  
  13.  
  14. ## Splitting a DQA
  15.  
  16. ### A somehow noisy output:
  17.  
  18. ```
  19. Xform: CXformSplitDQA
  20. Input:
  21. +--CLogicalGbAgg( Global ) Grp Cols: [][Global], Minimal Grp Cols: [], Generates Duplicates :[ 0 ] origin: [Grp:5, GrpExpr:0]
  22. |--CLogicalGet "foo" ("foo"), Columns: ["a" (0), "b" (1), "c" (2), "ctid" (3), "xmin" (4), "cmin" (5), "xmax" (6), "cmax" (7), "tableoid" (8), "gp_segment_id" (9)] Key sets: {[3,9]} origin: [Grp:0, GrpExpr:0]
  23. +--CScalarProjectList origin: [Grp:4, GrpExpr:0]
  24. +--CScalarProjectElement "avg" (10) origin: [Grp:3, GrpExpr:0]
  25. +--CScalarAggFunc (avg , Distinct: true , Aggregate Stage: Global) origin: [Grp:2, GrpExpr:0]
  26. +--CScalarIdent "b" (1) origin: [Grp:1, GrpExpr:0]
  27. Output:
  28. Alternatives:
  29. 0:
  30. +--CLogicalGbAgg( Global ) Grp Cols: [][Global], Minimal Grp Cols: [], Generates Duplicates :[ 0 ]
  31. |--CLogicalGbAgg( Intermediate ) Grp Cols: ["b" (1)][Intermediate], Minimal Grp Cols: [], Distinct Cols:["b" (1)], Generates Duplicates :[ 0 ]
  32. | |--CLogicalGbAgg( Local ) Grp Cols: ["b" (1)][Local], Minimal Grp Cols: [], Generates Duplicates :[ 1 ]
  33. | | |--CLogicalGet "foo" ("foo"), Columns: ["a" (0), "b" (1), "c" (2), "ctid" (3), "xmin" (4), "cmin" (5), "xmax" (6), "cmax" (7), "tableoid" (8), "gp_segment_id" (9)] Key sets: {[3,9]} origin: [Grp:0, GrpExpr:0]
  34. | | +--CScalarProjectList
  35. | +--CScalarProjectList
  36. +--CScalarProjectList
  37. +--CScalarProjectElement "avg" (10)
  38. +--CScalarAggFunc (avg , Distinct: false , Aggregate Stage: Global)
  39. +--CScalarIdent "b" (1)
  40. 1:
  41. +--CLogicalGbAgg( Global ) Grp Cols: [][Global], Minimal Grp Cols: [], Generates Duplicates :[ 0 ]
  42. |--CLogicalGbAgg( Local ) Grp Cols: [][Local], Minimal Grp Cols: [], Generates Duplicates :[ 0 ]
  43. | |--CLogicalGet "foo" ("foo"), Columns: ["a" (0), "b" (1), "c" (2), "ctid" (3), "xmin" (4), "cmin" (5), "xmax" (6), "cmax" (7), "tableoid" (8), "gp_segment_id" (9)] Key sets: {[3,9]} origin: [Grp:0, GrpExpr:0]
  44. | +--CScalarProjectList
  45. | +--CScalarProjectElement "ColRef_0011" (11)
  46. | +--CScalarAggFunc (avg , Distinct: true , Aggregate Stage: Local)
  47. | +--CScalarIdent "b" (1)
  48. +--CScalarProjectList
  49. +--CScalarProjectElement "avg" (10)
  50. +--CScalarAggFunc (avg , Distinct: false , Aggregate Stage: Global)
  51. +--CScalarIdent "ColRef_0011" (11)
  52. 2:
  53. +--CLogicalGbAgg( Global ) Grp Cols: [][Global], Minimal Grp Cols: [], Generates Duplicates :[ 0 ]
  54. |--CLogicalGbAgg( Local ) Grp Cols: ["b" (1)][Local], Minimal Grp Cols: [], Generates Duplicates :[ 0 ]
  55. | |--CLogicalGet "foo" ("foo"), Columns: ["a" (0), "b" (1), "c" (2), "ctid" (3), "xmin" (4), "cmin" (5), "xmax" (6), "cmax" (7), "tableoid" (8), "gp_segment_id" (9)] Key sets: {[3,9]} origin: [Grp:0, GrpExpr:0]
  56. | +--CScalarProjectList
  57. +--CScalarProjectList
  58. +--CScalarProjectElement "avg" (10)
  59. +--CScalarAggFunc (avg , Distinct: false , Aggregate Stage: Global)
  60. +--CScalarIdent "b" (1)
  61. ```
  62.  
  63. ### Let's parse the above
  64. Given an input of the following (simplified) shape:
  65.  
  66. ```
  67. Agg
  68. Output: aggfn(DISTINCT b)
  69. -> RELATION
  70. ```
  71.  
  72. -- where `RELATION` is a wildcard, in this case, it is `LogicalGet "foo"` -- the transform `CXformSplitDQA` generates 3 logical alternatives:
  73.  
  74. 1. Pre-agg deduplicate with global and local GropuBy's (a.k.a. 3-phase agg: Agg-GroupBy-GroupBy: aggregate function is not split)
  75. 1. A split-agg alternative (a.k.a. 2-stage agg: Agg-Agg)
  76. 1. Pre-agg deduplicate with local GroupBy (Agg-GroupBy: aggregate function is not split)
  77.  
  78. #### Pre-agg deduplicate with global and local GropuBy's
  79. ```
  80. Agg 🐱
  81. Output: NORMAL aggfn(b)
  82. -> GroupBy (b) [global dedup] 🐶
  83. -> GroupBy (b) [local dedup] 🐷
  84. -> WHATEVER 🐴
  85. ```
  86.  
  87. (Possibly) final physical plan:
  88. ```
  89. Aggregate 🐱
  90. Output: NORMAL aggfn(b)
  91. -> Gather Motion 3:1 (slice2; segments: 3)
  92. Output: b
  93. -> GroupAggregate 🐶
  94. Output: b
  95. Group By: b
  96. -> Sort 🐶
  97. Output: b
  98. Sort Key: b
  99. -> Redistribute Motion 3:3 (slice1; segments: 3)
  100. Output: b
  101. Hash Key: b
  102. -> GroupAggregate 🐷
  103. Output: b
  104. Group By: b
  105. -> Sort 🐷
  106. Output: b
  107. Sort Key: b
  108. -> Table Scan on foo 🐴
  109. Output: b
  110. ```
  111.  
  112. ##### Exercise: what happens when "WHATEVER" is distributed on the `DISTINCT` column?
  113.  
  114. #### "Split-Agg" alternative:
  115.  
  116. ```
  117. Agg 🐱
  118. Output: FINAL aggfn(b)
  119. -> Agg 🐶
  120. Output: PARTIAL aggfn(DISTINCT b)
  121. -> WHATEVER 🐴
  122. ```
  123.  
  124. Possibly final physical plan:
  125. ```
  126. Aggregate 🐱
  127. Output: FINAL aggfn(b)
  128. -> Gather Motion 3:1 (slice2; segments: 3)
  129. Output: (PARTIAL aggfn(DISTINCT b))
  130. -> Aggregate 🐶
  131. Output: PARTIAL aggfn(DISTINCT b)
  132. -> Redistribute Motion 3:3 (slice1; segments: 3)
  133. Output: b
  134. Hash Key: b
  135. -> Table Scan on foo 🐴
  136. Output: b
  137. ```
  138.  
  139. ##### Exercise: what happens when "WHATEVER" is distributed on the `DISTINCT` column?
  140. Then you won't need the bottom motion:
  141.  
  142. ```
  143. Aggregate 🐱
  144. Output: FINAL aggfn(a)
  145. -> Gather Motion 3:1 (slice2; segments: 3)
  146. Output: (PARTIAL aggfn(DISTINCT a))
  147. -> Aggregate 🐶
  148. Output: PARTIAL aggfn(DISTINCT a)
  149. -> Table Scan on foo 🐴
  150. Output: a
  151. ```
  152.  
  153. #### Pre-agg deduplicate with local GroupBy
  154. ```
  155. Agg 🐱
  156. Output: NORMAL aggfn(b)
  157. -> GroupBy (b) [global dedup] 🐶
  158. -> WHATEVER 🐴
  159. ```
  160.  
  161. (Possibly) final physical plan
  162.  
  163. ```
  164. Aggregate 🐱
  165. Output: NORMAL aggfn(b)
  166. -> Gather Motion 3:1 (slice2; segments: 3)
  167. Output: b
  168. -> GroupAggregate 🐶
  169. Output: b
  170. Group By: b
  171. -> Sort 🐶
  172. Output: b
  173. Sort Key: b
  174. -> Redistribute Motion 3:3 (slice1; segments: 3)
  175. Output: b
  176. Hash Key: b
  177. -> Table Scan on foo 🐴
  178. Output: b
  179. ```
  180.  
  181. ##### Exercise: what happens when "WHATEVER" is distributed on the `DISTINCT` column?
  182. Then you won't need the bottom motion:
  183.  
  184. ```
  185. Aggregate 🐱
  186. Output: NORMAL aggfn(a)
  187. -> Gather Motion 3:1 (slice2; segments: 3)
  188. Output: a
  189. -> GroupAggregate 🐶
  190. Output: a
  191. Group By: a
  192. -> Sort
  193. Output: a
  194. Sort Key: a
  195. -> Table Scan on foo 🐴
  196. Output: a
  197. ```
  198.  
  199.  
  200. ## Exercise for readers:
  201. 1. What happens if the `aggfn` doesn't have a combine (`prefunc`) function?
  202. 1. What happens when we have a `GROUP BY` in the query? Say
  203.  
  204. ```sql
  205. SELECT aggfn(DISTINCT b) FROM foo GROUP BY a;
  206. ```
  207. OR
  208. ```sql
  209. SELECT aggfn(DISTINCT b) FROM foo GROUP BY c;
  210. ```
  211.  
  212. 1. What if there is more than one aggregate function in the projection? Say
  213. ```sql
  214. SELECT aggfn1(DISTINCT b), aggfn2(DISTINCT c) FROM foo GROUP BY a;
  215. ```
  216.  
  217. OR
  218. ```sql
  219. SELECT aggfn1(DISTINCT b), aggfn2(DISTINCT a) FROM foo GROUP BY c;
  220. ```
  221.  
  222. 1 What should happen when the project mixes `DISTINCT` aggregate with regular aggregates?
  223.  
  224.  
  225. ## Debugging tips:
  226.  
  227. 1. Error loading bytea type
  228. ```
  229. 2018-01-11 09:57:31:165576 PST,THD000,ERROR,"Lookup of object 0.17.1.0 in cache failed",
  230. ```
  231.  
  232. Solution: turn off force 3-phase scalar dqa, or steal the `<dxl:Type>` metadata from other minidumps...
Add Comment
Please, Sign In to add comment