Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- ## Query
- DDL:
- ```sql
- CREATE TABLE foo (a int, b int, c int) DISTRIBUTED BY (a);
- ```
- Query:
- ```sql
- SELECT aggfn(DISTINCT b) FROM foo;
- ```
- ## Splitting a DQA
- ### A somehow noisy output:
- ```
- Xform: CXformSplitDQA
- Input:
- +--CLogicalGbAgg( Global ) Grp Cols: [][Global], Minimal Grp Cols: [], Generates Duplicates :[ 0 ] origin: [Grp:5, GrpExpr:0]
- |--CLogicalGet "foo" ("foo"), Columns: ["a" (0), "b" (1), "c" (2), "ctid" (3), "xmin" (4), "cmin" (5), "xmax" (6), "cmax" (7), "tableoid" (8), "gp_segment_id" (9)] Key sets: {[3,9]} origin: [Grp:0, GrpExpr:0]
- +--CScalarProjectList origin: [Grp:4, GrpExpr:0]
- +--CScalarProjectElement "avg" (10) origin: [Grp:3, GrpExpr:0]
- +--CScalarAggFunc (avg , Distinct: true , Aggregate Stage: Global) origin: [Grp:2, GrpExpr:0]
- +--CScalarIdent "b" (1) origin: [Grp:1, GrpExpr:0]
- Output:
- Alternatives:
- 0:
- +--CLogicalGbAgg( Global ) Grp Cols: [][Global], Minimal Grp Cols: [], Generates Duplicates :[ 0 ]
- |--CLogicalGbAgg( Intermediate ) Grp Cols: ["b" (1)][Intermediate], Minimal Grp Cols: [], Distinct Cols:["b" (1)], Generates Duplicates :[ 0 ]
- | |--CLogicalGbAgg( Local ) Grp Cols: ["b" (1)][Local], Minimal Grp Cols: [], Generates Duplicates :[ 1 ]
- | | |--CLogicalGet "foo" ("foo"), Columns: ["a" (0), "b" (1), "c" (2), "ctid" (3), "xmin" (4), "cmin" (5), "xmax" (6), "cmax" (7), "tableoid" (8), "gp_segment_id" (9)] Key sets: {[3,9]} origin: [Grp:0, GrpExpr:0]
- | | +--CScalarProjectList
- | +--CScalarProjectList
- +--CScalarProjectList
- +--CScalarProjectElement "avg" (10)
- +--CScalarAggFunc (avg , Distinct: false , Aggregate Stage: Global)
- +--CScalarIdent "b" (1)
- 1:
- +--CLogicalGbAgg( Global ) Grp Cols: [][Global], Minimal Grp Cols: [], Generates Duplicates :[ 0 ]
- |--CLogicalGbAgg( Local ) Grp Cols: [][Local], Minimal Grp Cols: [], Generates Duplicates :[ 0 ]
- | |--CLogicalGet "foo" ("foo"), Columns: ["a" (0), "b" (1), "c" (2), "ctid" (3), "xmin" (4), "cmin" (5), "xmax" (6), "cmax" (7), "tableoid" (8), "gp_segment_id" (9)] Key sets: {[3,9]} origin: [Grp:0, GrpExpr:0]
- | +--CScalarProjectList
- | +--CScalarProjectElement "ColRef_0011" (11)
- | +--CScalarAggFunc (avg , Distinct: true , Aggregate Stage: Local)
- | +--CScalarIdent "b" (1)
- +--CScalarProjectList
- +--CScalarProjectElement "avg" (10)
- +--CScalarAggFunc (avg , Distinct: false , Aggregate Stage: Global)
- +--CScalarIdent "ColRef_0011" (11)
- 2:
- +--CLogicalGbAgg( Global ) Grp Cols: [][Global], Minimal Grp Cols: [], Generates Duplicates :[ 0 ]
- |--CLogicalGbAgg( Local ) Grp Cols: ["b" (1)][Local], Minimal Grp Cols: [], Generates Duplicates :[ 0 ]
- | |--CLogicalGet "foo" ("foo"), Columns: ["a" (0), "b" (1), "c" (2), "ctid" (3), "xmin" (4), "cmin" (5), "xmax" (6), "cmax" (7), "tableoid" (8), "gp_segment_id" (9)] Key sets: {[3,9]} origin: [Grp:0, GrpExpr:0]
- | +--CScalarProjectList
- +--CScalarProjectList
- +--CScalarProjectElement "avg" (10)
- +--CScalarAggFunc (avg , Distinct: false , Aggregate Stage: Global)
- +--CScalarIdent "b" (1)
- ```
- ### Let's parse the above
- Given an input of the following (simplified) shape:
- ```
- Agg
- Output: aggfn(DISTINCT b)
- -> RELATION
- ```
- -- where `RELATION` is a wildcard, in this case, it is `LogicalGet "foo"` -- the transform `CXformSplitDQA` generates 3 logical alternatives:
- 1. Pre-agg deduplicate with global and local GropuBy's (a.k.a. 3-phase agg: Agg-GroupBy-GroupBy: aggregate function is not split)
- 1. A split-agg alternative (a.k.a. 2-stage agg: Agg-Agg)
- 1. Pre-agg deduplicate with local GroupBy (Agg-GroupBy: aggregate function is not split)
- #### Pre-agg deduplicate with global and local GropuBy's
- ```
- Agg 🐱
- Output: NORMAL aggfn(b)
- -> GroupBy (b) [global dedup] 🐶
- -> GroupBy (b) [local dedup] 🐷
- -> WHATEVER 🐴
- ```
- (Possibly) final physical plan:
- ```
- Aggregate 🐱
- Output: NORMAL aggfn(b)
- -> Gather Motion 3:1 (slice2; segments: 3)
- Output: b
- -> GroupAggregate 🐶
- Output: b
- Group By: b
- -> Sort 🐶
- Output: b
- Sort Key: b
- -> Redistribute Motion 3:3 (slice1; segments: 3)
- Output: b
- Hash Key: b
- -> GroupAggregate 🐷
- Output: b
- Group By: b
- -> Sort 🐷
- Output: b
- Sort Key: b
- -> Table Scan on foo 🐴
- Output: b
- ```
- ##### Exercise: what happens when "WHATEVER" is distributed on the `DISTINCT` column?
- #### "Split-Agg" alternative:
- ```
- Agg 🐱
- Output: FINAL aggfn(b)
- -> Agg 🐶
- Output: PARTIAL aggfn(DISTINCT b)
- -> WHATEVER 🐴
- ```
- Possibly final physical plan:
- ```
- Aggregate 🐱
- Output: FINAL aggfn(b)
- -> Gather Motion 3:1 (slice2; segments: 3)
- Output: (PARTIAL aggfn(DISTINCT b))
- -> Aggregate 🐶
- Output: PARTIAL aggfn(DISTINCT b)
- -> Redistribute Motion 3:3 (slice1; segments: 3)
- Output: b
- Hash Key: b
- -> Table Scan on foo 🐴
- Output: b
- ```
- ##### Exercise: what happens when "WHATEVER" is distributed on the `DISTINCT` column?
- Then you won't need the bottom motion:
- ```
- Aggregate 🐱
- Output: FINAL aggfn(a)
- -> Gather Motion 3:1 (slice2; segments: 3)
- Output: (PARTIAL aggfn(DISTINCT a))
- -> Aggregate 🐶
- Output: PARTIAL aggfn(DISTINCT a)
- -> Table Scan on foo 🐴
- Output: a
- ```
- #### Pre-agg deduplicate with local GroupBy
- ```
- Agg 🐱
- Output: NORMAL aggfn(b)
- -> GroupBy (b) [global dedup] 🐶
- -> WHATEVER 🐴
- ```
- (Possibly) final physical plan
- ```
- Aggregate 🐱
- Output: NORMAL aggfn(b)
- -> Gather Motion 3:1 (slice2; segments: 3)
- Output: b
- -> GroupAggregate 🐶
- Output: b
- Group By: b
- -> Sort 🐶
- Output: b
- Sort Key: b
- -> Redistribute Motion 3:3 (slice1; segments: 3)
- Output: b
- Hash Key: b
- -> Table Scan on foo 🐴
- Output: b
- ```
- ##### Exercise: what happens when "WHATEVER" is distributed on the `DISTINCT` column?
- Then you won't need the bottom motion:
- ```
- Aggregate 🐱
- Output: NORMAL aggfn(a)
- -> Gather Motion 3:1 (slice2; segments: 3)
- Output: a
- -> GroupAggregate 🐶
- Output: a
- Group By: a
- -> Sort
- Output: a
- Sort Key: a
- -> Table Scan on foo 🐴
- Output: a
- ```
- ## Exercise for readers:
- 1. What happens if the `aggfn` doesn't have a combine (`prefunc`) function?
- 1. What happens when we have a `GROUP BY` in the query? Say
- ```sql
- SELECT aggfn(DISTINCT b) FROM foo GROUP BY a;
- ```
- OR
- ```sql
- SELECT aggfn(DISTINCT b) FROM foo GROUP BY c;
- ```
- 1. What if there is more than one aggregate function in the projection? Say
- ```sql
- SELECT aggfn1(DISTINCT b), aggfn2(DISTINCT c) FROM foo GROUP BY a;
- ```
- OR
- ```sql
- SELECT aggfn1(DISTINCT b), aggfn2(DISTINCT a) FROM foo GROUP BY c;
- ```
- 1 What should happen when the project mixes `DISTINCT` aggregate with regular aggregates?
- ## Debugging tips:
- 1. Error loading bytea type
- ```
- 2018-01-11 09:57:31:165576 PST,THD000,ERROR,"Lookup of object 0.17.1.0 in cache failed",
- ```
- Solution: turn off force 3-phase scalar dqa, or steal the `<dxl:Type>` metadata from other minidumps...
Add Comment
Please, Sign In to add comment