Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- DECLARE @ip1 TINYINT = ?;
- DECLARE @ip2 TINYINT = ?;
- DECLARE @ip3 TINYINT = ?;
- DECLARE @ip4 TINYINT = ?;
- DECLARE @top_rows BIGINT = 9223372036854775807;
- SELECT DISTINCT a.UserID, a.IP1, a.IP2, a.IP3, a.IP4
- FROM (
- SELECT TOP (@top_rows)
- COALESCE(a1.TableAID, a2.TableAID, a3.TableAID, a4.TableAID) TableAID
- FROM (SELECT 1 DUMMY) dummy
- LEFT OUTER JOIN [dbo].[TableA] a1 ON a1.IP1 = @ip1
- LEFT OUTER JOIN [dbo].[TableA] a2 ON a2.IP2 = @ip2
- LEFT OUTER JOIN [dbo].[TableA] a3 ON a3.IP3 = @ip3
- LEFT OUTER JOIN [dbo].[TableA] a4 ON a4.IP4 = @ip4
- WHERE 1=1
- AND (@ip1 IS NULL OR @ip2 IS NULL OR a1.TableAID = a2.TableAID)
- AND (@ip1 IS NULL OR @ip3 IS NULL OR a1.TableAID = a3.TableAID)
- AND (@ip1 IS NULL OR @ip4 IS NULL OR a1.TableAID = a4.TableAID)
- AND (@ip2 IS NULL OR @ip3 IS NULL OR a2.TableAID = a3.TableAID)
- AND (@ip2 IS NULL OR @ip4 IS NULL OR a2.TableAID = a4.TableAID)
- AND (@ip3 IS NULL OR @ip4 IS NULL OR a3.TableAID = a4.TableAID)
- ) t
- INNER JOIN [dbo].[TableA] a ON t.TableAID = a.TableAID
- INNER JOIN [dbo].[TableB] b ON b.UserID = a.UserID
- OPTION (RECOMPILE, MAXDOP 1, QUERYTRACEON 9481, OPTIMIZE FOR (@top_rows = 1000));
- There's a lot going on there so I'll break it down step by step working from the inside out.
- SELECT TOP (@top_rows)
- This part serves two purposes. The `TOP` prevents the query optimizer from rearranging parts of the plan and that was advantageous for this query. The row estimate from the `t` derived table can also be controlled with the query hint at the bottom. With the legacy CE the row estimate was too high which lead to a clustered index scan at the end.
- COALESCE(a1.TableAID, a2.TableAID, a3.TableAID, a4.TableAID) TableAID
- If some of the IP pieces are `NULL` then `TableAID` will be `NULL` from that join.
- FROM (SELECT 1 DUMMY) dummy
- This is an anchor used for the joins. Some of the joins may be eliminated because they aren't necessary depending on the parameter values.
- LEFT OUTER JOIN [dbo].[TableA] a1 ON a1.IP1 = @ip1
- LEFT OUTER JOIN [dbo].[TableA] a2 ON a2.IP2 = @ip2
- LEFT OUTER JOIN [dbo].[TableA] a3 ON a3.IP3 = @ip3
- LEFT OUTER JOIN [dbo].[TableA] a4 ON a4.IP4 = @ip4
- Here I am both taking advantage of the [parameter embedding optimization][2] and splitting out the joins so that each join can be satisfied with a covering index. This optimization is only available with a certain service pack (SP4?) so make sure that you're patched. One advantage here is that joins can be eliminated from the query depending on the parameter values. For example, if `@ip3` is `NULL` then the query optimizer won't join to `a3` at all. Splitting up the joins like this also encourages the index join strategy that I talked about earlier because should be much cheaper than a clustered index scan.
- WHERE 1=1
- AND (@ip1 IS NULL OR @ip2 IS NULL OR a1.TableAID = a2.TableAID)
- AND (@ip1 IS NULL OR @ip3 IS NULL OR a1.TableAID = a3.TableAID)
- AND (@ip1 IS NULL OR @ip4 IS NULL OR a1.TableAID = a4.TableAID)
- AND (@ip2 IS NULL OR @ip3 IS NULL OR a2.TableAID = a3.TableAID)
- AND (@ip2 IS NULL OR @ip4 IS NULL OR a2.TableAID = a4.TableAID)
- AND (@ip3 IS NULL OR @ip4 IS NULL OR a3.TableAID = a4.TableAID)
- Here I'm completing the index join with the join on `TableAID`. The filters that aren't necessary based on the parameter values will be optimized away via the `RECOMPILE` hint at the bottom. I tried to write this query in a way that would allow the query optimizer to reorder the joins as needed but I'm not sure if I was successful.
- INNER JOIN [dbo].[TableA] a ON t.TableAID = a.TableAID
- This join gets the columns that we're missing after join elimination and other columns needed for the joins to the other tables. We're looking for a loop join and index seeks on the clustered key here.
- INNER JOIN [dbo].[TableB] b ON b.UserID = a.UserID
- This is the join that you had in the original query. More joins can be added here as necessary.
- OPTION (RECOMPILE, MAXDOP 1, QUERYTRACEON 9481, OPTIMIZE FOR (@top_rows = 1000));
- Finally we get to the last line. `RECOMPILE` is there for the parameter embedding optimization. `MAXDOP 1` is there because I got better performance with it for this query (the reasons for this are outside the scope of this answer). `QUERYTRACEON 9481` is used to force the legacy CE and you should remove that from your version of the query. `OPTIMIZE FOR` directs the optimizer to create a plan based on the value that you give it as opposed to the literal value of the `@top_rows` parameter. You want the value to be low enough so that you don't get a clustered index scan on `a`. Depending on what you pick here you may get nested loop joins or merge joins in the derived table `t`. Try a value that reflects how many rows this query usually returns and go from there. There may be a benefit in bumping it up to avoid some of the loop joins.
Advertisement
Add Comment
Please, Sign In to add comment