Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- SELECT p.*
- FROM dbo.patextract8K('Pay me $50.17 now or $1000 later!','[^$0-9.]') AS p;
- itemNumber itemIndex itemLength item
- ----------- ---------- ----------- --------
- 1 8 6 $50.17
- 2 22 5 $1000
- -- Easily consumable sample data
- DECLARE @table TABLE (shiftId VARCHAR(2), empKey VARCHAR(5), workDuration VARCHAR(100));
- INSERT @table(shiftId,empKey,workDuration)
- VALUES
- ('K','A','12PM - 4PM'),
- ('K','B','12PM - 4PM'),
- ('K','A','9AM - 12PM'),
- ('K','A','4PM - 6PM');
- -- Solution
- SELECT
- shiftId = f.shiftId,
- KeyIn = '1/1,'+REPLACE(CONVERT(VARCHAR(10),
- MIN(CAST(f.c1 AS TIME)) OVER (),100),':00',''),
- KeyOut = '1/1,'+REPLACE(CONVERT(VARCHAR(10),
- MAX(CAST(f.c2 AS TIME)) OVER (),100),':00',''),
- empShift = f.empKey,
- othEmpIn = '1/1,'+f.c1,
- othEmpOut = '1/1,'+f.c2
- FROM
- (
- SELECT t.shiftId, t.empKey, t.workDuration,
- c1 = MAX(CASE p.itemNumber WHEN 1 THEN p.item END),
- c2 = MAX(CASE p.itemNumber WHEN 2 THEN p.item END)
- FROM @table AS t
- CROSS APPLY dbo.patExtract8k(t.workDuration, '[^0-9APM]') AS p
- CROSS APPLY (VALUES(CAST(p.item AS TIME))) AS tm(N)
- GROUP BY t.shiftId, t.empKey, t.workDuration
- ) AS f;
- shiftId KeyIn KeyOut empShift othEmpIn othEmpOut
- ------- ---------- ------------ -------- ------------ ------------
- K 1/1,9AM 1/1,6PM A 1/1,12PM 1/1,4PM
- K 1/1,9AM 1/1,6PM A 1/1,4PM 1/1,6PM
- K 1/1,9AM 1/1,6PM A 1/1,9AM 1/1,12PM
- K 1/1,9AM 1/1,6PM B 1/1,12PM 1/1,4PM
- CREATE FUNCTION dbo.rangeAB
- (
- @low bigint,
- @high bigint,
- @gap bigint,
- @row1 bit
- )
- /****************************************************************************************
- [Purpose]:
- Creates up to 531,441,000,000 sequentia1 integers numbers beginning with @low and ending
- with @high. Used to replace iterative methods such as loops, cursors and recursive CTEs
- to solve SQL problems. Based on Itzik Ben-Gan's getnums function with some tweeks and
- enhancements and added functionality. The logic for getting rn to begin at 0 or 1 is
- based comes from Jeff Moden's fnTally function.
- The name range because it's similar to clojure's range function. The name "rangeAB" as
- used because "range" is a reserved SQL keyword.
- [Author]: Alan Burstein
- [Compatibility]:
- SQL Server 2008+ and Azure SQL Database
- [Syntax]:
- SELECT r.RN, r.OP, r.N1, r.N2
- FROM dbo.rangeAB(@low,@high,@gap,@row1) AS r;
- [Parameters]:
- @low = a bigint that represents the lowest value for n1.
- @high = a bigint that represents the highest value for n1.
- @gap = a bigint that represents how much n1 and n2 will increase each row; @gap also
- represents the difference between n1 and n2.
- @row1 = a bit that represents the first value of rn. When @row = 0 then rn begins
- at 0, when @row = 1 then rn will begin at 1.
- [Returns]:
- Inline Table Valued Function returns:
- rn = bigint; a row number that works just like T-SQL ROW_NUMBER() except that it can
- start at 0 or 1 which is dictated by @row1.
- op = bigint; returns the "opposite number that relates to rn. When rn begins with 0 and
- ends with 10 then 10 is the opposite of 0, 9 the opposite of 1, etc. When rn begins
- with 1 and ends with 5 then 1 is the opposite of 5, 2 the opposite of 4, etc...
- n1 = bigint; a sequential number starting at the value of @low and incrimentingby the
- value of @gap until it is less than or equal to the value of @high.
- n2 = bigint; a sequential number starting at the value of @low+@gap and incrimenting
- by the value of @gap.
- [Dependencies]:
- N/A
- [Developer Notes]:
- 1. The lowest and highest possible numbers returned are whatever is allowable by a
- bigint. The function, however, returns no more than 531,441,000,000 rows (8100^3).
- 2. @gap does not affect rn, rn will begin at @row1 and increase by 1 until the last row
- unless its used in a query where a filter is applied to rn.
- 3. @gap must be greater than 0 or the function will not return any rows.
- 4. Keep in mind that when @row1 is 0 then the highest row-number will be the number of
- rows returned minus 1
- 5. If you only need is a sequential set beginning at 0 or 1 then, for best performance
- use the RN column. Use N1 and/or N2 when you need to begin your sequence at any
- number other than 0 or 1 or if you need a gap between your sequence of numbers.
- 6. Although @gap is a bigint it must be a positive integer or the function will
- not return any rows.
- 7. The function will not return any rows when one of the following conditions are true:
- * any of the input parameters are NULL
- * @high is less than @low
- * @gap is not greater than 0
- To force the function to return all NULLs instead of not returning anything you can
- add the following code to the end of the query:
- UNION ALL
- SELECT NULL, NULL, NULL, NULL
- WHERE NOT (@high&@low&@gap&@row1 IS NOT NULL AND @high >= @low AND @gap > 0)
- This code was excluded as it adds a ~5% performance penalty.
- 8. There is no performance penalty for sorting by rn ASC; there is a large performance
- penalty for sorting in descending order WHEN @row1 = 1; WHEN @row1 = 0
- If you need a descending sort the use op in place of rn then sort by rn ASC.
- Best Practices:
- --===== 1. Using RN (rownumber)
- -- (1.1) The best way to get the numbers 1,2,3...@high (e.g. 1 to 5):
- SELECT RN FROM dbo.rangeAB(1,5,1,1);
- -- (1.2) The best way to get the numbers 0,1,2...@high-1 (e.g. 0 to 5):
- SELECT RN FROM dbo.rangeAB(0,5,1,0);
- --===== 2. Using OP for descending sorts without a performance penalty
- -- (2.1) The best way to get the numbers 5,4,3...@high (e.g. 5 to 1):
- SELECT op FROM dbo.rangeAB(1,5,1,1) ORDER BY rn ASC;
- -- (2.2) The best way to get the numbers 0,1,2...@high-1 (e.g. 5 to 0):
- SELECT op FROM dbo.rangeAB(1,6,1,0) ORDER BY rn ASC;
- --===== 3. Using N1
- -- (3.1) To begin with numbers other than 0 or 1 use N1 (e.g. -3 to 3):
- SELECT N1 FROM dbo.rangeAB(-3,3,1,1);
- -- (3.2) ROW_NUMBER() is built in. If you want a ROW_NUMBER() include RN:
- SELECT RN, N1 FROM dbo.rangeAB(-3,3,1,1);
- -- (3.3) If you wanted a ROW_NUMBER() that started at 0 you would do this:
- SELECT RN, N1 FROM dbo.rangeAB(-3,3,1,0);
- --===== 4. Using N2 and @gap
- -- (4.1) To get 0,10,20,30...100, set @low to 0, @high to 100 and @gap to 10:
- SELECT N1 FROM dbo.rangeAB(0,100,10,1);
- -- (4.2) Note that N2=N1+@gap; this allows you to create a sequence of ranges.
- -- For example, to get (0,10),(10,20),(20,30).... (90,100):
- SELECT N1, N2 FROM dbo.rangeAB(0,90,10,1);
- -- (4.3) Remember that a rownumber is included and it can begin at 0 or 1:
- SELECT RN, N1, N2 FROM dbo.rangeAB(0,90,10,1);
- [Examples]:
- --===== 1. Generating Sample data (using rangeAB to create "dummy rows")
- -- The query below will generate 10,000 ids and random numbers between 50,000 and 500,000
- SELECT
- someId = r.rn,
- someNumer = ABS(CHECKSUM(NEWID())%450000)+50001
- FROM rangeAB(1,10000,1,1) r;
- --===== 2. Create a series of dates; rn is 0 to include the first date in the series
- DECLARE @startdate DATE = '20180101', @enddate DATE = '20180131';
- SELECT r.rn, calDate = DATEADD(dd, r.rn, @startdate)
- FROM dbo.rangeAB(1, DATEDIFF(dd,@startdate,@enddate),1,0) r;
- GO
- --===== 3. Splitting (tokenizing) a string with fixed sized items
- -- given a delimited string of identifiers that are always 7 characters long
- DECLARE @string VARCHAR(1000) = 'A601225,B435223,G008081,R678567';
- SELECT
- itemNumber = r.rn, -- item's ordinal position
- itemIndex = r.n1, -- item's position in the string (it's CHARINDEX value)
- item = SUBSTRING(@string, r.n1, 7) -- item (token)
- FROM dbo.rangeAB(1, LEN(@string), 8,1) r;
- GO
- --===== 4. Splitting (tokenizing) a string with random delimiters
- DECLARE @string VARCHAR(1000) = 'ABC123,999F,XX,9994443335';
- SELECT
- itemNumber = ROW_NUMBER() OVER (ORDER BY r.rn), -- item's ordinal position
- itemIndex = r.n1+1, -- item's position in the string (it's CHARINDEX value)
- item = SUBSTRING
- (
- @string,
- r.n1+1,
- ISNULL(NULLIF(CHARINDEX(',',@string,r.n1+1),0)-r.n1-1, 8000)
- ) -- item (token)
- FROM dbo.rangeAB(0,DATALENGTH(@string),1,1) r
- WHERE SUBSTRING(@string,r.n1,1) = ',' OR r.n1 = 0;
- -- logic borrowed from: http://www.sqlservercentral.com/articles/Tally+Table/72993/
- --===== 5. Grouping by a weekly intervals
- -- 5.1. how to create a series of start/end dates between @startDate & @endDate
- DECLARE @startDate DATE = '1/1/2015', @endDate DATE = '2/1/2015';
- SELECT
- WeekNbr = r.RN,
- WeekStart = DATEADD(DAY,r.N1,@StartDate),
- WeekEnd = DATEADD(DAY,r.N2-1,@StartDate)
- FROM dbo.rangeAB(0,datediff(DAY,@StartDate,@EndDate),7,1) r;
- GO
- -- 5.2. LEFT JOIN to the weekly interval table
- BEGIN
- DECLARE @startDate datetime = '1/1/2015', @endDate datetime = '2/1/2015';
- -- sample data
- DECLARE @loans TABLE (loID INT, lockDate DATE);
- INSERT @loans SELECT r.rn, DATEADD(dd, ABS(CHECKSUM(NEWID())%32), @startDate)
- FROM dbo.rangeAB(1,50,1,1) r;
- -- solution
- SELECT
- WeekNbr = r.RN,
- WeekStart = dt.WeekStart,
- WeekEnd = dt.WeekEnd,
- total = COUNT(l.lockDate)
- FROM dbo.rangeAB(0,datediff(DAY,@StartDate,@EndDate),7,1) r
- CROSS APPLY (VALUES (
- CAST(DATEADD(DAY,r.N1,@StartDate) AS DATE),
- CAST(DATEADD(DAY,r.N2-1,@StartDate) AS DATE))) dt(WeekStart,WeekEnd)
- LEFT JOIN @loans l ON l.lockDate BETWEEN dt.WeekStart AND dt.WeekEnd
- GROUP BY r.RN, dt.WeekStart, dt.WeekEnd ;
- END;
- --===== 6. Identify the first vowel and last vowel in a along with their positions
- DECLARE @string VARCHAR(200) = 'This string has vowels';
- SELECT TOP(1) position = r.rn, letter = SUBSTRING(@string,r.rn,1)
- FROM dbo.rangeAB(1,LEN(@string),1,1) r
- WHERE SUBSTRING(@string,r.rn,1) LIKE '%[aeiou]%'
- ORDER BY r.rn;
- -- To avoid a sort in the execution plan we'll use op instead of rn
- SELECT TOP(1) position = r.op, letter = SUBSTRING(@string,r.op,1)
- FROM dbo.rangeAB(1,LEN(@string),1,1) r
- WHERE SUBSTRING(@string,r.rn,1) LIKE '%[aeiou]%'
- ORDER BY r.rn;
- ---------------------------------------------------------------------------------------
- [Revision History]:
- Rev 00 - 20140518 - Initial Development - Alan Burstein
- Rev 01 - 20151029 - Added 65 rows to make L1=465; 465^3=100.5M. Updated comment section
- - Alan Burstein
- Rev 02 - 20180613 - Complete re-design including opposite number column (op)
- Rev 03 - 20180920 - Added additional CROSS JOIN to L2 for 530B rows max - Alan Burstein
- ****************************************************************************************/
- RETURNS TABLE WITH SCHEMABINDING AS RETURN
- WITH L1(N) AS
- (
- SELECT 1
- FROM (VALUES
- (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),
- (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),
- (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),
- (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),
- (0),(0)) T(N) -- 90 values
- ),
- L2(N) AS (SELECT 1 FROM L1 a CROSS JOIN L1 b CROSS JOIN L1 c),
- iTally AS (SELECT rn = ROW_NUMBER() OVER (ORDER BY (SELECT 1)) FROM L2 a CROSS JOIN L2 b)
- SELECT
- r.RN,
- r.OP,
- r.N1,
- r.N2
- FROM
- (
- SELECT
- RN = 0,
- OP = (@high-@low)/@gap,
- N1 = @low,
- N2 = @gap+@low
- WHERE @row1 = 0
- UNION ALL -- COALESCE required in the TOP statement below for error handling purposes
- SELECT TOP (ABS((COALESCE(@high,0)-COALESCE(@low,0))/COALESCE(@gap,0)+COALESCE(@row1,1)))
- RN = i.rn,
- OP = (@high-@low)/@gap+(2*@row1)-i.rn,
- N1 = (i.rn-@row1)*@gap+@low,
- N2 = (i.rn-(@row1-1))*@gap+@low
- FROM iTally AS i
- ORDER BY rn
- ) AS r
- WHERE @high&@low&@gap&@row1 IS NOT NULL AND @high >= @low AND @gap > 0;
- GO
- IF OBJECT_ID('dbo.NGrams8k', 'IF') IS NOT NULL DROP FUNCTION dbo.NGrams8k;
- GO
- CREATE FUNCTION dbo.NGrams8k
- (
- @string VARCHAR(8000), -- Input string
- @N INT -- requested token size
- )
- /*****************************************************************************************
- [Purpose]:
- A character-level N-Grams function that outputs a contiguous stream of @N-sized tokens
- based on an input string (@string). Accepts strings up to 8000 varchar characters long.
- For more information about N-Grams see: http://en.wikipedia.org/wiki/N-gram.
- [Author]:
- Alan Burstein
- [Compatibility]:
- SQL Server 2008+, Azure SQL Database
- [Syntax]:
- --===== Autonomous
- SELECT ng.position, ng.token
- FROM dbo.NGrams8k(@string,@N) AS ng;
- --===== Against a table using APPLY
- SELECT s.SomeID, ng.position, ng.token
- FROM dbo.SomeTable AS s
- CROSS APPLY dbo.NGrams8K(s.SomeValue,@N) AS ng;
- [Parameters]:
- @string = The input string to split into tokens.
- @N = The size of each token returned.
- [Returns]:
- Position = BIGINT; the position of the token in the input string
- token = VARCHAR(8000); a @N-sized character-level N-Gram token
- [Dependencies]:
- 1. dbo.rangeAB (iTVF)
- [Developer Notes]:
- 1. NGrams8k is not case sensitive;
- 2. Many functions that use NGrams8k will see a huge performance gain when the optimizer
- creates a parallel execution plan. One way to get a parallel query plan (if the
- optimizer does not choose one) is to use make_parallel by Adam Machanic which can be
- found here:
- sqlblog.com/blogs/adam_machanic/archive/2013/07/11/next-level-parallel-plan-porcing.aspx
- 3. When @N is less than 1 or greater than the datalength of the input string then no
- tokens (rows) are returned. If either @string or @N are NULL no rows are returned.
- This is a debatable topic but the thinking behind this decision is that: because you
- can't split 'xxx' into 4-grams, you can't split a NULL value into unigrams and you
- can't turn anything into NULL-grams, no rows should be returned.
- For people who would prefer that a NULL input forces the function to return a single
- NULL output you could add this code to the end of the function:
- UNION ALL
- SELECT 1, NULL
- WHERE NOT(@N > 0 AND @N <= DATALENGTH(@string)) OR (@N IS NULL OR @string IS NULL)
- 4. NGrams8k is deterministic. For more about deterministic functions see:
- https://msdn.microsoft.com/en-us/library/ms178091.aspx
- [Examples]:
- --===== 1. Split the string, "abcd" into unigrams, bigrams and trigrams
- SELECT ng.position, ng.token FROM dbo.NGrams8k('abcd',1) AS ng; -- unigrams (@N=1)
- SELECT ng.position, ng.token FROM dbo.NGrams8k('abcd',2) AS ng; -- bigrams (@N=2)
- SELECT ng.position, ng.token FROM dbo.NGrams8k('abcd',3) AS ng; -- trigrams (@N=3)
- --===== How many times the substring "AB" appears in each record
- DECLARE @table TABLE(stringID int identity primary key, string varchar(100));
- INSERT @table(string) VALUES ('AB123AB'),('123ABABAB'),('!AB!AB!'),('AB-AB-AB-AB-AB');
- SELECT string, occurances = COUNT(*)
- FROM @table t
- CROSS APPLY dbo.NGrams8k(t.string,2) AS ng
- WHERE ng.token = 'AB'
- GROUP BY string;
- [Revision History]:
- ------------------------------------------------------------------------------------------
- Rev 00 - 20140310 - Initial Development - Alan Burstein
- Rev 01 - 20150522 - Removed DQS N-Grams functionality, improved iTally logic. Also Added
- conversion to bigint in the TOP logic to remove implicit conversion
- to bigint - Alan Burstein
- Rev 03 - 20150909 - Added logic to only return values if @N is greater than 0 and less
- than the length of @string. Updated comment section. - Alan Burstein
- Rev 04 - 20151029 - Added ISNULL logic to the TOP clause for the @string and @N
- parameters to prevent a NULL string or NULL @N from causing "an
- improper value" being passed to the TOP clause. - Alan Burstein
- Rev 05 - 20171228 - Small simplification; changed:
- (ABS(CONVERT(BIGINT,(DATALENGTH(ISNULL(@string,''))-(ISNULL(@N,1)-1)),0)))
- to:
- (ABS(CONVERT(BIGINT,(DATALENGTH(ISNULL(@string,''))+1-ISNULL(@N,1)),0)))
- Rev 06 - 20180612 - Using CHECKSUM(N) in the to convert N in the token output instead of
- using (CAST N as int). CHECKSUM removes the need to convert to int.
- Rev 07 - 20180612 - re-designed to: (1) use dbo.rangeAB - Alan Burstein
- ****************************************************************************************/
- RETURNS TABLE WITH SCHEMABINDING AS RETURN
- SELECT
- position = r.RN,
- token = SUBSTRING(@string, CHECKSUM(r.RN), @N)
- FROM dbo.rangeAB(1, LEN(@string)+1-@N,1,1) AS r
- WHERE @N > 0 AND @N <= LEN(@string);
- GO
- CREATE FUNCTION dbo.patExtract8K
- (
- @string VARCHAR(8000),
- @pattern VARCHAR(50)
- )
- /*****************************************************************************************
- [Description]:
- This can be considered a T-SQL inline table valued function (iTVF) equivalent of
- Microsoft's mdq.RegexExtract except:
- 1. It includes each matching substring's position in the string
- 2. It accepts varchar(8000) instead of nvarchar(4000) for the input string, varchar(50)
- instead of nvarchar(4000) for the pattern
- 3. The mask parameter is not required and therefore does not exist.
- 4. You have specify what text we're searching for as an exclusion; e.g. for numeric
- characters you should search for '[^0-9]' instead of '[0-9]'.
- 5. There is is no parameter for naming a "capture group". Using the variable below, both
- the following queries will return the same result:
- DECLARE @string nvarchar(4000) = N'123 Main Street';
- SELECT item FROM dbo.patExtract8K(@string, '[^0-9]');
- SELECT clr.RegexExtract(@string, N'(?<number>(d+))(?<street>(.*))', N'number', 1);
- Alternatively, you can think of patExtract8K as Chris Morris' PatternSplitCM (found here:
- http://www.sqlservercentral.com/articles/String+Manipulation/94365/) but only returns the
- rows where [matched]=0. The key benefit of is that it performs substantially better
- because you are only returning the number of rows required instead of returning twice as
- many rows then filtering out half of them. Furthermore, because we're
- The following two sets of queries return the same result:
- DECLARE @string varchar(100) = 'xx123xx555xx999';
- BEGIN
- -- QUERY #1
- -- patExtract8K
- SELECT ps.itemNumber, ps.item
- FROM dbo.patExtract8K(@string, '[^0-9]') ps;
- -- patternSplitCM
- SELECT itemNumber = row_number() over (order by ps.itemNumber), ps.item
- FROM dbo.patternSplitCM(@string, '[^0-9]') ps
- WHERE [matched] = 0;
- -- QUERY #2
- SELECT ps.itemNumber, ps.item
- FROM dbo.patExtract8K(@string, '[0-9]') ps;
- SELECT itemNumber = row_number() over (order by itemNumber), item
- FROM dbo.patternSplitCM(@string, '[0-9]')
- WHERE [matched] = 0;
- END;
- [Compatibility]:
- SQL Server 2008+
- [Syntax]:
- --===== Autonomous
- SELECT pe.ItemNumber, pe.ItemIndex, pe.ItemLength, pe.Item
- FROM dbo.patExtract8K(@string,@pattern) pe;
- --===== Against a table using APPLY
- SELECT t.someString, pe.ItemIndex, pe.ItemLength, pe.Item
- FROM dbo.SomeTable t
- CROSS APPLY dbo.patExtract8K(t.someString, @pattern) pe;
- [Parameters]:
- @string = varchar(8000); the input string
- @searchString = varchar(50); pattern to search for
- [Returns]:
- itemNumber = bigint; the instance or ordinal position of the matched substring
- itemIndex = bigint; the location of the matched substring inside the input string
- itemLength = int; the length of the matched substring
- item = varchar(8000); the returned text
- [Developer Notes]:
- 1. Requires NGrams8k
- 2. patExtract8K does not return any rows on NULL or empty strings. Consider using
- OUTER APPLY or append the function with the code below to force the function to return
- a row on emply or NULL inputs:
- UNION ALL SELECT 1, 0, NULL, @string WHERE nullif(@string,'') IS NULL;
- 3. patExtract8K is not case sensitive; use a case sensitive collation for
- case-sensitive comparisons
- 4. patExtract8K is deterministic. For more about deterministic functions see:
- https://msdn.microsoft.com/en-us/library/ms178091.aspx
- 5. patExtract8K performs substantially better with a parallel execution plan, often
- 2-3 times faster. For queries that leverage patextract8K that are not getting a
- parallel exeution plan you should consider performance testing using Traceflag 8649
- in Development environments and Adam Machanic's make_parallel in production.
- [Examples]:
- --===== (1) Basic extact all groups of numbers:
- WITH temp(id, txt) as
- (
- SELECT * FROM (values
- (1, 'hello 123 fff 1234567 and today;""o999999999 tester 44444444444444 done'),
- (2, 'syat 123 ff tyui( 1234567 and today 999999999 tester 777777 done'),
- (3, '&**OOOOO=+ + + // ==?76543// and today !!222222\tester{}))22222444 done'))t(x,xx)
- )
- SELECT
- [temp.id] = t.id,
- pe.itemNumber,
- pe.itemIndex,
- pe.itemLength,
- pe.item
- FROM temp AS t
- CROSS APPLY dbo.patExtract8K(t.txt, '[^0-9]') AS pe;
- -----------------------------------------------------------------------------------------
- Revision History:
- Rev 00 - 20170801 - Initial Development - Alan Burstein
- Rev 01 - 20180619 - Complete re-write - Alan Burstein
- *****************************************************************************************/
- RETURNS TABLE WITH SCHEMABINDING AS RETURN
- SELECT itemNumber = ROW_NUMBER() OVER (ORDER BY f.position),
- itemIndex = f.position,
- itemLength = itemLen.l,
- item = SUBSTRING(f.token, 1, itemLen.l)
- FROM
- (
- SELECT ng.position, SUBSTRING(@string,ng.position,DATALENGTH(@string))
- FROM dbo.NGrams8k(@string, 1) AS ng
- WHERE PATINDEX(@pattern, ng.token) < --<< this token does NOT match the pattern
- ABS(SIGN(ng.position-1)-1) + --<< are you the first row? OR
- PATINDEX(@pattern,SUBSTRING(@string,ng.position-1,1)) --<< always 0 for 1st row
- ) AS f(position, token)
- CROSS APPLY (VALUES(ISNULL(NULLIF(PATINDEX('%'+@pattern+'%',f.token),0),
- DATALENGTH(@string)+2-f.position)-1)) AS itemLen(l);
- GO
Add Comment
Please, Sign In to add comment