Guest User

Untitled

a guest
Nov 14th, 2018
117
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 21.79 KB | None | 0 0
  1. SELECT p.*
  2. FROM dbo.patextract8K('Pay me $50.17 now or $1000 later!','[^$0-9.]') AS p;
  3.  
  4. itemNumber itemIndex itemLength item
  5. ----------- ---------- ----------- --------
  6. 1 8 6 $50.17
  7. 2 22 5 $1000
  8.  
  9. -- Easily consumable sample data
  10. DECLARE @table TABLE (shiftId VARCHAR(2), empKey VARCHAR(5), workDuration VARCHAR(100));
  11. INSERT @table(shiftId,empKey,workDuration)
  12. VALUES
  13. ('K','A','12PM - 4PM'),
  14. ('K','B','12PM - 4PM'),
  15. ('K','A','9AM - 12PM'),
  16. ('K','A','4PM - 6PM');
  17.  
  18. -- Solution
  19. SELECT
  20. shiftId = f.shiftId,
  21. KeyIn = '1/1,'+REPLACE(CONVERT(VARCHAR(10),
  22. MIN(CAST(f.c1 AS TIME)) OVER (),100),':00',''),
  23. KeyOut = '1/1,'+REPLACE(CONVERT(VARCHAR(10),
  24. MAX(CAST(f.c2 AS TIME)) OVER (),100),':00',''),
  25. empShift = f.empKey,
  26. othEmpIn = '1/1,'+f.c1,
  27. othEmpOut = '1/1,'+f.c2
  28. FROM
  29. (
  30. SELECT t.shiftId, t.empKey, t.workDuration,
  31. c1 = MAX(CASE p.itemNumber WHEN 1 THEN p.item END),
  32. c2 = MAX(CASE p.itemNumber WHEN 2 THEN p.item END)
  33. FROM @table AS t
  34. CROSS APPLY dbo.patExtract8k(t.workDuration, '[^0-9APM]') AS p
  35. CROSS APPLY (VALUES(CAST(p.item AS TIME))) AS tm(N)
  36. GROUP BY t.shiftId, t.empKey, t.workDuration
  37. ) AS f;
  38.  
  39. shiftId KeyIn KeyOut empShift othEmpIn othEmpOut
  40. ------- ---------- ------------ -------- ------------ ------------
  41. K 1/1,9AM 1/1,6PM A 1/1,12PM 1/1,4PM
  42. K 1/1,9AM 1/1,6PM A 1/1,4PM 1/1,6PM
  43. K 1/1,9AM 1/1,6PM A 1/1,9AM 1/1,12PM
  44. K 1/1,9AM 1/1,6PM B 1/1,12PM 1/1,4PM
  45.  
  46. CREATE FUNCTION dbo.rangeAB
  47. (
  48. @low bigint,
  49. @high bigint,
  50. @gap bigint,
  51. @row1 bit
  52. )
  53. /****************************************************************************************
  54. [Purpose]:
  55. Creates up to 531,441,000,000 sequentia1 integers numbers beginning with @low and ending
  56. with @high. Used to replace iterative methods such as loops, cursors and recursive CTEs
  57. to solve SQL problems. Based on Itzik Ben-Gan's getnums function with some tweeks and
  58. enhancements and added functionality. The logic for getting rn to begin at 0 or 1 is
  59. based comes from Jeff Moden's fnTally function.
  60.  
  61. The name range because it's similar to clojure's range function. The name "rangeAB" as
  62. used because "range" is a reserved SQL keyword.
  63.  
  64. [Author]: Alan Burstein
  65.  
  66. [Compatibility]:
  67. SQL Server 2008+ and Azure SQL Database
  68.  
  69. [Syntax]:
  70. SELECT r.RN, r.OP, r.N1, r.N2
  71. FROM dbo.rangeAB(@low,@high,@gap,@row1) AS r;
  72.  
  73. [Parameters]:
  74. @low = a bigint that represents the lowest value for n1.
  75. @high = a bigint that represents the highest value for n1.
  76. @gap = a bigint that represents how much n1 and n2 will increase each row; @gap also
  77. represents the difference between n1 and n2.
  78. @row1 = a bit that represents the first value of rn. When @row = 0 then rn begins
  79. at 0, when @row = 1 then rn will begin at 1.
  80.  
  81. [Returns]:
  82. Inline Table Valued Function returns:
  83. rn = bigint; a row number that works just like T-SQL ROW_NUMBER() except that it can
  84. start at 0 or 1 which is dictated by @row1.
  85. op = bigint; returns the "opposite number that relates to rn. When rn begins with 0 and
  86. ends with 10 then 10 is the opposite of 0, 9 the opposite of 1, etc. When rn begins
  87. with 1 and ends with 5 then 1 is the opposite of 5, 2 the opposite of 4, etc...
  88. n1 = bigint; a sequential number starting at the value of @low and incrimentingby the
  89. value of @gap until it is less than or equal to the value of @high.
  90. n2 = bigint; a sequential number starting at the value of @low+@gap and incrimenting
  91. by the value of @gap.
  92.  
  93. [Dependencies]:
  94. N/A
  95.  
  96. [Developer Notes]:
  97.  
  98. 1. The lowest and highest possible numbers returned are whatever is allowable by a
  99. bigint. The function, however, returns no more than 531,441,000,000 rows (8100^3).
  100. 2. @gap does not affect rn, rn will begin at @row1 and increase by 1 until the last row
  101. unless its used in a query where a filter is applied to rn.
  102. 3. @gap must be greater than 0 or the function will not return any rows.
  103. 4. Keep in mind that when @row1 is 0 then the highest row-number will be the number of
  104. rows returned minus 1
  105. 5. If you only need is a sequential set beginning at 0 or 1 then, for best performance
  106. use the RN column. Use N1 and/or N2 when you need to begin your sequence at any
  107. number other than 0 or 1 or if you need a gap between your sequence of numbers.
  108. 6. Although @gap is a bigint it must be a positive integer or the function will
  109. not return any rows.
  110. 7. The function will not return any rows when one of the following conditions are true:
  111. * any of the input parameters are NULL
  112. * @high is less than @low
  113. * @gap is not greater than 0
  114. To force the function to return all NULLs instead of not returning anything you can
  115. add the following code to the end of the query:
  116.  
  117. UNION ALL
  118. SELECT NULL, NULL, NULL, NULL
  119. WHERE NOT (@high&@low&@gap&@row1 IS NOT NULL AND @high >= @low AND @gap > 0)
  120.  
  121. This code was excluded as it adds a ~5% performance penalty.
  122. 8. There is no performance penalty for sorting by rn ASC; there is a large performance
  123. penalty for sorting in descending order WHEN @row1 = 1; WHEN @row1 = 0
  124. If you need a descending sort the use op in place of rn then sort by rn ASC.
  125.  
  126. Best Practices:
  127. --===== 1. Using RN (rownumber)
  128. -- (1.1) The best way to get the numbers 1,2,3...@high (e.g. 1 to 5):
  129. SELECT RN FROM dbo.rangeAB(1,5,1,1);
  130. -- (1.2) The best way to get the numbers 0,1,2...@high-1 (e.g. 0 to 5):
  131. SELECT RN FROM dbo.rangeAB(0,5,1,0);
  132.  
  133. --===== 2. Using OP for descending sorts without a performance penalty
  134. -- (2.1) The best way to get the numbers 5,4,3...@high (e.g. 5 to 1):
  135. SELECT op FROM dbo.rangeAB(1,5,1,1) ORDER BY rn ASC;
  136. -- (2.2) The best way to get the numbers 0,1,2...@high-1 (e.g. 5 to 0):
  137. SELECT op FROM dbo.rangeAB(1,6,1,0) ORDER BY rn ASC;
  138.  
  139. --===== 3. Using N1
  140. -- (3.1) To begin with numbers other than 0 or 1 use N1 (e.g. -3 to 3):
  141. SELECT N1 FROM dbo.rangeAB(-3,3,1,1);
  142. -- (3.2) ROW_NUMBER() is built in. If you want a ROW_NUMBER() include RN:
  143. SELECT RN, N1 FROM dbo.rangeAB(-3,3,1,1);
  144. -- (3.3) If you wanted a ROW_NUMBER() that started at 0 you would do this:
  145. SELECT RN, N1 FROM dbo.rangeAB(-3,3,1,0);
  146.  
  147. --===== 4. Using N2 and @gap
  148. -- (4.1) To get 0,10,20,30...100, set @low to 0, @high to 100 and @gap to 10:
  149. SELECT N1 FROM dbo.rangeAB(0,100,10,1);
  150. -- (4.2) Note that N2=N1+@gap; this allows you to create a sequence of ranges.
  151. -- For example, to get (0,10),(10,20),(20,30).... (90,100):
  152. SELECT N1, N2 FROM dbo.rangeAB(0,90,10,1);
  153. -- (4.3) Remember that a rownumber is included and it can begin at 0 or 1:
  154. SELECT RN, N1, N2 FROM dbo.rangeAB(0,90,10,1);
  155.  
  156. [Examples]:
  157. --===== 1. Generating Sample data (using rangeAB to create "dummy rows")
  158. -- The query below will generate 10,000 ids and random numbers between 50,000 and 500,000
  159. SELECT
  160. someId = r.rn,
  161. someNumer = ABS(CHECKSUM(NEWID())%450000)+50001
  162. FROM rangeAB(1,10000,1,1) r;
  163.  
  164. --===== 2. Create a series of dates; rn is 0 to include the first date in the series
  165. DECLARE @startdate DATE = '20180101', @enddate DATE = '20180131';
  166.  
  167. SELECT r.rn, calDate = DATEADD(dd, r.rn, @startdate)
  168. FROM dbo.rangeAB(1, DATEDIFF(dd,@startdate,@enddate),1,0) r;
  169. GO
  170.  
  171. --===== 3. Splitting (tokenizing) a string with fixed sized items
  172. -- given a delimited string of identifiers that are always 7 characters long
  173. DECLARE @string VARCHAR(1000) = 'A601225,B435223,G008081,R678567';
  174.  
  175. SELECT
  176. itemNumber = r.rn, -- item's ordinal position
  177. itemIndex = r.n1, -- item's position in the string (it's CHARINDEX value)
  178. item = SUBSTRING(@string, r.n1, 7) -- item (token)
  179. FROM dbo.rangeAB(1, LEN(@string), 8,1) r;
  180. GO
  181.  
  182. --===== 4. Splitting (tokenizing) a string with random delimiters
  183. DECLARE @string VARCHAR(1000) = 'ABC123,999F,XX,9994443335';
  184.  
  185. SELECT
  186. itemNumber = ROW_NUMBER() OVER (ORDER BY r.rn), -- item's ordinal position
  187. itemIndex = r.n1+1, -- item's position in the string (it's CHARINDEX value)
  188. item = SUBSTRING
  189. (
  190. @string,
  191. r.n1+1,
  192. ISNULL(NULLIF(CHARINDEX(',',@string,r.n1+1),0)-r.n1-1, 8000)
  193. ) -- item (token)
  194. FROM dbo.rangeAB(0,DATALENGTH(@string),1,1) r
  195. WHERE SUBSTRING(@string,r.n1,1) = ',' OR r.n1 = 0;
  196. -- logic borrowed from: http://www.sqlservercentral.com/articles/Tally+Table/72993/
  197.  
  198. --===== 5. Grouping by a weekly intervals
  199. -- 5.1. how to create a series of start/end dates between @startDate & @endDate
  200. DECLARE @startDate DATE = '1/1/2015', @endDate DATE = '2/1/2015';
  201. SELECT
  202. WeekNbr = r.RN,
  203. WeekStart = DATEADD(DAY,r.N1,@StartDate),
  204. WeekEnd = DATEADD(DAY,r.N2-1,@StartDate)
  205. FROM dbo.rangeAB(0,datediff(DAY,@StartDate,@EndDate),7,1) r;
  206. GO
  207.  
  208. -- 5.2. LEFT JOIN to the weekly interval table
  209. BEGIN
  210. DECLARE @startDate datetime = '1/1/2015', @endDate datetime = '2/1/2015';
  211. -- sample data
  212. DECLARE @loans TABLE (loID INT, lockDate DATE);
  213. INSERT @loans SELECT r.rn, DATEADD(dd, ABS(CHECKSUM(NEWID())%32), @startDate)
  214. FROM dbo.rangeAB(1,50,1,1) r;
  215.  
  216. -- solution
  217. SELECT
  218. WeekNbr = r.RN,
  219. WeekStart = dt.WeekStart,
  220. WeekEnd = dt.WeekEnd,
  221. total = COUNT(l.lockDate)
  222. FROM dbo.rangeAB(0,datediff(DAY,@StartDate,@EndDate),7,1) r
  223. CROSS APPLY (VALUES (
  224. CAST(DATEADD(DAY,r.N1,@StartDate) AS DATE),
  225. CAST(DATEADD(DAY,r.N2-1,@StartDate) AS DATE))) dt(WeekStart,WeekEnd)
  226. LEFT JOIN @loans l ON l.lockDate BETWEEN dt.WeekStart AND dt.WeekEnd
  227. GROUP BY r.RN, dt.WeekStart, dt.WeekEnd ;
  228. END;
  229.  
  230. --===== 6. Identify the first vowel and last vowel in a along with their positions
  231. DECLARE @string VARCHAR(200) = 'This string has vowels';
  232.  
  233. SELECT TOP(1) position = r.rn, letter = SUBSTRING(@string,r.rn,1)
  234. FROM dbo.rangeAB(1,LEN(@string),1,1) r
  235. WHERE SUBSTRING(@string,r.rn,1) LIKE '%[aeiou]%'
  236. ORDER BY r.rn;
  237.  
  238. -- To avoid a sort in the execution plan we'll use op instead of rn
  239. SELECT TOP(1) position = r.op, letter = SUBSTRING(@string,r.op,1)
  240. FROM dbo.rangeAB(1,LEN(@string),1,1) r
  241. WHERE SUBSTRING(@string,r.rn,1) LIKE '%[aeiou]%'
  242. ORDER BY r.rn;
  243.  
  244. ---------------------------------------------------------------------------------------
  245. [Revision History]:
  246. Rev 00 - 20140518 - Initial Development - Alan Burstein
  247. Rev 01 - 20151029 - Added 65 rows to make L1=465; 465^3=100.5M. Updated comment section
  248. - Alan Burstein
  249. Rev 02 - 20180613 - Complete re-design including opposite number column (op)
  250. Rev 03 - 20180920 - Added additional CROSS JOIN to L2 for 530B rows max - Alan Burstein
  251. ****************************************************************************************/
  252. RETURNS TABLE WITH SCHEMABINDING AS RETURN
  253. WITH L1(N) AS
  254. (
  255. SELECT 1
  256. FROM (VALUES
  257. (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),
  258. (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),
  259. (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),
  260. (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),
  261. (0),(0)) T(N) -- 90 values
  262. ),
  263. L2(N) AS (SELECT 1 FROM L1 a CROSS JOIN L1 b CROSS JOIN L1 c),
  264. iTally AS (SELECT rn = ROW_NUMBER() OVER (ORDER BY (SELECT 1)) FROM L2 a CROSS JOIN L2 b)
  265. SELECT
  266. r.RN,
  267. r.OP,
  268. r.N1,
  269. r.N2
  270. FROM
  271. (
  272. SELECT
  273. RN = 0,
  274. OP = (@high-@low)/@gap,
  275. N1 = @low,
  276. N2 = @gap+@low
  277. WHERE @row1 = 0
  278. UNION ALL -- COALESCE required in the TOP statement below for error handling purposes
  279. SELECT TOP (ABS((COALESCE(@high,0)-COALESCE(@low,0))/COALESCE(@gap,0)+COALESCE(@row1,1)))
  280. RN = i.rn,
  281. OP = (@high-@low)/@gap+(2*@row1)-i.rn,
  282. N1 = (i.rn-@row1)*@gap+@low,
  283. N2 = (i.rn-(@row1-1))*@gap+@low
  284. FROM iTally AS i
  285. ORDER BY rn
  286. ) AS r
  287. WHERE @high&@low&@gap&@row1 IS NOT NULL AND @high >= @low AND @gap > 0;
  288. GO
  289.  
  290. IF OBJECT_ID('dbo.NGrams8k', 'IF') IS NOT NULL DROP FUNCTION dbo.NGrams8k;
  291. GO
  292. CREATE FUNCTION dbo.NGrams8k
  293. (
  294. @string VARCHAR(8000), -- Input string
  295. @N INT -- requested token size
  296. )
  297. /*****************************************************************************************
  298. [Purpose]:
  299. A character-level N-Grams function that outputs a contiguous stream of @N-sized tokens
  300. based on an input string (@string). Accepts strings up to 8000 varchar characters long.
  301. For more information about N-Grams see: http://en.wikipedia.org/wiki/N-gram.
  302.  
  303. [Author]:
  304. Alan Burstein
  305.  
  306. [Compatibility]:
  307. SQL Server 2008+, Azure SQL Database
  308.  
  309. [Syntax]:
  310. --===== Autonomous
  311. SELECT ng.position, ng.token
  312. FROM dbo.NGrams8k(@string,@N) AS ng;
  313.  
  314. --===== Against a table using APPLY
  315. SELECT s.SomeID, ng.position, ng.token
  316. FROM dbo.SomeTable AS s
  317. CROSS APPLY dbo.NGrams8K(s.SomeValue,@N) AS ng;
  318.  
  319. [Parameters]:
  320. @string = The input string to split into tokens.
  321. @N = The size of each token returned.
  322.  
  323. [Returns]:
  324. Position = BIGINT; the position of the token in the input string
  325. token = VARCHAR(8000); a @N-sized character-level N-Gram token
  326.  
  327. [Dependencies]:
  328. 1. dbo.rangeAB (iTVF)
  329.  
  330. [Developer Notes]:
  331. 1. NGrams8k is not case sensitive;
  332.  
  333. 2. Many functions that use NGrams8k will see a huge performance gain when the optimizer
  334. creates a parallel execution plan. One way to get a parallel query plan (if the
  335. optimizer does not choose one) is to use make_parallel by Adam Machanic which can be
  336. found here:
  337. sqlblog.com/blogs/adam_machanic/archive/2013/07/11/next-level-parallel-plan-porcing.aspx
  338.  
  339. 3. When @N is less than 1 or greater than the datalength of the input string then no
  340. tokens (rows) are returned. If either @string or @N are NULL no rows are returned.
  341. This is a debatable topic but the thinking behind this decision is that: because you
  342. can't split 'xxx' into 4-grams, you can't split a NULL value into unigrams and you
  343. can't turn anything into NULL-grams, no rows should be returned.
  344.  
  345. For people who would prefer that a NULL input forces the function to return a single
  346. NULL output you could add this code to the end of the function:
  347.  
  348. UNION ALL
  349. SELECT 1, NULL
  350. WHERE NOT(@N > 0 AND @N <= DATALENGTH(@string)) OR (@N IS NULL OR @string IS NULL)
  351.  
  352. 4. NGrams8k is deterministic. For more about deterministic functions see:
  353. https://msdn.microsoft.com/en-us/library/ms178091.aspx
  354.  
  355. [Examples]:
  356. --===== 1. Split the string, "abcd" into unigrams, bigrams and trigrams
  357. SELECT ng.position, ng.token FROM dbo.NGrams8k('abcd',1) AS ng; -- unigrams (@N=1)
  358. SELECT ng.position, ng.token FROM dbo.NGrams8k('abcd',2) AS ng; -- bigrams (@N=2)
  359. SELECT ng.position, ng.token FROM dbo.NGrams8k('abcd',3) AS ng; -- trigrams (@N=3)
  360.  
  361. --===== How many times the substring "AB" appears in each record
  362. DECLARE @table TABLE(stringID int identity primary key, string varchar(100));
  363. INSERT @table(string) VALUES ('AB123AB'),('123ABABAB'),('!AB!AB!'),('AB-AB-AB-AB-AB');
  364.  
  365. SELECT string, occurances = COUNT(*)
  366. FROM @table t
  367. CROSS APPLY dbo.NGrams8k(t.string,2) AS ng
  368. WHERE ng.token = 'AB'
  369. GROUP BY string;
  370.  
  371. [Revision History]:
  372. ------------------------------------------------------------------------------------------
  373. Rev 00 - 20140310 - Initial Development - Alan Burstein
  374. Rev 01 - 20150522 - Removed DQS N-Grams functionality, improved iTally logic. Also Added
  375. conversion to bigint in the TOP logic to remove implicit conversion
  376. to bigint - Alan Burstein
  377. Rev 03 - 20150909 - Added logic to only return values if @N is greater than 0 and less
  378. than the length of @string. Updated comment section. - Alan Burstein
  379. Rev 04 - 20151029 - Added ISNULL logic to the TOP clause for the @string and @N
  380. parameters to prevent a NULL string or NULL @N from causing "an
  381. improper value" being passed to the TOP clause. - Alan Burstein
  382. Rev 05 - 20171228 - Small simplification; changed:
  383. (ABS(CONVERT(BIGINT,(DATALENGTH(ISNULL(@string,''))-(ISNULL(@N,1)-1)),0)))
  384. to:
  385. (ABS(CONVERT(BIGINT,(DATALENGTH(ISNULL(@string,''))+1-ISNULL(@N,1)),0)))
  386. Rev 06 - 20180612 - Using CHECKSUM(N) in the to convert N in the token output instead of
  387. using (CAST N as int). CHECKSUM removes the need to convert to int.
  388. Rev 07 - 20180612 - re-designed to: (1) use dbo.rangeAB - Alan Burstein
  389. ****************************************************************************************/
  390. RETURNS TABLE WITH SCHEMABINDING AS RETURN
  391. SELECT
  392. position = r.RN,
  393. token = SUBSTRING(@string, CHECKSUM(r.RN), @N)
  394. FROM dbo.rangeAB(1, LEN(@string)+1-@N,1,1) AS r
  395. WHERE @N > 0 AND @N <= LEN(@string);
  396. GO
  397.  
  398. CREATE FUNCTION dbo.patExtract8K
  399. (
  400. @string VARCHAR(8000),
  401. @pattern VARCHAR(50)
  402. )
  403. /*****************************************************************************************
  404. [Description]:
  405. This can be considered a T-SQL inline table valued function (iTVF) equivalent of
  406. Microsoft's mdq.RegexExtract except:
  407.  
  408. 1. It includes each matching substring's position in the string
  409.  
  410. 2. It accepts varchar(8000) instead of nvarchar(4000) for the input string, varchar(50)
  411. instead of nvarchar(4000) for the pattern
  412.  
  413. 3. The mask parameter is not required and therefore does not exist.
  414.  
  415. 4. You have specify what text we're searching for as an exclusion; e.g. for numeric
  416. characters you should search for '[^0-9]' instead of '[0-9]'.
  417.  
  418. 5. There is is no parameter for naming a "capture group". Using the variable below, both
  419. the following queries will return the same result:
  420.  
  421. DECLARE @string nvarchar(4000) = N'123 Main Street';
  422.  
  423. SELECT item FROM dbo.patExtract8K(@string, '[^0-9]');
  424. SELECT clr.RegexExtract(@string, N'(?<number>(d+))(?<street>(.*))', N'number', 1);
  425.  
  426. Alternatively, you can think of patExtract8K as Chris Morris' PatternSplitCM (found here:
  427. http://www.sqlservercentral.com/articles/String+Manipulation/94365/) but only returns the
  428. rows where [matched]=0. The key benefit of is that it performs substantially better
  429. because you are only returning the number of rows required instead of returning twice as
  430. many rows then filtering out half of them. Furthermore, because we're
  431.  
  432. The following two sets of queries return the same result:
  433.  
  434. DECLARE @string varchar(100) = 'xx123xx555xx999';
  435. BEGIN
  436. -- QUERY #1
  437. -- patExtract8K
  438. SELECT ps.itemNumber, ps.item
  439. FROM dbo.patExtract8K(@string, '[^0-9]') ps;
  440.  
  441. -- patternSplitCM
  442. SELECT itemNumber = row_number() over (order by ps.itemNumber), ps.item
  443. FROM dbo.patternSplitCM(@string, '[^0-9]') ps
  444. WHERE [matched] = 0;
  445.  
  446. -- QUERY #2
  447. SELECT ps.itemNumber, ps.item
  448. FROM dbo.patExtract8K(@string, '[0-9]') ps;
  449.  
  450. SELECT itemNumber = row_number() over (order by itemNumber), item
  451. FROM dbo.patternSplitCM(@string, '[0-9]')
  452. WHERE [matched] = 0;
  453. END;
  454.  
  455. [Compatibility]:
  456. SQL Server 2008+
  457.  
  458. [Syntax]:
  459. --===== Autonomous
  460. SELECT pe.ItemNumber, pe.ItemIndex, pe.ItemLength, pe.Item
  461. FROM dbo.patExtract8K(@string,@pattern) pe;
  462.  
  463. --===== Against a table using APPLY
  464. SELECT t.someString, pe.ItemIndex, pe.ItemLength, pe.Item
  465. FROM dbo.SomeTable t
  466. CROSS APPLY dbo.patExtract8K(t.someString, @pattern) pe;
  467.  
  468. [Parameters]:
  469. @string = varchar(8000); the input string
  470. @searchString = varchar(50); pattern to search for
  471.  
  472. [Returns]:
  473. itemNumber = bigint; the instance or ordinal position of the matched substring
  474. itemIndex = bigint; the location of the matched substring inside the input string
  475. itemLength = int; the length of the matched substring
  476. item = varchar(8000); the returned text
  477.  
  478. [Developer Notes]:
  479. 1. Requires NGrams8k
  480.  
  481. 2. patExtract8K does not return any rows on NULL or empty strings. Consider using
  482. OUTER APPLY or append the function with the code below to force the function to return
  483. a row on emply or NULL inputs:
  484.  
  485. UNION ALL SELECT 1, 0, NULL, @string WHERE nullif(@string,'') IS NULL;
  486.  
  487. 3. patExtract8K is not case sensitive; use a case sensitive collation for
  488. case-sensitive comparisons
  489.  
  490. 4. patExtract8K is deterministic. For more about deterministic functions see:
  491. https://msdn.microsoft.com/en-us/library/ms178091.aspx
  492.  
  493. 5. patExtract8K performs substantially better with a parallel execution plan, often
  494. 2-3 times faster. For queries that leverage patextract8K that are not getting a
  495. parallel exeution plan you should consider performance testing using Traceflag 8649
  496. in Development environments and Adam Machanic's make_parallel in production.
  497.  
  498. [Examples]:
  499. --===== (1) Basic extact all groups of numbers:
  500. WITH temp(id, txt) as
  501. (
  502. SELECT * FROM (values
  503. (1, 'hello 123 fff 1234567 and today;""o999999999 tester 44444444444444 done'),
  504. (2, 'syat 123 ff tyui( 1234567 and today 999999999 tester 777777 done'),
  505. (3, '&**OOOOO=+ + + // ==?76543// and today !!222222\tester{}))22222444 done'))t(x,xx)
  506. )
  507. SELECT
  508. [temp.id] = t.id,
  509. pe.itemNumber,
  510. pe.itemIndex,
  511. pe.itemLength,
  512. pe.item
  513. FROM temp AS t
  514. CROSS APPLY dbo.patExtract8K(t.txt, '[^0-9]') AS pe;
  515. -----------------------------------------------------------------------------------------
  516. Revision History:
  517. Rev 00 - 20170801 - Initial Development - Alan Burstein
  518. Rev 01 - 20180619 - Complete re-write - Alan Burstein
  519. *****************************************************************************************/
  520. RETURNS TABLE WITH SCHEMABINDING AS RETURN
  521. SELECT itemNumber = ROW_NUMBER() OVER (ORDER BY f.position),
  522. itemIndex = f.position,
  523. itemLength = itemLen.l,
  524. item = SUBSTRING(f.token, 1, itemLen.l)
  525. FROM
  526. (
  527. SELECT ng.position, SUBSTRING(@string,ng.position,DATALENGTH(@string))
  528. FROM dbo.NGrams8k(@string, 1) AS ng
  529. WHERE PATINDEX(@pattern, ng.token) < --<< this token does NOT match the pattern
  530. ABS(SIGN(ng.position-1)-1) + --<< are you the first row? OR
  531. PATINDEX(@pattern,SUBSTRING(@string,ng.position-1,1)) --<< always 0 for 1st row
  532. ) AS f(position, token)
  533. CROSS APPLY (VALUES(ISNULL(NULLIF(PATINDEX('%'+@pattern+'%',f.token),0),
  534. DATALENGTH(@string)+2-f.position)-1)) AS itemLen(l);
  535. GO
Add Comment
Please, Sign In to add comment