xp_regex: Regular Expressions in SQL Server 2000
04.19.03
 
Editor's Note:

Jackie Rosenberger
Staff Writer

04.19.03

If you run SQL 2000, then you've definitely heard about the Slammer worm. I hope you patched in time and your servers stayed safe! Much has been said about the worm itself, yet little written about how the Slammer problem got out of hand so fast (from nothing to a crisis in ten minutes, by some estimates). Read more on Slammer here: http://www.sysadminnews.com/articles/0204.html.

Today's article deals with a more mundane but useful topic - regular expressions. Dan Farino explains how to use this extremely useful technique to simplify your daily work. Enjoy!

Download Free Trial


xp_regex: Regular Expressions in SQL Server 2000
By Dan Farino

Download source files/compiled DLL - 58 Kb

Regular Expressions are probably the best tool for text parsing. Using Regular Expressions greatly simplifies anything for which you would normally use CHARINDEX, PATINDEX, SUBSTRING, STUFF, etc.

xp_regex is an Extended Stored Procedure for SQL Server 2000 written in Managed C++ that lets you use Regular Expressions from T-SQL. In case you’re skeptical about the performance when mixing non-.NET code (in this case SQL Server) and .NET code in the same process, I can tell you that I wrote a completely non-.NET version using the Boost Regex++ Regular Expression library and the speed was actually slightly slower. Plus, the .NET Regular Expression library is pretty much a complete implementation of Perl 5.6's (the de factostandard).

There are two Extended Stored Procedures in the DLL:

xp_regex_split
xp_regex_format

1. XP_REGEX_FORMAT

Syntax:
xp_regex_format @input, @regex, @format_string, @result OUTPUT

@input is the text to parse.
@regex is the regular expression to match.
@format_string is used to format the results
@result is an output parameter that will hold the formatted results

All parameters are either VARCHAR or CHAR of any length. (TEXT might work too. Haven't tried…)

xp_regex_format is used to parse an input string and format the results. Probably the best example to demonstrate is by doing some telephone number parsing. I’ll assume you know the Regular Expression syntax for this documentation.

The regex [^d]*(d{3})[^d]*(d{3}) [^d]*(d{4}) will parse just about any phone-number-like string you throw at it. For instance, this code:
DECLARE @out VARCHAR(50)


EXEC xp_regex_format '(310)555-1212',
'[^d]*(d{3})[^d]*(d{3})[^d]*(d{4})'
, '($1) $2-$3', @out OUTPUT PRINT @out

EXEC xp_regex_format '310.555.1212',
'[^d]*(d{3})[^d]*(d{3})[^d]*(d{4})' , '($1) $2-$3', @out OUTPUT PRINT @out

EXEC xp_regex_format ' 310!555 hey! 1212',
'[^d]*(d{3})[^d]*(d{3})[^d]*
(d{4})','($1) $2-$3', @out OUTPUT PRINT @out

EXEC xp_regex_format ' hello, ( 310 ) 555.1212
is my phone number.
Thank you.', '[^d]*(d{3})[^d]*(d{3})[^d]*(d{4})', '($1) $2-$3', @out OUTPUT PRINT @out

prints out:

(310) 555-1212
(310) 555-1212
(310) 555-1212
(310) 555-1212

(Cool!)


FREE Trial! Integral Accounting Professional. - a complete, powerful and fully customizable accounting system implemented in Microsoft Visual Basic 6.0, ADO, and SQL Server for great performance, stability and reliability!

2. XP_REGEX_SPLIT

Syntax:
xp_regex_split @input, @regex, @column_number, @result OUTPUT

@input is the text to parse.
@regex is a regular expression that matches the delimiter
@column_number: We’re basically doing a "text-to-columns" here, so @column_number lets you specify which of the resulting columns should be passed back in the @result parameter
@result is an output parameter that will hold the formatted results

@input, @regex and @result are either VARCHAR or CHAR of any length.
@column_number is an INT. Columns are numbered starting at 1.

This function splits text data on some sort of delimiter (comma, pipe,

whatever). The cool thing about a split using regular expressions is that the delimiter does not have to be as consistent as you would normally expect.

For example, take this line as your source data:

one ,two|three : four

In this case, our delimiter is either a comma, pipe or colon with any number of spaces either before or after (or both). In regex form, that is written: s*[,|:]s*

For example:

DECLARE @out VARCHAR(8000)


EXEC xp_regex_split 'one ,two|three : four',
's*[,|:]s*', 1, @out
OUTPUT PRINT @out

EXEC xp_regex_split 'one ,two|three : four',
's*[,|:]s*', 2, @out
OUTPUT PRINT @out

EXEC xp_regex_split 'one ,two|three : four',
's*[,|:]s*', 3, @out
OUTPUT PRINT @out

EXEC xp_regex_split 'one ,two|three : four',
's*[,|:]s*', 4, @out
OUTPUT PRINT @out

prints out:

one 
two
three 
four



Please note that in the case above, there is no performance penalty for running the same split more than once. xp_regex_split caches both the input string and the regular expression. So calling it multiple times while changing only the @column_number parameter is perfectly fine. The actual split is only done the first time and the rest is pulled from cache.

3. FN_XP_REGEX_SPLIT and FN_XP_REGEX_FORMAT

These are user-defined functions that wrap the stored procedures. This way you can use the function as part of a select list or a where clause:

SELECT master.dbo.fn_regex_format(phone_number,
 '[^d]*(d{3})[^d]*(d{3})[^d]*(d{4})', '$1 $2 $3')
FROM
	customers

This would format every phone number in the "customers" table.

4. Installation

a. Copy xpRegex.dll to your Program FilesMicrosoft SQL ServerMSSQLinn folder.
b. Run the SQL script INSTALL.SQL. This will register the procedures and create the user-defined functions.

5. Misc

I've created a set of Performance Counters (the Performance Object is called "xp_regex"). You can use the Windows 2000 System Monitor to see various statistics about xp_regex while it is running.

Comments/corrections/additions are welcome. Please let me know if you find this useful! Thanks!


Originally appeared at http://www.codeproject.com/useritems/xpRegex.asp.
dan@stamps.com
Stamps.com, Inc.
Postage from Your Printer

Free Newsletters
CRMProductReview
SQLProNews
EnterpriseEcommerce
HiTechEdge
ITcertificationNews
ITmanagementNews
LinuxProNews
NetworkNewz
SecurityProNews
SysAdminNews



 

 

 

 

 

 



-- SQLProNews is an ">iEntry, Inc. ® publication --
© 2003 All Rights Reserved Privacy Policy and Legal

Read this article online at:
http://www.SQLProNews.com/2003/0419.html

advertising info | news headlines | free newsletters | comments/feedback | submit article

To unsubscribe from this mailing list reply to this message with "unsubscribe 192" in the subject or ">click here.