Regular expressions are one of those things that you may not need very often, but when you do, it really solves the problem. The usage of the Regex static methods may seem difficult, but they're pretty easy. Being able to use regular expressions readily will help you quickly write some code that would take you many hours longer by parsing the strings. I've compiled some code that uses the basic features of the Regex class. This article only covers the coding side of it. In order for it to work, you of course will need to know how to write the expressions themselves. I plan to cover that in my next post on Regex. That, too, seems very complex and difficult until you get a little experience with it.
To simply determine if a string contains a match, use IsMatch:
public bool IsValidUSPhone(string number)
{
return Regex.IsMatch(number, @"\(\d{3}\)\s\d{3}-\d{4}");
}
The Match object contains a lot of information about the matched sub-string. As you can see in the following code, it has a Success property that will return true if a match was found. The Value property contains the matched sub-string, as in the following code:
string text = "Please call us at (808) 867-5309 to inquire about our offer.";
string expression = @"\(\d{3}\)\s\d{3}-\d{4}";
Match m = Regex.Match(text, expression);
if (m.Success)
{
Console.WriteLine(string.Format("Phone: {0}", m.Value));
}
If the text contains several sub-strings that will match the expression, the Match object will contain all of them, and you can loop through them as in the following code. Notice that I'm also using the Index and Length properties.
string text = "Please call us at (808) 867-5309 or (808) 555-1212 at your earliest convenience.";
string expression = @"\(\d{3}\)\s\d{3}-\d{4}";
Match m = Regex.Match(text, expression);
while (m.Success)
{
Console.WriteLine(string.Format("The phone # {0} starts at position {1}, and is {2} characters long.", m.Value, m.Index, m.Length));
m = m.NextMatch();
}
By using parentheses around sections of the match expression, you can pull out parts of the match value. The Match object contains a Groups property. When there is a match, it always has one item in the Groups collection, which is the matched value. When you use parentheses around parts of the expression, it puts those parts into Groups items. I added parentheses around the area code match, and around the phone number match.
string text = "Please call us at (808) 867-5309 or (808) 555-1212 at your earliest convenience.";
string expression = @"\((\d{3})\)\s(\d{3}-\d{4})";
Match m = Regex.Match(text, expression);
while (m.Success)
{
Console.WriteLine("Match: {0} Area code: {1} Phone #: {2}", m.Groups[0], m.Groups[1], m.Groups[2]);
m = m.NextMatch();
}
This is good for simple tasks, but the MatchCollection is much better for more complex code.
string text = "Please call us at (808) 867-5309 or (808) 555-1212 at your earliest convenience.";
string expression = @"\(\d{3}\)\s\d{3}-\d{4}";
MatchCollection matches = Regex.Matches(text, expression);
Console.WriteLine("There are {0} phone numbers in the string.", matches.Count);
foreach (Match m in matches)
{
Console.WriteLine(string.Format("Phone: {0}", m.Value));
}
If you wanted to perform Linq queries against a match collection, you first have to get it into a list that implements IEnumerable. I'm sure there are many ways to do this, but here is one way:
List<Match> mlist = matches.OfType<Match>().ToList();
Replacing text is so much better using the Regex.Replace instead of the standard String replace method. Here is a very simple replace, which will simple replace (808) with (800). It will replace all occurrences of (808). Notice that in the replace expression I don't need to escape the parentheses characters:
string text = "Please call us at (808) 867-5309 or (808) 555-1212 at your earliest convenience.";
string matchExpression = @"\(808\)";
string replaceExpression = @"(800)";
string newText = Regex.Replace(text, matchExpression, replaceExpression);
Console.WriteLine(newText);
Here is an example of reformatting the phone number to 800.555.1212 format. I'm using parentheses to group sections of the match. In the replace expression, you can refer to groups by their index using the "$" character. In the match expression, the $ is to match the end of the string, but in the replace expression, it refers to the matched sub-group.
string text = "Please call us at (808) 867-5309 or (808) 555-1212 at your earliest convenience.";
string matchExpression = @"\((\d{3})\)\s(\d{3})-(\d{4})";
string replaceExpression = @"$1.$2.$3";
string newText = Regex.Replace(text, matchExpression, replaceExpression);
Console.WriteLine(newText);
There are many things you cannot do, however, with the replace expression. An example is to change the case of the matched value. If you wanted to match html tags and make them all lowercase, for example, you would not be able to use the following match and replace expressions:
string text = "<P><H1>My Heading<h1><p>";
string matchExpression = @"((?:</?)\w+)(?=.*?>)";
string replaceExpression = "$1";
string newText = Regex.Replace(text, matchExpression, replaceExpression.ToLower());
Console.WriteLine(newText);
The ToLower() operates on the replace expression itself, not on the text it's replacing. In order to do this, and other complex replaces, you need to use a MatchEvaluator. This is really easy to do. Before I ever used it, I saw the word "delegate" and thought it would difficult to figure out, but it's so easy. You need to have a function that accepts a Match object as a parameter and returns a string to create a MatchEvaluator. Below is the code to replace the html tags with all lower case tags:
public string HtmlTagsToLower(string html)
{
string matchExpression = @"((?:</?)\w+)(?=.*?>)";
MatchEvaluator tolower = MatchEvaluatorToLower;
return Regex.Replace(html, matchExpression, tolower);
}
private string MatchEvaluatorToLower(Match m)
{
return m.Value.ToLower();
}
I hope this helps to understand how to use the Regex methods.