Geeks With Blogs
Annie Bougie
Regular expressions are one of those things that you may not need very often, but when you do, it really solves the problem. The usage of the Regex static methods may seem difficult, but they're pretty easy. Being able to use regular expressions readily will help you quickly write some code that would take you many hours longer by parsing the strings.  I've compiled some code that uses the basic features of the Regex class. This article only covers the coding side of it. In order for it to work, you of course will need to know how to write the expressions themselves. I plan to cover that in my next post on Regex. That, too, seems very complex and difficult until you get a little experience with it.

To simply determine if a string contains a match, use IsMatch:

public bool IsValidUSPhone(string number)
{
    return Regex.IsMatch(number, @"\(\d{3}\)\s\d{3}-\d{4}");
}

The Match object contains a lot of information about the matched sub-string. As you can see in the following code, it has a Success property that will return true if a match was found. The Value property contains the matched sub-string, as in the following code:

string text = "Please call us at (808) 867-5309 to inquire about our offer.";
string expression = @"\(\d{3}\)\s\d{3}-\d{4}";
Match m = Regex.Match(text, expression);
if (m.Success)
{
    Console.WriteLine(string.Format("Phone: {0}", m.Value));
}

If the text contains several sub-strings that will match the expression, the Match object will contain all of them, and you can loop through them as in the following code. Notice that I'm also using the Index and Length properties.

string text = "Please call us at (808) 867-5309 or (808) 555-1212 at your earliest convenience.";
string expression = @"\(\d{3}\)\s\d{3}-\d{4}";
Match m = Regex.Match(text, expression);
while (m.Success)
{
    Console.WriteLine(string.Format("The phone # {0} starts at position {1}, and is {2} characters long.", m.Value, m.Index, m.Length));
    m = m.NextMatch();
}

By using parentheses around sections of the match expression, you can pull out parts of the match value. The Match object contains a Groups property. When there is a match, it always has one item in the Groups collection, which is the matched value. When you use parentheses around parts of the expression, it puts those parts into Groups items. I added parentheses around the area code match, and around the phone number match.

string text = "Please call us at (808) 867-5309 or (808) 555-1212 at your earliest convenience.";
string expression = @"\((\d{3})\)\s(\d{3}-\d{4})";
Match m = Regex.Match(text, expression);
while (m.Success)
{
    Console.WriteLine("Match: {0}  Area code: {1}  Phone #: {2}", m.Groups[0], m.Groups[1], m.Groups[2]);
    m = m.NextMatch();
}

This is good for simple tasks, but the MatchCollection is much better for more complex code.

string text = "Please call us at (808) 867-5309 or (808) 555-1212 at your earliest convenience.";
string expression = @"\(\d{3}\)\s\d{3}-\d{4}";
MatchCollection matches = Regex.Matches(text, expression);
Console.WriteLine("There are {0} phone numbers in the string.", matches.Count);
foreach (Match m in matches)
{
    Console.WriteLine(string.Format("Phone: {0}", m.Value));
}

If you wanted to perform Linq queries against a match collection, you first have to get it into a list that implements IEnumerable. I'm sure there are many ways to do this, but here is one way:

List<Match> mlist = matches.OfType<Match>().ToList();

Replacing text is so much better using the Regex.Replace instead of the standard String replace method. Here is a very simple replace, which will simple replace (808) with (800). It will replace all occurrences of (808). Notice that in the replace expression I don't need to escape the parentheses characters:

string text = "Please call us at (808) 867-5309 or (808) 555-1212 at your earliest convenience.";
string matchExpression = @"\(808\)";
string replaceExpression = @"(800)";
string newText = Regex.Replace(text, matchExpression, replaceExpression);
Console.WriteLine(newText);

Here is an example of reformatting the phone number to 800.555.1212 format. I'm using parentheses to group sections of the match. In the replace expression, you can refer to groups by their index using the "$" character. In the match expression, the $ is to match the end of the string, but in the replace expression, it refers to the matched sub-group.

string text = "Please call us at (808) 867-5309 or (808) 555-1212 at your earliest convenience.";
string matchExpression = @"\((\d{3})\)\s(\d{3})-(\d{4})";
string replaceExpression = @"$1.$2.$3";
string newText = Regex.Replace(text, matchExpression, replaceExpression);
Console.WriteLine(newText);


There are many things you cannot do, however, with the replace expression. An example is to change the case of the matched value. If you wanted to match html tags and make them all lowercase, for example, you would not be able to use the following match and replace expressions:

string text = "<P><H1>My Heading<h1><p>";
string matchExpression = @"((?:</?)\w+)(?=.*?>)";
string replaceExpression = "$1";
string newText = Regex.Replace(text, matchExpression, replaceExpression.ToLower());
Console.WriteLine(newText);

The ToLower() operates on the replace expression itself, not on the text it's replacing. In order to do this, and other complex replaces, you need to use a MatchEvaluator. This is really easy to do. Before I ever used it, I saw the word "delegate" and thought it would difficult to figure out, but it's so easy. You need to have a function that accepts a Match object as a parameter and returns a string to create a MatchEvaluator. Below is the code to replace the html tags with all lower case tags:

public string HtmlTagsToLower(string html)
{
    string matchExpression = @"((?:</?)\w+)(?=.*?>)";
    MatchEvaluator tolower = MatchEvaluatorToLower;
    return Regex.Replace(html, matchExpression, tolower);
}

private string MatchEvaluatorToLower(Match m)
{
    return m.Value.ToLower();
}

I hope this helps to understand how to use the Regex methods. Posted on Wednesday, September 17, 2008 4:09 PM C# Code | Back to top


Comments on this post: Matching and Grouping Regular Expressions using Regex in C#

# re: Matching and Grouping Regular Expressions using Regex in C#
Requesting Gravatar...
Thanks for this article. Was very useful.
Left by AzHofi on Mar 05, 2009 7:44 AM

# re: Matching and Grouping Regular Expressions using Regex in C#
Requesting Gravatar...
I' trying to find a FIle name by Using the three first digits via String Format.

the file is 4.0.123.txt but sometime is renamed
4.0.123.0.txt the digits I wanna use to find it are 4.0.123. but some how String.Format doesn;t like my query.
String.Format(\\b\\d{0}.{1}.{2}.txt\\b), 4, 0, 123)
Left by Virgilio on Mar 11, 2009 6:19 AM

# re: Matching and Grouping Regular Expressions using Regex in C#
Requesting Gravatar...
The reason why your search is failing is because of the periods. In a search expression, the period is symbol that stands for any printable character. You need to escape the periods like this:

4\.0\.123\.

To match either the string "4.0.123.txt" or "4.1.123.0.txt", you would use the following expression:

4\.0\.123\..*txt

In this expression, the ".*" stands for any character repeated 0 or more times.

To use string.Format to compile the expression, it would look like this:

string exp = string.Format("\b{0}\.{1}\.{2}\..*txt", "4", "0", "123")
Left by Anne Bougie on Mar 11, 2009 11:26 AM

# re: Matching and Grouping Regular Expressions using Regex in C#
Requesting Gravatar...
I found quite a bit of issues when using this expression to search for a file in a Directory using IO and REGEX.

I tried a number of regexes and none worked it kept complaining about not being a valid UNC.

I did Find a non very elgeant way to circumvent this by Using a Wild card.
string[] files = Directory.GetFiles(SourcePath.ToString(), String.Format("{0}.{1}.{2}*.txt", "No1", "No2", "No3"));
foreach (string fileName in files)
{
if (File.Exists(fileName))
Console.WriteLine(fileName);

}

--Not very Elgant but Works and less coding :)
Left by Virgilio on Mar 12, 2009 2:16 AM

# re: Matching and Grouping Regular Expressions using Regex in C#
Requesting Gravatar...
Great Article. At last found a solution to convert all html tags in a string to lowercase.
Left by Bhawani on Dec 14, 2009 4:27 PM

Your comment:
 (will show your gravatar)


Copyright © Annie Bougie | Powered by: GeeksWithBlogs.net