Geeks With Blogs

News INETA Community Speakers Program
GeeksWithBlogs.net: WTFNext's hosting!

View Stacy Vicknair's profile on LinkedIn


WTF Next? Dev ramblings from a master of nothing.

This post is mostly inspired by a question I came across in the MSDN forums. Basically, say we had a set of data in a string that looked something like this and we just HAAAAAD to use regular expressions to get the info out (I’m insinuating maybe string.split would work better in this situation…)

Dim dataToSearch As String = "-Cat-Dog-Meese-Chardonnay-"

Well, the pattern is simple right? Just throw in a regex for "-[A-Za-z]+-" and we’re good to go.

Dim reg As New Regex("-[A-Za-z]+-")

For Each m As Match In reg.Matches(dataToSearch)
    Console.WriteLine(m.Value)
Next

However, we get this.

image

So what happened? Well, The regex doesn’t catch overlapping results, so we’re just left with our Cat and Meese (the obvious plural of mice). Is there something we can do in this case? Yep. This is where we start looking around.

In regular expressions, just like crossing the street, you can look both ways (although for crossing the street, you ought to. You might not need to for RegEx). Looking ahead is like trying to peek around traffic when the interstates stops. You know what you are doing, but you need to know what’s happening further down the line that might influence you.

So, if we want to look ahead from our first term without actually consuming the dash, this will allow us something like this:

 

Dim dataToSearch As String = "-Cat-Dog-Meese-Chardonnay-"

Dim reg As New Regex("-[A-Za-z]+(?=-)")

For Each m As Match In reg.Matches(dataToSearch)
    Console.WriteLine(m.Value)
Next
Console.ReadLine()

And our result becomes:

image

Hoorah! We’ve got all the terms now. So let’s look at what we’ve changed. Instead of just a dash after our [A-Za-z] now we’ve got (?=-). What this is saying is that we don’t want to consume this value, but our regex ought to have a dash afterwards. Say we wanted to work this differently and instead say we want to end on any non [A-Za-z] instead we could do (?![A-Za-z]) and we’d get the same result. This is a negative lookahead, and is simply just looking for what we don’t want to find instead of what we do.

Well, what about that first dash? I don’t really care for it either, so let’s look and a look behind to get rid of it. Keeping with our example, you might compare this to looking in the rearview mirror to make sure you aren’t being followed.

Dim dataToSearch As String = "-Cat-Dog-Meese-Chardonnay-"

Dim reg As New Regex("(?<=-)[A-Za-z]+(?=-)")

For Each m As Match In reg.Matches(dataToSearch)
    Console.WriteLine(m.Value)
Next
Console.ReadLine()

Now our results are nice and clean (except maybe the dog and our chardonnay that is in mixed company…)

image

So what is different about a look behind? Now, we’re looking at what comes before our area of interest. And since we aren’t consuming it, just saying that the dash better be there, it isn’t a part of the result!

On a similar note to looking ahead, we can also do a negative lookbehind by using (?<![A-Za-z]).

 

So that concludes our quick look at regular expressions, from all angles. I hope you found these examples enjoyable, and for more information be sure to check out these helpful links:

http://www.regular-expressions.info/lookaround.html

http://social.msdn.microsoft.com/Forums/en-US/regexp/threads/

 

Technorati Tags: ,
Posted on Saturday, January 17, 2009 12:37 PM VB .NET | Back to top


Comments on this post: Looking at regular expressions, behind and ahead.

No comments posted yet.
Your comment:
 (will show your gravatar)


Copyright © Stacy Vicknair | Powered by: GeeksWithBlogs.net