Ivan Porto Carrero

Placeholder.Add("Really Cool Stuff");

  Home  |   Contact  |   Syndication    |   Login
  49 Posts | 6 Stories | 126 Comments | 88 Trackbacks

News

Article Categories

Archives

Post Categories

Personal Links

I have been looking for a good first layer of validating an url to see if it is valid.

For checking the format of the url it seems to me to be the most logical approach to use regular expressions. Up until now I always discarded them as being to “geeky”, meaning i don't consider it my life's biggest goal to be typing (/?[]\w) all day long (so why did i become a programmer, aaaah yes to make life easier for other people)

Anyway.. to find a good regular expression to that validates urls not url domains. One that doesn't allow spaces in the domainname and where the domain can be suffixed with the port number.  Also I need support for the ~/ paths

This is what I came up with.. if somebody as a better idea... or finds a mistake please let me know.. Always happy to learn something new.

^(((ht|f)tps?\:\/\/)|~/|/)?([a-zA-Z]{1}([\w\-]+\.)+([\w]{2,5})(:[\d]{1,5})?)/?(\w+\.[\w]{3,4})?((\?\w+=\w+)?(&\w+=\w+)*)?

I was a bit quickly in using this regex. Simeon pilgrim indicated that the ftp urls won't validate when you add a username and a password. 

I don't really need to validate ftp so I should have removed the ftp protocol from the list of choices.  I need this just to validate urls for weblinks and the link element in an rss feed.  When I need them for ftp I will post the ftp version.. but for now I don't have time to spend on elaborating the regex.

Anyway here is the right one : ^(http(s?)\:\/\/|~/|/)?([a-zA-Z]{1}([\w\-]+\.)+([\w]{2,5}))(:[\d]{1,5})?/?(\w+\.[\w]{3,4})?((\?\w+=\w+)?(&\w+=\w+)*)?

A full url validation would include resolving names through dns or making a webrequest to the provided url to see if we get a 200 response. The only way to be sure is to test if it is there in my opinion.

Thanks Simeon.

And for those who really want the ftp validation : ^((ht|f)tp(s?)\:\/\/|~/|/)?([\w]+:\w+@)?([a-zA-Z]{1}([\w\-]+\.)+([\w]{2,5}))(:[\d]{1,5})?/?(\w+\.[\w]{3,4})?((\?\w+=\w+)?(&\w+=\w+)*)?

I am not sure about numbers in the username but I believe you can have a username of numbers alone.

Comments don't seem to work on this blog engine.. so just send me a mail through the contact form. thanks

Two days later ...

I discovered there is still a problem with my regular expressions... folders don't get parsed.
I've solved the path issue, so now it should be finding all url's

Expression:
^((ht|f)tp(s?)\:\/\/|~/|/)?([\w]+:\w+@)?([a-zA-Z]{1}([\w\-]+\.)+([\w]{2,5}))(:[\d]{1,5})?((/?\w+/)+|/?)(\w+\.[\w]{3,4})?((\?\w+=\w+)?(&\w+=\w+)*)?

Should parse the url below
http://hh-1hallo.msn.blabla.com:80800/test/test/test.aspx?dd=dd&id=dki

But not :
http://hh-1hallo. msn.blablabla.com:80800/test/test.aspx?dd=dd&id=dki

 

posted on Thursday, December 01, 2005 8:15 AM

Feedback

# re: A good url regular expression ? 12/30/2005 7:44 PM Ivan Porto Carrero
I discovered it lacked support of commas the new regex would be:
^((ht|f)tp(s?)\:\/\/|~/|/)?([\w]+:\w+@)?(([a-zA-Z]{1}([\w\-]+\.)+([\w]{2,5}))(:[\d]{1,5})?)?((/?\w+/)+|/?)(\w+\.[\w]{3,4})?([,]\w+)*((\?\w+=\w+)?(&\w+=\w+)*([,]\w*)*)?

# re: A good url regular expression ? 1/25/2006 4:46 AM PRMan
Thanks for your help. I ended up with a different one.

# re: A good url regular expression ? 4/7/2006 9:32 AM Hamada
It is not validate this type of url's
http://www.beresfordltd.nf.net/signs-banners.jpg


the problem is the dash???

# re: A good url regular expression ? 12/9/2006 7:54 AM Resentless
Note: that the expression:
^((ht|f)tp(s?)\:\/\/|~/|/)?([\w]+:\w+@)?(([a-zA-Z]{1}([\w\-]+\.)+([\w]{2,5}))(:[\d]{1,5})?)?((/?\w+/)+|/?)(\w+\.[\w]{3,4})?([,]\w+)*((\?\w+=\w+)?(&\w+=\w+)*([,]\w*)*)?
Does NOT REQUIRE anything to be present and therefore ColdFusion will fail trying to find the URL as in the following example:
REFind("^((ht|f)tp(s?)\:\/\/|~/|/)?([\w]+:\w+@)?(([a-zA-Z]{1}([\w\-]+\.)+([\w]{2,5}))(:[\d]{1,5})?)?((/?\w+/)+|/?)(\w+\.[\w]{3,4})?([,]\w+)*((\?\w+=\w+)?(&\w+=\w+)*([,]\w*)*)?","before google.com after",1,"true")

This is because ColdFusion, if it cannot find an optional regular expression at the beginning of a string it will ignore it all together.

With the following modification (a couple of well-placed parentheses):
((ht|f)tp(s?)\:\/\/|~/|/)?([\w]+:\w+@)?(([a-zA-Z]{1}([\w\-]+\.)+([\w]{2,6})))((:[\d]{1,5})?)?((/?\w+/)+|/?)([\w\-]+\.[\w]{3,4})?([,]\w+)*((\?\w+=\w+)?(&\w+=\w+)*([,]\w*)*)?
The domain portion of the regular expression is forced to be present and Coldfusion will find the URL.

# re: A good url regular expression ? 1/27/2007 10:29 AM sdfsdfs
sdfsdfs

# re: A good url regular expression ? 1/31/2007 1:17 AM Web Developer
If you're using Java, I find the best way to validate a URL is to use the URL class and try to get the content. The benefit of doing this is a) it validates ANY valid url that the Java community has deemed valid, and b) shows you if a url actually points somewhere!

# re: A good url regular expression ? 1/31/2007 1:18 AM Web Developer
Oh and here's the code for it:

package com.keteracel.urlchecker;

import java.io.FileNotFoundException;
import java.io.IOException;
import java.net.MalformedURLException;
import java.net.URL;
import java.net.UnknownHostException;

public class URLChecker {
public static void main(String[] args) {
try {
URL url = new URL("http://java.sun.com/docs/books/tutorial/reallybigindex.html");
url.getContent();
System.out.println("URL OK");
} catch (UnknownHostException e) {
System.out.println("Unknown Host");
} catch (MalformedURLException e) {
System.out.println("Bad URL: " + e.getMessage().substring(0, e.getMessage().lastIndexOf(':')));
} catch (FileNotFoundException e) {
System.out.println("404 error returned");
} catch (IOException e) {
System.out.println("Communication failure");
}
}
}

# re: A good url regular expression ? 3/26/2007 7:38 PM just3ala2
What about local addresses
like
http://localhost/Project/webFroms/webForm.aspx
It doesn't pass

# re: A good url regular expression ? 3/26/2007 7:43 PM just3ala2
What about local addresses
like
http://localhost/Project/webFroms/webForm.aspx
It doesn't pass

# re: A good url regular expression ? 3/26/2007 8:06 PM Ivan Porto Carrero
Web developer:
If you have to parse 100's or 1000's urls it's comforting to know that a regular expression can weed out most of the bad addresses.
but to hightlight links etc in pages a regex is a nicer solution in my opinion
just3ala2:
To check for hosts that don't have a domain associated with them you can use this expression:
^((ht|f)tp(s?)\:\/\/|~/|/)?([\w]+:\w+@)?(([a-zA-Z]{1}([\w\-]+\.?)*(\.[\w]{2,5})?)(:[\d]{1,5})?)?((/?\w+/)+|/?)(\w+\.[\w]{3,4})?([,]\w+)*((\?\w+=\w+)?(&\w+=\w+)*([,]\w*)*)?


# re: A good url regular expression ? 4/23/2007 10:21 AM Jonathan
What about mailto links?

# re: A good url regular expression ? 4/27/2007 8:46 AM re
plz help


hoe to hide aspx pages with .htm extension

# re: A good url regular expression ? 5/22/2007 4:04 PM Mike Cronin
FYI... the java.net.URL will return a MalformedURLException when attempting secure (HTTPS). So that concept is fine if you do not intend to validate a secure URL.

# re: A good url regular expression ? 5/25/2007 3:32 AM Wil
the motherload of URL regex:
read the article and click on the regex link at the top to see the regex..
http://foad.org/~abigail/Perl/url2.html


# re: A good url regular expression ? 10/18/2007 12:28 AM Veeresh
The above java code, takes extra to time just to validate whether it is a valid url or not, its need to connect the remote system and wait for response to validate the that url. Its not effecient way to validate it. We may have to find some other alternatives to validate it quickly.

The URL expression should also contain required group in it to extract required fields in it, may be domain name, subnet, webpage, etc.

Veeresh D.
http://drveresh.googlepages.com

# re: A good url regular expression ? 11/16/2007 1:38 AM Nanda
I want regular expression that validates the url in javascript

# re: A good url regular expression ? 12/26/2007 10:38 PM vivek kumar
^((ht|f)tp(s?)\:\/\/|~/|/)?([\w]+:\w+@)?(([a-zA-Z]{1}([\w\-]+\.?)*(\.[\w]{2,5})?)(:[\d]{1,5})?)?((/?\w+/)+|/?)(\w+\.[\w]{3,4})?([,]\w+)*((\?\w+=\w+)?(&\w+=\w+)*([,]\w*)*)?


In the above URL validator, i don't want to force the user to input http/https/ftp/ftps. please suggest the RegExp.

Thanks

# re: A good url regular expression ? 2/1/2008 5:58 AM Priya
hi, i need a url validation (http://www.domainname.com/filename) for this can you suggest the regular expression

# re: A good url regular expression ? 2/1/2008 6:07 AM Priya
hi, the validation must accept both (www.domainname.com) and (http://www.domainname.com/filename)... Pls help me...

# re: A good url regular expression ? 2/3/2008 3:53 PM Force4
Hello,

There are some more problems with this pattern.

For instance, this URL should not match, but it does :
http://www.server.com/dir/doc.php&foo=bar
(The "search part" of a url should begin with `?' and not `&'.)

There is only one character `?' which may be removed to modify your original pattern :
^((ht|f)tp(s?)\:\/\/|~/|/)?([\w]+:\w+@)?([a-zA-Z]{1}([\w\-]+\.)+([\w]{2,5}))(:[\d]{1,5})?((/?\w+/)+|/?)(\w+\.[\w]{3,4})?((\?\w+(=\w+)?)(&\w+(=\w+)?)*)?


# re: A good url regular expression ? 2/6/2008 6:07 AM Avinash
HI,
I have used this Regular expression for URL validation,it's working fine in simple HTML file.But when we use in Web application ,occurred Javascript errors like Syntax error in Regular expression...

Please Suggest......

# re: A good url regular expression ? 2/25/2008 12:38 AM Ganga
Hi,

I need a regex validation for the www.academic.testing.com/marketing/lamb



# re: A good url regular expression ? 2/27/2008 2:15 AM frferfer
dwedwdewdw

# re: A good url regular expression ? 2/27/2008 2:15 AM frferfer
dwedwdewdw dwe ew wee ew

# re: A good url regular expression ? 3/5/2008 12:47 PM Alan
There are other protocols besides http: and ftp:

http://www.w3.org/Addressing/URL/url-spec.txt specifies 13 of them, and there also exist non-w3 URLs, including non-alphabetic ones, for example a URL to a Subversion repository can start with "svn+ssh://"

Hence I would start the expression with

([a-zA-Z+]){3,7})\:\/\/

# re: A good url regular expression ? 3/17/2008 12:51 PM Cecilia
I am using this expression, because I only need de short path
ex. /Page/Name.aspx

(/)\w+([-+.'/]\w+)*.\w+([-.]\w+)*\.\w+([-.]\w+)*



# Mr 3/26/2008 8:00 AM ecards and online greeting cards
This blog post has become extremely useful and confusing at the same time because there have been so many corrections and variations.

For example, the corrections in the original post, did they come as a result of the comments below? Or was the original post truly in 2005 and all the comments came after?

It'd be great to clearly state what the final best known revision for each scenario is: with ftp, without ftp, etc, etc.

regards,
ltd

# re: A good url regular expression ? 5/20/2008 6:01 AM Sanjida
If i put the address like this----
ww.mail.yahhoo.com
it will work. Here is the error. i havent write 3 w . and i have mispelling the yahoo webserver. so why should ur RE work.
Please let me know the details. I am waiting......

# re: A good url regular expression ? 5/20/2008 6:56 AM Ivan Porto Carrero
There is no law that says you have to start a url with www. I'm using that url expression as a means to highlight possible links in a text to make them clickable.

look here for more info:
http://www.w3.org/Addressing/URL/url-spec.txt


# re: A good url regular expression ? 5/20/2008 3:27 PM Joe
Here is a good url regex that covers most of my problems has comments and needs case insensitive turned on.


^(?#Protocol)(?:(?:ht|f)tp(?:s?)\:\/\/|~/|/)?(?#Username:Password)(?:\w+:\w+@)?(?#Subdomains)(?:(?:[-\w]+\.)+(?#TopLevel Domains)(?:com|org|net|gov|mil|biz|info|mobi|name|aero|jobs|museum|travel|[a-z]{2}))(?#Port)(?::[\d]{1,5})?(?#Directories)(?:(?:(?:/(?:[-\w~!$+|.,=]|%[a-f\d]{2})+)+|/)+|\?|#)?(?#Query)(?:(?:\?(?:[-\w~!$+|.,*:]|%[a-f\d{2}])+=(?:[-\w~!$+|.,*:=]|%[a-f\d]{2})+)(?:&(?:[-\w~!$+|.,*:]|%[a-f\d{2}])+=(?:[-\w~!$+|.,*:=]|%[a-f\d]{2})+)*)*(?#Anchor)(?:#(?:[-\w~!$+|.,*:=]|%[a-f\d]{2})+)?(?#What not to end in)[^.!,:;?]$

# re: A good url regular expression ? 5/20/2008 3:38 PM Joe
Update to the last really long regex to add support for empty parameters. This will validate a url like
http://www.google.com/products?q=some%20search&rls=com.microsoft:*&ie=UTF-8&oe=UTF-8&startIndex=&startPage=1&um=1&sa=N&tab=wf



^(?#Protocol)(?:(?:ht|f)tp(?:s?)\:\/\/|~/|/)?(?#Username:Password)(?:\w+:\w+@)?(?#Subdomains)(?:(?:[-\w]+\.)+(?#TopLevel Domains)(?:com|org|net|gov|mil|biz|info|mobi|name|aero|jobs|museum|travel|[a-z]{2}))(?#Port)(?::[\d]{1,5})?(?#Directories)(?:(?:(?:/(?:[-\w~!$+|.,=]|%[a-f\d]{2})+)+|/)+|\?|#)?(?#Query)(?:(?:\?(?:[-\w~!$+|.,*:]|%[a-f\d{2}])+=(?:[-\w~!$+|.,*:=]|%[a-f\d]{2})*)(?:&(?:[-\w~!$+|.,*:]|%[a-f\d{2}])+=(?:[-\w~!$+|.,*:=]|%[a-f\d]{2})*)*)*(?#Anchor)(?:#(?:[-\w~!$+|.,*:=]|%[a-f\d]{2})+)?(?#What not to end in)[^.!,:;?]$

# re: A good url regular expression ? 5/21/2008 5:28 AM Sergey
Regex incorrect validate www.google.ru

# re: A good url regular expression ? 5/21/2008 9:21 AM Joe
Sorry I would leave out the last part after (?#What not to end in). Thats what messed it up.

I have tested these url's in Regex Buddy

http://www.google.com/search?q=good+url+regex&rls=com.microsoft:*&ie=UTF-8&oe=UTF-8&startIndex=&startPage=1
ftp://joe:password@ftp.filetransferprotocal.com
google.ru
https://some-url.com?query=&name=joe?filter=*.*#some_anchor


\w~!$+|.,=]|%[a-f\d]{2})+)+|/)+|\?|#)?(?#Query)(?:(?:\?(?:[-\w~!$+|.,*:]|%[a-f\d{2}])+=(?:[-\w~!$+|.,*:=]|%[a-f\d]{2})*)(?:&(?:[-\w~!$+|.,*:]|%[a-f\d{2}])+=(?:[-\w~!$+|.,*:=]|%[a-f\d]{2})*)*)*(?#Anchor)(?:#(?:[-\w~!$+|.,*:=]|%[a-f\d]{2})*)?$

# re: A good url regular expression ? 5/21/2008 9:22 AM Joe
Sorry for flooding that last paste was bad....


^(?#Protocol)(?:(?:ht|f)tp(?:s?)\:\/\/|~/|/)?(?#Username:Password)(?:\w+:\w+@)?(?#Subdomains)(?:(?:[-\w]+\.)+(?#TopLevel Domains)(?:com|org|net|gov|mil|biz|info|mobi|name|aero|jobs|museum|travel|[a-z]{2}))(?#Port)(?::[\d]{1,5})?(?#Directories)(?:(?:(?:/(?:[-\w~!$+|.,=]|%[a-f\d]{2})+)+|/)+|\?|#)?(?#Query)(?:(?:\?(?:[-\w~!$+|.,*:]|%[a-f\d{2}])+=(?:[-\w~!$+|.,*:=]|%[a-f\d]{2})*)(?:&(?:[-\w~!$+|.,*:]|%[a-f\d{2}])+=(?:[-\w~!$+|.,*:=]|%[a-f\d]{2})*)*)*(?#Anchor)(?:#(?:[-\w~!$+|.,*:=]|%[a-f\d]{2})*)?$

# re: A good url regular expression ? 5/21/2008 4:15 PM Ivan Porto Carrero
Thanks for that Joe :)

# re: A good url regular expression ? 5/21/2008 8:46 PM Mike
Could someone post a pure http version of the regex?

# re: A good url regular expression ? 5/23/2008 6:43 AM Sergey
Hi Joe,

Still does not work www.google.ru

Thanks!

# re: A good url regular expression ? 5/27/2008 2:39 PM k
d

# re: A good url regular expression ? 6/5/2008 6:57 PM Aaron
Sergey, give this one a try:

^((https?):\/\/(?:([a-zA-Z\d\-_]+)@?([a-zA-Z\d\-_]+)\:)?((?:(?:(?:(?:[a-zA-Z\d](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?)\.)*([a-zA-Z](?:(?:[a-zA-Z\d]|-)*[a-zA-Z\d])?))|(?:(?:\d+)(?:\.(?:\d+)){3}))(?::(\d+))?)(?:\/((?:(?:(?:[a-zA-Z\d$\-_.+!*'(),~]|(?:%[a-fA-F\d]{2}))|[;:@&=])*)(?:\/(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),~]|(?:%[a-fA-F\d]{2}))|[;:@&=])*))*)(\?(?:(?:(?:[a-zA-Z\d$\-_.+!*'(),~]|(?:%[a-fA-F\d]{2}))|[;:@&=])*))?)?)$

# re: A good url regular expression ? 6/6/2008 4:59 PM Imbrod
Hi,
can you post a reg.exp. that would match ANY form of URL:

http://www.aaa.com/bbb/ccc.htm
https://www.aaa.com/bbb/ccc.htm
www.aaa.com/bbb/ccc.htm

with or without querystrings and hash?

I need to write function in VB.NET that would replace all URLs in text with the hyperlinks.

# re: A good url regular expression ? 6/6/2008 5:34 PM Imbrod
Me again.
I found excellent URL seeker, it works with or without protocol. Written by James Tikitiki:

(((ht|f)tp(s?):\/\/)|(www\.[^ \[\]\(\)\n\r\t]+)|(([012]?[0-9]{1,2}\.){3}[012]?[0-9]{1,2})\/)([^ \[\]\(\),;"'<>\n\r\t]+)([^\. \[\]\(\),;"'<>\n\r\t])|(([012]?[0-9]{1,2}\.){3}[012]?[0-9]{1,2})

# re: A good url regular expression ? 6/9/2008 1:03 AM Justin
I would add in support for "mailto:" since it's sometimes a necessary value when working with CMS's.

Also, support for IP domains or fragments (e.g.: asdf.com/#fragment) would round this out as a very complete expression.

# re: A good url regular expression ? 6/9/2008 5:15 AM Sergey
Hi Aaron,

Still does not work, i've noticed another issue. the regex incorrect validate http://localhost, http://localhost:81

Thanks,
Sergey

Post Feedback

Title:
Name:
Email: (never displayed)
Url:
Comments: 
Please add 8 and 5 and type the answer here: