Geeks With Blogs
AzamSharp Some day I will know everything. I hope that day never comes.

Sometimes you need to find out that if the urls on the page exists or not. The following code reads the HTML of the page and extracts all the urls and finally checks if the url exists or not.

Take a look at the following code:

protected void Button1_Click(object sender, EventArgs e)
    {
        WebRequest req = WebRequest.Create("http://localhost:1852/
        LookIntoDoPostBack/UrlList.aspx");
        HttpWebResponse res = (HttpWebResponse) req.GetResponse();
        Stream stream =  res.GetResponseStream();
        ArrayList badUrls = 
new ArrayList(); 

        StreamReader reader = 
new StreamReader(stream);
        
string html = reader.ReadToEnd();
    
        
// Get the links 
        
string pattern = @"((http|ftp|https):\/\/w{3}[\d]*.|(http|ftp|https)
        :\/\/|w{3}[\d]*.)([\w\d\._\-#\(\)\[\]\\,;:]+@[\w\d\._\-#\(\)\[\]\\
        ,;:])?([a-z0-9]+.)*[a-z\-0-9]+.([a-z]{2,3})?[a-z]{2,6}(:[0-9]+)?(\/
        [\/a-z0-9\._\-,]+)*[a-z0-9\-_\.\s\%]+(\?[a-z0-9=%&\.\-,#]+)?";

        Regex r = 
new Regex(pattern);
        MatchCollection mC = r.Matches(html); 

        
// Iterate through the collection and find if the Url Exists or not 

        
foreach (Match m in mC)
        {
            
if (!DoesUrlExists(m.Value))
            {
                
// Add to the broken urls 
                
badUrls.Add(m.Value); 
            }
        } 
      
        
// Display the bad urls in the GridView control 
        
gvBadUrls.DataSource = badUrls;
        gvBadUrls.DataBind(); 
    }

    
private bool DoesUrlExists(string url)
    {
        
bool urlExists = false;
        WebRequest req = WebRequest.Create(url);

        
try
        
{  
            HttpWebResponse response = (HttpWebResponse) req.GetResponse();
            urlExists = 
true
        }
        
catch (System.Net.WebException ex)
        {
          
        }

        
return urlExists; 
    }

When I find a bad url I simply put it in a ArrayList. Later I display the bad urls in the GridView control. The code will not display any bad url if your ISP is tranfering you to a custom page when the Page Not Found exception is thrown. Also, this process of checking the url is very time consuming so I suggest if you use it then try to run this process in a different thread.

powered by IMHO 1.3

Posted on Thursday, June 8, 2006 8:45 PM | Back to top


Comments on this post: Checking if the URL Exists!

# re: Checking if the URL Exists!
Requesting Gravatar...
I have to say, I really like this design, and the widget support is a nice bonus
Left by 虚拟主机 on Jun 06, 2007 1:35 AM

# re: Checking if the URL Exists!
Requesting Gravatar...
I've tried this code and it made my site go down. Got a lot of timeouts when trying to use this method too often.

Any ideas?

thanks
Left by Water Bongs on Sep 19, 2007 5:12 PM

Your comment:
 (will show your gravatar)


Copyright © Mohammad Azam | Powered by: GeeksWithBlogs.net