Geeks With Blogs
Shahed's blog Sharing my thoughts and work

Recently, I faced a problem where I need to ensure that someone really clicked on a link from a browser. One possible option could be using javascript instead of a direct link. But as many of the referrer sites have already set the link in their pages, I can't ask all of them to fix in their sites. And although this can solve bot or crawler but can't block if someone programmatically hits this page.

The page link was like following:

One option is the check on server side to prevent Crawler. But this is not enough for all cases.

if( Request.Browser.Crawler )
      //do something else

So, I did a trick. Instead of directly reffereing to the actual url, first redirect to a dummy page. And the dummy page will redirect to actual url using javascript. We know that javascript will work only in browsers. Which means anyone hitting the page programmatically will get the dummy page. 

You can also generate the actual page url at runtime and managed that in Application_BeginRequest in global.ascx. Although this saves you from automated hit attacts but also creates a problem. It creates a loop if you press the "browser->back" button from your site in IE6. Because the browser will take you back to the dummy page and dummy page will send you back again. This could be resolved easily if I could have little more access in window.history object. But unfortunately, recent browsers restrict access of this object and allows only go(), back(), forward() and length property.

Now, the problem is how the dummy page will know whether it has been redirected from a referrer site or from your own site (if user clicks "browser->back" button)?

Here is the trick: add a marker in the url as bookmark.

// Check if its redirected from pageflakes
if( document.location.href.indexOf('#marker') > 0 )
       // User clicked the Browser.back(), redirect to referrer site
       document.location.href= document.referrer;
       // From referrer site, redirect to my site
       var redirectUrl = 'mypageX.aspx?<%= Request.RawUrl.Split('?')[1].Replace("'","\\'") %>';
       document.location.href= redirectUrl;

Fell free to comment if you have any other ideas.

Posted on Tuesday, January 9, 2007 1:44 PM | Back to top

Comments on this post: Stop bot, crawler or automated programs from hiting a page

No comments posted yet.
Your comment:
 (will show your gravatar)

Copyright © Shahedul Huq Khandkar | Powered by: