Recently in Mod_rewrite Category

2007 Apr 19

Looks like my post to avoid linking to hot images is not working, now i'm getting 85% bandwidth usage on my website. When i try to view the logs got numerous referrers from profiles.myspace.com. Alright, i have decided to block all traffic from myspace, will do it using .htaccess.

RewriteCond %{HTTP_USER_AGENT} QihooBot [NC,OR]
RewriteCond %{HTTP_REFERER} ^http(s)?://(www\.)?profile.myspace.com.*$ [NC]
RewriteRule ^.* - [F]

Now, there is a bot that i have found so i added it anyway. :D If you wish to block some more add an or on the [NC] directive like this [NC,OR].

2006 Dec 7

we could redirect our users from a 404 error page not found to any url by editing our .htaccess files. this is usefull to give us more control in handling vairous html error codes. using the ErrorDocument tag and specifying the error code 404 as the page not found error code, (other error code might also be used in the redirect like: 400 - Bad Syntax, 401 - Unauthorized, 402 - Not used, 403 - Forbidden, 404 - Not Found, 500 - Internal Error, 501 - Not Implemented, 502 - Overloaded, 503 - Gateway Timeout).

Redirecting to an error page:

ErrorDocument 404 /404.html
- redirects the user to the 404.html page

using Rewrite to do the redirect:

RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule (.+) http://www.tildemark.com/404page/$1

or

RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.+)testing\.html?$ temporary/$1

Display an html page:

ErrorDocument 404 <b>Page Not Found</b><p /><br /><a href="http://www.tildemark.com">Tildemark blogs homepage</a>

2006 Sep 13

Spam is a problem, posting entries to blogs with email address should be avoided. There are lots of automated programs used to collect email addresses; other than spam, bandwidth may also be an issue for these programs reads your entire website. If you only have a small bandwidth allocated to your site then you will be seeing that Bandwidth Limit Error in due time.

What I did? Blocking all unwanted robots out of my site using mod_rewrite by apache. First, you need to examine your access log file ang try to google on the robots that has visited your site if they are safe or just they are just scrapers. Just be carefull not to block those major search engine spiders like googlebot, inktomi slurp, msnbot or ask jeeves. Unless you don't want them crawl your website.

You need to modify your .htaccess file to block unwanted robots from scraping your website by:

<IfModule mod_rewrite.c>
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} ^Siphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^Link[Ww]alker [OR]
RewriteRule ^.* - [F]
</IfModule mod_rewrite.c>

The above code tells the spiders Siphon and LinkWalker that they are not allowed on our website by returning a 403 Forbidden Error.

There are also good robots, most of them are used for link checking, so redirecting them to the proper areas would be a better solution.

<IfModule mod_rewrite.c>
RewriteCond %{HTTP_USER_AGENT} reciprocalman [OR]
RewriteCond %{HTTP_USER_AGENT} LinksManager.com_bot
RewriteRule ^$ /resources/
</IfModule mod_rewrite.c>

The code above tells the reciprocalman and the LinksManager.com_bot to go directly to the resources directory.

2006 Sep 7

Bandwidth is precious, and seeing a bandwidth limit exceeded on your website is just so frustrating. Blocking unwanted referrers from your site may be your best option. If you are using apache as your webserver then you can take advantage of its mod_rewrite module to block unwanted referrers.

You need to modify your .htaccess file to block access to large files such as, images, mpeg, avi, etc. :

<IfModule mod_rewrite.c>
RewriteEngine on
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://([-a-z0-9]+\.)?domain\.com [NC]
RewriteRule .*\.(jpg|gif|avi|wmv|mpg|mpeg)$ http://www.domain.com/nohotlink.jpg [R,NC,L]
</ifModule>

2006 Aug 18

In apache, we can redirect pages having no www to its www counterpart without manualy adding the www on the url by using mod_rewrite. Like for example, try to type in my url as tildemark.com on your browser's address bar, hit Enter and automatically you will be redirected to http://www.tildemark.com. This is useful to avoid duplicate caching of pages and the division of your pages' PR.

We need to edit our .htaccess file to add the 301 redirect.

# .htaccess file
# we need to check if mod_rewrite has been enabled, 
# by default its not
<IfModule mod_rewrite.c>
  RewriteEngine On
  RewriteCond %{HTTP_HOST}       ^tildemark.com
  RewriteRule (.*)               http://www.tildemark.com/$1 [R=301,L] 
</IfModule>

Replace tildemark.com to your respective domain name. here's a mod_rewrite cheatsheet from ilovejackdaniels.com

Notes:
301 is an http status code meaning permanent redirect
.htaccess files should be placed on the root directory
PR is PageRank

About this Archive

This page is a archive of recent entries in the Mod_rewrite category.

Javascript is the previous category.

MySQL is the next category.

Find recent content on the main index or look in the archives to find all content.