A guide to Mod_Rewrite


What is mod_rewrite?

Mod_rewrite is a rule-based rewriting engine (based on a regular-expression parser) to rewrite requested URLs on the fly. It supports an unlimited number of rules and an unlimited number of attached rule conditions for each rule, to provide a really flexible and powerful URL manipulation mechanism. The URL manipulations can depend on various tests, of server variables, environment variables, HTTP headers, or time stamps. Even external database lookups in various formats can be used to achieve very granular URL matching.

This module operates on full URLs, including the path information, both on a per-server context as well as a per-directory context (.htaccess) and can generate query-string parts on result. The rewritten result can lead to internal sub-processing, external request redirection or even to an internal proxy (mod_proxy). However keep in mind, just like anything else that you might place inside of an .htaccess file, the more rules and conditions you create for mod_rewrite to interpret the greater the load will be on the system that parses those rules. Higher loads will typically translate into poor site performance.

How Does it Work?

API Phases

Apache processes HTTP request in several different phases. A hook for each of these phases is provided by the Apache API. Mod_rewrite uses two of these hooks in order to make modifications to request.

  • The URL-to-filename translation hook (used after the HTTP request has been read, but before any authorization starts)
  • The Fixup hook (triggered after the authorization phases, and after the per-directory config files (.htaccess) have been read, but before the content handler is activated).

Once a request comes in, and Apache has determined the appropriate server (or virtual server), the rewrite engine starts the URL-to-filename translation, processing the mod_rewrite directives from the per-server configuration. A few steps later, when the final data directories are found, the per-directory configuration directives of mod_rewrite are triggered in the Fixup phase. Most of the work you will do in our environments will involve the Fixup phase as opposed to the per-server configurations.

Rule set Processing

When mod_rewrite is triggered during these two API phases, it reads the relevant rule sets from its configuration structure (which was either created on start up, for a per-server context, or during the directory traversal for a per-directory context). The URL rewriting engine is started with the appropriate rule set (one or more rules together with their conditions), and its operation is exactly the same for both configuration contexts. Only the final result processing is different.

The order of the rules in the rule set is important because the rewrite engine processes them in a particular (not always obvious) order.

  • The rewrite engine loops through the rule sets, rule by rule.
    (Each rule set is made up of RewriteRule directives with, or without, RewriteConditions)
  • When a particular rule is matched, mod_rewrite also checks the corresponding conditions.
  • For historical reasons the conditions are given first, making the control flow a little bit long-winded. See the image below.
mod_rewrite_fig1


The control flow of the rewrite engine through a rewrite rule set

Here is a breakdown of the process that repeats as is seen in the diagram.

  1. Current URL
  2. RewriteRule
  3. Pattern
    1. RewriteCond
    2. TestString
    3. CondPattern
  4. Substitution
  5. Rewritten URL

As above, first the Current URL is matched against the Pattern of a rule. If it does not match, mod_rewrite immediately stops processing that rule, and goes on to the next rule. If the Pattern matches, mod_rewrite checks for Rule Conditions. If none are present, the URL will be replaced with a new string, constructed from the Substitution string, and mod_rewrite goes on to the next rule.

If a RewriteCond exist, an inner loop is started, processing them in the order that they are listed. Conditions are not matched against the current URL directly. A TestString is constructed by expanding variables, back-references, map lookups, etc., against which the CondPattern is matched. If the pattern fails to match one of the conditions, the complete set of rules and associated conditions will fail. If the pattern matches a given condition, then matching continues to the next condition, until no more conditions are available. If all conditions match, processing is continued with the substitution of the Substitution string for the request currently being processed.

Environment Variables

Mod_rewrite keeps track of two additional (non-standard) CGI/SSI environment variables named SCRIPT_URL and SCRIPT_URI. These contain the logical Web-view to the current resource, while the standard CGI/SSI variables SCRIPT_NAME and SCRIPT_FILENAME contain the physical System-view.

Note: These variables hold the URI/URL as they were initially requested, that is, before any rewriting has occurred. This is important to note because the rewriting process is primarily used to rewrite logical URLs to physical pathnames.

SCRIPT_NAME=/sw/lib/w3s/tree/global/u/rse/.www/index.html
SCRIPT_FILENAME=/u/rse/.www/index.html
SCRIPT_URL=/u/rse/
SCRIPT_URI=http://en1.engelschall.com/u/rse/

Rewrite Log

The RewriteLog directive sets the name of the file to which the server logs any rewriting actions it performs. If the name does not begin with a slash ('/') then it is assumed to be relative to the Server Root.

The directive should occur only once per server configuration.

  • That means that this is a Server Level Directive.
  • Keep in mind that if you chose to use this in a cPanel or Plesk environment you will need to either modify the main Apache configuration or you will need to create a custom include.
  • This can be especially useful while trying to trouble shoot rules and conditions.

The easiest way to get log information for a single page load is to do the following :

  • Load the page you wish to check in a browser
  • Add the configuration to the website's VirtualHost entry in the Apache configuration
  • Restart Apache
  • Refresh the page once
  • Remove (or comment out) the RewriteLog configuration
  • Restart Apache again.
RewriteLog "/usr/local/apache/logs/rewrite.log"
RewriteLogLevel 3

Apache 2.4+

This directive has changed in later versions of Apache as mod_rewrite has become a more integral part of the Apache Core. Trace levels range from trace1 to trace8 and work virtually the same as the above directive. All log data related to Rewrite Logging in Apache 2.4+ is generally routed to the error_log for the server with this change.

LogLevel alert rewrite:trace3

Disable logging

  • Remove or comment out the RewriteLog directive or use RewriteLogLevel 0.Make sure you remove, comment out, or disable logging if you enable it on a server once you are done troubleshooting the rule sets.
  • Do not set the Filename argument to /dev/null.
    Although the rewrite engine will no longer output to a logfile it still creates the logfile output internally.
    This will slow down the server with and provides no real advantage.

Rewrite Log Level

The RewriteLogLevel directive sets the verbosity level of the rewrite log file. The default level 0 means no logging, while 9 or more means that practically all actions are logged.

RewriteLog "/usr/local/apache/logs/rewrite.log"
RewriteLogLevel 3

Using a high value for Level will slow down your Apache server dramatically! Use the rewriting log file at a Level greater than 2 only for debugging!

Reading Apache Rewrite Logs

The following is an example of the log generated from a single page on a default Rails application.

...
192.168.1.2 - - [07/Oct/2012:23:50:33 --0500] [gem-installer.com/sid#ccfb768][rid#ceb4550/initial] (2) [perdir /home/example/public_html/] rewrite '' -> 'http://127.0.0.1:12001/'
192.168.1.2 - - [07/Oct/2012:23:50:33 --0500] [gem-installer.com/sid#ccfb768][rid#ceb4550/initial] (2) [perdir /home/example/public_html/] escaped URI in per-dir context for proxy, http://127.0.0.1:12001/ -> http://127.0.0.1:12001/
192.168.1.2 - - [07/Oct/2012:23:50:33 --0500] [gem-installer.com/sid#ccfb768][rid#ceb4550/initial] (2) [perdir /home/example/public_html/] forcing proxy-throughput with http://127.0.0.1:12001/
192.168.1.2 - - [07/Oct/2012:23:50:33 --0500] [gem-installer.com/sid#ccfb768][rid#ceb4550/initial] (1) [perdir /home/example/public_html/] go-ahead with proxy request proxy:http://127.0.0.1:12001/ [OK]
192.168.1.2 - - [07/Oct/2012:23:50:33 --0500] [gem-installer.com/sid#ccfb768][rid#cea24c0/initial] (2) [perdir /home/example/public_html/] rewrite 'javascripts/prototype.js' -> 'http://127.0.0.1:12001/javascripts/prototype.js'
192.168.1.2 - - [07/Oct/2012:23:50:33 --0500] [gem-installer.com/sid#ccfb768][rid#cea24c0/initial] (2) [perdir /home/example/public_html/] escaped URI in per-dir context for proxy, http://127.0.0.1:12001/javascripts/prototype.js -> http://127.0.0.1:12001/javascripts/prototype.js
192.168.1.2 - - [07/Oct/2012:23:50:33 --0500] [gem-installer.com/sid#ccfb768][rid#cea24c0/initial] (2) [perdir /home/example/public_html/] forcing proxy-throughput with http://127.0.0.1:12001/javascripts/prototype.js
192.168.1.2 - - [07/Oct/2012:23:50:33 --0500] [gem-installer.com/sid#ccfb768][rid#cea24c0/initial] (1) [perdir /home/example/public_html/] go-ahead with proxy request proxy:http://127.0.0.1:12001/javascripts/prototype.js [OK]
192.168.1.2 - - [07/Oct/2012:23:50:33 --0500] [gem-installer.com/sid#ccfb768][rid#ced1630/initial] (2) [perdir /home/example/public_html/] rewrite 'javascripts/effects.js' -> 'http://127.0.0.1:12001/javascripts/effects.js'
...

In this particular case the rewrite log consists of the sections below but many of the log entries will be similar whether we are looking at a Rails application or a normal page redirection:

  • Remote host IP address
  • Remote login name Will usually be "-"
  • HTTP user auth name Username, or "-" if no auth
  • Date and time of request
  • Virtualhost and virtualhost ID
  • Request ID, and whether it's a subrequest
  • Log entry severity level
  • Perdir
  • Text error message

The perdir entry is of particular value as it is the location of the .htaccess file with the RewriteRule being processed. You'll notice here that the request id (rid) is not quite as useful as it could be due to a single page load consisting of multiple requests to the server.

To provide a more common example, the following is the output from a single page load of a WordPress site with the basic WordPress rewrite rules:

...
192.168.1.2 - - [08/Oct/2012:02:35:31 --0500] [example.com/sid#4e4c398][rid#7fd8e4002970/initial] (2) [perdir /home/example/public_html/] rewrite 'about/' -> '/index.php'
192.168.1.2 - - [08/Oct/2012:02:35:31 --0500] [example.com/sid#4e4c398][rid#7fd8e4002970/initial] (2) [perdir /home/example/public_html/] trying to replace prefix /home/example/public_html/ with /
192.168.1.2 - - [08/Oct/2012:02:35:31 --0500] [example.com/sid#4e4c398][rid#7fd8e4002970/initial] (1) [perdir /home/example/public_html/] internal redirect with /index.php [INTERNAL REDIRECT]
192.168.1.2 - - [08/Oct/2012:02:35:31 --0500] [example.com/sid#4e4c398][rid#7fd8e4037fa8/initial/redir#1] (1) [perdir /home/example/public_html/] pass through /home/example/public_html/index.php
192.168.1.2 - - [08/Oct/2012:02:35:33 --0500] [example.com/sid#4e4c398][rid#7fd8d8004980/initial] (1) [perdir /home/example/public_html/] pass through /home/example/public_html/wp-content/plugins/nextgen-gallery/css/nggallery.css
192.168.1.2 - - [08/Oct/2012:02:35:33 --0500] [example.com/sid#4e4c398][rid#7fd89c002970/initial] (1) [perdir /home/example/public_html/] pass through /home/example/public_html/wp-content/plugins/nextgen-gallery/shutter/shutter-reloaded.css
192.168.1.2 - - [08/Oct/2012:02:35:33 --0500] [example.com/sid#4e4c398][rid#7fd8a4002970/initial] (1) [perdir /home/example/public_html/] pass through /home/example/public_html/wp-content/themes/Chameleon/epanel/page_templates/js/fancybox/jquery.fancybox-1.3.4.css
192.168.1.2 - - [08/Oct/2012:02:35:33 --0500] [example.com/sid#4e4c398][rid#7fd8cc002970/initial] (1) [perdir /home/example/public_html/] pass through /home/example/public_html/wp-content/themes/Chameleon/epanel/shortcodes/shortcodes.css
192.168.1.2 - - [08/Oct/2012:02:35:33 --0500] [example.com/sid#4e4c398][rid#7fd884002970/initial] (1) [perdir /home/example/public_html/] pass through /home/example/public_html/wp-content/themes/Chameleon/css/responsive.css
...

You can see that the requested path about/ is being rewritten to index.php and the various images, stylesheets and javascript files are being passed through to the path of the file itself. The most useful information in these logs is seeing what is being rewritten or redirected and where, as well as what is being passed through unchanged. Additionally, the log lines will let you know which directory's .htaccess file contains the RewriteRule being acted on.

Portability

You may have noticed during your work in live environments that many stock Rule Sets, such as those provided by WordPress, included an "IfModule" wrapper around them.

<IfModule mod_rewrite.c> 
# Place your mod_rewrite code here #
</IfModule>

If you are distributing a script or creating rules that maybe used in other places that use mod_rewrite, or any other module, you need to ensure that overrides test to see if the module is available before attempting to load themselves at run time. When you attempt to turn on the RewriteEngine on a server without for instance the mod_rewrite
module enabled, you will cause a 500 Internal Server Error every time a request is made to that directory or site depending on where the rule set is located. For obvious reasons this not a desirable result. To ensure that we don't inadvertently take sites down when unloading a module or using a module that may not be loaded at a later time we can surround all of the overrides with an IfModule conditional tag.The lines inside the conditional tag set will only be used if that particular module is enabled on that server.

Server Load Issues

You do not need to be a software engineer to realize the more work a computer has to do, the longer it will take. In most cases, the load impact of using mod_rewrite on your server will be insignificant. However, you should be sensible in determining where and when to make use of mod_rewrite. Bear in mind that your server will need to consider your rewrites every time a request is made; that is for every image, every file and every page loaded, every time. If you have one file you wish to send a 301 moved header from, create that file and use a script to send the header instead of a rewrite directive. For obvious reasons, using mod_rewrite in that circumstance would not be optimal, since you would force Apache to load the runtime and perform the substitution.

On high traffic servers, load becomes more of an issue. To reduce the impact of mod_rewrite there are two things you can do.

First, if it is possible place your rewrite codes in the Apache Configuration.

(Don't do this on Shared Environments unless you know what you are doing!)

Simply edit the VirtualHost section for each domain. You may need to update the paths in your rule sets if your rules use them. The downside to this is that you will need to restart the server and you will need to ensure that the changes are distilled or added to the proper includes to protect them in cPanel/Plesk environments. This may be a more preferable solution over the use of an .htaccess file as the Apache Configuration is compiled when Apache starts so there is no additional overhead, whereas an .htaccess file must be interpreted on every request.

Second, if you must use an .htaccess file place it as deep as possible in the site's structure.

If you are using mod_rewrite to stop hotlinking in a specific directory such as /html/images/site/protected/, place the .htaccess file in the protected directory instead of the root of your domain. That way it only needs to be interpreted when a request is made to that specific directory as opposed to being interpreted for every request, for every file, across the parent directories.

Regular Expression Back-References

Using parentheses in a Pattern or in one of the CondPatterns causes back-references to be internally created. These can later be referenced using the strings $N and %N (see below), for creating the Substitution and TestString strings. The image below attempts to show how the back-references are transferred through the process for later expansion. $N will be used as
part of a RewriteRule while %N is used as part of a RewriteCond.

mod_rewrite_fig2


The back-reference flow through a rule.

The substitution of a rewrite rule is the string which is substituted for (or replaces) the original URL which were Pattern matched. In addition to plain text, it can include

  1. back-references ($N) to the RewriteRule pattern
  2. back-references (%N) to the last matched RewriteCond pattern

Back-references are identifiers of the form $N ( N = 0..9 ), which will be replaced by the contents of the Nth group of the matched Pattern. The server-variables are the same as for the TestString of a RewriteCond directive.

Using Back-references

Using parentheses in a Pattern or in one of the CondPatterns causes back-references to be internally created. These can later be referenced using the strings $N and %N, based on where the reference is initially created. The back-references are transferred through the entire process for later expansion when they are called. $N will be used as part of a RewriteRule while %N is used as part of a RewriteCond. The substitution of a rewrite rule is the string which is substituted for (or replaces) the original URL which were Pattern matched.

  1. To reference data stored in a back-reference that was created as part of a RewriteRule pattern use $N where N is replaced by the number of the back-reference, such as $1.
  2. To reference data stored in a back-references that was created as part of a RewriteCond pattern use %N where N is replaced by the number of the back-reference, such as %1.

It is important to remember that data will only be available in a back-reference if it matches a pattern. If it does not match a pattern then the reference will be empty.

Back-reference in a RewriteRule

In this example we are simply using a back-reference within a specific rule to carry forward information in a request. In many cases you will find sites that will include a query string of some kind that is relevant to what you are attempting to rewrite. For instance if we look back at a previous example in which we are rewriting a page request with an article number to a page processor we can see how we might use a back-reference when only matching information in a RewritePattern.

RewriteRule ^articles/([0-9]+)-(.*)\.html$ article.php?id=$1 [L]

Break down

  1. Match request that start with articles/<some number>.html
    • Here notice that there are 2 references.
      • The first back-reference contains the number of the article we want to send to the page processor. ($1)
      • The second back-reference contains any additional information after the number such as an article title. ($2)
  2. Now that we have information stored in our 2 references we can pass the article number to the processor and ignore the rest.

Back-reference in a RewriteCond

In this example we are simply using a back-reference within a specific rule to carry forward information in a requested condition. In some cases you will find sites that will include some kind of information that may need to be used in another rewrite condition. For instance if we look at a an example in which we are rewriting a page request only if it contains a specific host request we can see how we might use a back-reference when only matching information in a RewriteCond.

RewriteCond %{HTTP_HOST} ^www.host(example).com [NC]
RewriteCond %{HTTP_HOST} ! ^www.%1fail.com [NC]

Break down

  1. Match requests that start with www.hostexample.com
    • Here there is only 1 reference that is part of this condition.
      • The reference here contains only a portion of the hostname, in this case example. (%1)
  2. Now that we have information stored in our references we can later call it in another condition. In this case we are telling the server that the request must not be for www.examplefail.com.

Back-reference Across Rules

Now lets say we want to combine the last two examples so that we are rewriting the request in a specific way if the request is for www.examplefail.com. There are clearly better ways to do what is in this example, but this is here simply to demonstrate how back-references can work across an entire rule set.

RewriteCond %{HTTP_HOST} ^www.(examplefail).com [NC]
RewriteRule ^articles/([0-9]+)-(.*)\.html$ http://%1.com/article.php?id=$1 [L]

Break down

  1. Match request that start with articles/<some number>.html
    • Here notice that there are 2 references.
      • The first back-reference contains the number of the article we want to send to the page processor. ($1)
      • The second back-reference contains any additional information after the number such as an article title. ($2)
  2. Now that we have matched a pattern, lets check the condition of the request.
    • If the host requested starts with www.examplefail.com then perform the substitution we have outlined in the RewriteRule
      • Here notice that there is only 1 reference in the RewriteCond
      • This reference contains examplefail. (%1)
  3. Now that we have information stored in our 3 references we can pass the article number to the processor, we can also write the request excluding the www portion of the url originally requested.

Basic Directives and Controls

Rewrite Engine Directive

The RewriteEngine directive enables or disables the runtime of the rewrite engine. If it is set to off this modules runtime is not loaded at all. It does not even update the SCRIPT_URx environment variables. Use this directive to disable the module instead of commenting out all the RewriteRule directives !

RewriteEngine On

Note: By default, rewrite configurations are not inherited. This means that you need to have a RewriteEngine On directive for each virtual host in which you wish to use it.

Rewrite Base Directive

The RewriteBase directive explicitly sets the base URL for per-directory rewrites. As you will see below, RewriteRule can be used in
per-directory configuration files (.htaccess). In such a case, it will act locally, stripping the local directory prefix before processing, and applying rewrite rules only to the remainder of the request. When processing is complete, the prefix is automatically added back to the path. The default setting is; RewriteBase physical-directory-path

When a substitution occurs for a new URL this module has to re-inject the URL into the server while processing the request.

  • To be able to do this it needs to know what the corresponding URL-prefix or URL-base is.
  • By default this prefix is the corresponding filepath itself.
  • For most websites URLs are NOT directly related to physical filename paths, so this assumption will often be wrong!
  • You can use the RewriteBase directive to specify the correct URL-prefix.

If your webserver's URLs are not directly related to physical file paths, you will need to use RewriteBase in every .htaccess file where you want to use RewriteRule directives.

For example, assume the following per-directory configuration file:

#
#  /abc/def/.htaccess -- per-dir config file for directory /abc/def
#  Remember: /abc/def is the physical path of /xyz, _i.e._, the server
#            has a 'Alias /xyz /abc/def' directive _e.g._
#

RewriteEngine On

#  let the server know that we were reached via /xyz and not
#  via the physical path prefix /abc/def
RewriteBase   /xyz

#  Now the rewriting rules
RewriteRule   ^oldstuff\.html$  newstuff.html

In the above example, a request to /xyz/oldstuff.html gets correctly rewritten to the physical file /abc/def/newstuff.html.

The following list gives detailed information about the internal processing steps:

Request:
  /xyz/oldstuff.html

Internal Processing:
  /xyz/oldstuff.html     -> /abc/def/oldstuff.html  (per-server Alias)
  /abc/def/oldstuff.html -> /abc/def/newstuff.html  (per-dir    RewriteRule)
  /abc/def/newstuff.html -> /xyz/newstuff.html      (per-dir    RewriteBase)
  /xyz/newstuff.html     -> /abc/def/newstuff.html  (per-server Alias)

Result:
  /abc/def/newstuff.html

This seems to be somewhat complicated, but is in fact the correct Apache internal processing flow.

  • The per-directory rewrites come late in the process.
  • The rewritten request has to be re-injected into the Apache kernel, as if it were a completely new request.

This is not as much overhead as it may appear to be since this re-injection is completely internal to the daemon and this same procedure is in fact used by many other operations within Apache.

Rewrite Options

The RewriteOptions directive sets special options for the current per-server or per-directory configuration. These options modify the way in which mod_rewrite functions including inheritance as well as whether or not the engine is enabled.

Inherit

This option forces the current configuration to inherit the configuration of the parent. In a per-virtual-server context, this means that the maps, conditions and rules of the main server are inherited. In a per-directory context this means that conditions and rules of the parent directory's .htaccess configuration or Directory sections are inherited. The inherited rules are virtually copied to the section where this directive is being used. If this is used in combination with local rules on the system, the inherited rules are copied behind the local rules. The position of this directive, below or above the local rules, has no influence on this behaviour. If the local rules forced the rewriting to stop, the inherited rules won't be processed.

Rules inherited from the parent are applied after rules specified in the child.

RewriteOptions Inherit

InheritBefore

Like Inherit above, the rules from the parent are applied to the child, but in reverse. By default the parent rules are applied after the local rule set. When this directive is set the parent rules will be applied before those of the children. This particular option is only available in Apache 2.3.10 and later.

RewriteOptions InheritBefore

AllowNoSlash

By default, mod_rewrite will ignore URLs that map to a directory on disk but lack a trailing slash, in the expectation that the mod_dir module will issue the client with a redirect to a canonical URL that contains a trailing slash. The AllowNoSlash option can be enabled to ensure that rewrite rules are not ignored when they are missing the trailing slash. This option makes it possible to apply rewrite rules within .htaccess files that match the directory without a trailing slash. This particular option is only available in Apache 2.4.0 and later.

RewriteOptions AllowNoSlash

AllowAnyURI

When RewriteRule is used in a VirtualHost or server context with Apache 2.2.22 or later, mod_rewrite will only process the rewrite rules if the request URI is a URL-path. This avoids some security issues where particular rules could allow "surprising" pattern expansions (see CVE-2011-3368 and CVE-2011-4317). To lift the restriction on matching a URL-path, the AllowAnyURI option can be enabled, and mod_rewrite will apply the rule set to any request URI string, regardless of whether that string matches the URL-path grammar required by the HTTP specification.

RewriteOptions AllowAnyURI

Rewrite Rules

Rewrites are done according the rules you specify and the rules are read in order from the top down. Bare in mind that Rules are checked for pattern matches first, once a pattern match is found the Rewrite Engine checks for conditions before attempt to apply the rule. Creating a rule is very simple, but can obviously get more complex with the use of Regular Expressions, Conditions, Replacement Variables and special flags. You create a rule using the follow syntax:

RewriteRule PATTERN DESTINATION

Requests that match this PATTERN will be rewritten to the DESTINATION.

Note: The format of the request that is compared against the PATTERN will be the requested filename on the server without the host, query string, or starting forward slash.

For example, if you have a RewriteRule code in domain.com/.htaccess and you make a request for the URL:

http://domain.com/dir/file.php?somequery=string

The string that will be compared to the PATTERN is just

dir/file.php

If you need to access the query string for your rewrite rules, there is a
special flag which is further down in the Rewrite Flags Section.

At the most basic level, our PATTERN can be a simple string such as:

RewriteRule old.html new.html

This would rewrite old.html to new.html, thus when a user requests the old.html page, they would be shown new.html.
Note: This alone is not a good way of moving pages but it illustrates the point. To move a page you would also want to include the [R] flag, which is better outlined in the section on Rewrite Control Flags.

It would not be practical to create a Rewriterule for every page if you are attempting to move a large number of pages to a new location. This is where regular expressions will come into play. You may be familiar with wildcard notation (*) to match anything; a regular expression is merely an extension of this that allows us to create more specific search strings that can be applied to a data set.

Rewrite an Article ID to articles.php

You have articles stored in a database and a PHP script that retrives them (Similar to WordPress). The script articles.php takes the article ID in a GET parameter, for example http://domain.com/articles.php?id=24.

  • We can write a rule to put this into a nicer format, that will make a little more sense to crawlers and vistors.
  • We change our script to output the URLs in the format http://domain.com/articles/24
  • We can now come up with a RewriteRule that will rewrite such a request back to the script including the article ID, like the one below.
RewriteRule ^articles/24$ articles.php?id=24

Note: The original example did not use ^ and $ characters to anchor the string for the sake of simplicity, at this stage you should be somewhat familiar with its meaning. (articles/24 alone would match /a_different_dir/articles/24.) Obviously that will give us undesired results, so we must anchor the string by starting with (^) and ending with ($) symbols to ensure we only match requests that we want.

What if the Article ID isn't static?

Now our regular expression patterns are going to be useful. We can match any number in the request by using a range instead of a literal string.

RewriteRule ^articles/([0-9]+)$ articles.php?id=$1

Our parenthesis around the regular expression create a case which can be used as a back reference in the rule. This means that we are capturing in the information in brackets and storing them so that we can call them again during the Rewrite procedure. By itself, being able to create patterns that match an unknown is of limited use. We want to be able to use the unknown in our destination and we can do that with a back reference. These take the form of $N in a RewriteRule, where the value that was matched in the case is $1. Referring to a back reference is done differently in Conditions as %N, that is because back references are stored and used in the entire procedure.

Pretty URLs

Search Engine Optimization specialist and users alike prefer URLs that are easy to read, type, and use. You cannot however use mod_rewrite to place your keywords or article titles in the URL but, you can alter your script to output URLs in that format and use mod_rewrite to rewrite the request back to your script with the appropriate variable data.

For instance, what if we wanted to set the site up in such a way that we could refer to an article by title.

RewriteRule ^articles/([0-9]+)-(.*)\.html$ article.php?id=$1 [L]
  • This rule matches our previous example in that it will grab the article ID and return it to the script. That information is stored in the first back reference ($1).
  • Now the articles title is additional information that we don't really need to direct to our script, but it is still part of the string we have anchored and it makes our visitors a little more comfortable while navigating the site.
  • The second back reference will now store the articles title, ($2), however we do not need them in the destination that the server must process.
  • Here we are just using the parenthesis to group the (.*) expression.

This rule will now accept articles/24[ANYTHING CAN GO HERE].html and rewrites the required information to our script.

To create more complex or effective Conditions and Rules Review the RewriteRule Flags, RewriteCondition Operators, RewriteCondition Flags, and Rewrite Engine Variables in this Course.

RewriteRule Control Flags

All rewrite rules are applied to the Substitution (in the order in which they are defined in the configuration file). The URL is completely replaced by the Substitution= and the rewrite process continues until all of the rules have been applied, or it is explicitly terminated by the L flag. These flags allow us to specify how substitutions are applied as well as force the server to return specific types of responses such as 301, 302, 403.

The [FLAGS] are a third argument to the RewriteRule, they are surrounded by brackets, below are some of the flags and examples of those flags. The RewriteRule directive can be supplemented with a number of comma separated flags to modify the nature of the rule.

FlagDescriptionExample
CChain the current rule with the next rule. If a rule matches processing continues, else all further chained rules are ignored.RewriteRule ^index.html$ substituted.html [C]
RewriteRule ^substituted.html$ further-substituted.html [C]
RewriteRule ^further-substituted.html$ final-substitution.html
CO= name:value:domain[lifetime:path]Forces the response to be a specified MIME-type.RewriteRule index.html - [CO=foo:bar:domain.com:14400:/]
E= VAR:VALSet an environmental variable where VAR is the name of the variable and VAL is the value. The value may contain regular expression and back references.RewriteRule .* - [E=foo:bar]
FReturns a 403 response (forbidden).RewriteRule ^private-area - [F]
GReturns a 410 response (gone).RewriteRule ^old-page.html - [G]
H= handlerSpecifies what should handle the request.RewriteRule index.cgi - [H=cgi-script]
LStops the rewriting process - no more rules are processed.RewriteRule ^.* index.py?request=$0 [L]
NStart the rewriting process again from the first rule with the current substituted URL.RewriteRule ^index.html$ substituted.html [N]
NCThis makes the Pattern case-insensitive.RewriteRule ^INDEX.HTML$ index.html [NC]
NEBy default special characters (%, $ etc.) are replaced with their hexadecimal equivalents. The NE flag turns this escaping off.RewriteRule ^index.html$ hello%world.html [NE]
NSThe rule will be skipped if the request is an internal sub-request, i.e. if substitutions have already been made.RewriteRule ^.*$ further-substituted.html [NS]
PRequested is routed through Apache proxy module. The substitution must begin with a valid host.RewriteRule ^proxy.html http://proxyhost/proxy.html[P]
PTFacility to allow the post-processing of RewriteRule directives.RewriteRule ^hidden/ secret/ [PT]
Alias /secret /var/www/secret-files
QSAAn acronym for query string append, this flag forces the query string part of existing one to be appended to the substitution.RewriteRule ^page\.php$ /target.php?bar=baz [QSA,L]
R[=code]Forces a redirect to the substitution URL. Code (HTTP header status code) may be in the range of 300-400 and defaults to 302 (moved temporarily). Using 301 (moved permanently) will cause search engines to transfer the target of incoming links.RewriteRule ^olg-page.html$ new-page.html [R=301]
S[=integer]When the current rule matches, the next specified number of rules will be skipped.RewriteRule ^index.html$ new.html [S]
T= MIME-typeForces the response to be a specified MIME-type.RewriteRule ^jpeg/(.+)$ jpeg.pl?=src=$1 [T=image/jpeg]

The [L] Flag and Infinite Loops

The following only applies when mod_rewrite is used in a .htaccess (Per-Directory Context) file. The [L] flag behaves exactly as expected when used in httpd.conf (Server Level Context).

The [L] flag will tell Apache to stop processing the rewrite rules for that request. Now what is often unrealized is that it will now make a new request for the newly rewritten file name and begin processing the rewrite rules again. Therefore, if you were to do a rewrite where the destination is still a match to the pattern, it will not behave as desired. In these cases, you should use a RewriteCond to exclude the destination file from the rule.

Rewrite everything to single script

Assume you are using a CMS system that rewrites requests for everything to a single index.php script.

RewriteRule ^(.*)$ index.php?PAGE=$1 [L,QSA]

Every time you go to the page, regardless of which file you request, the PAGE variable always contains index.php. This is because Apache will end up doing two rewrites instead of a single one as you might have intended because the request continues to match the pattern of the Rule.

  • First a request to test.php will get rewritten to index.php?PAGE=test.php
  • A second request is now made for index.php?PAGE=test.php as a result of the original substitution.

While this looks like the desired result there are unintended side effects in this case. The new request will still match your original rewrite pattern and because it does match the pattern, mod_rewrite will in turn rewrite your request again since there are no conditions in place to tell it otherwise. In this case the new substitution will result in the following URI index.php?PAGE=index.php

One possible solution to this problem would be to add a RewriteCond that checks if the request is already "index.php". An even better solution that will also allow you to keep images and CSS files in the same directory is to use a RewriteCond that checks if the file exists. This can be done by using the -f.

See RewriteConditions and RewriteCondition Operators for additional details.

Rewrite by exclusion

In this example we want to rewrite a certain request with the first RewriteRule, then rewrite the rest somewhere else.

RewriteRule ^articles/([0-9]+)-(.*)\.html$ article.php?id=$1 [L] 
RewriteRule .* index.php

In this example articles/ will be rewritten to article.php. Once the request is rewritten to article.php the next rule will rewrite the request to index.php. Again, this is not the desired effect, we will need to use conditional statements to qualify when the rules should be applied.

Special Substitution ( - )

There is a special operator that may be used with a RewriteRule which will tell the system that no substitution will be performed. The operator is simply a ( - ). You might use something like this to set a special variable or perform a specific action such as returning a 403 Forbidden error for a specific request where rewriting the request is not required.

RewriteRule .* - [F,L]

Modifying the Query String

By default, the query string is passed through unchanged. You can, however, create URLs in the substitution string containing a query string part. Simply use a question mark inside the substitution string to indicate that the following text should be re-injected into the query string. When you want to erase an existing query string, end the substitution string with just a question mark. To combine new and old query strings, use the [QSA] flag.

#Keep original query (default behavior)

RewriteRule ^page\.php$ /target.php [L]

# from http://example.com/page.php?foo=bar
# to   http://example.com/target.php?foo=bar

#Discard original query

RewriteRule ^page\.php$ /target.php? [L]

# from http://example.com/page.php?foo=bar
# to   http://example.com/target.php

#Replace original query

RewriteRule ^page\.php$ /target.php?bar=baz [L]

# from http://example.com/page.php?foo=bar
# to   http://example.com/target.php?bar=baz

#Append new query to original query

RewriteRule ^page\.php$ /target.php?bar=baz [QSA,L]

# from http://example.com/page.php?foo=bar
# to   http://example.com/target.php?foo=bar&bar=baz

RewriteConditions

The RewriteCond directive defines a rule condition. One or more RewriteCond can precede a RewriteRule directive. The following rule is then only used if both the current state of the URI matches its pattern, and if these conditions are met. What about rewriting depending on referrer, query string, IP address, whether or not the file exists? We can do this and more by extending our rules using the RewriteCond directive.

Again a STRING must be a server variable or a backreference from previous a RewriteCondition and a CONDITION can be another regular expression, similar to a RewriteRule. Additionally, it can take special variants of the normal regular expression pattern.

RewriteCond STRING CONDITION

Prevent an IP address from accessing your site

RewriteCond %{REMOTE_ADDR} 123.45.67.89
RewriteRule .* you-are-banned.html [R]

This is a simple example of a conditional directive to start with. This condition is applied only if your IP is 123.45.67.89. If the IP matches the condition we show the user a "you are banned" HTML page. The Rewrite rule in this case is designed to match all request, the condition is what creates specificity. As you can see in the Rewrite Engine Variables section %{REMOTE_ADDR} will contain the IP address of the user making the request, that variable is then matched against the pattern provided returning, in this case, a true condition.

Rewrite the Homepage of a site according to the "User-Agent:''

RewriteCond  %{HTTP_USER_AGENT}  ^Mozilla
RewriteRule  ^/$                 /homepage.max.html  [L]

RewriteCond  %{HTTP_USER_AGENT}  ^Lynx
RewriteRule  ^/$                 /homepage.min.html  [L]

RewriteRule  ^/$                 /homepage.std.html  [L]

In this example if you use a browser which identifies itself as 'Mozilla' (including Netscape Navigator, Mozilla etc), then you get the max homepage (which could include frames, or other special features). If you use the Lynx browser (which is terminal-based), then you get the min homepage (which could be a version designed for easy, text-only browsing). If neither of these conditions apply (you use any other browser, or your browser identifies itself as something non-standard), you get the std (standard) homepage.

To create more complex or effective Conditions and Rules Review the RewriteRule Control Flags, RewriteCondition Operators, RewriteCondition Flags, and Rewrite Engine Variables in this Course.

RewriteCondition Operators

OperatorDescriptionExample
<Is lexically lower.RewriteCond directory-a <directory-b
>Is lexically greater.RewriteCond directory-b >directory-a
=Is lexically equal.RewriteCond directory-c =directory-c
!Is not equal or inverseRewriteCond is not equal to the listed condition
-dIs a directory.RewriteCond /directory/ -d
-fIs a file.RewriteCond /path/index.html -f
-sTests if the test-string exists and has a file size of more than 0 bytes.RewriteCond /path/index.html -s
-FIs existing file via subrequest.RewriteCond /path/index.html -F
-UIs existing URL via subrequest.RewriteCond /path/index.html -U

* Of or relating to the vocabulary, words, or morphemes of a language.

RewriteCondition Flags

There are a short list of Flags available to rewrite conditions that may be used to control how an individual condition is analyzed the by the server.

FlagDescriptionExample
NCTest of test-string and pattern becomes case-insensitive.RewriteCond localhost LOCALHOST [NC]
ORCombines rules with OR as opposed to AND.RewriteCond %{REMOTE_HOST} ^localhost [OR]
RewriteCond %{REMOTE_HOST} ^remotehost

Rewrite Engine Variables

These are a list of some of the less common Engine variables that can be accessed for both pattern matching and conditions. These variables allow you to access a number of environment variables including the Server port number, the State of SSL, the User Agent and more. One place you may see some of these specific variables are in situations in which a user wants to force SSL or in cases in which special applications such as GoMobi are in use.

NameDescriptionExample Output
%{HTTP_USER_AGENT}The user agent string of the client that sent the request.Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.3) Gecko/2008101315 Linux Mint/6 (Felicia) Firefox/3.0.3
%{HTTP_REFERER}The referring document.http://localhost/path/
%{HTTP_COOKIE}The contents of any cookies set for the request.SID=58301702dfcb380ebe1e93f28bcbbc1a
%{HTTP_FORWARDED}Typically the IP address of the client that is using a proxy server.123.123.123.123
%{HTTP_HOST}The name of the remote host being requested.localhost
%{HTTP_PROXY_CONNECTION}Timeout status of HTTP proxy connection.keep-alive
%{HTTP_ACCEPT}Lists which media types are acceptable for the response.text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
%{HTTPS}Will contain the text "on" if the connection is using SSL/TLS, or "off" otherwise. (This variable can be safely used regardless of whether or not mod_ssl is loaded).on
%{REMOTE_ADDR}IP address of the client that sent the request.123.123.123.123
%{REMOTE_HOST}IP address of the client that sent the request.123.123.123.123
%{REMOTE_USER}Username of access authenticated client.bob
%{REMOTE_IDENT}The user making the request as specified by identd or similar daemon.?UTF-8?b?0YHQsNGFNORDgA=?=
%{REQUEST_METHOD}The method used by the client in the request. See rfc2616 for HTTP 1.1 request methods.GET
%{SCRIPT_FILENAME}The full file-path to the file/directory being requested./var/www/path/script.py
%{PATH_INFO}The remainder of the request URL's path./
%{QUERY_STRING}Content of the query string, i.e. part of the URL after the question mark and before the anchor.?foo=bar&var=64
%{AUTH_TYPE}The type of authentication used (if any). Typically basic or digest.basic
%{DOCUMENT_ROOT}The root directory as defined by the DocumentRoot directive./var/www/
%{SERVER_ADMIN}The email address for the server administrator.webmaster@localhost
%{SERVER_NAME}The servers host name, DNS alias or IP address.localhost
%{SERVER_ADDR}The IP address of the server handling the request.127.0.0.1
%{SERVER_PORT}The port number on the server that is handling the request.80
%{SERVER_PROTOCOL}The name and the version of the protocol of the request.HTTP/1.1
%{SERVER_SOFTWARE}The name and version of the software handling the request.Apache/2.2.9 (Ubuntu) PHP/5.2.6-2ubuntu4.2 with Suhosin-Patch
%{TIME_YEAR}Four digit representation of the year.2009
%{TIME_MON}Two digit representation of the month.07
%{TIME_DAY}Two digit representation day of the month.04
%{TIME_HOUR}Two digit representation of the hour.02
%{TIME_MIN}Two digit representation of the minute.34
%{TIME_SEC}Two digit representation of the second.54
%{TIME_WDAY}One digit representation of the day of the week, 1-7, Monday is 1.6
%{TIME}Current timestamp.20090704023524
%{API_VERSION}Apache API version.20051115:15
%{THE_REQUEST}The full request method header.GET /path/script.py HTTP/1.1
%{REQUEST_URI}The full URL requested./path/script.py
%{REQUEST_FILENAME}The full filename requested.script.py
%{IS_SUBREQ}true (string) when if the current request is a sub-request.false

Other Variables

SyntaxDescriptionExample
%{HTTP:name}Access to HTTP request headers.%{HTTP:Accept-Charset}
%{ENV:name}Access to environmental variables.%{ENV:APACHE_PID_FILE}
%{SSL:name}Access to mod_ssl variables.%{SSL:SSL_SERVER_CERT}
%{LA-U:name}Access to positive URL-based look-ahead variables.%{LA-U:REMOTE_USER}
%{LA-F:name}Access to positive file-path-based look-ahead variables.%{LA-F:REMOTE_USER}

A Quick Way to See $_Server Vars

In some cases it may be difficult to determine the REQUEST_URI or other components you may be attempting to match. While rewrite logging will certainly give this information to you it may not be practical to restart apache service and enable logging. A simple solution to this may be to create a php script that will return the server variables. This of course can only be done in environments in which php is available or if there is another language available which can provide similar information to you. You can use the code below to create a simple php script which will show you the php environment variables, but more importantly all of the server variables which you may be attempting to match with a RewriteRule pattern or condition.

<?php

print "Server Variables : <br/><pre>" ; 
print_r($_SERVER);
print "</pre>";

print "PHP Info : <br/>";
phpinfo();

?>

Additional Examples Including Break-downs

WordPress Rewrite Rules and Conditions

# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>
# END WordPress

WordPress is one of the most prolific and common systems we see in our environment, that being the case it is a good idea to understand the way in which mod_rewrite interacts with WordPress. Like most Content Management Systems (CMS) WordPress has a built-in engine designed to handle the information being passed to it in different ways and is able to generate URL in various formats. These different formats that can be generated make WordPress SEO friendly and user friendly right out of the box. In this case mod_rewrite only plays a very small role in the way WordPress handles content retrieval and URL generation but it is still just as important. Keep in mind that if you were to feed the Title of an article to Apache as part of a URL, you would most likely receive a 404 error since it would be unable to locate that file. We use mod_rewrite in this case to redirect the File Request to index.php which is the script which makes calls to the WordPress classes and functions. This in turn processes the article title and loads the appropriate information that is being requested. Below is a step by step by step break down of the rules which should help you better understand how mod_rewrite works in conjunction with WordPress to present the information that users intended regardless of the URL formatting they may choose to use.

Break down

  1. If the Module mod_rewrite is loaded then proceed with the rule set
  2. Turn the Rewrite Engine On
  3. Set the Base Rewrite location to /
  4. Scan for the pattern (.)
  5. This is the dot-all in regular expressions
    Redirect to index.php
    If and only if the following conditions are met.

     

    • Request Filename is NOT a file
    • Request Filename is NOT a directory
  6. Then stop processing all directives once the request is rewritten to index.php [L]

Joomla Rewrite Rules and Conditions

## Can be commented out if causes errors, see notes above.
Options +FollowSymLinks

## Mod_rewrite in use.

RewriteEngine On

## Begin - Rewrite rules to block out some common exploits.
# If you experience problems on your site block out the operations listed below
# This attempts to block the most common type of exploit `attempts` to Joomla!
#
# Block out any script trying to base64_encode data within the URL.
RewriteCond %{QUERY_STRING} base64_encode.*\(.*\) [OR]
# Block out any script that includes a <script> tag in URL.
RewriteCond %{QUERY_STRING} (\<|%3C).*script.*(\>|%3E) [NC,OR]
# Block out any script trying to set a PHP GLOBALS variable via URL.
RewriteCond %{QUERY_STRING} GLOBALS(=|\[|\%[0-9A-Z]{0,2}) [OR]
# Block out any script trying to modify a _REQUEST variable via URL.
RewriteCond %{QUERY_STRING} _REQUEST(=|\[|\%[0-9A-Z]{0,2})
# Send all blocked request to homepage with 403 Forbidden error!
RewriteRule ^(.*)$ index.php [F,L]
#
## End - Rewrite rules to block out some common exploits.

##
# Uncomment following line if your webserver's URL
# is not directly related to physical file paths.
# Update Your Joomla! Directory (just / for root).
##

RewriteBase /

## Begin - Joomla! core SEF Section.
#
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !^/index.php
RewriteCond %{REQUEST_URI} (/component/) [OR]
RewriteCond %{REQUEST_URI} (/|\.php|\.html|\.htm|\.feed|\.pdf|\.raw|/[^.]*)$  [NC]
RewriteRule (.*) index.php
RewriteRule .* - [E=HTTP_AUTHORIZATION:%{HTTP:Authorization},L]
#
## End - Joomla! core SEF Section.

Joomla like WordPress is fairly prolific on our network, Joomla also has a similar built in URL encoding and decoding engine. Again mod_rewrite itself cannot place article titles in the URL alone as it would match a different location on disk. So we use a system in this case called SEF to handle the Query strings and pass the correct information to the index file. The index file in Joomla like WordPress also loads all of the classes and functions that produce the content users expect to see when making a request. When a request is made we want to pass the correct query string to the application so that it can pull the appropriate data from its internal resources.

Break down

  1. Turn the Rewrite Engine On
  2. Scan for the pattern ^(.*)$
  3. This is the dot-all in regular expressions. The (^) and ($) characters anchor the string attached
    Send all blocked request to index.php with a 403 Forbidden error.
    Then stop processing all directives [L]
    If and only if the following conditions are met.

     

    • The Query String includes base64 encoded data OR
    • The Query String includes <script> NO CASE, OR
    • The Query String includes an attempt to set a Global Variable via the URL OR
    • The Query String includes an attempt to modify the _REQUEST variable via the URL
  4. Scan for the pattern (.*)
  5. This is the dot-all in regular expressions in a case
    Redirect to index.php
    If and only if the following conditions are met.

     

    • The requested Filename is NOT a file
    • The requested Filename is NOT a directory
    • The Request URI does NOT start with /index.php
    • The Request URI does include /component/ OR
    • The Request URI includes / or .html or .feed or .pdf or .raw or /[Starts with Anything] plus Anything else including the string, NO CASE
  6. Scan for the pattern .*
  7. This is the dot-all in regular expressions
    Perform NO substitutions
    Set the Environment Variable HTTP_AUTHORIZATION to what is stored in Apache Request Header called Authorization (See Other Variables Section)
    Then stop processing all directives [L]

Sources

http://httpd.apache.org/docs/2.0/mod/mod_rewrite.html
Apache.org. Apache Module mod_rewrite (2011). Retrieved July 08, 2012

http://semlabs.co.uk/journal/mod_rewrite-quick-reference-and-cheat-sheet/
David. SEM Labs. mod_rewrite Quick Reference and Cheat Sheet. Retrieved July 11, 2012

http://www.easymodrewrite.com/guide-syntax
Owen, Dave. Mediacollege.com. Apache : Mod Rewrite : Start Rewriting. Retrieved July 09, 2012.

http://www.sitepoint.com/guide-url-rewriting/
Turcsanyi, Tamas (2002, October 22.) Sitepoint. mod_rewrite: A Beginner’s Guide to URL Rewriting Article. Retrieved July 10, 2012.