Tuesday, September 24, 2019

apache - Reference: mod_rewrite, URL rewriting and "pretty links" explained

Answer


Answer





"Pretty links" is an often requested topic, but it is rarely fully explained. mod_rewrite is one way to make "pretty links", but it's complex and its syntax is very terse, hard to grok, and the documentation assumes a certain level of proficiency in HTTP. Can someone explain in simple terms how "pretty links" work and how mod_rewrite can be used to create them?



Other common names, aliases, terms for clean URLs: RESTful URLs, user-friendly URLs, SEO-friendly URLs, slugging, and MVC URLs (probably a misnomer)


Answer



To understand what mod_rewrite does you first need to understand how a web server works. A web server responds to HTTP requests. An HTTP request at its most basic level looks like this:



GET /foo/bar.html HTTP/1.1


This is the simple request of a browser to a web server requesting the URL /foo/bar.html from it. It is important to stress that it does not request a file, it requests just some arbitrary URL. The request may also look like this:




GET /foo/bar?baz=42 HTTP/1.1


This is just as valid a request for a URL, and it has more obviously nothing to do with files.



The web server is an application listening on a port, accepting HTTP requests coming in on that port and returning a response. A web server is entirely free to respond to any request in any way it sees fit/in any way you have configured it to respond. This response is not a file, it's an HTTP response which may or may not have anything to do with physical files on any disk. A web server doesn't have to be Apache, there are many other web servers which are all just programs which run persistently and are attached to a port which respond to HTTP requests. You can write one yourself. This paragraph was intended to divorce you from any notion that URLs directly equal files, which is really important to understand. :)



The default configuration of most web servers is to look for a file that matches the URL on the hard disk. If the document root of the server is set to, say, /var/www, it may look whether the file /var/www/foo/bar.html exists and serve it if so. If the file ends in ".php" it will invoke the PHP interpreter and then return the result. All this association is completely configurable; a file doesn't have to end in ".php" for the web server to run it through the PHP interpreter, and the URL doesn't have to match any particular file on disk for something to happen.




mod_rewrite is a way to rewrite the internal request handling. When the web server receives a request for the URL /foo/bar, you can rewrite that URL into something else before the web server will look for a file on disk to match it. Simple example:



RewriteEngine On
RewriteRule /foo/bar /foo/baz


This rule says whenever a request matches "/foo/bar", rewrite it to "/foo/baz". The request will then be handled as if /foo/baz had been requested instead. This can be used for various effects, for example:



RewriteRule (.*) $1.html



This rule matches anything (.*) and captures it ((..)), then rewrites it to append ".html". In other words, if /foo/bar was the requested URL, it will be handled as if /foo/bar.html had been requested. See http://regular-expressions.info for more information about regular expression matching, capturing and replacements.



Another often encountered rule is this:



RewriteRule (.*) index.php?url=$1


This, again, matches anything and rewrites it to the file index.php with the originally requested URL appended in the url query parameter. I.e., for any and all requests coming in, the file index.php is executed and this file will have access to the original request in $_GET['url'], so it can do anything it wants with it.




Primarily you put these rewrite rules into your web server configuration file. Apache also allows* you to put them into a file called .htaccess within your document root (i.e. next to your .php files).



* If allowed by the primary Apache configuration file; it's optional, but often enabled.





mod_rewrite does not magically make all your URLs "pretty". This is a common misunderstanding. If you have this link in your web site:







there's nothing mod_rewrite can do to make that pretty. In order to make this a pretty link, you have to:




  1. Change the link to a pretty link:





  2. Use mod_rewrite on the server to handle the request to the URL /my/pretty/link using any one of the methods described above.





(One could use mod_substitute in conjunction to transform outgoing HTML pages and their contained links. Though this is usally more effort than just updating your HTML resources.)



There's a lot mod_rewrite can do and very complex matching rules you can create, including chaining several rewrites, proxying requests to a completely different service or machine, returning specific HTTP status codes as responses, redirecting requests etc. It's very powerful and can be used to great good if you understand the fundamental HTTP request-response mechanism. It does not automatically make your links pretty.



See the official documentation for all the possible flags and options.


No comments:

Post a Comment

hard drive - Leaving bad sectors in unformatted partition?

Laptop was acting really weird, and copy and seek times were really slow, so I decided to scan the hard drive surface. I have a couple hundr...