/home/josephspurrier

The Trailing Slash Solution

After much deliberation, I’ve come up a logical way to handle the trailing slash in website URLs. I’ll step through the evolution of the web application and end with a proposal. The Apache and Nginx Configurations are also available.

URL Rules

In order the fully understand how a web server retrieves web content, I’m going to set a few ground rules so the process makes sense. These are designed to made it easy to understand what you are accessing as well as how you should expect to access resources online.

Static Applications

In a perfect world, this shows what should happen when a user interacts with a web server that delivers only static content.

* Request *                   * Response *    * Method *              * Comment *
http://example.com/           -> File:        GET /index.htm          # Is pointing to a directory so it must return a file
http://example.com            -> Redirect:    301 /                   # If above returns a file, must redirect
http://example.com/index.htm  -> Redirect:    301 /                   # If the file is already accessible, must redirect
http://example.com/archive.htm-> File:        GET /archive.htm        # Return the file
http://example.com/archive/   -> File:        GET /archive/index.htm  # Is pointing to a directory so it must return a file

Based on this example, it is not a question of whether trailing slashes should be used, it is a question of where they should go. It is obvious that trailing slashes represent directories. Directories do not contain content like files (only show items) so we will set a default document (index.htm) to return content instead.

Now, here is where it gets complicated. We don’t use static applications like we used to. We’ve grown up and now need scalable systems that can withstand millions of tweets per second. Tweets that must get on the internet or the whole idea of freedom of speech is tossed out the window. I digress.

When we first started using dynamic applications, we sent all requests to a single file (if using the Front Controller Pattern) and used the query string to determine which resource to return. The query string did not really use slashes so it did not conflict with the file system.

Then, someone came up with a wonderful idea, “Let’s rewrite the URLs so they are pretty!” And now we have a problem. Pretty URLs use slashes. Slashes mean directories. Now what? When using dynamic applications, we have to tell the web server to check for the existence of files and directories BEFORE sending the request to /index.php. This means that if a file exists on the web server at /archive, then the dynamic application will NOT get to show its dynamic content. We are breaking the rules.

Now that we know how we got into this mess, how do we clean it up?

Proposed Solution for Trailing Slash Usage

We move index.php out of the root directory, remove the check for a directory, and add a trailing slash to every dynamic resource.

http://example.com/           -> File:        GET /../index.php           # Is pointing to a directory so it must return a file
http://example.com            -> Redirect:    301 /                       # If above returns a file, must redirect.
http://example.com/index.php  -> File:        GET /index.php              # Return the static content
http://example.com/archive    -> File:        GET /archive                # Return the static content
http://example.com/archive/   -> File:        GET /../index.php?q=archive # Return the dynamic content

Now every file is accessible and every directory will instead return the dynamic resource. If something does not exist, it should rewrite to index.php which is what will return a 404 is the dynamic content also does not exist. The example follows all the rules and answers the question: “When should I use trailing slashes?”

Note: Of course, this won’t work with Apache because it cannot serve the initial index.php if it is outside the web root folder. It will also require changing the way people use the internet (good luck with that). If I tell you go visit http://example.com/coolstuff/ and you go to http://example.com/coolstuff (without the trailing slash), you will not get the correct content. It would be the equivalent of spelling cool with a ‘k’. The importance of the slash was downplayed because there was no Internet Instruction Manual. You just type in a URL and the website pops up. You want to make your website as accessible as possible so a few misplaced characters still brings up the content you want. In my opinion, we should provide better 404 pages (which Apache does do with mod_speling) when someone lands on the wrong page. Great for SEO (only one way to get to every page) and great for users (404 - Couldn’t find your page, but the page your looking for may be in this list…)

In summary, my vote is to use the trailing slash for web applications that serve up dynamic content. Index.php will have to stay where it is and be an exception. Here are the Apache and Nginx Configurations and here are the important points:

#apache #nginx