Custom error pages

This is an old post. It may contain broken links and outdated information.

We gain a lot of quick flexibility with running Nginx on our back-end. I’m sure that there are ways to make Ruby & Ruby Rack do lots of nifty things, including fancy rewrites (and indeed theres a good amount of information out there to show you how to do just that), but it’s always going to be faster and more efficient to let the web server handle redirects and rewrites where possible, since it can do those kinds of things without having to pass information up and down the stack to another process.

Rewrites are a rich subject, but none are really necessary for hosting a simple blog. If you’re migrating to Octopress and Nginx from something else and you’ve got a significant amount of posts & history to bring along with you, you might indeed want to spend some time with Nginx’s rewrite module, which isn’t as deep as Apache’s but which is more than capable of addressing nearly every need. Rewrites are such a rich subject, in fact, that I’m not going to go into them any more than I already have. Chances are if you need to set up a bunch of rewrite rules, you’ve already dug up a bunch of other tutorials. For a single-user site with no history or cruft to bring forward, rewrites aren’t necessary; Octopress’s simple static layout is already “SEO friendly,” or at least uses human-parsable URIs.

However, error pages are something that we can quickly and easily customize. Nginx (as with most web servers) lets you specify custom error pages; the problem at first glance is generating a custom error page with the same look and feel as your Octopress blog, and that changes as your blog changes. Fortunately, Octopress’s ability to generate generic pages as well as blog posts comes to the rescue!

$ rake new_page["404.html"
(in /home/lee/octopress)
mkdir -p source
Creating new page: source/404.html

This gives us a page named 404.html in the source subdirectory, underneath the Octopress root. It starts life as a blank Markdown page, which you can customize to show whatever variation of “what you’re looking for isn’t here” makes you happy. You should strip out the extra header info (you don’t need comments on your 404 page, do you?) to keep the page simple. Here’s what this site’s 404 error page looks like, before it goes through the generator:

---
layout: page
title: "404 Not Found"
footer: False
---

Either you clicked on a bad link, or typed in something that doesn't correspond
with a page or image on the bigdinosaur.org server.

Then you can rake generate the site and publish it to get the page up onto the server. The next step is to tell Nginx that instead of serving its built-in 404 error page when an object isn’t found, it should use your new 404 page instead. If you’re using something similar to the HTTP-only site definition described in my post about Octopress on Nginx, then you’ll want to open up your blog’s site definition file under /etc/nginx/sites-available and make the following change:

server {
    server_name yourblog.com;
    root /var/www/yourblog.com;
    index index.html;
    autoindex off;

### Add the line below this to tell Nginx to serve your 404 page
    error_page 404 404.html;

...

If you’ve got a more complex site definition with HTTP and HTTPS, you can either add the same line into both HTTP and HTTPS server blocks, or you can add the line to a common configuration file that gets included by both servers (like the common.conf file I described in the previous post).

Either way, once the line is added, reload Nginx’s configuration and try accessing something that’s not on your web site, and you should get your new fancy error page! You can define custom pages for any HTTP error code, so you could conceivably put up custom 403 and 500 pages, too, or anything else you’d like.

That’s just the first thing we can do, though. Rather than having that 404 page sitting in your blog’s root directory, it would be a lot neater and more organized if we could put it and any other error pages you decide to make in their own directory—say, /error. This is easy to do because of the flexibility you get from Octopress—simply create ~/octopress/source/error and move the error page from source into the new directory:

$ mkdir ~/octopress/source/error
$ mv ~octopress/source/404.html ~octopress/source/error

Then, the next time you generate and publish the site, the 404 page will be moved in the error directory on the web server.

We can do more, too. You don’t really want anyone directly accessing yoursite.com/error. It doesn’t necessarily do any harm, but it doesn’t do any good either, does it? We can declare /error as an “internal” directory, meaning that Nginx can be told programmatically to serve files out of it, but any browser-sourced requests to that location are answered with a “403 FORBIDDEN” response.

To do this, add the following location block to your site, either to the individual sites-avaialable file or to the common inclusion file:

location /error/ { internal; }

Reload Nginx’s configuration and try to access yoursite.com/error with your browser, and you’ll get a big fat FORBIDDEN message—in fact, if you defined a custom 403 error page, this is one way to get it to display!

We also almost certainly want to exclude our custom error pages from our sitemap.xml file, since they don’t need to be indexed by search engines. Open up ~/octopress/plugins/sitemap_generator.rb and edit the following line:

  # Any files to exclude from being included in the sitemap.xml
  # Add any custom error message pages to the list
  EXCLUDED_FILES = ["atom.xml", "404.html"] 

There’s one final fancy trick we can do. On one hand, a 403 FORBIDDEN response is useful and good. It tells the requesting browser that the web server understood the browser’s request, but that it refuses to answer it; more to the point, IETF RFC 2616 says that a 403 response is the correct response to send in such circumstances. However, rather than waving a big fat “something here might be interesting but you can’t look at it!” flag at potential snoopers, it might be more to your liking to respond to such requests not with a standards-compliant “403 FORBIDDEN” message, but instead with a false “404 NOT FOUND” message.

Nginx makes this easy. Find wherever you added your custom error page definitions, and make the following addition:

### Error page definitions
    error_page 401 /error/401.html;
    error_page 404 /error/404.html;
    error_page 500 /error/500.html;

### Add this:
    error_page 403 =404 /error/404.html;

This causes Nginx to send a 404 response and serve the custom 404 error page whenever a request triggers a 403.

After doing all this, you’ve got a web server that’s sending out fully themed error pages from an inaccessible location, and which responds to any unauthorized request with a “not found” error rather than a “there’s something here buy you can’t have it” message. The result might not be fully in line with RFC 2616, but that’s okay. I won’t tell.