The (Almost) Perfect PHP 404 Page
Introduction
Many sites deal with visitors that are not technically inclined at all. What sort of 404 page do you give to them? You don’t want to confuse them, but how much power do you have to help them? There is a lot of client-side stuff that is involved, and that obviously depends entirely on the client. There is also the restraint of reasonability. You don’t want to have your server going through complex calculations every time a visitor gets a 404 error.
Giving Instructions
This might seem like the easiest and most obvious one. However, a lot of webmasters don’t follow it. Links to areas that might help the user find what they need ( Ex: blog archive, site search, etc. ) are often broken links. You need to keep your 404 page up to date. You also need to show the difference from a 404 page and a normal page on your site. I recommend using the same general design, but having the word “Error” in very big letters.
Using the HTTP Referer
The HTTP referer tells what URL allowed the visitor to come to this page. That means that if a visitor goes to http://example.com and clicks a link that takes them to your site, when you scan for the HTTP referer, you will get “http://example.com”. The HTTP referer depends on the visitor. That means that the software on the visitor’s computer ( web browsers or old firewalls ) can modify or stop the HTTP referer from being sent.
This will require you to set up a few checks. These checks can’t really determine if the HTTP referer has been modified, but it will pick up the most obvious forms of tampering. Below is a small bit of code that would work as a simple test to check if the HTTP referer exists and if the referer came from your site, meaning that your site caused the 404 error that was experienced.
You can see an example here
Using Pspell
Plenty of URLs, especially blogs that have the post title in the URL, have words that are in the URL. In the event that a user gets a 404 error, assuming that the URL has valid dictionary-words, the URL could be run through Pspell and the misspellings could be corrected. There would be a check to see if the spell-checked URL actually existed. If it did, the script could look for the HTML <title> tag and/or meta tags that would give more information on the page. This is a bit resource-intensive for a 404 page, so it would be best recommended for low-traffic sites that need high usability. You could also try querying a database to see if words in the spellechecked URL are in the database.
The smarter way
You could extract the words out of the URL and enter them in a site-specific Google search. Google’s spellchecking goes beyond a normal dictionary and can correct names, technical terms, and other things that might not be found in Aspell. Even without correcting the names, the page that they are looking for might appear on the screen. This pretty much is a site-search though, but much less work for your servers. All they have to do is redirect.
Using a Site Search
SIte search is typically what a lot of 404 pages have. It’s simple and designed to find information for the user. However, plenty of site searches are not very efficient at getting data that you need. Try limiting the search to fields that give a solid description on what the site is about. On one of my sites, I have a 255 character description field that tells what the page is about so users can decide if they want to read the article. A field like that could be searched for the search terms that a user searched for. You could also try having a field of keywords.
Better yet… Preventing a 404 error
The perfect 404 page is one that you never see. Of course, the only way for your visitors to never see a 404 page would be to not have one, which isn’t smart because then they don’t know what happened. You could prevent a 404 error by using redirects. If a URL is being misspelled frequently, a redirect would be very helpful. Shorter and more memorable URLs mean that they are easier to remember and type into a URL bar.
For a short tutorial on URL rewriting using the Apache module mod_rewrite, you can head over to this mod_rewrite tutorial on HTMLSource