Back in November of 2000, I wrote an article for ASP 101 called
"Extending Your Page Names".
Based on the number of people that contact me with questions and praise,
it was one of the most successful articles I have ever written.
Since then we have seen the launch of Windows 2003, IIS 6.0, ASP.NET, and the
rise of Google. So I felt it was time to make an update to the article....
Introduction
Dynamically generated pages are the only way to have a truly big site; however dynamic page names are not
very user friendly, nor search engine friendly. For example, having a page called:
product.asp?Id=4
is not as memorable as:
/appliances/dishwashers/kenmore/Model3809.htm
This article shows how you can get a static-looking page name and dynamic pages at the same time.
Search Engines
There is another benefit to having well-named pages, search engines are able to traverse these pages.
Most search engines, like Alta Vista and Yahoo do not traverse pages with question marks in them, since
they are afraid of entering into a never-ending traverse. By converting your dynamic page names to static
pages, search engines would categorize these pages driving more traffic to your site.
Another benefit is that some search engines traverse the URL of the page looking for keywords, and rank
these keywords with more importance then the text in the HTML. For this reason it is good to have keywords
in your URL that match the page description.
Minimum Requirements
In order to get well-named pages , you will need to use Windows 2000/IIS 5.0 or Windows 2003/IIS 6.0
running either ASP.NET or ASP. If you are running ASP please see the former article for the code sample
needed. There are two new enhancements in IIS 5.0 that allow you to have great pages names, Custom Error
Pages that use Server.Transfer and the ability to use Server.Transfer in our Active Server Pages/ASP.NET.
Even though Custom Error Pages were available in IIS 4.0, they used Response.Redirect, which will not work.
Response.Redirect will not work because search engines do not follow redirects.
Overview
As the site programmer, you link up pages that don't exist -- presumably with well-named URLs. You then
tell IIS that you want an ASP.Net page (404.aspx) to handle all the 404s that come to the site. Inside
this ASP.NET page, you convert the original URL to well-named URLs and do a Server.Transfer to execute
and return that page to the user's browser.
Where you fill in myserver.com with your web site name the page will return 404. The first thing to do
is to have all your 404 pages handled by a single .aspx page. You can do this by using the Custom Error
Page feature of IIS 5.0/6.0. To turn on custom error pages follow these steps:
Open IIS Manage In MMC.
Right Click on the web site node and choose properties.
Click on the Custom Errors Tab.
Scroll down until you see the HTTP Error -- 404.
Double click on 404 to open the "Error Mapping Properties" dialog.
Change the Message Type to URL.
For the URL enter in /404.aspx
Click OK and Then OK again.
Now all your 404 errors will be handled by 404.aspx. The nice thing about IIS is that when it calls 404.aspx,
it will send the page name that caused the error as a parameter in the query string.
404.aspx
Now create a 404.aspx page to handle your errors. The first thing that you need to do is get the name of the
page that had the 404. This line of code will get the page name from the query string:
// C#: Get the Page Name
strQ = Request.ServerVariables["QUERY_STRING"].ToString();
So what is important to use in the strQ? In the example above it appears like: /appliances/dishwashers/kenmore/Model3809.htm.
All we really need from it is the Model3809, since this could be the unique key to the product database.
The following lines of code takes the model number and find the product id.
// C#
string strParentPageName="";
string strPageName="";
string strQ="";
int nIndex=0;
int nProductId = 0;
string strSQL = "";
strQ = Request.ServerVariables["QUERY_STRING"].ToString();
// Decode the HTML
// Strips Out Trash that the URL creators might
// Have Entered
strQ = Regex.Replace(strQ,"[_]"," ");
strQ = Regex.Replace(strQ," "," ");
strQ = Regex.Replace(strQ,@"\+"," ");
strQ = Regex.Replace(strQ,"%20"," ");
strQ = Regex.Replace(strQ,"%C3%A9","é");
strQ = Regex.Replace(strQ,"%C3%A1","á");
strQ = Regex.Replace(strQ,"%27","'");
strQ = Regex.Replace(strQ,"%5C","\\");
// Strip Ending Slash If There Is One
if (strQ[strQ.Length-1]=='/')
{
strQ = strQ.Substring(0,strQ.Length-1);
}
// Strip The Page Name
nIndex = strQ.LastIndexOf("/");
if (nIndex>0)
{
strPageName = strQ.Substring(nIndex+1,(strQ.Length-nIndex)-1);
strQ = strQ.Substring(0,nIndex);
}
// Transfer To the Home Page If There Appears
// To Be a Problem With the URL, maybe a real 404?
if (strPageName.Length<4)
{
Server.Transfer("/default.aspx");
}
// Trim Off .htm From Location Name
if ((strPageName[strPageName.Length-1]=='m') &&
(strPageName[strPageName.Length-2]=='t') &&
(strPageName[strPageName.Length-3]=='h') &&
(strPageName[strPageName.Length-4]=='.'))
{
strPageName = strPageName.Substring(0,strPageName.Length-4);
}
// Now We Have The Name of the Page Decode Some More
// Also Prep For A Database Call
strPageName = Regex.Replace(strPageName,"'","''");
strPageName = Regex.Replace(strPageName,"%2F","/");
strPageName = Regex.Replace(strPageName,"%2f","/");
// Lookup the Name of the Page From The Product Database
strSQL = "SELECT Product_Id FROM Category WHERE Product_Name = '" + strPageName + "'";
SqlConnection myConnection = new
SqlConnection(ConfigurationSettings.AppSettings["ConnectionString"]);
SqlCommand myCommand = new SqlCommand(strSQL, myConnection);
// Execute the command
myConnection.Open();
SqlDataReader result = myCommand.ExecuteReader(CommandBehavior.CloseConnection);
// Not IN The Product Database, Transfer To the Home Page
if (!result.Read())
{
Server.Transfer("/default.aspx");
}
nProductId = Int32.Parse(result["Product _Id"].ToString());
Now that we have the Product Id we need to store it before transferring to the correct .aspx page.
In ASP we can't pass the Product Id in the query string via the Server.Transfer (this is an IIS restriction).
So, we pass it via the Session Object. However, a major difference in ASP.NET is that you can pass the
information in a query string with a Server.Transfer.
and because of the Server.Transfer, the URL in the browser's address bar doesn't change, and the browser doesn't
have to perform another round trip, unlike Response.Redirect.
Also notice that the directories do not exist at all, in fact it doesn't matter in this case what the rest of
the URL says -- except the server name. For instance all these URLs go to the same page:
So why put in the directories? The directories will give you higher search engine placement.
Because some search engines use the words in the query string as stronger keywords to the search than words
in the title or body of the HTML, directory names are very important.
Calling Pages that Don't Exist
We have covered the technology to convert URLs that don't exist to dynamic URLs, however in order to get the
search engine to traverse those pages you need to link to the URLs that don't exist. In other words, the only
way the search engine is going to find your Model3809.htm page is if you link it up.
When you linked this page before all you had to do was use the Product Id like this: " product.asp?Id=4",
so lets take that Product Id and create a function that returns the correct URL.
// C#
public String Page1::CreateProductURL(int lProductId)
{
String strURL = "";
String strSQL = "SELECT Product_Model FROM Product WHERE Product_Id =" + lProductId.ToString();
SqlConnection myConnection = new SqlConnection(ConfigurationSettings.AppSettings["ConnectionString"]);
SqlCommand myCommand = new SqlCommand(strSQL, myConnection);
// Execute the command
myConnection.Open();
SqlDataReader result = myCommand.ExecuteReader(CommandBehavior.CloseConnection);
if (!result.Read())
{
// No Product With This ID, Alert the Page Creator
throw new Exception("Invalid Product Id: " + lProductId);
}
strURL = "/" + result["Product_Model"].ToString() + ".htm";
myConnection.Close();
return(strURL);
}
Now when you want to add a URL you do it like this:
Note: In this example we assume you don't know the model name and you have to go back to the
database -- in the real world this might not be the case. If you know the model name then you
can create the URL without making another call to the database.
Performance
There are some obvious performance issues associated with this technique. First, it might require an
extra database call to create the well-named URLs that don't exist. Secondly, it will always require
an extra database call to figure out the correct URL from the 404 URL.
Finally, the Server.Transfer is expensive.
An Update
Since writing the original article, my brother (Glen Berry) pointed out a way to avoid the extra
database call in the 404.aspx that is very clever. Simply put the product identifier in the URL
that you create, for example:
Your 404.aspx page would then parse out the top most directory and get 453422. In the example 404.aspx
code above the nProductId would equal 453422. This would avoid the database call and make the page
faster. However, this makes the URL less readable, but still gives you good search engine placement.
Example
An example of this can be seen with http://www.kulshan.com. Most of the leaf pages on Kulshan.com,
like the individual restaurants reviews, use this technique to get better search engine placement.
Summary
Creating URLs that you can market and that work on search engines is fairly easy, if you don't have
to create the directory structure and files that those URLs represent. Using IIS 5.0/6.0 Custom Error
Page technology and handling the 404s can yield great pages names.
XCache Technologies
Founded in 1996 by Wayne and Dina Berry, XCache Technologies is a lead innovator of Web caching software
and Internet performance solutions. The company's flagship products XCache, XCompress and XTune provide
Web developers and site owners with robust performance solutions that significantly improve site
experiences for their end users. XCache Technologies is headquartered in Bellingham, Washington and
can be contacted via their website: http://www.xcache.com.