ASP 101 - Active Server Pages 101 - Web03
The Place ASP Developers Go!

Please visit our partners


Windows Technology Windows Technology
15 Seconds
4GuysFromRolla.com
ASP 101
ASP Wire
VB Forums
VB Wire
WinDrivers.com
internet.commerce internet.commerce
Partners & Affiliates














ASP 101 is an
internet.com site
ASP 101 is an internet.com site
IT
Developer
Internet News
Small Business
Personal Technology

Search internet.com
Advertise
Corporate Info
Newsletters
Tech Jobs
E-mail Offers

ASP 101 News Flash ASP 101 News Flash



 Top ASP 101 Stories Top ASP 101 Stories
Getting Scripts to Run on a Schedule
The Top 10 ASP Links @ Microsoft.com
What is Adovbs.inc and Why Do I Need It?

QUICK TIP:
Don't touch that mouse!
Show All Tips >>
ASP 101 RSS Feed ASP 101 Updates


ASP.NET URL Rewriting and Caching Engine

by Matteo G.P. Flora - http://www.lastknight.com

Playing with URL Rewrite

I came across Url Rewrite in .NET a little time ago, trying to get rid of the querystrings of an old-ASP CMS I deveoped and worked great (see Pitagora / WOE Web Content Management System) but had the problems that afflict a lot of asp scripts: all the pages are something like: http://myhost/mypage.asp?pagetoget=MyVeryGoodPage

While working at this I came across a fancy all purpose Url rewriting and caching engine for any programming language. Strange as it seems this was the very complex simplest solution I found to handle a very simple task. These are the stories of the Starship Enterpri... hemm... of my implementation.

The Main Idea

...or how I got into this mess.

Mainly I have a very good CMS (Pitagora / WOE Web Content Management System, shameless self-promotion intended) that handles all the pages with a URL that is http://myhost/template.asp?page=Title_Of_Page. While this approach is really useful and very good for memory (I only need to remember the title of the page I want to access), it is a huge problem for search engines and web statistics, cause actually ALL pages only call template.asp.

What I really wanted was something that could:

  • translate http://myhost/template.asp?page=Title_Of_Page
  • into http://myhost/Title_Of_Page.aspx while using the CMS.

I didn't want to use a mere redirect, mainly because it uses a 302/303 HTTP code that means a normal redirect not followed by search engines. And I obviously wanted to make it automatic, not needing to manually create the aspx file for each page. When I first went into Url Rewriting in .NET I though I was really OK. Using ASP I already had a Wintel machine, I can program in ASP.NET, in my imagination this had to be as simple as 1 + 1.

Of that stuff is Hell made.

How to Crash a Wonderful Idea

It started simply enough (they all do).

The beautiful thing is that if I correctly implement the trick I didn't need the MyFakePage.aspx to REALLY exist on server; I only needed to implement the Application_BeginRequest method on Global.asax. Application_BeginRequest is fired BEFORE the actual page, so I can SIMULATE the page exist and doing my redirect job fast and neat. So I'd have something like this:

protected void Application_BeginRequest(Object sender, EventArgs e) 
{ 
string myBaseUrl = "/vox/"; 
string strCustomPath; 
string mypath = Request.ServerVariables["PATH_INFO"]; 
string filename; 
filename = mypath.Substring(mypath.LastIndexOf("/")); 
string destinationUrl = "/myCMS/template.aspx?page=" + 
filename.Replace("aspx", ""); 
Server.Transfert(destinationUrl); 
} 

Ok. Thats all folks. It is all well and done, It's Ok... ACHK!!! It is not.

It seems that HttpHandler in machine.Config gets angry if you try to execute an ASP this way. You cannot. That's all.

Well, I can modify Machine.Config here, but what about my HOSTED site? This could be the end of the dream. To make the things even worse, the same problem afflicts the RewriteUrl method. I had to quickly devise a new method for being maybe LESS neat but functioning.

Fetching What I Need

Well, all in all an ASP page is something that is sent via HTTP. All I have to do is to fetch it a Web Browser will do and have it spitted back to the client. I was starting to hope again... First of all lets get that damn page and store it into a string:

private string getPage(string URL) 
{ 
string res =""; 
try 
{ 
WebRequest wrGETURL; 
wrGETURL = WebRequest.Create(URL); 
Stream objStream; 
objStream = wrGETURL.GetResponse().GetResponseStream(); 
StreamReader objReader = new StreamReader(objStream); 
res = objReader.ReadToEnd().ToString(); 
} 
catch () {} 
return res; 
}

And now we can kindly use it over and over and over...

protected void Application_BeginRequest(Object sender, EventArgs e) 
{ 
string myBaseUrl = "/vox/"; 
String strCustomPath; 
string mypath = Request.ServerVariables["PATH_INFO"]; 
string filename; 
filename = mypath.Substring(mypath.LastIndexOf("/")); 
string destinationUrl = "/myCMS/template.aspx?page=" + 
filename.Replace("aspx", ""); 
string destinationHost = Request.ServerVariables["HTTP_HOST"]; 
Response.Write(destinationHost + destinationUrl; 
Response.End 
} 

This could be the end of the article, as the Machine.Config problem above, but once again I'm struck with the fact I'm working in a Hosted Environment. A hosting environment with PAID BANDWIDTH.

I'm sure you can get my point. Every time I call for a page, it is fetched over and over again by the ASP.NET process. It can really lead to disaster. I need another trick...

Maybe Akamai Started From Here Too...

The more I thought about all this, the more I was convinced that a cache system of some kind would be a neat job. In addition to making this article function it would provide the CMS with a cache for content. Not only that... it would make the system a URL Rewriting and caching engine for EVERY language. You could even use it on PHP (for example) or Python etc. as long as they are on the same server?

And why to limit on the SAME server? You can change the destinationHost string and fetch the content from anywhere in the world, as far as I know.

Well... We were talking about the cache... I know that ASP.NET has a beautiful set of caching features, but they are in the PAGE object. I didn't wanted to use it. So I went into the dear old Application object.

The trick is easy: do you have the page in cache? Yes? Ok, spit it to the browser. No? Ok, get it, store it and spit it to the browser. Like this:

protected void Application_BeginRequest(Object sender, EventArgs e) 
{ 
string myBaseUrl = "/vox/"; 
String strCustomPath; 
string mypath = Request.ServerVariables["PATH_INFO"]; 
string filename; 
filename = mypath.Substring(mypath.LastIndexOf("/")); 
string destinationUrl = "/myCMS/template.aspx?page=" + 
filename.Replace("aspx", ""); 
string destinationHost = Request.ServerVariables["HTTP_HOST"]; 
string destinationAddress = destinationHost + destinationUrl; 
if (Application[destinationAddress] == null) 
{ 
Application[destinationAddress] = getPage(destinationAddress); 
Response.Write(Application[destinationAddress ].ToString()); 
Response.Write("<!-- Got it from LIVE -->"); 
} 
else 
{ 
Response.Write(Application[destinationAddress ].ToString()); 
Response.Write("<!-- Got it from CACHE -->"); 
} 
Response.End 
} 

That's all. I can now navigate every fake page and get it from the CMS. I can get it stored so that I have a cache and I can even add a comment on the bottom line to remind me where it comes from (open the resulting HTML and youll see).

I Need to Change a Page!

You should have noted that if you want to change a page you'll need to restart the application to erase Application values. Once again, being hosted, I needed to find some sort of sideway to handle this (and seriously thinking of housing ;).

A very quick way is to catch if a particular aspx page is called (suppose it's called clearall.aspx) and doing the job. Something like this:

if (filename == "clear") 
{ 
Application.RemoveAll(); 
Response.Write("Done it!"); 
} 

That should handle all the stuff pretty easly.

Playing with Content

Please, remember that we have the whole page as a string! That means we can alterate it as simply as Replace("oldVar", "newVar"). We can, for example, modify all HREFs to reflect a new position with a Replace("href="", "href="/mynewdir/"), or even substitute the title of the site we are stealing the content from with our own (ok, ok, that's too nasty ;) I promise not to do that...).

Conclusion

While this article is FAR from being a Guru vision or something like that, I hope it will provide you some ideas to experiment with this approach. I do not know if this "stripped down" version of the actual system I use is safe (I am a little concerned about memory allocation of Applications after a while), but is a good start for implementing something better.

Please, avoid sending me flame mails about how my code-style sucks. I know it. Im good at inventing things, not at the fine carpentering job of stylish-coding... if you create a polished version I'd be more than glad to take a look at it!

It has helped me much, it was fun to write it and right now my site ( Lastknight Dot Com ) runs by using it (well, a more sophisticated and database driven version of it, but this was the core idea). As far as I know, it funcions and my statistics are referring the ASPX page correctly and nicely. It's fun to have all the stats made on PAGES (top view, permanence, etc) as if I wasn't using a CMS at all... It takes the best of both worlds.

And Google is handling the page as a static one. That means they actually think I have more than ONE PAGE in the domain, right now. Astonishing, isnt it?

Have my greetings and I'll be most pleased to know about your experimenting with this, as well as to correct the scores of errors I made in this article that you'll kindly signal.

M.


Home |  News |  Samples |  Articles |  Lessons |  Resources |  Forum |  Links |  Search |  Feedback

Internet.com
The Network for Technology Professionals

Search:

About Internet.com

Legal Notices, Licensing, Permissions, Privacy Policy.
Advertise | Newsletters | E-mail Offers