I came across Url Rewrite in .NET a little time ago, trying to get rid of the querystrings of an old-ASP
CMS I deveoped and worked great (see Pitagora / WOE Web Content Management System)
but had the problems that afflict a lot of asp scripts: all the pages are something like:
http://myhost/mypage.asp?pagetoget=MyVeryGoodPage
While working at this I came across a fancy all purpose Url rewriting and caching engine for any programming
language. Strange as it seems this was the very complex simplest solution I found to handle a very simple task.
These are the stories of the Starship Enterpri... hemm... of my implementation.
The Main Idea
...or how I got into this mess.
Mainly I have a very good CMS (Pitagora / WOE Web Content Management System,
shameless self-promotion intended) that handles all the pages with a URL that is http://myhost/template.asp?page=Title_Of_Page.
While this approach is really useful and very good for memory (I only need to remember the title of the page I want to access),
it is a huge problem for search engines and web statistics, cause actually ALL pages only call template.asp.
into http://myhost/Title_Of_Page.aspx while using the CMS.
I didn't want to use a mere redirect, mainly because it uses a 302/303 HTTP code that means a normal
redirect not followed by search engines. And I obviously wanted to make it automatic, not needing to
manually create the aspx file for each page. When I first went into Url Rewriting in .NET I though I
was really OK. Using ASP I already had a Wintel machine, I can program in ASP.NET, in my imagination
this had to be as simple as 1 + 1.
Of that stuff is Hell made.
How to Crash a Wonderful Idea
It started simply enough (they all do).
The beautiful thing is that if I correctly implement the trick I didn't need the MyFakePage.aspx to
REALLY exist on server; I only needed to implement the Application_BeginRequest method on Global.asax.
Application_BeginRequest is fired BEFORE the actual page, so I can SIMULATE the page exist and doing
my redirect job fast and neat. So I'd have something like this:
Ok. Thats all folks. It is all well and done, It's Ok... ACHK!!! It is not.
It seems that HttpHandler in machine.Config gets angry if you try to execute an
ASP this way. You cannot. That's all.
Well, I can modify Machine.Config here, but what about my HOSTED site?
This could be the end of the dream. To make the things even worse, the same problem afflicts the
RewriteUrl method. I had to quickly devise a new method for being maybe LESS neat but functioning.
Fetching What I Need
Well, all in all an ASP page is something that is sent via HTTP. All I have to do is to fetch it
a Web Browser will do and have it spitted back to the client. I was starting to hope again...
First of all lets get that damn page and store it into a string:
This could be the end of the article, as the Machine.Config problem above, but once again I'm
struck with the fact I'm working in a Hosted Environment. A hosting environment with PAID BANDWIDTH.
I'm sure you can get my point. Every time I call for a page, it is fetched over and over again by the ASP.NET
process. It can really lead to disaster. I need another trick...
Maybe Akamai Started From Here Too...
The more I thought about all this, the more I was convinced that a cache system of some kind would be
a neat job. In addition to making this article function it would provide the CMS with a cache for content.
Not only that... it would make the system a URL Rewriting and caching engine for EVERY language. You could even
use it on PHP (for example) or Python etc. as long as they are on the same server?
And why to limit on the SAME server? You can change the destinationHost string and fetch the content from
anywhere in the world, as far as I know.
Well... We were talking about the cache... I know that ASP.NET has a beautiful set of caching features, but
they are in the PAGE object. I didn't wanted to use it. So I went into the dear old Application object.
The trick is easy: do you have the page in cache? Yes? Ok, spit it to the browser. No? Ok, get it, store it
and spit it to the browser. Like this:
That's all. I can now navigate every fake page and get it from the CMS. I can get it stored so that
I have a cache and I can even add a comment on the bottom line to remind me where it comes from
(open the resulting HTML and youll see).
I Need to Change a Page!
You should have noted that if you want to change a page you'll need to restart the application to
erase Application values. Once again, being hosted, I needed to find some sort of sideway to
handle this (and seriously thinking of housing ;).
A very quick way is to catch if a particular aspx page is called (suppose it's called clearall.aspx)
and doing the job. Something like this:
if (filename == "clear")
{
Application.RemoveAll();
Response.Write("Done it!");
}
That should handle all the stuff pretty easly.
Playing with Content
Please, remember that we have the whole page as a string! That means we can alterate it as simply as
Replace("oldVar", "newVar"). We can, for example, modify all HREFs to reflect a
new position with a Replace("href="", "href="/mynewdir/"), or even
substitute the title of the site we are stealing the content from with our own (ok, ok, that's too nasty ;)
I promise not to do that...).
Conclusion
While this article is FAR from being a Guru vision or something like that, I hope it will provide you
some ideas to experiment with this approach. I do not know if this "stripped down" version
of the actual system I use is safe (I am a little concerned about memory allocation of Applications
after a while), but is a good start for implementing something better.
Please, avoid sending me flame mails about how my code-style sucks. I know it. Im good at inventing
things, not at the fine carpentering job of stylish-coding... if you create a polished version
I'd be more than glad to take a look at it!
It has helped me much, it was fun to write it and right now my site ( Lastknight Dot Com ) runs by using it
(well, a more sophisticated and database driven version of it, but this was the core idea).
As far as I know, it funcions and my statistics are referring the ASPX page correctly and nicely.
It's fun to have all the stats made on PAGES (top view, permanence, etc) as if I wasn't using a CMS at all...
It takes the best of both worlds.
And Google is handling the page as a static one. That means they actually think I have more than ONE PAGE in the domain,
right now. Astonishing, isnt it?
Have my greetings and I'll be most pleased to know about your experimenting with this, as well as to correct
the scores of errors I made in this article that you'll kindly signal.