I'm not going to spend much time trying to convince you that your
site needs a search, because if your site needs one then you
probably already know it does. If you don't think you need a search
then your site is probably either very small, is just a presence
site, or deals only with very structured data and you may very well not
need one. For everyone else, here's a quick rundown of one approach
to adding search functionality to your site.
Search Options
There are a couple different ways to handle adding search
functionality to your site. Each has their benefits and
drawbacks and the option you choose is really dependent on
your site and your seaching needs.
External Search Engine
A number of the larger search engines offer site owners the
option of adding a search box to your site that will
enable users to search their search engine for results that reside
on your site.
This is perhaps the easiest way to get started and is a great option
for users who host their site at an ISP or don't have access to
their server in order to implement a more advanced solution.
There are two main downsides to this approach.
The first is that many engines can run into trouble indexing
your site's content (especially if your site is database-driven)
and therefore may not be able to return appropriate results.
The second is that you often have very little control over
the format of the results returned.
There are also a fair number of companies offering this as a
paid solution. I'm not familiar with many of them, but if
anyone is using one they really like let me know and I'll list
it here for others to check out.
Server-Based Search Software
If you can't get what you want via an external search it might be
time to look into installing your own. The downsides here are that
the implementation is usually much more complex and most of the
products that fall into this category are not cheap. The main
benefits include flexibility and customizability of your search, the
ability to search private sites (aka. intranets), and the ability to
search more then just HTML documents.
This category contains all sorts
of products from different vendors. Most of the major search engines
have some sort of commercial software offering that you can install,
but if you're looking to do it cheaply you can use Microsoft's Index
Server. Index Server is included with Windows and is the focus of
the rest of this article.
Index Server
Instead of takling the time to describe and explain Index Server and how it works,
I'll let Microsoft's Index Server: Frequently Asked Questions
do it for me so I can jump right into walking you through the setup.
Note: You'll probably notice that most of the Index Server
documentation and content tends to be either out of date, incomplete, or
both. I'm not sure why, but Index Server never gets explained very well
by the folks in Redmond. I just mention it as a heads up... for those times
like when it says you need to download and install IIS 3! Don't actually do
it... If you're running Win2K or better you've probably already got it installed.
Creating an Index
Index Server is able to provide you with acceptable performance by indexing
the content to be searched ahead of time and creating an index of keywords and
document attributes. If it didn't, every time a query was issued the server
would have to go looking through every document in order to find the ones that
matched the query criteria which would result in a very slow search results
and would probably cripple your server in the process.
Now since Index Server is not strictly a web-based search engine, it works by
looking at the file system and not the links in your documents. This is important
for a number of reasons, but primarily because it can have trouble giving you URLs for
your documents and it will find and index documents that do not have links to them.
In general I find it best to create a new catalog when setting up a search for
a given set of documents. For this article I've created some standard HTML
documents (available in a zip file at the bottom of this page) and placed them
in a directory named C:\Inetpub\wwwroot\indexserver\. This puts them right off
my web server's root at http://localhost/indexserver/.
I'll be doing this in Windows 2000, but the process should be similar in other
flavors/versions of NT-based OSs. To create a catalog, you need to go to the Administrative Tools folder
in your web server's Control Panel. Under Administrative Tools you should
find an item that says Computer Management. When you click on this it should
open up MMC with the root node of Computer Management. Under that should be an
item called Services and Aplications which contains an entry for Indexing Service.
If you right-click on Indexing Service and select New >, you should get the
option to add a new catalog. It'll ask you for two things in order to create the catalog:
the catalog name and a location to put it's files. For my example I'm using
IS-Sample and C:\Inetpub\index\ but you can use whatever works for you. Note that
this is not the location of the files to be indexed, but is where the files that Index
Server creates and uses to do the indexing will be stored. Don't select a directory that
is in use or one that is web-accessible. At this point, you'll
probably get a message saying that the catalog will remain offline until the
service is restarted. Don't worry... this is normal.
You should now right-click on your new catalog (in the right hand pane)
and select Properties.... You can't edit anything on the General tab so
click to the Tracking tab. I find it best to leave everything unchecked and
select (None) for WWW Server. Selecting a server is helpful because then index
server can generate URLs for you, but it also causes it to automatically add
and index all virtual directories under the selected web site. If you're on
a server flavor of Windows and the web in question doesn't have any virtual
directories then you can go ahead and select the appropriate web, but otherwise
I'd recommend against it or you'll probably end up indexing a whole bunch of
stuff you don't need or want indexed. Under the Generation tab I usually
uncheck everything except Generate Abstracts. This creates little blurbs
based on the documents and can be quite helpful. Pick something reasonable for
the max size (I'm using 100 chars for the demo) and click OK.
The next step is to tell the catalog what to index. Right click on the Directories folder
underneath your catalog and select New > Directory. This is where you add the
directory you want this catalog to index. In my example it's C:\Inetpub\wwwroot\indexserver\.
Next you click on Indexing Service in the left hand pane and click the start arrow
in the toolbar to start the service if it's not already running. If it was running,
stop and restart it. This should start all the catalogs indexing. You can stop
any catalogs that are not in use by clicking them in the right hand pane and
clicking the stop square in the toolbar. Now you've created a catalog and
have told it what to index. Based on the size of the content involved, the
initial indexing process can take a while so don't be worried if you don't
get results right away or only get partial results.
You can test your catalog by clicking on Query the Catalog underneath your
catalog's name and running a test query. Using the word "database" with
our sample documents I get three results: default.htm, database.htm, and
a frontpage file (which we'll discuss more in part II).
So now our catalog is set up, but we still don't have any way to query it.
Now we need to build a web form to interface with the Indexing Service.
Note that I also added caching of DocKeywords and DocAuthor fields under
Properties under my catalog in order to get those fields working.
Just check caching and accept the default values.
Search Page
Part of the power of Index Server is the fact that you can customize it to your
heart's content. The flip side to that is that it can also be quite complex
and is quite daunting to the new user. I'm going to start with a relatively simple
search form and comment it heavily to get you started.
<%@ Language="VBScript" %>
<% Option Explicit %>
<html>
<head>
<title>ASP 101's Index Server Article - Search Page</title>
<meta name="description" content="Search Page">
<meta name="keywords" content="Search Page">
<meta name="author" content="John Peterson">
</head>
<body>
<p>
This is the search page of the sample web content for ASP 101's
Index Server article.
</p>
<form action="default.asp" method="get">
<input type="text" name="query" />
<input type="submit" value="Search" />
</form>
<p>
Queries that should return results include:
<a href="?query=component">component</a>,
<a href="?query=cookie">cookie</a>,
<a href="?query=database">database</a>,
<a href="?query=date">date</a>,
<a href="?query=time">time</a>,
<a href="?query=email">email</a>,
<a href="?query=form">form</a>,
<a href="?query=search">search</a>,
etc.
</p>
<%
Dim strQuery ' The text of our query
Dim objQuery ' The index server query object
Dim rstResults ' A recordset of results returned from I.S.
Dim objField ' Field object for loop
' Retreive the query from the querystring
strQuery = Request.QueryString("query")
' If the query isn't blank then proceed
If strQuery <> "" Then
' Create our index server object
Set objQuery = Server.CreateObject("ixsso.Query")
' Set it's properties
With objQuery
.Catalog = "IS-Sample" ' Catalog to query
.Query = strQuery ' Query text
.MaxRecords = 10 ' Max # of records to return
' What to sort records by. I'm sorting by rank [d]
' which is [d]escending by how pertinent Index Server
' thinks the result is. This way the most applicable
' result should be first.
.SortBy = "rank [d]"
' Which columns to return. Column names must
' be the same as the catalog's properties. Some
' of them are: contents, filename, size, path,
' vpath, hitcount, rank, create, write, DocTitle
' DocSubject, DocAuthor, DocKeywords...
.Columns = "filename, path, vpath, size, write, " _
& "characterization, DocTitle, DocAuthor, " _
& "DocKeywords, rank, hitcount"
End With
' Get a recordset of our results back from Index Server
Set rstResults = objQuery.CreateRecordset("nonsequential")
' Get rid of our Query object
Set objQuery = Nothing
' Check for no records
If rstResults.EOF Then
Response.Write "Sorry. No results found."
Else
' Print out # of results
Response.Write "<p><strong>"
Response.Write rstResults.RecordCount
Response.Write "</strong> results found:</p>"
' Loop through results
Do While Not rstResults.EOF
' Loop through Fields
' Formatting leaves something to be desired,
' but it'll work for now. We'll pretty things
' up and link to the content in part II.
For Each objField in rstResults.Fields
Response.Write "<strong>"
Response.Write objField.Name
Response.Write ":</strong> "
Response.Write rstResults.Fields(objField.Name)
Response.Write "<br />"
Next
' Spacing between results
Response.Write "<br />"
' Move to next result
rstResults.MoveNext
Loop
End If
' Kill our recordset object
Set rstResults = Nothing
End If
%>
</body>
</html>
In the next part we'll cover how to execute more advanced
queries and get the results looking a little prettier and linking to the
content pages, but this should get you started for now.
Downloads
You can download the sample content files and the search page listed above
from here: indexserver.zip (6.8KB).
Part II
Did you notice that the name of this article included the phrase "Part I"? Well... at long last... here's Part II: