In the follow-up to our extremely popular Index Server
article we cover linking to results, limiting the number of
results, filtering results based on path and filename, and
using GetRows and GetString. We also include an improved
search script.
In this article I'm going to continue to discuss using Microsoft's
Index Server from ASP. I'm assuming you've already read
Part I which explains how to set up an
index and the basic details of a "plain vanilla"
search page. If you haven't, you might want to take a moment
and give it a quick read before proceeding.
Linking to the Result Documents
For the most part... just finding out that a document exists
is not enough. Once you perform a search and find a document
you're interested in reading, the next step is usually to open
that document and take a look at it. In a web-based setting, the
key to doing this is creating a link to the document in question.
We've all created links... a simple <a href=""></a>
should be all it takes, but filling in the URL part is what's important.
With Index Server there are two different properties you can use to
determine the path to the result document: path and vpath. Ideally
(for web uses) you want to use vpath (virtual path to the document
in question), but there's a catch. If you didn't associate the
catalog in question with a WWW Server instance in the catalog
configuration screen (which, as I already mentioned in Part I, has the
downside of automatically including all the web's virtual folders
and mucking up your catalog), the vpath property always comes back blank.
So in order to use vpath you'll need to go associate your
index with the appropriate web server instance. If you choose this method
I recommend that the very next thing you do is go to the Directories
folder underneath your catalog in the Indexing Service configuration
screen and set the "Include In Catalog" attribute of the
ones you won't need to "No" and then restart the Indexing
Service.
I personally prefer to have a "clean" index that only
includes the documents I want indexed, so I decided to use the
path property and create a function to translate the physical
path into the appropriate form.
Function PathToVpath(strPath)
Const strWebRoot = "c:\inetpub\wwwroot\"
Dim strTemp
strTemp = strPath
strTemp = Replace(strTemp, strWebRoot, "\")
strTemp = Replace(strTemp, "\", "/")
PathToVpath = strTemp
End Function
It's pretty basic... just swaps out the physical root path
of the web site with a backslash and then swaps all backslashes
for normal slashes. You'll need to set the Const to reflect
the root of your site, but aside from that it's pretty simple
and prevents me having to
deal with Index Server's auto inclusion of virtual dirs.
Limiting the Number of Results Returned
Have you ever taken a look at the result of a query and
seen something like: "Results 1 - 10 of about 61,300,000?"
I don't care who you are or what you're looking for... no human
being wants to look through a million or more results looking
for something of interest. The fact that a search engine
would even present this many entries is simply insane.
Generally if you don't find what you're looking for in the first
10 or 20 results you're going to perform a new search to try
and narrow down the results. So what's the point of your search
page returning all the results?
It's fairly simple to tell Index Server to limit it's results
to a reasonable number of hits. To do so you simply need to
set the value of the MaxRecords property of the
Query object
to the maximum number of results you want your search to return.
For example:
objQuery.MaxRecords = 50
Filtering Results Based on Filename
There are a number of reasons you might want to block results
based on filename. Since Index Server is based on the file system,
it will pick up files in the web that aren't linked to from any
other pages. The two types of file that spring immediately to
mind as ones that you might not want returned as search results
are files generated by Frontpage Server Extensions and secure
administration pages. Both play their role, but neither is
something you want a random visitor stumbling across.
By adding a conditional to our query we can exclude
files that we don't want returned as results. I'm not going
to go into great depth as to the syntax since the documentation
has gotten much better then it used to be, but the command below
makes sure we only return files that
don't contain "_vti_" in their path (FPSE files).
objQuery.Query = strQuery & " AND NOT #path *\_vti_*"
As another example, if you wanted to exclude files whose name
contained the word "admin," you could do something like this:
objQuery.Query = strQuery & " AND NOT #filename *admin*"
Finally there's no reason you can't do both:
objQuery.Query = strQuery _
& " AND NOT #filename *admin*" _
& " AND NOT #path *\_vti_*"
Using GetRows and GetString
I never really though about it before, but after the
last article I got an email from a user who was trying
to use the GetString() method of the
recordset returned from his query in order to get the results quickly and
release the objects as quickly as possible (always a good idea). Unfortunately
he was having some problems. I'm not sure what they were, but
I recently tested these methods and they seem to work well.
You'll need to be sure you're on an active record, but aside
from that GetString() and GetRows() should work like they do
for any other recordset. For more info you might want to take
a look at our
Database GetRows
and
Database GetString
samples.
Searching Database Driven Web Sites
This is one section where Index Server really falls short of the mark.
The only answer I've ever gotten for this is that you should build
static pages for every possible page and then have Index Server
index that. It's a half-@$$ solution and everyone knows it, but
I've yet to find any other way to get Index Server to actually index
data stored in a database. If you've had any luck in this area
let me know and I'll be happy to share, but AFAIK, in order
to get database query results into an Index Server query you'll
need to query the DB yourself and find some (hopefully meaningful)
way to integrate the results.
Have Patience
I didn't know where else to put this one, but I thought I should mention it.
Often times time alone can fix a problem. I know it's frustrating,
but there are times when Index Server just decides to take it's
sweet time re-indexing. I've had experiences where I've spent
half an hour trying to figure out what I was doing wrong because
I kept getting no results back. I finally went to lunch to try
and clear my head and when I got back everything was working
fine. I'm not sure if the server was just too busy to re-index
at that moment or if it was just trying to annoy me, but
remember to give it some time... especially on large or complex
sets of documents. (I haven't had this happen in a while so maybe
faster computers are helping or maybe it was a bug in Index Server
2, but just FYI.)
Additional Information
While writing this article I ran across some relatively useful pages
over at Microsoft. I wish I had found them when I was trying to build
my first search page, but I guess it's better late then never:
Notice that they're all pretty old and none of them fare very well based on user feedback, but at least they're something.
Downloads
I've added an advanced query page
(integrating examples of the topics included above)
to the download from Part I.
If you just downloaded it then you've already got everything.
If not you can get a copy from here: indexserver.zip (6.8KB).
The only things left to do (that I can think of) would be to make a
more complex front end form to allow more complex queries
(date and time, boolean, exact phrase, author, etc.) and to
see if I can pull off getting this thing to work from ASP.NET.
If I write a Part III, it'll be to cover one or both of those, but
don't hold your breath.
Update: Using Index Server from ASP.NET
While I haven't had a chance to play with it or whip up a sample script
to parallel our other samples, I did run across a how to article in
Microsoft's KnowledgeBase that covers querying an Index Server catalog from
an ASP.NET page:
Another thing to note is that if you're trying to use Index Server to index
ASP.NET pages (*.aspx, *.ascx, etc.) you may need need to change some settings first
(depending on your OS, service pack level, and configuration):