ASP 101 - Active Server Pages 101 - Web06
The Place ASP Developers Go!

Please visit our partners


Windows Technology Windows Technology
15 Seconds
4GuysFromRolla.com
ASP 101
ASP Wire
VB Forums
VB Wire
WinDrivers.com
internet.commerce internet.commerce
Partners & Affiliates














ASP 101 is an
internet.com site
ASP 101 is an internet.com site
IT
Developer
Internet News
Small Business
Personal Technology

Search internet.com
Advertise
Corporate Info
Newsletters
Tech Jobs
E-mail Offers

ASP 101 News Flash ASP 101 News Flash



 Top ASP 101 Stories Top ASP 101 Stories
The Top 10 ASP Links @ Microsoft.com
What is Adovbs.inc and Why Do I Need It?
An Overview of ASP.NET

QUICK TIP:
Hiding and restoring Visual InterDev's Workspace pane
Show All Tips >>
ASP 101 RSS Feed ASP 101 Updates


Using Index Server to Search Your Web Site - Noise Words

by John Peterson

Introduction

Since writing my article on using Microsoft's Index Server from ASP (Part 1, Part 2), I've gotten quite a few questions about why people aren't getting the results they expect. There are a number of reasons why this might happen, but one of the most common is that your query includes one or more words that index server considers "noise words". This article will explain what noise words are and show you how to edit the list of words that index server treats as noise words.

The Email

You can thank Yu Zhang for finally getting me to write this article. I've answered quite a few questions about noise words, but his came at the issue from a little bit different angle. Here's his email:

Hi John,

Your article "Using Index Server to Search Your Web Site", helped me a lot, but, I'm having a problem with 'reserved words' such as 'i' and 'about'. To deal with this problem, I need to find the list of Index Server's reserved words so I can filter them.

Do you know where to find the list?

Thanks,
Yu

As it turns out, I didn't know where to find the list... so, I found out. Having taken the time to do so, I figured I should share the info with everyone.

What are Noise Words?

Noise words are words that are very common and yet have very little meaning. Words like 'a', 'an', 'the', 'to', 'so', 'with', etc. are found in almost all documents but provide very little information about the actual meaning of the document. Therefore, there is very little value to be gained from knowing that a document contains any of them. Because of this, Index Server is designed to ignore these type of words when it builds an index from a set of documents.

So, to answer Yu's question from above, you can find the list of all the words that Index Server considers noise words in the System32 folder of your Windows directory. There you'll find a bunch of files named noise.xxx, where xxx represents the language in question. For US English, the file name is noise.enu. On my laptop, the complete path to this file is C:\Windows\System32\noise.enu. The file is a plain text file and you can open and edit it using the text editor of your choice (Windows' Notepad works fine).

Editing the List of Noise Words

So why would you want to add or remove a word? Let's say your site is named "ASP 101" and every page title includes the phrase "ASP 101". In that case, searching for "ASP" might be pretty pointless since it would return every single document and that really sort of defeats the point of searching for something now doesn't it? To avoid this problem, we might want to add "ASP" and "101" to the list of noise words so that Index Server would ignore them while indexing and produce a smaller index and provide faster search results. It would also prevent users from searching for "ASP" and getting back an unmanageable set of results.

Editing the noise word list is basically as simple as editing the text file. As always, you should make a backup copy before you do so and there are a few other caveats, but they are all discussed in Microsoft's Knowledge Base Article #247561 - How to Edit Index Server Noise-Word Lists so I won't go into them here.

That's All Folks

I hope this article has helped shed some light on the topic of noise words for all of you using Index Server. And, keep on sending in those questions... someone has to tell me what you guys want to read about.

As an aside... I just love how much support Microsoft gives Index Server. Check out all the information at the Index Server Support Center. I realize it's not their flagship product or anything, but come on guys... give us something!

Related Articles


Home |  News |  Samples |  Articles |  Lessons |  Resources |  Forum |  Links |  Search |  Feedback

Internet.com
The Network for Technology Professionals

Search:

About Internet.com

Legal Notices, Licensing, Permissions, Privacy Policy.
Advertise | Newsletters | E-mail Offers