Optimization Optimization is always a concern for every developer. After all, it hardly matters how cool your code is or what great features it uses, if your users have time to make a cup of coffee before your code responds. Database access has always been an area where optimization is an issue. Compared to many tasks, data access is relatively slow.
Because data access varies so much, it's almost impossible to come up with a fixed set of rules to optimize your database work. As is usual with these sorts of problems, the answer you'll most often get is 'it depends', because the sort of optimization you need depends on what you are trying to do.
General ADO Tips
Despite the fact that many points about optimization are dependent upon the task, there are some general areas you can look at:
Pick only the columns you need
When opening recordsets, don't automatically use a table name, or even SELECT *, unless you need all of the columns. Using individual column names means that you can reduce the amount of data being sent to and from the server. Even if you do need all of the columns the best performance will be achieved by naming the columns individually, since the server doesn't have to work out what the column names are.
Use stored procedures as much as possible
A stored procedure is pre-compiled and contains an already worked out execution plan. It will therefore run more quickly than a SQL statement.
Use stored procedures for data changes
This is invariably faster that using the ADO methods on the recordset.
Don't create a recordset unless it's required
When running action queries make sure you add the adExecuteNoRecords option, so that a recordset is not created. You can also use this technique in lookup situations when returning just a single row with one or two fields (ID values for example). In this case a stored procedure and output parameters will be quicker.
Use the appropriate cursor and lock modes
If all you are going to be doing is reading data from the recordset and displaying it on screen (creating a table, for example), then use the default forward-only, read-only recordset. The less work ADO has to do maintaining details about records and locking, the better.
One guaranteed way to improve performance whilst looping through recordsets is to use object variables to point to members of collections. For example, consider looping through a recordset containing Authors:
You can speed this code up, and make it more readable, by using the following:
Here we use two variables and set them to point to particular fields in the Fields collection of the recordset (remember that the Fields collection is the default collection). Because you're setting an object reference here you can then use the object variable instead of the actual variable, which means less work for the scripting engines, since there is less indexing into collections going on.
The cache size is the number of records that ADO reads at a time from the data store, and it defaults to 1. This means that when using server-based cursors, every time you move to another record, the record must be fetched from the data store. Increasing the size of the cache to 10, for example, would mean that records are read into the ADO buffer 10 at a time. If you access a record that is within the cache then ADO doesn't need to fetch it from the data store. Accessing a recording outside the cache causes the next set of records to be read into the cache.
You can set the size of the cache by using the CacheSize property of the recordset:
You can change the cache size at any time during the life of a recordset, although the new figure only becomes effective after the next set of records are retrieved.
Like many performance tips, there's no set size that is best for the cache, as it varies upon the task in hand and the data provider. But, increasing the cache from 1 invariably increases performance.
If you want to see this in action, then use the SQL Server Profiler and watch what happens when you open a recordset using the default cache size, and compare that to what happens with an increased cache size. Not only does increasing the cache size mean ADO has less work to do, but SQL Server too.
Don't always look to your programming to consider improvements to your data access, as you should also consider the design of the database. This isn't going to be a big discussion on database design, but there are some things you can think about when using databases for Web sites:
When showing data to users, how important is it for that data to be up-to-date? Take a product catalog for example, how often does the catalog change? If the catalog doesn't change very often, then is there any reason to get the data from a database every time? Would a better way be to generate static HTML pages from the database, say once a week, or when the data changes.
If you are not doing many data additions, but lots of different look ups, then consider adding indexes to your tables.
If your site has two distinct purposes (data maintenance and data analysis) then consider de-normalizing some tables to help with the analysis side of the operation. You could even provide separate, completely de-normalized tables for analysis that are updated on a regular basis, and to improve performance even more you could move these analysis tables to another machine.
If using SQL Server 6.x then make sure you update your statistics regularly if data is being added or deleted. These statistics are used to generate query plans and can affect how queries are run. See UPDATE STATISTICS in the SQL Books Online for more details. For SQL Server 7, this task is automated for you.
These are fairly standard database design techniques, but ones you may not think about if you've got your head deep in ASP code.
The first thing to note is that data caching has nothing to do with the recordset cache size, although both can be used to improve performance. Data caching means the temporary storage of data, allowing the cache to be used, rather than re-creating the data. This is really only suitable for data that doesn't change very often, but that is accessed many times.
One of the simplest ways of caching data in ASP is to use Application and Session scoped variables. For example, imagine you have several pages on which you need to select the book type. Under normal circumstances you might consider creating an include file that contains this function:
This simply calls a stored procedure to get the book types and build up a SELECT list. The disadvantage with this is that every time you call this function, the database is accessed. So, how about changing this function:
Instead of just opening the recordset, this version of the script checks to see if there is anything stored in the Application variable BookTypes. If there is, then the contents of this variable are used. If not, then the recordset is opened as before. Obviously once the first person has run this routine the data will be cached, so this is only useful for data that doesn't change very often.
You could use a Session variable if you wanted to cache data on a user basis, but here you have to watch for Sessions that might expire. If this happens, then Session-level variables will die along with the session, and your code may break.
Using the Web Application Stress tool, informally called homer (probably named after Bart and not Alex), I got the following results:
That's some kind of improvement eh?
Now don't get too excited, and start caching everything like this. After all, this method is only suitable for data that is already formatted for display. And besides, my Web server is only serving me - hardly a typical Web server usage. Using WAS you can simulate multiple clients on one server, and thus give your applications a more realistic test.
The Web Application Stress tool allows you to stress test your Web pages by simulating numbers of users. It's got a simple graphical interface, and is extremely easy to use. You can find out more, and download it, from http://homer.rte.microsoft.com/.
So what if you want to cache data that isn't formatted, so you can use it in different ways in different places? Well, you can also do this using Application or Session variables. Consider the case of book titles. You might want to use the titles in several pages - perhaps in a table showing all titles, or in a list box where the user can select an individual title, and so on. You might think that instead of caching the HTML containing the tags you could cache the recordset itself.
The simple fact is that yes, you can cache objects in both Application and Session variables, but there are two main problems:
Objects stored in Application variables must support Free-threading, so they must be either Free- or Both-threaded objects. This means you cannot cache Visual Basic created components in Application variables.
Storing Apartment-threaded objects in Session state means that the thread that created the object is the only one allowed to access it. IIS cannot, therefore, perform good thread management, because any page that tries to access this object will have to wait for the original thread to service the page. This can kill any chance of scaling your application well.
For a discussion on threading issues, refer to Chapter 16.
By default, ADO ships as an Apartment-threaded object, primarily because some of the OLEDB Providers are not thread safe. In the ADO installation directory there is a registry file that will switch ADO over to the Both-threaded model, thus allowing ADO objects to be stored safely in Application and Session objects.
So, you might think that all is well, and you can gain some sudden speed increase by all sorts of objects, but this isn't necessarily so. Many people have thought that since connecting to a data store is a relatively expensive operation, caching the Connection object would save a lot of time when reconnecting. Yes it would, but caching a Connection object means that the connection is never closed, and therefore connection pooling works less effectively. One of the ideas behind connection pooling is to reduce resources in use on the server, and caching objects in ASP state clearly doesn't reduce resources. In fact it increases them, because each time an object is cached it uses server resources. For a heavily used site this could drastically reduce the effectiveness of the Web server.
OK, so you won't store Connection objects, but how about Recordset objects, especially disconnected recordsets. Well, assuming ADO is changed from Apartment-threaded to Both-threaded, there's no reason why you can't do this, as long as you realize exactly what you are doing. Don't think it's automatically going to increase the performance of your ASP pages. Every recordset that you cache takes up server resources, both in terms of memory and ASP management, so don't even think of caching large recordsets.
Another technique is to use the GetRows method of a recordset, which converts the recordset into an array. Since an array doesn't suffer from the threading issues that the Recordset object does, this will be more acceptable to use in a Session level variable. However, it still uses server resources, plus you have to take into account the time taken to manipulate the array.
You can nearly always architect your applications so this caching technique isn't necessary.