Data Shaping Data shaping, or hierarchical recordsets, allows you to represent a tree-like structure or related recordsets. This is achieved by having a field in a recordset contain a recordset of its own, allowing database relationships to be expressed, and multiple recordsets to be returned in a single call. There are a couple of reasons why this is useful:
Performance: When used correctly, data shaping can improve performance.
Convenience: It's extremely easy to map the parent/child relationship in data shaping.
The easiest way to see what data shaping involves is to look at a diagram:
This shows a hierarchy from the pubs databases, showing Publishers, Titles and Sales.
One important point to note is that each child recordset is not a separate recordset on its own. So, in the above diagram there are not six recordsets, but only three. How come? Well, there is a recordset for each level in the hierarchy - which is Publishers, Titles and Sales. When you reference the Titles for a Publisher, you are actually referencing the Titles recordset, but ADO filters the Titles, so only those applicable to the selected Publisher are shown. This gives the impression that you have a separate recordset for each child element.
Using Data Shaping
To use data shaping you have to use the following:
The MSDataShape OLEDB Provider
A special Shape language, a superset of SQL, that allows you to construct the hierarchies
The connection string doesn't actually change that much, even though you are using a new provider. This is because you still need to get the data from somewhere. So what you do is this:
You use the MSDataShape as the Provider, and your normal provider becomes the Data Provider, with the rest of the connection string remaining the same.
The easy way to construct a connection string for data shaping is to start with your normal connection string, and append it to the end of the data shaping bits. For example, consider the following, normal connection string:
You can create a connection string for the data shape provider like so:
This sets the provider to be MSDataShape, and the Data Provider becomes the real source of the data. The original connections string already has the Provider= bits in it, so we only need to put Data in front of this to get the correct connection details.
The Shape Language
The shape language has its own grammar, but we won't go into the formal construction of it here - it's included in the ADO documentation. In most situations you'll be using the following command:
The easiest way to understand this is to see an actual example, so we'll start with just Publishers and Titles:
The parent recordset is the first line, and the child recordset the second line. The third line indicates the two fields that provide the relationship between the parent and child. In this case both tables have a field called Pub_ID (the Publisher ID field). This command returns a recordset containing the publishers, and onto the end of that recordset it APPENDs a new column (of type adChapter), which contains the child recordset. The name of this column is given by the AS clause - in this case it will be rsTitles.
The type adChapter just indicates that the field contains a child recordset. Personally I think adChild or adRecordset would have been better.
You can easily see what the fields of the parent recordset look like, by just looping through the Fields collection. Using the SHAPE command above, we get:
We now have a recordset that is a field in another recordset, so how do we access that child recordset? Simple, we use the Value property of the field to set another recordset:
So you can loop through the parent records, and for each parent record obtain a recordset of the children. Here's some code that does this. We start with the usual include file and variables:
Now we create the connection string:
Next comes the actual shape command. This will create a parent recordset containing the Publishers, and a child recordset containing the book titles:
Then we open the recordset as normal:
And like normal recordsets, we can loop through the records:
To access the child recordset, we set a variable to point to the Value of the field containing the child. In this case it's rsTitles:
The variable rsTitles is now a recordset, and behaves just like an ordinary one. So, we can loop through the values for this recordset, which only contains titles for the matching parent Publisher:
This gives us a nice list of Publishers and Titles:
With a little bit of DHTML and some extra tags you could easily hide the titles and only show them when a publisher was selected.
Data shaping would be very limited if you could only have one child per recordset, but luckily it's extremely flexible. For example, to include both the Titles and Employees for a Publisher, you could use:
You just add on any siblings to the end of the APPEND clause. This could give a result like so:
The rules of accessing the children don't change. You still use the Value property of the column to access the child recordset; only this time you have two children, so you have two variables:
There is also the possibility of grandchildren, where the child recordset contains a child of its own. For example:
So instead of just APPENDing a SQL SELECT statement, we APPEND another SHAPE command.
Like the case with multiple children, accessing the child recordsets is the same:
There's no theoretical limit to the number of children or grandchildren, but it's unlikely you'll want to go more than three or four levels deep.
Data shaping doesn't automatically improve performance, but when used correctly it can. The important thing to remember is how it works:
For SELECT statements in a SHAPE command, the tables are fetched in their entirety. No optimization of the SQL is performed. Thus, if you add a WHERE clause to the parent to restrict the parent row, you still get all of the child rows. For example:
The APPEND statement returns all Titles, and not those titles for Publishers in CA. Remember, this isn't a SQL JOIN statement. The Publishers in CA, and all of the Titles are fetched, and then the shape is produced.
Although we used SQL statements, you can use stored procedures. This on its own will give a marginal improvement. However, if you use a parameterized stored procedure for the child, then the stored procedure is executed every time you access the child recordset. This means that the child recordset is not created in its entirety up front, but contains only those records from the stored procedure. The disadvantage is the extra trip to the server, but the advantage is that the data is guaranteed to be up-to-date, as it's fetched every time it's needed.
You'll see a little more about shaped recordsets in the next chapter, when we look at data on the client.