ASP 101 - Active Server Pages 101 - Web05
The Place ASP Developers Go!

Please visit our partners


Windows Technology Windows Technology
15 Seconds
4GuysFromRolla.com
ASP 101
ASP Wire
VB Forums
VB Wire
WinDrivers.com
internet.commerce internet.commerce
Partners & Affiliates














ASP 101 is an
internet.com site
ASP 101 is an internet.com site
IT
Developer
Internet News
Small Business
Personal Technology

Search internet.com
Advertise
Corporate Info
Newsletters
Tech Jobs
E-mail Offers

ASP 101 News Flash ASP 101 News Flash



 Top ASP 101 Stories Top ASP 101 Stories
Getting Scripts to Run on a Schedule
The Top 10 ASP Links @ Microsoft.com
What is Adovbs.inc and Why Do I Need It?

QUICK TIP:
Basic authentication vs. NT Challenge and Response
Show All Tips >>
ASP 101 RSS Feed ASP 101 Updates


A Look Behind .NET's System.String Type

by Yuri Gorobets

This article will attempt to help you better understand the System.String type and how it works behind the scenes.

A string is immutable set of characters. Yes, it's true - when we make any changes to a string we don't actually get back an altered version of the same string we started with, but rather a new string which includes our changes.

A string is an object; it says string is reference type, not a value one. But when we compare string with the operators "==" or "!=" we are actually comparing content. But it's important to remember, that when we use the ">" or ">=" operators we are comparing references. The reason for this behavior is that the operators "==" and "!=" are overloaded and actually use the Compare function top do their work.

The string type has 8 public constructors, with those of them that take pointers marked as not CLS compliant - this is important in terms of future language interoperability. Info about the rest of the constructors is less interesting upon deeper analysis.

Now, a little bit more about the String.Compare function. All overloaded compare functions are based on the CultureInfo class. Please, don't be confused by the AssemblyCulture attribute designated to distinguish between main and satellite assemblies. Comparison results may be different for the same case based upon the selected culture.

For example, this is how CultureInfo is passed as an argument if we are using case sensitive comparison:

return culture.CompareInfo.Compare(strA, strB, CompareOptions.None);

and for case insensitive one:

return culture.CompareInfo.Compare(strA, strB, CompareOptions.IgnoreCase);

Another interesting point is implementation of comparing strings without considering of culture or language. For a comparison with the case insensitive option, it uses CaseInsensitiveComHelper function (written in C++). If the string includes characters that are greater then char (0x80) then it will always return false.

Interesting how strings bring to the same case: low case characters and upper case characters differenced only by the 0x20 bit. So, when by xor operation a character is known not to be lower case, by "bitwise or" operation it is brought to lower case and only afterwards is the comparison performed. This comparison is performed by trivial increment of character array pointers. If any of the characters is greater then 0x7F then we'll get an Argument Exception.

Comparing case sensitivly resulted in loop characters comparing. The number of iterations defined by the shortest string length if the compared strings' lengths are not equal.

In C# string concatenation is realized in a more sophisticated method then compared to Visual Basic 6 concatenation. The first step is the allocation of memory for a character array with length equal to the sum of the concatenated string lengths. Then the result array filled by the strings content.

The last thing I'd like to discuss is the Replace function, more exactly the Replace(string, string) function, implementation. The first step is to perform some error handling to check that the new string's length is greater then zero in which case the function returns without any action.

The next step is building an index of all needed replaces and storing it in an integer array. Now we simply walk through the array and copy characters into the result array until we get to an indexed location. Here the new value is inserted, the counter incremented, and iteration continued. Of course, the whole thing is performed on a low level with memory allocation.

A little more about replace: When your job needs a simple and frequently repeated replace operation, try to use Regular Expressions instead of the Replace function. Performance differences can be tremendous. Here are examples of simple code which may help you to see it better:

1. Using Replace function of the String (C#):

DateTime t1=System.DateTime.Now;
	for(int i=0;i<100;i++)
		{
			String digitregex = "9";
			String before =new String('9',65000);
			String after = before.Replace(digitregex, "");
		}
	DateTime t2=System.DateTime.Now;
MessageBox.Show(Convert.ToString(t2-t1));

This code performed at 0.38 seconds on average.

2. Using dumb regular expression(C#)

DateTime t1=System.DateTime.Now;
	for(int i=0;i<100;i++){
		Regex digitregex = new Regex("(?<digit>[9])");
		String before =new String('9',65000);
		String after = digitregex.Replace(before, "");
	}
	DateTime t2=System.DateTime.Now;
	MessageBox.Show(Convert.ToString(t2-t1));

This code performed at 17.5 seconds on average. Conclusion - don't use regular expressions in this type of case.

Now a little improvement will reduce time to 0.38 seconds:

Regex digitregex = new Regex("(?<digit>[9])*");

And last improvement will bring it to 0.24:

Regex digitregex = new Regex("(?<digit>[9])+");

Home |  News |  Samples |  Articles |  Lessons |  Resources |  Forum |  Links |  Search |  Feedback

Internet.com
The Network for Technology Professionals

Search:

About Internet.com

Legal Notices, Licensing, Permissions, Privacy Policy.
Advertise | Newsletters | E-mail Offers