This article will attempt to help you better understand
the System.String type and how it works behind the
scenes.
A string is immutable set of characters. Yes, it's true
- when we make any changes to a string we don't actually
get back an altered version of the same string we started with,
but rather a new string which includes our changes.
A string is an object; it says string is reference type,
not a value one. But when we compare string with the
operators "==" or "!=" we are actually
comparing content. But it's important to remember, that
when we use the ">" or ">="
operators we are comparing references. The reason for
this behavior is that the operators "==" and
"!=" are overloaded and actually use the
Compare function top do their work.
The string type has 8 public constructors, with those of
them that take pointers marked as not CLS compliant
- this is important in terms of future language
interoperability. Info about the rest of the
constructors is less interesting upon deeper analysis.
Now, a little bit more about the String.Compare function.
All overloaded compare functions are based on the
CultureInfo class. Please, don't be confused by
the AssemblyCulture attribute designated to distinguish
between main and satellite assemblies. Comparison
results may be different for the same case
based upon the selected culture.
For example, this is how CultureInfo is passed as an
argument if we are using case sensitive comparison:
Another interesting point is implementation of comparing
strings without considering of culture or language.
For a comparison with the case insensitive option, it uses
CaseInsensitiveComHelper function (written in C++).
If the string includes characters that are greater then
char (0x80) then it will always return false.
Interesting how strings bring to the same case: low case
characters and upper case characters differenced only by
the 0x20 bit. So, when by xor operation a character
is known not to be lower case, by "bitwise or"
operation it is brought to lower case and only afterwards is
the comparison performed. This comparison is performed
by trivial increment of character array pointers. If any
of the characters is greater then 0x7F then we'll get
an Argument Exception.
Comparing case sensitivly resulted in loop characters
comparing. The number of iterations defined by the
shortest string length if the compared strings'
lengths are not equal.
In C# string concatenation is realized in a more
sophisticated method then compared to Visual Basic 6
concatenation. The first step is the allocation of
memory for a character array with length equal to the
sum of the concatenated string lengths. Then the
result array filled by the strings content.
The last thing I'd like to discuss is the
Replace function, more exactly the Replace(string, string) function, implementation.
The first step is to perform some error handling
to check that the new string's length is greater then
zero in which case the function returns without any
action.
The next step is building an index of all needed
replaces and storing it in an integer array.
Now we simply walk through the array and copy
characters into the result array until we get to
an indexed location. Here the new value is inserted,
the counter incremented, and iteration continued.
Of course, the whole thing is performed on a low
level with memory allocation.
A little more about replace: When your job needs
a simple and frequently repeated replace operation,
try to use Regular Expressions instead of the Replace
function. Performance differences can be tremendous.
Here are examples of simple code which
may help you to see it better: