strReverse c#中的应用

strReverse 在VB.NET中有此函数,但在C#中没有. 原文链接http://weblogs.asp.net/justin_rogers/archive/2004/06/10/153175.aspx

 

Performance: Fastest string reversing algorithms… (final results)

strReverse 在VB.NET中有此函数,但在C#中没有. 原文链接http://weblogs.asp.net/justin_rogers/archive/2004/06/10/153175.aspx

 

Performance: Fastest string reversing algorithms… (final results)

To start I got entries from Darren Neimke, Jerry Pisk, and Ryan Blake.  In most cases thes algorithms were the same as the ones I was going to use for demonstration, however, at the end of the competition, Darren managed to toss in one algorithm that I had overlooked because it is slow on small strings, but it actually started to pull ahead when I made some monstrous strings to test the final algorithms on.

The problem comes in three parts, allocation of a work buffer, reversing algorithm, and reintegration as a string.  That means to start we should examine the various ways that people used in order to allocate a buffer.  I’ll even show one, that turns out to be not performant at all, but you’d think it would be.

  1. string.ToCharArray – Note, this function is equivalent to net char[] + an internal memory copy.  Most users have this as the basis of their algorithm and I’ll show you why this in turn hurts perf.  In effect you are copying the data twice when you don’t need to.
  2. new char[] -This is probably the best buffer you could use.  It has no data in it yet, but strings are immutable and you’ll have the original string around the entire time as you do the reversal.
  3. StringBuilder – StringBuilder’s actually use a String behind the scenes and have the ability to convert your *work* into an instantly accessible string.  The char[] overloads for string actually have to copy your buffer into place.  This could be a solution for solving a buffer and reintegration?

Those are the three types of buffer used so far.  There could be more, but people didn’t send them in.  I’m interested in seeing a few more if anyone has them.  I experimented with memory streams and a few other ideas, but the third part of the problem, getting an actual string, always hurt other attempts.

Alrighty, so how do we reverse the string once we have it?  There are lots of answers to that depending on the buffers we used, so I’ll mention them all here.

  1. ToCharArray and Inline – This is an inline swap using the ToCharArray method.  It involves making a swap of two array locations.  This means you are going to have a single helper variable come along for the ride, since you can’t make a swap without storing a temporary local.  Because of this temporary local, these algorithms fall short a bit.  Note that as the front approaches the back, i < j, there is a point where i = j.  Most authors actually did a replacement when this occured.  However, if i = j, then you don’t need to do a replacement, and you perform in one less operation.  Any odd character in the middle of a string is already reversed 😉
  2. ToCharArray using Original – This is a modified version that uses the original string.  This actually comes out to be a bit faster, because we get rid of one copy for every 2 characters.  None of the author’s found this method, and I can’t blame them.  Why in the hell would you use the original string when you have a copy of it 😉
  3. ToCharArray using Array.Reverse – This is a surprising algorithm.  Because we get back into unmanaged code for the reversal, it actually performs better over very large strings than any of the other algorithms.  We are still making the extra copy, but in this case it doesn’t matter at all.  Darren found this one all on his own.  I wouldn’t have even tried it, because logically finding that ToCharArray was making an extra copy, I would have assumed it to still be slower (and it still is for strings less than ~100 characters).
  4. char[] using Original – This method uses a character array only.  This method gets by doing the original copy and instead saves all copying for the reversal layer.  In this scenario, you have to iterate until i <= j, because you need to restore that middle character.  This generally results in one extra unneeded replacement per run of the algorithm, but that isn’t bad.
  5. StringBuilder using Append – Slow, don’t use it.  StringBuilder is riddled with thread checks making it extremely slow.
  6. StringBuilder using initial String – Slow, don’t use it.  StringBuilder is riddled with thread checks making it extremely slow.

The surprising part is how poorly StringBuilder actually performs.  Let’s stop and think about what we are doing in the other algorithms.  We are making a copy, doing work on the buffer, then effectively making another copy to turn it back into a string.  The turning it back into a string part is actually a big operation and there isn’t anything we can do to limit it’s overhead.  Now with a StringBuilder we are making a copy that already is a now mutable string.  We can then update our buffer, and finally get a string back without incurring the extra copy.  To bad all of the bogus thread checking in there kills our performance.

Alrighty, I said three parts.  Turning the buffer back into a string is as easy as constructing a new string, or calling ToString in the case of a StringBuilder.  All there is to it, algorithms are complete.  Here they are, refactored into testable C# with the appropriate attributions to each submitter.

 

Contributor Code
Myself
Jerry Pisk
Ryan Blake
Darren Neimke
We all came up with this one. I’ve denoted both versions with the faster one being uncommented. Also note that most authors replaced the *odd* character in the middle of a string, which lost them one cycle 😉
private static string ReverseUsingCharArrayCopy(string input) {    char[] reversed = input.ToCharArray();            for(int i = 0, j = reversed.Length - 1; i < j; i++, j--) {        reversed[j] = input[i];        reversed[i] = input[j];//            char temp = reversed[i];//            reversed[i] = reversed[j];//            reversed[j] = temp;    }    return new String(reversed);}                
Darren Neimke
This method came as a surprise to me, but it actually performs faster over large strings by about 5% over the fastest algorithm. I’ll dig the rotor and find out why.
private static string ReverseUsingCharArrayCopyAndReverse(string input) {    char[] reversed = input.ToCharArray();    Array.Reverse(reversed);    return new String(reversed);}                
Me
Darren Neimke
This kicks the pants off of the ToCharArray version because we get rid of the extra copying of data. This was my original posting for fastest algorithm, and it remains so for strings less than ~100-150 characters.
private static string ReverseUsingCharArrayInline(string input) {    char[] reversed = new char[input.Length];    for(int i = 0, j = reversed.Length - 1; i <= j; i++, j--) {        reversed[i] = input[j];        reversed[j] = input[i];    }    return new String(reversed);}                
Me
I’m not a Unicode guy, so I’m not sure if my surrogate range checking code is good enough for business. Hell, I’m not really sure it works at all, I just wrote a basic algorithm that knows how to reverse items that might be composed of more than a single data block. If the algorithm I’m using here isn’t accurate, then replace the range check with an IsSurrogate check and go back to the slow days 😉
private static string ReverseUsingCharArrayInlineWithSurrogates(string input) {    char[] reversed = new char[input.Length];    for(int i = 0, j = input.Length - 1; i < input.Length; i++, j--) {        if ( input[j] >= 0xD800 && input[j] <= 0xDFFF ) {            reversed[i+1] = input[j--];            reversed[i++] = input[j];        } else {            reversed[i] = input[j];        }    }    return new String(reversed);}                
Me
One of the StringBuilder attempts. This is the fastest one and that isn’t saying much because it is still far too slow for practical usage. Every call to the indexer is invoking some threading gods, as is the call to ToString().
private static string ReverseUsingStringBuilderInline(string input) {    StringBuilder sb = new StringBuilder(input);    for(int i = 0, j = input.Length - 1; i <= j; i++, j--) {        sb[i] = input[j];        sb[j] = input[i];    }    return sb.ToString();}                

Alrighty, the above leads to some concepts for tuning our algorithm to be different based on the size of the string.  We don’t want to negatively weight our algorithm to only perform well under certain circumstances.  My recommendation would be to find a sutiable cut-off in character length for when the array reversal actually becomes faster.  In addition I propose to find out why the array reversal becomes so quick, after all, we are still copying the data one extra time in the ToCharArray call.  The slowness of the reversal is only apparent at small string lengths, and at these lengths both the original ToCharArray with swapping and the fast char[] allocation are much faster.

// Small strings
00:00:01.1816992 ToCharArray
00:00:01.4721168 ToCharArray /w Reverse
00:00:00.6609504 new char[]
00:00:00.9914256 new char[] with Surrogate Awareness
// Large strings
00:00:09.3334208 ToCharArray
00:00:07.5408432 ToCharArray /w Reverse
00:00:08.2218224 new char[]
00:00:15.1317584 new char[] with Surrogate Awareness

To note, the array reverse can never be surrogate aware, so if we wanted that feature, we couldn’t use our super fast over large strings algorithm.  Well, I think this algorithm is cooked guys.  Thanks for participating and enjoy the results.

Published Thursday, June 10, 2004 6:32 PM by Justin Rogers

Leave a Reply

Your email address will not be published. Required fields are marked *