Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Improve String.Insert in .Net?

Tags:

string

c#

.net

I need to mark up a string with identifiers indicating the start and end of a substring that has passed a test.

Assume I had the string "The quick brown fox jumps over the lazy dog" and I wanted to markup the string with a tag for every word starting with the characters 'b' and 'o'. The final string would look like "The quick <tag>brown</tag> fox jumps <tag>over</tag> the lazy dog".

Using a combination of regular expressions and LINQ I have the correct logic to accomplish what I want but my performance is not what I want it to be because I am using String.Insert to insert the tags. Our strings can be very long (>200k) and the number of substrings to tag can be close to a hundred. Below is the code I am using to insert the tags. Given I know the start and length of each substring how can I update the string 'input' faster?

.ForEach<Match>(m => {
  input = input.Insert(m.Index + m.Length, "</tag>");
  input = input.Insert(m.Index, "<tag>");
});
like image 397
user481779 Avatar asked Jan 27 '26 22:01

user481779


2 Answers

You should use a StringBuilder.

For optimal performance, set the StringBuilder's capacity before doing anything, then append chunks of the original string between tags.

Alternatively, move your logic to a MatchEvaluator lambda expression and call RegeEx.Replace.

like image 119
SLaks Avatar answered Jan 29 '26 11:01

SLaks


Try this:

Regex

Regex.Replace("The quick brown fox jumps over the lazy dog", @"(^|\s)([bo]\w*)", "$1<tag>$2</tag>");

Results

The quick <tag>brown</tag> fox jumps <tag>over</tag> the lazy dog

Regular expressions should provide with a fairly quick replacement. Whether or not this method is the best depends on the length of the string and how much work is involved to actually match one of your "words."

like image 39
Josh M. Avatar answered Jan 29 '26 11:01

Josh M.