Obviously primary usage of Trim is to remove leading and ending whitespace from string like:
"  hello  ".Trim(); // results in "hello"
But Trim also removes extra characters like \n, \r and \t, so:
"  \nhello\r\t  ".Trim(); // it also produces "hello"
Is there a definite list of all characters (preferably in string escaped format, like \n) that Trim will remove?
EDIT: Thanks for detailed answers - I now know EXACT chars. This Wikipedia list that @RayKoopa left in comments is probably best looking format for me.
We can take a look at the source code for the String class here
The public Trim() method calls an internal helper method named TrimHelper():
 public String Trim() {
        Contract.Ensures(Contract.Result<String>() != null);
        Contract.EndContractBlock();
        return TrimHelper(TrimBoth);        
 }
TrimHelper() looks like this:
[System.Security.SecuritySafeCritical]  // auto-generated
        private String TrimHelper(int trimType) {
            //end will point to the first non-trimmed character on the right
            //start will point to the first non-trimmed character on the Left
            int end = this.Length-1;
            int start=0;
            //Trim specified characters.
            if (trimType !=TrimTail)  {
                for (start=0; start < this.Length; start++) {
                    if (!Char.IsWhiteSpace(this[start]) && !IsBOMWhitespace(this[start])) break;
                }
            }
            if (trimType !=TrimHead) {
                for (end= Length -1; end >= start;  end--) {
                    if (!Char.IsWhiteSpace(this[end])  && !IsBOMWhitespace(this[start])) break;
                }
            }
            return CreateTrimmedString(start, end);
        }
So the bulk of your question basically lies in the check for Char.IsWhiteSpace method,
char.cs
   [Pure]
    public static bool IsWhiteSpace(char c) {
        if (IsLatin1(c)) {
            return (IsWhiteSpaceLatin1(c));
        }
        return CharUnicodeInfo.IsWhiteSpace(c);
    }
If it's a Latin character, then this is what constitutes white space:
 private static bool IsWhiteSpaceLatin1(char c) {
            // There are characters which belong to UnicodeCategory.Control but are considered as white spaces.
            // We use code point comparisons for these characters here as a temporary fix.
            // U+0009 = <control> HORIZONTAL TAB
            // U+000a = <control> LINE FEED
            // U+000b = <control> VERTICAL TAB
            // U+000c = <contorl> FORM FEED
            // U+000d = <control> CARRIAGE RETURN
            // U+0085 = <control> NEXT LINE
            // U+00a0 = NO-BREAK SPACE
            if ((c == ' ') || (c >= '\x0009' && c <= '\x000d') || c == '\x00a0' || c == '\x0085') {
                return (true);
            }
            return (false);
        }
Otherwise we have to go to CharUnicodeInfo.cs, which uses an Enum to check the whitespace character
   internal static bool IsWhiteSpace(char c)
        {
            UnicodeCategory uc = GetUnicodeCategory(c);
            // In Unicode 3.0, U+2028 is the only character which is under the category "LineSeparator".
            // And U+2029 is th eonly character which is under the category "ParagraphSeparator".
            switch (uc) {
                case (UnicodeCategory.SpaceSeparator):
                case (UnicodeCategory.LineSeparator):
                case (UnicodeCategory.ParagraphSeparator):
                    return (true);
            }
            return (false);
        }
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With