Removing diacritics (accents) from strings

It's often useful to remove diacritic marks (often called accent marks) from characters. You know: tilde, cédille, umlaut and friends. This means 'é' becomes 'e', 'ü' becomes 'u' or 'à' becomes 'a'. This could be used for indexing or to build simple URLs, for example.
Doing so is not so easy if you don't know the trick. You can play with String.Replace or regular expressions... But do you know .NET 2 has all that is required to make this easier?

You should use this kind of code for example: (this example is only possible for .NET 2.0 or higher)

public static String RemoveDiacritics(String s)
{
  String normalizedString = s.Normalize(NormalizationForm.FormD);
  StringBuilder stringBuilder = new StringBuilder();

  for (int i = 0; i < normalizedString.Length; i++)
  {
    Char c = normalizedString[i];
    if (CharUnicodeInfo.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark)
      stringBuilder.Append(c);
  }

  return stringBuilder.ToString();
}

Copied from Fabrice blog.

Published Wed, May 30 2007 9:15 AM by Harmjan
Filed under:

Comments

# re: Removing diacritics (accents) from strings

Wednesday, May 30, 2007 10:23 AM by Niels

Handy, thanks! :)

# re: Removing diacritics (accents) from strings

Monday, June 18, 2007 3:04 PM by Bob

You exactly copied this post from another persons blog! Shame on you!

# re: Removing diacritics (accents) from strings

Thursday, June 21, 2007 1:57 PM by Harmjan

You are right, I copied because I liked the post, and wanted so show it to everybody.

Regards, Harmjan

Leave a Comment

(required) 
(required) 
(optional)
(required) 
Please add 6 and 6 and type the answer here: