c# - How can I remove accents on a string? -


possible duplicate:
how remove diacritics (accents) string in .net?

i have following string

áéíóú 

which need convert to

aeiou 

how can achieve it? (i don't need compare, need new string save)


not duplicate of how remove diacritics (accents) string in .net?. accepted answer there doesn't explain , that's why i've "reopened" it.

it depends on requirements. uses, normalising nfd , filtering out combining chars do. cases, normalising nfkd more appropriate (if want removed further distinctions between characters).

some other distinctions not caught this, notably stroked latin characters. there's no clear non-locale-specific way (should ł considered equivalent l or w?) may need customise beyond this.

there cases nfd , nfkd don't work quite expected, allow consistency between unicode versions.

hence:

public static ienumerable<char> removediacriticsenum(string src, bool compatnorm, func<char, char> customfolding) {     foreach(char c in src.normalize(compatnorm ? normalizationform.formkd : normalizationform.formd))     switch(charunicodeinfo.getunicodecategory(c))     {       case unicodecategory.nonspacingmark:       case unicodecategory.spacingcombiningmark:       case unicodecategory.enclosingmark:         //do nothing         break;       default:         yield return customfolding(c);         break;     } } public static ienumerable<char> removediacriticsenum(string src, bool compatnorm) {   return removediacritics(src, compatnorm, c => c); } public static string removediacritics(string src, bool compatnorm, func<char, char> customfolding) {   stringbuilder sb = new stringbuilder();   foreach(char c in removediacriticsenum(src, compatnorm, customfolding))     sb.append(c);   return sb.tostring(); } public static string removediacritics(string src, bool compatnorm) {   return removediacritics(src, compatnorm, c => c); } 

here we've default problem cases mentioned above, ignores them. we've split building string generating enumeration of characters need not wasteful in cases there's no need string manipulation on result (say going write chars output next, or further char-by-char manipulation).

an example case wanted convert ł , Ł l , l, had no other specialised concerns use:

private static char normaliselwithstroke(char c) {   switch(c)   {      case 'ł':        return 'l';      case 'Ł':        return 'l';      default:        return c;   } } 

using above methods combine remove stroke in case, along decomposable diacritics.


Comments

Popular posts from this blog

c++ - Convert big endian to little endian when reading from a binary file -

C#: Application without a window or taskbar item (background app) that can still use Console.WriteLine() -

unicode - Are email addresses allowed to contain non-alphanumeric characters? -