String.normalize
normalize, go back to String module for more information.
Converts all characters in string to Unicode normalization
form identified by form.
Invalid Unicode codepoints are skipped and the remaining of
the string is converted. If you want the algorithm to stop
and return on invalid codepoint, use :unicode.characters_to_nfd_binary/1,
:unicode.characters_to_nfc_binary/1, :unicode.characters_to_nfkd_binary/1,
and :unicode.characters_to_nfkc_binary/1 instead.
Normalization forms :nfkc and :nfkd should not be blindly applied
to arbitrary text. Because they erase many formatting distinctions,
they will prevent round-trip conversion to and from many legacy
character sets.
Forms
The supported forms are:
:nfd- Normalization Form Canonical Decomposition. Characters are decomposed by canonical equivalence, and multiple combining characters are arranged in a specific order.:nfc- Normalization Form Canonical Composition. Characters are decomposed and then recomposed by canonical equivalence.:nfkd- Normalization Form Compatibility Decomposition. Characters are decomposed by compatibility equivalence, and multiple combining characters are arranged in a specific order.:nfkc- Normalization Form Compatibility Composition. Characters are decomposed and then recomposed by compatibility equivalence.
Examples
iex> String.normalize("yêṩ", :nfd)
"yêṩ"
iex> String.normalize("leña", :nfc)
"leña"
iex> String.normalize("fi", :nfkd)
"fi"
iex> String.normalize("fi", :nfkc)
"fi"