c++ - In ICU UnicodeString what is the difference between countChar32() and length()? -
from docs;
the length number of uchar code units in unicodestring. if want number of code points, please use countchar32().
and
count unicode code points in length uchar code units of string.
a code point may occupy either 1 or 2 uchar code units. counting code points involves reading code units.
from inclined think code point actual character , code unit 1 possible part of character.
for example.
say have unicode string like:
'foobar'
both length , countchar32 6. have string composed of 6 chars take full 32 bits encode length 12 countchar32 6.
is correct?
the 2 values differ if use characters out of base multilingual plane (bmp). these characters represented in utf-16 surrogate pairs. 2 16-bit characters make 1 logical character. if use of these, each pair counts one 32-bit character 2 elements of length.
Comments
Post a Comment