c++ - In ICU UnicodeString what is the difference between countChar32() and length()? -


from docs;

the length number of uchar code units in unicodestring. if want number of code points, please use countchar32().

and

count unicode code points in length uchar code units of string.

a code point may occupy either 1 or 2 uchar code units. counting code points involves reading code units.

from inclined think code point actual character , code unit 1 possible part of character.

for example.

say have unicode string like:

'foobar'

both length , countchar32 6. have string composed of 6 chars take full 32 bits encode length 12 countchar32 6.

is correct?

the 2 values differ if use characters out of base multilingual plane (bmp). these characters represented in utf-16 surrogate pairs. 2 16-bit characters make 1 logical character. if use of these, each pair counts one 32-bit character 2 elements of length.


Comments

Popular posts from this blog

php - What is the difference between $_SERVER['PATH_INFO'] and $_SERVER['ORIG_PATH_INFO']? -

fortran - Function return type mismatch -

queue - mq_receive: message too long -