unicode - Calculating the length of a Japanese multibyte string with half-width kana in PHP -
so have utf-8 encoded string can contain full-width kanji, full-width kana, half-width kana, romaji, numbers or kawaii japanese symbols ★ or ♥.
if want length use mb_strlen() , counts each of these 1 in length. fine purposes.
but, i've been asked (by japanese client) count half-width kana 0.5 (for purpose of maxlength of text field) because apparently thats how japanese websites it. using mb_strwidth() counts full-width 2, , half-width 1, divide 2.
however method counts romaji characters 1 chocアイス count 7 .. i'd divide 2 account kanji , i'd 3.5. want 5.5 (4 romaji + 1.5 3 half-width kana).
// edit: more info: character (even non-kana) has both full , half should 1 full-width , 0.5 half-width. example, characters ¥、3@( should 1, characters ¥,3@( should 0.5
// edit: symbols ☆ , ♥ should 1, mb_strwidth/2 method return them 0.5
is there standard way japanese systems count string length? or loop thru strings , count characters don't match standard width rules?
one way convert half-width katakana full-width , subtract difference in width original length:
$raw = 'chocアイス'; $full = mb_convert_kana($raw, 'k'); $len = mb_strlen($raw) - (mb_strwidth($full) - mb_strwidth($raw))/2; assert($len === 5.5); however, sure should considering basic latin characters full-width? there exist full-width varieties of basic latin characters too---that is, should choc considered same Choc?
usually, characters "a" , "ア" have width of 1, "A" , "ア" have width of 2 (which mb_strwidth does). i'd cautious having hack around that.
given edit, mb_strwidth (or mb_strwidth/2) want.
Comments
Post a Comment