perl - Template Toolkit's somevar.substr() and UTF-8 -
we use template toolkit in catalyst app. configured tt use utf-8 , had no problems before.
now call substr() method of string var. unfortunately split string after n bytes instead of n chars. if n'th , (n+1)'th byte build unicode char split , 1st byte part of substr() result.
how fix or workaround behaviour?
[% string = "fööbär"; string.length; # prints 9 string.substr(0, 5); # prints "föö" (1 ascii + 2x 2 byte unicode) string.substr(0, 4): # prints "fö?" (1 ascii, 1x 2 byte unicode, 1 unknown char) %] until had no problems unicode chars, neither ones comes database nor text in templates.
edit: how configure catalyst::view::tt module in catalyst app:
__package__->config( # debug => debug_all, default_encoding => 'utf-8', include_path => my::app->path_to( 'root', 'templates' ), template_extension => '.tt', wrapper => "wrapper/default.tt", render_die => 1, );
i did quick testing perl 1.12.2 mswin32 template module. handle these substr operation properly.
this test code:
use template; # useful options (see below full list) $config = { # default_encoding => 'utf-8', include_path => 'd:/devel/perl', # or list ref interpolate => 1, # expand "$var" in plain text eval_perl => 1, # evaluate perl code blocks }; # create template object $template = template->new($config); # define template variables replacement $vars = { var1 => "abcdef" }; # specify input filename, or file handle, text reference, etc. $input = 'ttmyfile.txt'; # process input template, substituting variables print $template->process($input, $vars); ttmyfile.txt
var = [% var1 %] [% string = "fööbär" -%] [% string.length %] # prints 6 [% string.substr(0, 5) %] # prints "fööbä" [% string.substr(0, 4) %] # prints "fööb" output:
var = abcdef 6 # prints 6 fööbä # prints "fööbä" fööb # prints "fööb" 1 all works fine, without use utf8 nor default_encoding. key things here:
make sure template
.ttfiles encoded utf8 bom --byte order mark. must task! because template-toolkit detect unicode file encoding according bom.- you can use windows notepad save file bom,
file-->save--> encoding: "utf-8". - you can use vim make input
set fenc=utf8,set bomb, save file, file start bom.
- you can use windows notepad save file bom,
set
ncodingparamtertemplate->new({ncoding => 'utf-8'});'utf-8' enforcetemplateload template file 'utf-8'.suggest have
use utf8in script, ensure inline strings encoding utf8 properly.
because catalyst::view::tt rely on template, believe should working well! luck~~~
Comments
Post a Comment