perl - Template Toolkit's somevar.substr() and UTF-8 -


we use template toolkit in catalyst app. configured tt use utf-8 , had no problems before.

now call substr() method of string var. unfortunately split string after n bytes instead of n chars. if n'th , (n+1)'th byte build unicode char split , 1st byte part of substr() result.

how fix or workaround behaviour?

[% string = "fööbär";  string.length; # prints 9  string.substr(0, 5); # prints "föö" (1 ascii + 2x 2 byte unicode)  string.substr(0, 4): # prints "fö?" (1 ascii, 1x 2 byte unicode, 1 unknown char) %] 

until had no problems unicode chars, neither ones comes database nor text in templates.

edit: how configure catalyst::view::tt module in catalyst app:

__package__->config( #   debug => debug_all,     default_encoding => 'utf-8',     include_path => my::app->path_to( 'root', 'templates' ),     template_extension => '.tt',     wrapper => "wrapper/default.tt",     render_die => 1, ); 

i did quick testing perl 1.12.2 mswin32 template module. handle these substr operation properly.

this test code:

use template;  # useful options (see below full list) $config = { #    default_encoding => 'utf-8',     include_path => 'd:/devel/perl',  # or list ref     interpolate  => 1,               # expand "$var" in plain text     eval_perl    => 1,               # evaluate perl code blocks };  # create template object $template = template->new($config);  # define template variables replacement $vars = {     var1  => "abcdef" };  # specify input filename, or file handle, text reference, etc. $input = 'ttmyfile.txt';  # process input template, substituting variables print $template->process($input, $vars); 

ttmyfile.txt

var = [% var1 %]  [% string = "fööbär" -%] [% string.length %]   # prints 6 [% string.substr(0, 5) %]  # prints "fööbä" [% string.substr(0, 4) %]  # prints "fööb"  

output:

var = abcdef  6     # prints 6 fööbä  # prints "fööbä" fööb  # prints "fööb"  1 

all works fine, without use utf8 nor default_encoding. key things here:

  1. make sure template .tt files encoded utf8 bom -- byte order mark. must task! because template-toolkit detect unicode file encoding according bom.

    • you can use windows notepad save file bom, file --> save --> encoding: "utf-8".
    • you can use vim make input set fenc=utf8 , set bomb, save file, file start bom.
  2. set ncoding paramter template->new({ncoding => 'utf-8'}); 'utf-8' enforce template load template file 'utf-8'.

  3. suggest have use utf8 in script, ensure inline strings encoding utf8 properly.

because catalyst::view::tt rely on template, believe should working well! luck~~~


Comments

Popular posts from this blog

php - What is the difference between $_SERVER['PATH_INFO'] and $_SERVER['ORIG_PATH_INFO']? -

fortran - Function return type mismatch -

queue - mq_receive: message too long -