java - How to efficiently convert character encoding in Channel? -


i need accept streams different encodings , transcode them single predefined encoding (e.g. utf-8). know how (inputstream)reader / (outputstream)writer combo , array buffer, time i'm dealing bytechannel's. naturally, looking charsetdecoder / charsetencoding solution, best came is:

public static void copy(readablebytechannel rbc, charset in,          writablebytechannel wbc, charset out) throws ioexception {     bytebuffer b1 = bytebuffer.allocatedirect(buffer_size);     charbuffer cb = charbuffer.allocate(buffer_size);     bytebuffer b2 = bytebuffer.allocatedirect(buffer_size);      charsetdecoder decoder = in.newdecoder();     decoder.onmalformedinput(codingerroraction.replace);     charsetencoder encoder = out.newencoder();     encoder.onunmappablecharacter(codingerroraction.replace);      while( rbc.read(b1)!=-1 ){         b1.flip();         decoder.decode(b1, cb, false);         cb.flip();         encoder.encode(cb, b2, false);         b2.flip();         wbc.write(b2);         b2.compact();         cb.compact();         b1.compact();     }     b1.flip();     while (b1.hasremaining()){         decoder.decode(b1, cb, true);         cb.flip();         encoder.encode(cb, b2, false);         b2.flip();         wbc.write(b2);         b2.compact();         cb.compact();     }     decoder.decode(b1, cb, true);     decoder.flush(cb);     cb.flip();     while (cb.hasremaining()){         encoder.encode(cb, b2, true);         b2.flip();         wbc.write(b2);         b2.compact();     }     encoder.encode(cb, b2, true);     encoder.flush(b2);     b2.flip();     while (b2.hasremaining()){         wbc.write(b2);     } } 

since method "workhorse" in project, must absolutely sure finish no matter combination of buffer_size, encodings , blocking device output given.

my questions are:

  • is there better way of buffer draining instead of these cascade of while loops?
  • is ok ignore encode() / decode() results (for overflows , underflows)?

of course, alternative idea welcomed. :)

to improve performance of above code:

  1. cache byte/char buffers in thread local or fields. allocating chunks of memory expensive.
  2. direct byte buffers performers io bad performers encoding/decoding has optimized implementation heap buffers. might better performance copying to/from heap byte buffers decode/encode operations.
  3. you can skip encode/decode when charset same.
  4. minimize calls compact.
  5. you seem have redundant decode/encode operations after buffer have nothing remaining.
  6. byte buffer size should 4 time char buffer size, chars can 1-4 bytes. allocating byte buffers multiples of page size(usually 4k) can io performance.

most importantly, write benchmark realistic data , use means measure performance improvements. if don't measure you'll never know worked.


Comments

Popular posts from this blog

php - What is the difference between $_SERVER['PATH_INFO'] and $_SERVER['ORIG_PATH_INFO']? -

fortran - Function return type mismatch -

queue - mq_receive: message too long -