java - How to efficiently convert character encoding in Channel? -
i need accept streams different encodings , transcode them single predefined encoding (e.g. utf-8). know how (inputstream)reader / (outputstream)writer combo , array buffer, time i'm dealing bytechannel's. naturally, looking charsetdecoder / charsetencoding solution, best came is:
public static void copy(readablebytechannel rbc, charset in, writablebytechannel wbc, charset out) throws ioexception { bytebuffer b1 = bytebuffer.allocatedirect(buffer_size); charbuffer cb = charbuffer.allocate(buffer_size); bytebuffer b2 = bytebuffer.allocatedirect(buffer_size); charsetdecoder decoder = in.newdecoder(); decoder.onmalformedinput(codingerroraction.replace); charsetencoder encoder = out.newencoder(); encoder.onunmappablecharacter(codingerroraction.replace); while( rbc.read(b1)!=-1 ){ b1.flip(); decoder.decode(b1, cb, false); cb.flip(); encoder.encode(cb, b2, false); b2.flip(); wbc.write(b2); b2.compact(); cb.compact(); b1.compact(); } b1.flip(); while (b1.hasremaining()){ decoder.decode(b1, cb, true); cb.flip(); encoder.encode(cb, b2, false); b2.flip(); wbc.write(b2); b2.compact(); cb.compact(); } decoder.decode(b1, cb, true); decoder.flush(cb); cb.flip(); while (cb.hasremaining()){ encoder.encode(cb, b2, true); b2.flip(); wbc.write(b2); b2.compact(); } encoder.encode(cb, b2, true); encoder.flush(b2); b2.flip(); while (b2.hasremaining()){ wbc.write(b2); } } since method "workhorse" in project, must absolutely sure finish no matter combination of buffer_size, encodings , blocking device output given.
my questions are:
- is there better way of buffer draining instead of these cascade of while loops?
- is ok ignore
encode()/decode()results (for overflows , underflows)?
of course, alternative idea welcomed. :)
to improve performance of above code:
- cache byte/char buffers in thread local or fields. allocating chunks of memory expensive.
- direct byte buffers performers io bad performers encoding/decoding has optimized implementation heap buffers. might better performance copying to/from heap byte buffers decode/encode operations.
- you can skip encode/decode when charset same.
- minimize calls compact.
- you seem have redundant decode/encode operations after buffer have nothing remaining.
- byte buffer size should 4 time char buffer size, chars can 1-4 bytes. allocating byte buffers multiples of page size(usually 4k) can io performance.
most importantly, write benchmark realistic data , use means measure performance improvements. if don't measure you'll never know worked.
Comments
Post a Comment