performance - "Fastest" hash function implemented in Java, comparing part of file -

- April 15, 2011

i need compare 2 different files of instance "file" in java , want fast hash function.

idea: - hashing 20 first lines in file 1 - hashing 20 first lines in file 2 - compare 2 hashes , return true if equal.

i want use "fastest" hash function ever been implemented in java. 1 choose?

if want speed, not hash! not cryptographic hash md5. these hashes designed impossible reverse, not fast calculate. should use checksum - see java.util.zip.checksum , 2 concrete implementations. adler32 extremely fast compute.

any method based on checksums or hashes vulnerable collisions, can minimise risk using 2 different methods in way rsync does.

the algorithm basically:

check file sizes equal
break files chunks of size n bytes
compute checksum on each pair of matching blocks , compare. differences prove files not same.

this allows detection of difference. can improve computing 2 checksums @ once different algorithms, or different block sizes.

more bits in result mean less chance of collision, go on 64 bits outside java (and computer's cpu) can handle natively , hence slow, fnv-1024 less give false negative slower.

if speed, use adler32 , accept difference not detected. rare. checksums these used ensure internet can spot transmission errors, , how wrong data turning up?

it accuracy really, have compare every byte. nothing else work.

if can compromise between speed , accuracy, there wealth of options out there.

Search This Blog

Score

performance - "Fastest" hash function implemented in Java, comparing part of file -

Comments

Post a Comment

Popular posts from this blog

how to build hyperlink for query string in php -

php - What is the difference between $_SERVER['PATH_INFO'] and $_SERVER['ORIG_PATH_INFO']? -

queue - mq_receive: message too long -