performance - "Fastest" hash function implemented in Java, comparing part of file -
i need compare 2 different files of instance "file" in java , want fast hash function.
idea: - hashing 20 first lines in file 1 - hashing 20 first lines in file 2 - compare 2 hashes , return true if equal.
i want use "fastest" hash function ever been implemented in java. 1 choose?
if want speed, not hash! not cryptographic hash md5. these hashes designed impossible reverse, not fast calculate. should use checksum - see java.util.zip.checksum , 2 concrete implementations. adler32 extremely fast compute.
any method based on checksums or hashes vulnerable collisions, can minimise risk using 2 different methods in way rsync does.
the algorithm basically:
- check file sizes equal
- break files chunks of size n bytes
- compute checksum on each pair of matching blocks , compare. differences prove files not same.
this allows detection of difference. can improve computing 2 checksums @ once different algorithms, or different block sizes.
more bits in result mean less chance of collision, go on 64 bits outside java (and computer's cpu) can handle natively , hence slow, fnv-1024 less give false negative slower.
if speed, use adler32 , accept difference not detected. rare. checksums these used ensure internet can spot transmission errors, , how wrong data turning up?
it accuracy really, have compare every byte. nothing else work.
if can compromise between speed , accuracy, there wealth of options out there.
Comments
Post a Comment