Scala murmur hash vs Java native hash

Question

I am learning scala and kind of confused on the hash code part of case classes.

As far as I have seen case classes provide an automatic generation of toString,equals and hashCode.

In java the traditional wisdom is that java hashcode uses a native implementation.

But in scala it uses murmur hash

My questions.

1) Java has native hashcode since the hashcode is machine dependant,but if scala uses murmur hash then how it is machine independent?

2) Scala has regular classes as well as case classes, do normal classes also use murmur hash?

3) If murmur hash is really the fastest implementation after point 1 then why does java still using a native implementation?

Rex Kerr · Accepted Answer

MurmurHash is a fast high-quality hash. Scala provides automatic hashCode for its collections, tuples, case classes, and most other library-provided objects (along with equals), and since many of these things get used in hash maps, it's important to have a decent default hash. MurmurHash provides this. As far as I know, Java hashes are also NOT machine-dependent, even if there are cases where they are implemented with native code. What is important is that the algorithm is the same from machine to machine, which Scala's is because it's implemented entirely in bytecode, and Java's is because anything that is not in bytecode (I haven't checked everything!) was done carefully, presumably.

(At least for anything extending java.util.AbstractList, conventional wisdom is wrong. It's not a native implementation at all, just a loop on an iterator that calls the hashCode method of each thing inside. But the JVM is good at this kind of looping and math; why would you want it to be native?)

Normal classes in Scala do not override hashCode so they don't use MurmurHash. However, most library classes that are not case classes do use MurmurHash--all the ordered collections do, for instance. (It is not appropriate to use MurmurHash, which is order-dependent, on a set, where order does not matter.)

MurmurHash, despite being very fast, is not the fastest possible hash. Java typically uses a x(n)*31 + x(n+1)-type algorithm for its hashing, which is even faster. Unfortunately, it's also a pretty crummy hash. It's very easy to have collisions. Also, MurmurHash has a nice compromise between low overhead and fast speed overall, but other hashes (e.g. XxHash or CityHash) can be faster for large objects at the expense of a little more start-up overhead. It is not a given, therefore, that everyone should use MurmurHash for everything.

Nonetheless, MurmurHash was chosen for Scala specifically because of measured defects in the simpler typical Java-style hash, and it has generally worked out well. Why has Java not adopted it? Possibly only because Java, as a more mature language, tends to change slower than Scala, and nobody has gotten around to it yet, and/or anyone who cares is already using their own custom hashing solution.

Scala murmur hash vs Java native hash

Tags:

java

hash

scala

Greedy Coder

1 Answers

Rex Kerr

Recent Activity

Donate For Us

Scala murmur hash vs Java native hash

Tags:

java

hash

scala

Greedy Coder

1 Answers

Rex Kerr

Related questions

Recent Activity

Donate For Us