Performance Tuning: June 2014

Imagine there is some hashing factory (purely hypothetical!), where objects arrive, hashCode function is applied on them and then they are placed in correct buckets!

Assumptions:

1.For simplicity, assume hashCode() of String class computes length of string as hash.

public int hashCode(){
return s.length;
}

Insertion:

1.Object arrives (word ="monkey")
2. hashCode() is called and hash value is calculated ( hash =6)
3. Bucket for hash(6) is found and object is placed there!

Hashing Factory

Searching:

Suppose now we have to search for "Fate", then again hash will be calculated (4) and we will reach correct bucket.
After we find the correct bucket, equals function comes into play and String is found!

Two Important Rules:

1. For two objects to be equal their hashCode should match.

Now that is quite simple! Trace back the search..
If two objects are equal then they must have been in same bucket?
If they were in same bucket then their hashCode was same!

2. It is not necessary that if hashCode is same then objects are equal.
Hashcode if equal means that they are part of same bucket (example "Fate" ,"Kate" are part of Bucket:Length:4) but they are not same strings!

Trade off performance/memory for hash:

Suppose you create hash function which gives unique values to each object..
Then we will probably have all the buckets containing one object only!
Hence, O(1) complexity for fetching an Object after calculating hash
But the memory space taken will be huge!

Similarly, if hash function returns same value for all then searching in bucket would cost O(n)!

Hence, the hash function should be devised such that data is distributed across the buckets in optimized way.

More on creating optimized hash will be followed in next post!

Performance Tuning

Tuesday, 17 June 2014

Hashing it up - Hashcode and equals in Java