Pages

Friday, March 25, 2011

Load factor for HashSet with contains operation

Update: This post really contains zero valuable information and is left here merely as a memorial to my stupidity.

In previous post I have measured the performance of HashSet with various load factors and initial capacity on the add operation. But the often case is that the most common operation on the set is checking whether an element contains in the set. So without much comments - another silly bench:

Load factor for HashSet with add operation

Update: This post was written when I was young and stupid. It is a try of investigation of the impact of the load factor on the speed of creation of HashSet without taking into account memory usage or any other characteristics of the resulting set. If you really need some theory behind all this - refer to Wikipedia or any book on the algorithms. And if you need this for some practise I'd suggest to just use excellent Guava library and methods for creating Sets and Maps in particular.

Load factor, initial size... only two parameters that controls the HashSet, but it is not very intuitive how to use them. So here is the small benchmark on adding the elements to HashSet. (The same may be applied to the HashMap I think, since the data structure that is used internally in the HashSet is: private transient HashMap map;).

Collisions of hashcodes - much more real than an md5 collision

So okay, hashcode is int and so there are only 4294967296 distinct hashcodes. It is not a little number, but in modern systems there could be definitely much more objects.

In a very good book by Joshua Bloch I have read that there is a probability that two consequently created objects will have an equal system identity hashcode. Actually from the time of publishing that book major version of Java was released and the situation has improved dramatically. But still 232 is a very little number for objects. And still two consequently created objects may have an equal identity hashcode:
  private static final int SIZE = 500000000;

  public static void main(String[] args) {
    int count = 0;
    for (int i = 0; i < SIZE; i++) {
      if (new Object().hashCode() == new Object().hashCode()) {
        count++;
      }
    }
    System.out.println(count + " of " + SIZE + " consequently created pairs of objects had equal hashcodes");
  }

On my machine that produced the following output:
16 of 500000000 consequently created objects had equal hashcodes