Sunday, 19 July 2015

When prefer HashSet over ArrayList?

This question have been asked by numerous interviewers to zillions of interviewees.
And we receive an answer as HashSet without basic understanding of why it is supposed to be hashSet?

Let me re-phrase the question.

"Suppose you had millions of strings and you want to perform search operations on those. Which data structure would you prefer in Java?"

List? 

Well it won't be incorrect answer! I can use list.contains(searchString) and easily search my string.

But what if the String I am searching lies at last position?
In that case also list.contains(searchString). Java has provided API, why should we worry about how it fetches results.

If you think about internal implementation of contains in case of ArrayList, it just loops the list till it finds the correct one! Why? because all ArrayList knows about is index of elements.

HashSet?

So If you are aware of hashing concept of Java (see link if you are not sure of it :) http://www.app-performancetuning.com/2014/06/hashing-it-up-hashcode-and-equals-in.html )

1. Same contains function will first find hash code of the search string.

2. With the help of hash code (plus some extra internal function, in order to normalise all buckets, so that not all the strings go to same bucket and not all go to different buckets.) it will find correct bucket.

3. With the help of equals method it will find whether the string exists.

With these 3 steps and performant hash function applied it can search strings or any custom object at a pretty faster rate.






Why JPA hints on NamedQuery won't work?


Well, we faced this problem recently. We were using JPA 2.0 with Oracle 11g. 
We are working on huge data, millions of data being processed at a very faster rate. With lot of data in question and RDBMS like oracle, sometimes queries take lot of time to execute.

We picked query timeout feature of JPA for solving this problem. 

1. Applied  @QueryHint(name = QueryHints.TIMEOUT_HIBERNATE, value = 10) }, on @NamedQuery
waited for query to get timed out endlessly!

2. Tried something with standalone plain old jdbc class with same driver. This seemed to work perfectly!

      Statement st = conn.createStatement();
       String sql = "select....from ... ";  
        st.setQueryTimeout(2);

Then it all boiled down to JPA 2.0 which was certainly not working as expected!

After drilling down through JPA and Spring API, we found pretty interesting thing.

1. @NamedQuery are compiled and loaded at Spring application startup only. Therefore any hints applied will be applied at the time of app startup. 


     private void initQuery(Query query, NamedQueryDefinition nqd) {
query.setCacheable( nqd.isCacheable() );
query.setCacheRegion( nqd.getCacheRegion() );
if ( nqd.getTimeout()!=null ) query.setTimeout( nqd.getTimeout().intValue() );
if ( nqd.getFetchSize()!=null ) query.setFetchSize( nqd.getFetchSize().intValue() );
if ( nqd.getCacheMode() != null ) query.setCacheMode( nqd.getCacheMode() );
query.setReadOnly( nqd.isReadOnly() );
if ( nqd.getComment() != null ) query.setComment( nqd.getComment() );
}


2. While if you call any query call inside @transactional function with certain timeout, then this parameter will be overridden again by Transaction timeout! 

    public static void applyTransactionTimeout(Criteria criteria, SessionFactory sessionFactory) {
Assert.notNull(criteria, "No Criteria object specified");
SessionHolder sessionHolder =
(SessionHolder) TransactionSynchronizationManager.getResource(sessionFactory);
if (sessionHolder != null && sessionHolder.hasTimeout()) {
criteria.setTimeout(sessionHolder.getTimeToLiveInSeconds());
}
}

So whatever be your Query hint value, it will be overridden by transaction timeout!


Now lets come to solution.

Instead of using JPARepository just create a DAO function, and fetch namedQuery and then apply timeout.
           Query query = entityManager.createNamedQuery("abc");
            query.setHint("org.hibernate.timeout", "5");
            query.getSingleResult();

This way if this function is called inside @transactional function, firstly transaction timeout will be applied and then query one will override timeout.

Cheers!