Performance Tuning

Monday, 17 October 2016

How I Found Toptal!!

Living in the age of information, the immense amount of data that is continuously generated
by the digitally connected populace is staggering. Whether voluntarily or involuntarily, we
leave digital breadcrumbs at every step of our life. Be it browsing the internet or downloading
a mail or indeed walking around in the mall. Our imprints are unique data points that can be
stored, cataloged, assessed and analysed to determine our day to day patterns. To get a
brief glimpse of data being populated. Based on arbitrary reports, on an average, a digitally
connected American uses up to 11.3 GB of Wi-Fi data every month! With more than 3.3
billion users logging on to the internet…the data is indeed voluminous. The sheer volume of
datasets makes the task of understanding, extracting, categorizing and analyzing it to predict
future trends a formidable task.
While many would be boggled by the sheer number, I find it exciting and wish to dedicate my life to combing through these datasets and finding patterns, that can be used to develop intelligent insights into the various businesses by building high performant real time streaming applications. With the aim of working on such applications I was on hunt for finding interesting collaborative projects to work on.

With nearly 5 years of experience in the professional sphere, I have worked on different projects from data processing applications to process engine web applications using Multi-threading, Microservices, Spring, Hibernate, REST, Camunda etc. It is indeed my quest for knowledge and keen interest of working on complex applications which help business solve challenging problems, I started looking for freelancing options which would expose me to more problem areas to solve.
While I stumbled across many portals which provided easy on-boarding etc, but were not able to fuel the passion inside me.

I found Toptal Software Development Group, started looking at prestigious clientele it holds, meticulous screening process it holds which generates trust among clients.It was exactly offering what I want to work on, matching my wavelength! I thought of pursuing this wholeheartedly as it would surely provide opportunity to work on exciting projects along with talented team of developers across the world. This is surely going to help me in technical growth as well as next stepping stone towards my goal of becoming Software Architect in future.

Sunday, 19 July 2015

When prefer HashSet over ArrayList?

This question have been asked by numerous interviewers to zillions of interviewees.
And we receive an answer as HashSet without basic understanding of why it is supposed to be hashSet?

Let me re-phrase the question.

"Suppose you had millions of strings and you want to perform search operations on those. Which data structure would you prefer in Java?"

List?

Well it won't be incorrect answer! I can use list.contains(searchString) and easily search my string.

But what if the String I am searching lies at last position?
In that case also list.contains(searchString). Java has provided API, why should we worry about how it fetches results.

If you think about internal implementation of contains in case of ArrayList, it just loops the list till it finds the correct one! Why? because all ArrayList knows about is index of elements.

HashSet?

So If you are aware of hashing concept of Java (see link if you are not sure of it :) http://www.app-performancetuning.com/2014/06/hashing-it-up-hashcode-and-equals-in.html )

1. Same contains function will first find hash code of the search string.

2. With the help of hash code (plus some extra internal function, in order to normalise all buckets, so that not all the strings go to same bucket and not all go to different buckets.) it will find correct bucket.

3. With the help of equals method it will find whether the string exists.

With these 3 steps and performant hash function applied it can search strings or any custom object at a pretty faster rate.

Why JPA hints on NamedQuery won't work?

Well, we faced this problem recently. We were using JPA 2.0 with Oracle 11g.

We are working on huge data, millions of data being processed at a very faster rate. With lot of data in question and RDBMS like oracle, sometimes queries take lot of time to execute.

We picked query timeout feature of JPA for solving this problem.

1. Applied @QueryHint(name = QueryHints.TIMEOUT_HIBERNATE, value = 10) }, on @NamedQuery

waited for query to get timed out endlessly!

2. Tried something with standalone plain old jdbc class with same driver. This seemed to work perfectly!

Statement st = conn.createStatement();

String sql = "select....from ... ";

st.setQueryTimeout(2);

Then it all boiled down to JPA 2.0 which was certainly not working as expected!

After drilling down through JPA and Spring API, we found pretty interesting thing.

1. @NamedQuery are compiled and loaded at Spring application startup only. Therefore any hints applied will be applied at the time of app startup.

private void initQuery(Query query, NamedQueryDefinition nqd) {

query.setCacheable( nqd.isCacheable() );

query.setCacheRegion( nqd.getCacheRegion() );

if ( nqd.getTimeout()!=null ) query.setTimeout( nqd.getTimeout().intValue() );

if ( nqd.getFetchSize()!=null ) query.setFetchSize( nqd.getFetchSize().intValue() );

if ( nqd.getCacheMode() != null ) query.setCacheMode( nqd.getCacheMode() );

query.setReadOnly( nqd.isReadOnly() );

if ( nqd.getComment() != null ) query.setComment( nqd.getComment() );

}

2. While if you call any query call inside @transactional function with certain timeout, then this parameter will be overridden again by Transaction timeout!

public static void applyTransactionTimeout(Criteria criteria, SessionFactory sessionFactory) {

Assert.notNull(criteria, "No Criteria object specified");

SessionHolder sessionHolder =

(SessionHolder) TransactionSynchronizationManager.getResource(sessionFactory);

if (sessionHolder != null && sessionHolder.hasTimeout()) {

criteria.setTimeout(sessionHolder.getTimeToLiveInSeconds());

}

So whatever be your Query hint value, it will be overridden by transaction timeout!

Now lets come to solution.

Instead of using JPARepository just create a DAO function, and fetch namedQuery and then apply timeout.

Query query = entityManager.createNamedQuery("abc");

query.setHint("org.hibernate.timeout", "5");

query.getSingleResult();

This way if this function is called inside @transactional function, firstly transaction timeout will be applied and then query one will override timeout.

Cheers!

Tuesday, 17 June 2014

Hashing it up - Hashcode and equals in Java

Imagine there is some hashing factory (purely hypothetical!), where objects arrive, hashCode function is applied on them and then they are placed in correct buckets!

Assumptions:

1.For simplicity, assume hashCode() of String class computes length of string as hash.

public int hashCode(){
return s.length;
}

Insertion:

1.Object arrives (word ="monkey")
2. hashCode() is called and hash value is calculated ( hash =6)
3. Bucket for hash(6) is found and object is placed there!

Hashing Factory

Searching:

Suppose now we have to search for "Fate", then again hash will be calculated (4) and we will reach correct bucket.
After we find the correct bucket, equals function comes into play and String is found!

Two Important Rules:

1. For two objects to be equal their hashCode should match.

Now that is quite simple! Trace back the search..
If two objects are equal then they must have been in same bucket?
If they were in same bucket then their hashCode was same!

2. It is not necessary that if hashCode is same then objects are equal.
Hashcode if equal means that they are part of same bucket (example "Fate" ,"Kate" are part of Bucket:Length:4) but they are not same strings!

Trade off performance/memory for hash:

Suppose you create hash function which gives unique values to each object..
Then we will probably have all the buckets containing one object only!
Hence, O(1) complexity for fetching an Object after calculating hash
But the memory space taken will be huge!

Similarly, if hash function returns same value for all then searching in bucket would cost O(n)!

Hence, the hash function should be devised such that data is distributed across the buckets in optimized way.

More on creating optimized hash will be followed in next post!