Object Count Impact on Garbage Collection Performance

When developers complain about poor Java application performance they most often cite garbage collection as too slow and taking too much time. Moreover, longer garbage collection pauses cause serious problems to any interactive application, especially where screen content changes at a high frequency (games). The Java defenders usually reply that garbage collection algorithms are very efficient, sometimes even faster than manual memory management in C/C++. Unfortunately, it is only half the truth. The other half is that we, Java programmers can be real pigs and allocate excessive numbers of objects that can overwhelm any garbage collection algorithm, regardless of how fast it is.

The particular problem I am concerned with is when a large number of small objects is used long term. Note that a smaller number of large objects is much better, since garbage collection performance is proportional to the number of objects and not their sizes (for more details on garbage collection algorithms see Sun‘s documentation). In particular, the garbage collection phase that must suspend all threads needs to scan objects and their references, so minimizing the number of objects should minimize this pause. It is especially critical to reduce the number of long lived objects as they are scanned over and over again.

I was performing analysis of a server software that was experiencing very long garbage collection pauses, several seconds at a time. This application had a several hundreds of megabytes heap, so clearly allocated a lot of objects. It didn‘t take much effort to identify the culprit:
http://localhost:7002/showInstanceCounts/includePlatform/
475988 instances of class java.util.HashMap$Entry
150358 instances of class java.lang.Integer
134060 instances of class java.lang.String
128203 instances of class java.util.HashMap$Entry[]
128173 instances of class java.util.HashMap
126484 instances of class char[]
51877 instances of class java.lang.Boolean
...

HashMap related class is at the very top. As an aside, note the huge number of integer wrapper objects. Worse yet, I found 51877 Boolean instances. While we are told not to bother pooling small objects such as primitive wrappers, that doesn‘t mean we should always allocate them on heap using new. In fact, the wrapper classes implement a simple pooling mechanism when appropriate that can significantly reduce number of instances allocated. As I said before, even though integer and boolean objects are tiny and don‘t use much memory, it is the high object count that degrades garbage collection performance. After replacing new Integer(... & new Boolean(... with Integer.valueOf(... & Boolean.valueOf(... I had:

...
50853 instances of class java.lang.Integer
...
24 instances of class java.lang.Boolean
...

This is a very simple technique, but easily forgotten. Note that allocating short term objects using new is fine as they don‘t live long enough to burden garbage collection. For long lived objects make sure to use the valueOf factory methods. Back to our list of top object counts:

http://localhost:7002/showInstanceCounts/includePlatform/
476039 instances of class java.util.HashMap$Entry
138654 instances of class java.lang.String
132946 instances of class char[]
128197 instances of class java.util.HashMap$Entry[]
128175 instances of class java.util.HashMap
...

So we have a lot of string and characters, which is typical of most server software. Also, we have a large number of hash map objects. Let‘s break it down:
About 128K hash maps and about the same number of hash map entry arrays. Makes sense, as each map has a single array. Then we have 476K entry objects, which is about 476 / 128 = 3.7 entries per map. At this point an alarm is blaring, "Why so many small maps?!?!?!"

Map is one of the most useful and most frequently used data structures. It is defined by the java.util.Map interface. java.util.HashMap is the most popular implementation. Map is used to associate things whenever you cannot index something. Moreover we often are too lazy to define a new class and end up using a map instead. For example:

class PersonString name =null;int age =null;int weight =0f;int height =0f;...
How many objects? 1 Person + 1 string (for name) = 2 objects total used by a single person object (inclusive). However, I just don‘t feel like creating another class,... Hey, I‘ll just whip up a nice little map:

Map<String,Object> person = new HashMap<String,Object>();person.put("name", "Joe Doe");person.put("age", Integer.valueOf(22));person.put("weight", Integer.valueOf(80));person.put("height", Integer.valueOf(170));
Object count is bit tricky. The keys are likely to be shared by all maps, so in effect they don‘t count. The integer objects are probably shared as well, so we‘ll estimate that for 3 integer objects due to sharing we only allocate 2 new instances. Therefore we have: 1 Map + 1 entry array + 4 entries (1 entry/mapping) + 1 string (for name) + 2 integers = 9 objects total used by a single person map (inclusive).

We should have used only 2 objects for a person, yet ended up using 9!!! To make things worse most programmers simply don‘t pay attention to such nasty little side effects of using a map. Most other map implementations use just as many objects if not more. As we use maps often in many places extra 7 objects quickly becomes a major problem when you reach extra half a million objects in the server software!!!

One way to approach this problem is simply to declare appropriate classes and try to use primitives as much as possible. At the same time, there is no denying that maps are handy tools and we don‘t want to punish programmers for using them. Therefore, I decided to create another map implementation that would use the minimum number of objects internally. That way we could keep using maps without creating a huge number of objects. In particular, if I could eliminate the entry objects, the person map above would use 4 less objects and the difference would not be as bad.

Next time I will start the series of articles about the MocMap - Minimum Object Count Map. I must warn you that they may not be fun (compared to Swing posts and other cool stuff). In fact, some of you may criticize it as much effort for a problem that doesn‘t exist. Please keep in mind that this class is relevant only in certain contexts. In other cases java.util.* maps are just fine and my implementation is not applicable. Therefore, any time you discuss merits of a piece of code make sure that the context is well defined and clear. In my defense, I‘ll note that small details such as number of objects used by a data structure, may be what distinguishes a slow Java pig from a Java application that fulfills the potential of Java platform and pushes the performance to the limits.