O
ne of the important advantages of Java, from a programmers prospective, is the use of garbage collection. One aspect of memory management in Java is that all objects are created on a garbage collected heap. Only primitive types, mostly numeric types and references to objects, are allocated on the runtime stack. We speculated that a significant number of objects behaved like traditional automatic variables, that are normally allocated on the runtime stack. We instrumented a Java virtual machine to test this hypothesis. The percentage of objects that could have been allocated on a stack instead of on the heap ranged from zero to possibly as high as 56%, but were generally in the 5-15% range.
One of the attractions of Java is that it automatically manages all memory usage[GJS96]. Java is certainly not the only such language, but it is probably the fastest growing language that uses garbage collection.
This is a big advantage when considering software development time. The programmer is freed from having to worry about deallocating memory when it is no longer needed. The problem is that garbage collection can significantly slow down the execution of a program. There has been a lot of work on improving garbage collection[JL96]. We propose to reduce the number of objects that must be managed by the garbage collector, thus improving the performance of any existing garbage collector.
In a programming language like C or C++, it is possible to either allocate a new
object (in C it would be a struct) on the heap or to create it on the
stack, in memory allocated to an automatic variable.
In Java, all automatic variables contain either primitive values such
as int and float, or pointers, called references in Java.
The result is that any objects that, in C++, would be stored on the
stack in the memory for an automatic local variable, must instead be
stored in the heap.
This unnecessarily contributes to the workload of the garbage
collector.
Our hypothesis is that a compiler can identify a significant number of object allocation operations that can be changed from heap allocation to stack allocation. To test this hypothesis, we instrumented a Java virtual machine to detect, at runtime, objects that could have been allocated on the stack instead of on the heap. Using our most conservative analysis, we believe a compiler can move, on the order of 10% of the collectible objects from the heap to a stack. If we make the non-conservative assumption that native methods do not create any permanent references to objects passed in as parameters, then the percentage jumps as high as 56% in one case.
In order to test our hypothesis, we instrumented a Java virtual machine to include reference counting. We weren't using the reference counting for the primary garbage collection, which was not modified. We need reference counting because we need to know the earliest point at which an object becomes collectible. We then check to see if the stack frame for the method that created the object is still on the stack. If the frame is still on the stack then the object could have been allocated on the stack instead of the heap and trivially reclaimed when the frame is popped from the stack. We call such an object a stackable object.
A compile time algorithm must make a decision for all objects created by a particular statement. It is possible that only some objects created from a particular statement might be stackable. We therefore are interested in allocation statements that create only stackable objects. We call such an allocation statement a stackable allocation.
The number we are most interested in is the percentage of collectible objects that were created by a stackable allocation. These are the percentages mentioned in the introduction, which range from 0% to 56% for the tests we have run. Our tests include a simple computer game with a graphical user interface, a bulletin board server, a client for the bulletin board that uses a graphical user interface, and the Java compiler from JavaSoft compiling a ten class program (the game we tested).
The Java virtual machine that we instrumented is the one in JDK1.0.2 from JavaSoft. This was done prior to the introduction of the Java Native Interface (JNI) specification. JNI requires native methods to register any permanent references to objects that are passed in as parameters. A native method is a method (function) written in some language other than Java that is callable from Java. Providing a means for a garbage collector to accurately track all references to objects is necessary for some garbage collection algorithms. Similarly, our analysis of stackable versus non-stackable objects requires us to accurately track all references. Without JNI, we could not accurately track references that were passed in to native methods. To address this problem, we computed two sets of numbers. One set assumes that native methods do not make permanent copies of references and thus do not impact the stackability of an object. The other set assumes that any object passed to a native method is not stackable. Reality obviously lies somewhere in between.
Table 1 summarizes our results. For each application we show four percentages. Two of the numbers correspond to our lower estimate where we assume any object passed to a native method is not stackable. The other two numbers correspond to our more optimistic estimate where native methods are assumed to never create copies of references that persist beyond the life of the native method. For each assumption, we show the percentage of collectible objects that were from stackable allocations. Note that this is not the percentage of collectible objects that were stackable which would be higher because some stackable objects are created by non-stackable allocations. We also show the percentage of allocation statements that were stackable. This is a static count. That is, the percentage of allocation statements in the program that might be changed by a compiler to allocate on the stack instead of the heap.
| Application | 2c|stackable objects | 2c|stackable allocations | ||
| native not stackable |
native not stackable |
native not stackable |
native not stackable |
|
| javac | 7% | 26% | 20% | 26% |
| game | 0% | 56% | 0% | 9% |
| server | 8% | 15% | 14% | 23% |
| client | 3% | 8% | 3% | 6% |
Although we were encouraged by these numbers, the actual affect of this approach on time spent performing garbage collection is unclear. It is clear that reducing the number of objects that a garbage collector must deal with will improve performance. For some applications, doing some allocations on the stack might result in never needing to do any garbage collection. On the other hand, if the average number of objects on the heap is only reduced by a few percentage points or less, then the performance impact would be negligible.
We decided we needed to get a better idea of what the overall performance impact of stack allocation might be. To do this we looked at some applications that were readily available to us. Somewhat to our surprise, the applications we use or have written are not heavily impacted by garbage collection performance. The two applications we report here are javac (the Java compiler that comes with JDK1.1.5) and DumpUses, a program that we developed to determine what classes might be loaded during the execution of a given class.
Using the default heap size that is part of the javac command, garbage collection accounted for about 2% of the time required to compile a 2000 line program (the same compilation reported in line one of table 1). The time for our DumpUses program was similar, about 2.5% of the total execution time. These numbers were computed using the ``-verbosegc'' option in the JDK1.1.5 JVM.
We assume there are applications for which these percentages are significantly higher. From our very small sample, it would appear that at least for small to medium sized programs, the performance of garbage collection is not currently a major issue. This can be expected to change as JIT and other performance enhancements are made to JVMs. Because the garbage collection code is already running at native speed, adding a JIT that increases performance by a factor of 10 will at the same time increase the relative amount of time spent on garbage collection. Our two sample applications will move from 2-3% to 20-30%.
There is very mixed data available about the actual amount of time spent doing garbage collection. We would like to try and identify a small set of applications that are currently heavily impacted by the amount of time spent performing garbage collection. Interested readers with garbage collection intensive applications are encouraged to contact the author. Assuming such applications can be identified, then we would like to measure the actual space consumed by the stackable objects.
In order to test the actual performance benefit from using stacking allocations, we will need to provide support for stack allocation in a Java virtual machine. The Java virtual machine specification does not support stack allocation of objects. It does, however, have two reserved byte codes. We plan to use one of the reserved byte codes to implement a stack allocation operator. At the same time, we plan to use standard compiler dataflow analysis to identify stackable allocation statements in the source. It is yet to be determined what percentage of the stackable allocations can be identified at compile time.
Java's late binding semantics will be a problem for compile time analysis. We expect that in many cases, interprocedural analysis will be necessary to identify some stackable allocations. In some cases, local analysis can identify stackable allocations.
Although garbage collection is not a major performance problem for the applications we have examined, we expect that this will change. We have explored the idea that some objects that are currently allocated on the Java heap could instead be allocated on a stack. This technique has been used in other programming languages, such as ML and Scheme. This optimization would reduce the workload for any garbage collection algorithm. We have presented the results of tests that indicate on the order of 10% of the collectible objects for a range of applications could have been allocated on the stack. Although the numbers we have for garbage collection overhead would suggest that moving some object allocations to the stack would not significantly affect performance today, we believe that could change with improvements in JIT and JVM technology.
Acknowledgment
Manosiz Bhattacharyye implemented the changes necessary to do the initial data collection.