Wednesday, December 10, 2008

On Local Variables and Inner Classes in Java

Local classes in Java, both named and anonymous, can access local variables defined in the method defining the class, as long as the variable is defined with the modifier "final". This restriction appears arbitrary, at least initially. However, it is this restriction that prevents the semantics of local classes accessing local variables from becoming unusable.

First, recall that local classes are the same as regular classes from JVM's point of view (dollar signs '$' the compiler inserts into their names do not make these classes special). However, these classes are special from the compiler's point of view, because of their syntax and the scoping rules for the variables that they access.

The compiler performs a relatively straightforward transformation - it creates an instance variable for each local variable referenced from the methods of the class, prefixing the name with "var$". It also creates a constructor that takes one parameter for each referenced local variable, and initializes the additional instance variables with the values passed to the constructor. Finally, each access to a local variable is replaced with an access to its corresponding instance variable.

The same mechanism could have worked without requiring local variables to be declared "final": we simply need to ensure that the referenced local variable is initialized at the time the instance of the local class is created. Compiler already performs this check, so the use of "final" seems redundant. Unfortunately, relaxing this restriction would result in substantially less clear semantics, without giving much in return. Here is the code illustrating my point (obviously, it does not compile):

public static Object test() {
    String x = "x";
    class Inner {
        public String toString() {
            String tmp = x;
            x = "y";
            return tmp;
        }
    }
    Object res = new Inner();
    System.out.print(x);
    System.out.print(inner.toString());
    System.out.print(x);
    return res;
}

The problem is that nearly any programmer reading this code would expect the code to print "xxy", not "xxx", as it, no doubt, would, had the program above compiled. The reason for that is that x="y" becomes var$x="y", leaving the local variable x intact. However, the code says otherwise, because there simply isn't a second variable!

I believe that this is precisely the problem that resulted in the additional restriction, as there is no other way to hide the mechanism based on copying values. Moreover, there is no way around copying values, because instances of local classes must be able to exist outside the scope of the method that declared their local classes. The easiest way to visualize this is to think about the behavior of the Inner class once the test() method has returned - x is no longer on the stack, so it must have a copy.