Verifier Extensions

Next: Runtime Extensions Up: Virtual Machine Extensions Previous: Bytecode Extensions

Verifier Extensions

The JVM bytecodes are an instruction set that can be statically type checked. For example, bytecodes that invoke methods explicitly declare the expected argument and result types. The state in the execution model is stored in a set of local variables and on a stack. The types of each storage location can be determined by straightforward dataflow analysis that infers types for each stack entry and each local variable slot, at each point in the program.

It is important that extensions for parameterized types not remove the ability to statically type check bytecodes. The standard Java bytecode verifier works by verifying one class at a time [Yel95]. A call to a method of another class is checked using a declaration of the signature of that method, which is inserted in the .class file by the compiler. When the other class is loaded dynamically, these declarations are checked to ensure that they match the actual class signature.

Our extensions to the JVM preserve this efficient model of verification. The code of a parameterized or non-parameterized class is verified only once, in isolation from other classes, thus verifying it for all legal instantiations of the code. An instantiation of a class or interface can be checked for legality by examining only signature information in the .class file for the parameterized class or interface; examining the bytecodes in the .class file is unnecessary.

Both the compiler and the verifier perform similar type checking, treating formal parameter types as ordinary types with a limited set of allowed operations. The important difference between the compiler and the verifier is that compiler variables have declared types, but the verifier must infer their types. The verifier must assign types to stack locations and local variable slots which are specific enough that an instruction can be type checked (e.g., if it invokes a method, the object must have that method). The assigned types must also be general enough that they include all possible results of the preceding instructions. For each instruction, the verifier records a type with this property, if possible. It uses standard iterative dataflow analysis techniques either to assign types to all stack locations for all instructions, or to detect that the program does not type check.

Because the bytecodes include branch instructions, different instructions may precede the execution of a particular instruction X. For type safety, the possible types of values placed on the stack by the preceding instructions must all be subtypes of the types expected by X. The core of the verifier performs this operation; it is a procedure to merge a set of types, producing the most specific supertype, or least upper bound in the type hierarchy. The dataflow analysis propagates this common supertype through X and sends its results on to the succeeding instruction(s). The analysis terminates when the types of all stack locations and local variable slots are stably assigned.

The primary change to the verifier for parameterized types is a modification to this merge procedure. To find the lowest common class in the hierarchy, we walk up the hierarchy from all the classes to be merged. Each time that a link is traversed to a superclass, we apply the parameter substitution that is described by the extends clause of the class definition. When a common class is reached in the hierarchy, the actual parameters of the common class must be equal.

[Image of the hierarchy]


  A[T] extends Object
  B[U] extends A[U]
  C[K,V] extends A[K]
  D extends A[B[int}]]

Figure 7: A parameterized class hierarchy

Consider the class and interface hierarchy shown in Figure 7, with the corresponding extends clauses. The union of B[X] and C[X,Y] can be conservatively approximated by successively moving up the tree to find a common node while substituting parameters. The result is as follows:

  (B[X] union C[X,Y])  (A[X] union A[X]) = A[X]

So these two types are merged to produce A[X]. Similarly, for B[X] and C[Y,X] we have

  (B[X] union C[Y,X])  (A[X] union A[Y])  Object

In this case, the merge result is Object, since the parameters did not match for A. Note that unlike in the non-parameterized verifier, the lowest common superclass node is not always sufficient for the merge result, since it may be instantiated differently by the merged types. Finally, consider merging B[B[int]] and D, which demonstrates that parameterized and non-parameterized classes can be merged:

  (B[B[int]] union D)  A[B[int]]

Next: Runtime Extensions Up: Virtual Machine Extensions Previous: Bytecode Extensions