Friday, November 22, 2013

Java Virtual Machine - Loading, Linking and Initializing

Loading

  • Process of finding the binary representation of a class or interface type with a particular name and creating a class or interface from that binary representation.
  • (JVM Specification: ) Java Virtual Machine specification gives implementations flexibility in the timing of class and interface loading.
  • Internal data structures are populated including Method area.
  • An instance of class java.lang.Class for the class being loaded is created.
  • Information about a type that is stored in the internal data structures and accessed by invoking methods on the Class instance for that type.
  • If Class Loader encounters a problem, it is reported during linking phase (LinkageError) on first active use of the class.

Linking 

  • A class or interface is completely loaded before it is linked. 
  • Process of taking a class or interface and combining it into the run-time state of the Java Virtual Machine so that it can be executed.
  • Linking a class or interface involves verifying and preparing that class or interface, its direct superclass, its direct superinterfaces.
  • Dynamic linking involves locating classes, interfaces, fields, and methods referred to by symbolic references stored in the constant pool, and replacing the symbolic references with direct references.
  • Linking is divided into three sub-steps: verification, preparation, and resolution:
    • Verification ensures the type is properly formed and fit for use by the Java Virtual Machine.
      • Verification ensures that the binary representation of a class or interface is structurally correct.
      • Verification may cause additional classes and interfaces to be loaded but need not cause them to be verified or prepared.
      • Final classes/methods, abstract classes/methods validations are checked here.
    • Preparation involves allocating memory needed by the type, such as memory for any class variables.
      • Involves creating the static fields for a class or interface and initializing such fields to their default values (initializing to other values except default values is done in Initialization where explicit initializers are executed).
    • Resolution is the process of transforming symbolic references in the constant pool into direct references. 
      • It is an optional part of linking.
      • Implementations may delay the resolution step until each symbolic reference is actually used by the running program.
      • After verification, preparation, and (optionally) resolution are completed, the type is ready for initialization. Resolution, which may optionally take place after initialization.
  • (JVM Specification: ) Java Virtual Machine specification gives implementations flexibility in the timing of class and interface linking.

Initialization

  • Initialization of a class or interface consists of executing the class or interface initialization method <clinit>.
    • <clinit> is static and has no arguments and is for static initialization blocks.
    • If there is no initialization of class variables in the code, compiler will not generate <clinit> method
  • Class variables are initialized during Initialization phase.
  • Initialization of a class or interface requires careful synchronization, since some other thread may be trying to initialize the same class or interface at the same time.
  • (JVM Specification: ) All implementations must initialize each class and interface on its first active use.
  • Involves creating the static fields for a class or interface and initializing such fields to their default values (initializing to other values except default values is done in Initialization where explicit initializers are executed).

Class-In-Use (Working with Objects)

  • For each constructor in the source code of a class, the Java compiler generates one <init() method.
  • If the class declares no constructors explicitly, the compiler generates a default no-arg constructor that just invokes the superclassís no-arg constructor.
  • For every class except Object, an <init() method must begin with an invocation of another <init() method belonging either to the same class or to the direct superclass.
  • Instance initialization methods may be invoked only by the invokespecial instruction (§invokespecial), and only on uninitialized class instances.

Unloading Class

  • Java Virtual Machine will run a class's classFinalize() method (if the class declares one) before it unloads the class.
  • If the application has no references to the type, then the type can't affect the execution of program and it can be garbage collected.

Class Loader

  • At run time, a class or interface is determined by a its binary name and defining class loader both.
  • Class Loader deals with file system to load the files for the JVM, no other component in JVM needs to deal with file system.
  • All Java virtual machines include one class loader that is embedded in the virtual machine, called Primordial class loader.
  • Every Class object contains a reference to the ClassLoader that defined it.
    • Class.getClassLoader()
  • Class LoaderTestClass loaded by class loader A, is not the same class as the class LoaderTestClass loaded with class loader B.
  • There are following three types of class loaders:
    • Bootstrap Class Loader
    • System Class Loader
    • Extension Class Loader

Premordial (Bootstrap) Class Loader

  • It implements the default implementation of loadClass().
  • Written in native language.
  • JVM assumes that it has access to a repository of trusted classes which can be run by the VM without verification.
  • Java core API classes are loaded by the bootstrap (or primordial) Class Loader.
  • Bootstrap ClassLoader searches in the locations (directories and zip/jar files) specified in the sun.boot.class.path system property.
    • String bootClassPath = System.getProperty("sun.boot.class.path");
  • Types loaded through the primordial class loader will always be reachable and never be unloaded.

Application (System) Class Loader

  • Application-specific classes are loaded by the system (or application) ClassLoaders.
  • Our application's code is loaded using this.
  • The system ClassLoader searches for classes in the locations specified by the classpath (set as the java.class.path system property) command-line variable passed in when a JVM starts executing. 
    • String appClassPath = System.getProperty("java.class.path");
    • sun.misc.Launcher$AppClassLoader

Extensions Class Loader

  •  Loads the code in the extensions directories (<JAVA_HOME>/jre/lib/ext, or any other directory specified by the java.ext.dirs system property).
    • String extClassPath = System.getProperty("java.ext.dirs");
    • sun.misc.Launcher$ExtClassLoader

Custom (User-Defined) Class Loader

  • Subclass of the abstract class Java.lang.ClassLoader
  • Method loadClass is to be implemented.
    • loadClass should be synchronized.
    • Call findLoadedClass to check class is already loaded. 
    • If we have the raw bytes, call defineClass to turn them into a Class object.
    • If the resolve parameter is true, call resolveClass to resolve the Class object.
    • If class not found, throw a ClassNotFoundException.
  • Every loaded class needs to be linked. This is done using the ClassLoader.resolve() method. This method is final.
  • Following are the steps a class loader performs:
    • Check if the class was already loaded.
    • If not loaded, ask parent class loader to load the class. 
    • If parent class loader cannot load class, attempt to load it in this class loader.

Examples of Class Loaders:

  • java.net.URLClassLoader, 
  • java.security.SecureClassLoader

Related Terms and Concepts:

JVM Startup

  • The Java Virtual Machine starts up by creating an initial class, which is specified in an implementation-dependent manner, using the bootstrap class loader.
  • The Java Virtual Machine then links the initial class, initializes it, and invokes the public class method void main(String[]). 
  • In an implementation of the Java Virtual Machine, the initial class could be provided as a command line argument. Alternatively, the implementation could provide an initial class that sets up a class loader which in turn loads an application. 

JVM Dynamic Linking

  • The process of JVM loading your program's classes and interfaces and hooking them together.
  • Java Virtual Machine builds an internal web of interconnected classes and interfaces.

Binding

  • Binding is the process by which a function written in a language other than the Java programming language and implementing a native method is integrated into the Java Virtual Machine so that it can be executed.
  • Although this process is traditionally referred to as linking, the term binding is used in the specification to avoid confusion with linking of classes or interfaces by the Java Virtual Machine. 
Previous: Java Virtual Machine - Garbage Collection

Tuesday, November 12, 2013

Java Virtual Machine - Garbage Collection


  • Garbage Collection is not restricted to Java, its a generic concept that applies to most of the programming languages.
  • Garbage collection is the process of automatically freeing objects that are no longer referenced by the program. 
  • The garbage collector must determine which objects are no longer being referenced by the program.
  • Garbage collector must run any finalizers of objects being freed.
  • Garbage collection is automatic, thus it removes the possibility of developer errors due to incorrectly freeing of memory.
  • Runtime.gc() and System.gc() only suggests JVM to run Garbage Collection, it is not mandatory that Garbage Collection will definitely run after call to these methods.
  • Java virtual machine has an instruction that allocates memory on the heap for a new object (created in source coce), but has no instruction for freeing that memory.
  • Garbage Collection in Java is carried by JVM's  daemon threads for Garbage Collection.
  • Garbage collection was invented by John McCarthy around 1959 to solve problems in Lisp.

Java provides many garbage collectors, following are some examples:

  • Parallel Scavenge
  • Serial Garbage Collector
  • Parallel New + Serial GC
  • CMS (Concurrent Mark and Sweep)
  • G1 (Garbage First, available in Java7)

Regarding JVM Specification:

  • JVM specification does not specify any particular garbage collection technique, the decision to choose the garbage collection algorithm is left to the discretion of the implementors as they better know the constraints of the target device.
  • JVM specification mentions the need of garbage collection on heap, it leaves garbage collection or compacting the method area on implementers.
  • A garbage collector is not strictly required but heap memory should be managed.
  • The Java language specification states that a finalizer code will be executed only once. Finalizer are allowed to "resurrect" the object again, that is it'll be referenced again, but finalizer will not run next time.

Additional Responsibilities:

  • Heap Defragmentation

Disadvantages:

  • Developer has no control on Garbage Collection, reducing the use the finalizers.
  • At times, it may lead to uneven performance of the application components, as we don't know when Garbage collection will run and how much resources and time it'll consume.

Steps:

  • Garbage Detection
    • Identification of live and garbage objects.
    • Objects on heap which don't have any active references are available for garbage collection.
    • This process is also called Marking.
  • Reclaim Memory
    • Removes unreferenced objects leaving referenced objects and pointers to free space.
  • Optional Step: Compacting
    • Defragments the memory space, compacts the remaining referenced objects. This makes new memory allocation much easier and faster.

Stop the World Pauses:

  • Stop-the-world is term for JVM stopping the application (all threads) from running to execute a GC.
  • The interrupted tasks will resume only after the GC task has completed. 
  • GC tuning often means reducing this stop-the-world time.

Disruptive Garbage Collection vs Deterministic Garbage Collection:

  • Garbage collection with noticable Stop the World pauses are called Disruptive Garbage collection.
  • Disadvantages - These are not suitable for Real-time systems. For end-users noticable pauses are irritating.
  • Solution Approach - Garbage collection should work incrementally, on small areas of heap at a time.
    • If each incremental collection can be completed in < Threashhold time, then it is suitable for Real-time system.
    • Example - Generational Collector used in Hotspot JVM, discussed below.

Tracing Garbage Collector:

  • Tracing garbage collectors are the most common type of garbage collector. They first determine which objects are reachable (or potentially reachable), and then discard all remaining objects.

Syntactic vs Semantic Garbage:

  •  The reachability definition of "garbage" is not optimal, as there are two cases.
    • CASE #1: An object variable is now reassigned thus original object doesn't have any references left.
    • CASE #2: An object is still available in the program, but it is not used anymore.
  • Syntactic Garbage (Objects that a program cannot possibly reach)
    • It'll handle CASE #1 but not CASE #2.
  • Semantic Garbage (Objects that a program will never use again)
    • It'll handle CASE #1 and CASE #2 both.

Weak vs Strong References:

  • An object is eligible for garbage collection if there are no strong (i.e. ordinary) references to it, even though there still might be some weak references to it.
  • Weak Reference is usually reserved for a properly managed category of special reference objects which are safe to use even after the object disappears because they lapse to a safe value.
  • Weak References will be discussed in another post soon.

Precise vs Conservative:

  • Collectors that correctly identify all references to an object are called precise (also exact or accurate) collectors.
  • Collectors that don't identify all references to an object correctly are called conservative or partly conservative collector.
  • Conservative garbage collectors have traditionally been used in conjunction with programming languages like C and C++, which were not originally designed with a garbage collector in mind.
  • Conservative collectors assume that any bit pattern in memory could be a pointer if, interpreted as a pointer, it would point into an allocated object.

Moving (Compacting) vs Non-Moving (Non-compacting) Garbage Collectors

  • A moving GC strategy appears costly and low performance but it gives better performance. Following are details on this:
    • Since memory is defragmented, large contigous memory is available, new objects can be allocated very quickly.
    • No additional work is required to reclaim the space freed by dead objects; the entire region of memory from which reachable objects were moved can be considered free space.
    • While moving the objects special care is needed for objects passed to native methods, as there might be native pointers pointing to this.
  • Updating references after moving objects is made simpler by adding another layer of abstraction in between. Object references will point to a table of object handles and not directly refer to objects. Thus after de-fragmentation, only the object handle table will get updated.

Garbage Detection:

  • An object becomes eligible for garbage collection if its all references are null, i.e. If its not reachable from any live threads or any static references.
  • Achieved by defining a set of roots and determining reachability from the roots.
  • A distinguished set of objects are assumed to be reachable: these are known as the roots.
  • Typically, set of roots include all the objects referenced from anywhere in the call stack, and any global variables.
    • Object references in Local variables and 
    • Operand stack frame, 
    • Static members of classes in method area, 
    • Object references passed to currently executing native methods, etc.
  • An object is reachable if there is some path of references from the roots by which the executing program can access the object. Reachability is a transitive closure.
  • The roots are always accessible to the program. Any objects that are reachable from the roots are considered live. 
  • Objects that are not reachable are considered garbage.
  • Cyclic dependencies are also taken care this way.

Approaches for Garbage Detection:

  • Tracing
    • Trace the references from roots.
    • Mark and Swipe is basic Tracing Collector.
  • Reference Counting
    • Keeps count of the references to an object.
    • Advantage - It can run in small chunks of time closely interwoven with the execution of the program. This characteristic makes it particularly suitable for real-time environments where the program can't be interrupted for very long.
    • Disadvantage - It does not detect cycles: two or more objects that refer to one another.
    • This technique is currently out of favor.

Approaches for Heap Fragmentation Problem:

  • Compacting Collector
    • Slide/Move live objects to defragment the heap area.
    • Examples:
      • Mark and Swipe
      • Tri-color Marking
  • Copying Collector
    • Copying garbage collectors move all live objects to a new area.
    • Examples:
      • Stop-and-Copy

Generational Garbage Collection

  • The generational collection technique can be applied to mark and sweep algorithms as well as copying algorithms.
  • Dividing the heap into generations of objects can help improve the efficiency of the basic underlying garbage collection algorithm.
  • The most recently created objects are most likely to become unreachable quickly (known as infant mortality or the generational hypothesis).
  • Heap is divided into two or more sub-heaps, each of which serves one "generation" of objects.
  • The younger generation is garbage collected more frequently than older generation. 
  • Once an object has survived a few garbage collections as a member of the youngest generation, the object is promoted to the next generation: it is moved to another sub-heap.
  • Every sub-heap except the mature sub-heap can be given a maximum size.
  • Any objects that don't fit in the sub-heaps of the younger generations must go into the mature object space, thus mature object space cannot be given a maximum size.
  • It ensures that most incremental GC will be complete in less than a threshhold time. Only exception is Mature Generation.
  • Following image shows generational garbage collection in Hotspot JVM:

Train Algorithm:

  • Specifies an organization for the mature object space of a generational collector for time bound incremental collections of the mature object space.
  • The train algorithm, which was first proposed by Richard Hudson and Eliot Moss and is currently used by Sun's Hotspot virtual machine.
  • Divides the mature object space into fixed-sized blocks of memory, each of which is collected individually during a separate invocation of the algorithm.
    • Each block belongs to one set
    • The blocks within a set are ordered, and the sets themselves are ordered. 
  • Small number indicate an older block within a set and Set with the heap.
  • For more details on this alogorithm, please google or refer to references section at the end of the Blog post.


Types of Garbage Collectors (Garbage Collection Approaches/Strategies/Algorithms)

  • Stop-and-Copy (Semi-space) Collector
    • Memory is partitioned into a "from space" and "to space". 
    • Initially, objects are allocated into "to space" until they become full and a collection is triggered. 
    • At the start of a collection, the "to space" becomes the "from space", and vice versa. 
    • The objects reachable from the root set are copied from the "from space" to the "to space". 
  • Mark and Sweep
    • This is an example of tracing collector.
    • Each object in memory has a flag (typically a single bit) reserved for garbage collection use only.
    • In Garbage detection, every object that is ultimately pointed to from the root set is marked.
    • In Memory Reclaim step, entire memory is scanned, all non-reachable objects are freed.
    • Disadvantage - Entire system must be suspended during garbage collection; no mutation of the working set can be allowed. This will cause programs to 'freeze' periodically (and generally unpredictab'ly), making real-time and time-critical applications impossible.
  • Tri-Color Marking:
    • This is an example of tracing collector.
    • White-Set (Condemned Set)
      • Objects that are candidates for having their memory recycled.
    • Black-Set
      • Objects that can cheaply be proven to have no references to objects in the white set, but are also not chosen to be candidates for recycling; in many implementations, the black set starts off empty.
      • In many implementations, the black set starts off empty.
    •  Grey-Set
      • Objects that are reachable from root references but the objects referenced by grey objects haven't been scanned yet. 
      • The grey state means we still need to check any objects that the object references.
      • Grey objects will eventually end up in the black set.
    •  The grey set is initialised to objects which are referenced directly at root level; typically all other objects are initially placed in the white set.
    • Objects can move from white to grey to black, never in the other direction.
    • Advantage: it can be performed 'on-the-fly'.
  • Mark and Don't sweep
    • This is an example of tracing collector.
    • All reachable objects are always black.
    • A white object is unused memory and may be allocated.
    • The interpretation of the black/white bit can change. Initially, the black/white bit may have the sense of (0=white, 1=black).
    • If an allocation operation ever fails to find any available memory, the sense of the black/white bit is then inverted (0=black, 1=white). Thus all objects become white. A marking phase begins immediatly to mark reachable objects black.

Adaptive Collectors

  • An adaptive algorithm monitors the current situation on the heap and adjusts its garbage collection algorithm and configuration.

Oracle Java 7 G1 (Garbage First) Garbage Collector:

  • Compact free space without lengthy and more predictable GC induced pause times.
  • Its still not suitable for Real-Time systems.
  • Targetted for multi-processors with large memories.
  • G1 is both concurrent and parallel. 
    • It uses all available CPUs to speed up its “stop-the-world” pauses. 
    • It also works concurrently with running Java threads to minimize whole-heap operations during stop-the-world pauses.

Monday, November 4, 2013

Java Virtual Machine - Introduction

Typical Java Environment consists of following:
  •  Java Programming Language
    • JDK
    • Source code is in Java.
  • Java Class File Format
    • Bytecode
  • Java API
    • Standard classes and interfaces of JDK.
  • JVM
    • Java Virtual Machine

JVM:

  • The Java Virtual Machine, or JVM, is code executing component of Java platform.
  • Abstract machine that runs compiled Java programs.
  • The JVM is "virtual" because it is implemented in software on top of a "real" hardware platform and operating system.
  • The Java Virtual Machine is an abstract computing machine. Like a real computing machine, it has an instruction set and manipulates various memory areas at run time. 
  • Oracle JVM specifications specifies an abstract machine, it doesn't describe any particular implementation of JVM.
  • Java is for platform-independence, each Java platform has different constraints. Java specification doesn’t give hard and fast rules about the design of JVM with respect to memory and other aspects. JVM is implemented specifically for each platform, and implementation of JVM is left to jvm implementers.

Role:

  • Read the Class file format (Bytecode) and correctly performs the operations specified in it.

Key points:

  • Any implementation details like memory structures, garbage-collection etc are not part of the Java Virtual Machine's specification would unnecessarily constrain the creativity of implementers.
  • The Java Virtual Machine knows nothing of the Java programming language, it only cares for Virtual Machine instructions (or bytecodes).
  • All Java Virtual Machine implementations must recognize the Java class file format, but individual implementations may also recognize other binary formats.
  • Any language with functionality that can be expressed in terms of a valid class file can be hosted by the Java Virtual Machine.
  • For the sake of security, the Java Virtual Machine imposes strong syntactic and structural constraints on the code in a class file for security concerns.

Advantages:  

  • Hardware and operating system independence. 
  • Small size of its compiled code. 
  • Protect users from malicious programs. 
    • Bytecode is carefully inspected by JVM before execution.

JVM Implementation:

  • Like most virtual machines, the Java virtual machine machine has a stack-based architecture akin to a microcontroller/microprocessor. 
  • JVM doesn't have registers for storing arbitrary values. All computation in the JVM centers on the stack, everything must be pushed onto the stack before it can be used in a calculation.
  • JVM are usually coded for a particular OS but it can also be coded directly on hardware.
  • JVM also has low level support for Java like classes and methods, which amounts to a highly idiosyncratic memory model and capability-based architecture.
  • JVM has no built-in support for dynamically typed languages.
  • JVM supports following 7 primitive types:
    • byte     -    one-byte signed two's complement integer
    • short    -    two-byte signed two's complement integer
    • int        -    4-byte signed two's complement integer
    • long     -    8-b yte signed two's complement integer
    • float     -    4-byte IEEE 754 single-precision float
    • double -    8-byte IEEE 754 double-precision float
    • char     -    2-byte unsigned Unicode character
  • Few Examples:
    • Most popular JVM implementation is from Oracle and is named HotSpot, written in C++.
    • Jinitiator is another popular JVM implementation, especially in Oracle Apps community, developed by Oracle for supporting forms on websites before they purchased Sun. 
    • Mac OS Runtime for Java (MRJ, originally Macintosh Runtime for Java).
    • SAPJVM (SAP) is a licensed and modified SUN JVM ported to all supported platforms of SAP NetWeaver

About Bytecode

  • Machine language of the Java virtual machine.
  • Each instruction consists of a one-byte opcode(represents operation) followed by zero or more operands.
  • An opcode may optionally be followed by operands(represents parameters required by opcodes).
  • The total number of opcodes is less (~200), each opcodes requires only one byte. 
    • This helps in minimizing the size of class files and JVM implementation.
  • Java Bytecode is an intermediate language which is typically compiled from Java but it can also be compiled from other programming language, ex ADA.
  • Class files contain bytecode and a symbol table, as well as other ancillary information.
  • Any language with functionality that can be expressed in terms of a valid class file can be hosted by the Java Virtual Machine.
  • JVM has instructions for Load/Store of variables, Arithmetic, Type Conversions, Stack Management, Exception management, and Monitor based concurrency control etc.
  • Examples of Opcodes:
    • iload, lload, fload, and dload  are used to load int, long, float and double operand on the stack respectively.
    • istore, fstore are used to pop int and float from the stack to a variable.
    • aload, astore operates on references.
  • There is no support for loops and binary logical operators (|| &&). These are implemented by compiler with jump instructions.

Class File

  • Each class file contains the definition of a single class or interface. 
  • A class file consists of a stream of 8-bit bytes. All 16-bit, 32-bit, and 64-bit quantities are constructed by reading in two, four, and eight consecutive 8-bit bytes, respectively. 
  • Multibyte data items are always stored in big-endian.

Execution Engine:

  • It consists of following:
    • Bytecode Interpreter
    • JIT (Just In Time) Compiler
      • Since byte code is interpreted it executes slower than compiled machine code.
      • Just-in-time compilation (JIT) is also known as dynamic translation.
      • JIT is used to improve the runtime performance of byte code.
      • The program is stored in memory as byte code, but the code segment currently running is preparatively compiled to physical machine code in order to run faster.
    • Virtual Processor

JRE:

  • Java Execution Environment (Java Runtime Environment). 
  • JRE contained JVM and java class libraries that implement java API.

Other JVM Languages

  • Groovy
  • Scala
  • Jython
  • JRuby
  • Kotlin