Tuesday, November 12, 2013

Java Virtual Machine - Garbage Collection


  • Garbage Collection is not restricted to Java, its a generic concept that applies to most of the programming languages.
  • Garbage collection is the process of automatically freeing objects that are no longer referenced by the program. 
  • The garbage collector must determine which objects are no longer being referenced by the program.
  • Garbage collector must run any finalizers of objects being freed.
  • Garbage collection is automatic, thus it removes the possibility of developer errors due to incorrectly freeing of memory.
  • Runtime.gc() and System.gc() only suggests JVM to run Garbage Collection, it is not mandatory that Garbage Collection will definitely run after call to these methods.
  • Java virtual machine has an instruction that allocates memory on the heap for a new object (created in source coce), but has no instruction for freeing that memory.
  • Garbage Collection in Java is carried by JVM's  daemon threads for Garbage Collection.
  • Garbage collection was invented by John McCarthy around 1959 to solve problems in Lisp.

Java provides many garbage collectors, following are some examples:

  • Parallel Scavenge
  • Serial Garbage Collector
  • Parallel New + Serial GC
  • CMS (Concurrent Mark and Sweep)
  • G1 (Garbage First, available in Java7)

Regarding JVM Specification:

  • JVM specification does not specify any particular garbage collection technique, the decision to choose the garbage collection algorithm is left to the discretion of the implementors as they better know the constraints of the target device.
  • JVM specification mentions the need of garbage collection on heap, it leaves garbage collection or compacting the method area on implementers.
  • A garbage collector is not strictly required but heap memory should be managed.
  • The Java language specification states that a finalizer code will be executed only once. Finalizer are allowed to "resurrect" the object again, that is it'll be referenced again, but finalizer will not run next time.

Additional Responsibilities:

  • Heap Defragmentation

Disadvantages:

  • Developer has no control on Garbage Collection, reducing the use the finalizers.
  • At times, it may lead to uneven performance of the application components, as we don't know when Garbage collection will run and how much resources and time it'll consume.

Steps:

  • Garbage Detection
    • Identification of live and garbage objects.
    • Objects on heap which don't have any active references are available for garbage collection.
    • This process is also called Marking.
  • Reclaim Memory
    • Removes unreferenced objects leaving referenced objects and pointers to free space.
  • Optional Step: Compacting
    • Defragments the memory space, compacts the remaining referenced objects. This makes new memory allocation much easier and faster.

Stop the World Pauses:

  • Stop-the-world is term for JVM stopping the application (all threads) from running to execute a GC.
  • The interrupted tasks will resume only after the GC task has completed. 
  • GC tuning often means reducing this stop-the-world time.

Disruptive Garbage Collection vs Deterministic Garbage Collection:

  • Garbage collection with noticable Stop the World pauses are called Disruptive Garbage collection.
  • Disadvantages - These are not suitable for Real-time systems. For end-users noticable pauses are irritating.
  • Solution Approach - Garbage collection should work incrementally, on small areas of heap at a time.
    • If each incremental collection can be completed in < Threashhold time, then it is suitable for Real-time system.
    • Example - Generational Collector used in Hotspot JVM, discussed below.

Tracing Garbage Collector:

  • Tracing garbage collectors are the most common type of garbage collector. They first determine which objects are reachable (or potentially reachable), and then discard all remaining objects.

Syntactic vs Semantic Garbage:

  •  The reachability definition of "garbage" is not optimal, as there are two cases.
    • CASE #1: An object variable is now reassigned thus original object doesn't have any references left.
    • CASE #2: An object is still available in the program, but it is not used anymore.
  • Syntactic Garbage (Objects that a program cannot possibly reach)
    • It'll handle CASE #1 but not CASE #2.
  • Semantic Garbage (Objects that a program will never use again)
    • It'll handle CASE #1 and CASE #2 both.

Weak vs Strong References:

  • An object is eligible for garbage collection if there are no strong (i.e. ordinary) references to it, even though there still might be some weak references to it.
  • Weak Reference is usually reserved for a properly managed category of special reference objects which are safe to use even after the object disappears because they lapse to a safe value.
  • Weak References will be discussed in another post soon.

Precise vs Conservative:

  • Collectors that correctly identify all references to an object are called precise (also exact or accurate) collectors.
  • Collectors that don't identify all references to an object correctly are called conservative or partly conservative collector.
  • Conservative garbage collectors have traditionally been used in conjunction with programming languages like C and C++, which were not originally designed with a garbage collector in mind.
  • Conservative collectors assume that any bit pattern in memory could be a pointer if, interpreted as a pointer, it would point into an allocated object.

Moving (Compacting) vs Non-Moving (Non-compacting) Garbage Collectors

  • A moving GC strategy appears costly and low performance but it gives better performance. Following are details on this:
    • Since memory is defragmented, large contigous memory is available, new objects can be allocated very quickly.
    • No additional work is required to reclaim the space freed by dead objects; the entire region of memory from which reachable objects were moved can be considered free space.
    • While moving the objects special care is needed for objects passed to native methods, as there might be native pointers pointing to this.
  • Updating references after moving objects is made simpler by adding another layer of abstraction in between. Object references will point to a table of object handles and not directly refer to objects. Thus after de-fragmentation, only the object handle table will get updated.

Garbage Detection:

  • An object becomes eligible for garbage collection if its all references are null, i.e. If its not reachable from any live threads or any static references.
  • Achieved by defining a set of roots and determining reachability from the roots.
  • A distinguished set of objects are assumed to be reachable: these are known as the roots.
  • Typically, set of roots include all the objects referenced from anywhere in the call stack, and any global variables.
    • Object references in Local variables and 
    • Operand stack frame, 
    • Static members of classes in method area, 
    • Object references passed to currently executing native methods, etc.
  • An object is reachable if there is some path of references from the roots by which the executing program can access the object. Reachability is a transitive closure.
  • The roots are always accessible to the program. Any objects that are reachable from the roots are considered live. 
  • Objects that are not reachable are considered garbage.
  • Cyclic dependencies are also taken care this way.

Approaches for Garbage Detection:

  • Tracing
    • Trace the references from roots.
    • Mark and Swipe is basic Tracing Collector.
  • Reference Counting
    • Keeps count of the references to an object.
    • Advantage - It can run in small chunks of time closely interwoven with the execution of the program. This characteristic makes it particularly suitable for real-time environments where the program can't be interrupted for very long.
    • Disadvantage - It does not detect cycles: two or more objects that refer to one another.
    • This technique is currently out of favor.

Approaches for Heap Fragmentation Problem:

  • Compacting Collector
    • Slide/Move live objects to defragment the heap area.
    • Examples:
      • Mark and Swipe
      • Tri-color Marking
  • Copying Collector
    • Copying garbage collectors move all live objects to a new area.
    • Examples:
      • Stop-and-Copy

Generational Garbage Collection

  • The generational collection technique can be applied to mark and sweep algorithms as well as copying algorithms.
  • Dividing the heap into generations of objects can help improve the efficiency of the basic underlying garbage collection algorithm.
  • The most recently created objects are most likely to become unreachable quickly (known as infant mortality or the generational hypothesis).
  • Heap is divided into two or more sub-heaps, each of which serves one "generation" of objects.
  • The younger generation is garbage collected more frequently than older generation. 
  • Once an object has survived a few garbage collections as a member of the youngest generation, the object is promoted to the next generation: it is moved to another sub-heap.
  • Every sub-heap except the mature sub-heap can be given a maximum size.
  • Any objects that don't fit in the sub-heaps of the younger generations must go into the mature object space, thus mature object space cannot be given a maximum size.
  • It ensures that most incremental GC will be complete in less than a threshhold time. Only exception is Mature Generation.
  • Following image shows generational garbage collection in Hotspot JVM:

Train Algorithm:

  • Specifies an organization for the mature object space of a generational collector for time bound incremental collections of the mature object space.
  • The train algorithm, which was first proposed by Richard Hudson and Eliot Moss and is currently used by Sun's Hotspot virtual machine.
  • Divides the mature object space into fixed-sized blocks of memory, each of which is collected individually during a separate invocation of the algorithm.
    • Each block belongs to one set
    • The blocks within a set are ordered, and the sets themselves are ordered. 
  • Small number indicate an older block within a set and Set with the heap.
  • For more details on this alogorithm, please google or refer to references section at the end of the Blog post.


Types of Garbage Collectors (Garbage Collection Approaches/Strategies/Algorithms)

  • Stop-and-Copy (Semi-space) Collector
    • Memory is partitioned into a "from space" and "to space". 
    • Initially, objects are allocated into "to space" until they become full and a collection is triggered. 
    • At the start of a collection, the "to space" becomes the "from space", and vice versa. 
    • The objects reachable from the root set are copied from the "from space" to the "to space". 
  • Mark and Sweep
    • This is an example of tracing collector.
    • Each object in memory has a flag (typically a single bit) reserved for garbage collection use only.
    • In Garbage detection, every object that is ultimately pointed to from the root set is marked.
    • In Memory Reclaim step, entire memory is scanned, all non-reachable objects are freed.
    • Disadvantage - Entire system must be suspended during garbage collection; no mutation of the working set can be allowed. This will cause programs to 'freeze' periodically (and generally unpredictab'ly), making real-time and time-critical applications impossible.
  • Tri-Color Marking:
    • This is an example of tracing collector.
    • White-Set (Condemned Set)
      • Objects that are candidates for having their memory recycled.
    • Black-Set
      • Objects that can cheaply be proven to have no references to objects in the white set, but are also not chosen to be candidates for recycling; in many implementations, the black set starts off empty.
      • In many implementations, the black set starts off empty.
    •  Grey-Set
      • Objects that are reachable from root references but the objects referenced by grey objects haven't been scanned yet. 
      • The grey state means we still need to check any objects that the object references.
      • Grey objects will eventually end up in the black set.
    •  The grey set is initialised to objects which are referenced directly at root level; typically all other objects are initially placed in the white set.
    • Objects can move from white to grey to black, never in the other direction.
    • Advantage: it can be performed 'on-the-fly'.
  • Mark and Don't sweep
    • This is an example of tracing collector.
    • All reachable objects are always black.
    • A white object is unused memory and may be allocated.
    • The interpretation of the black/white bit can change. Initially, the black/white bit may have the sense of (0=white, 1=black).
    • If an allocation operation ever fails to find any available memory, the sense of the black/white bit is then inverted (0=black, 1=white). Thus all objects become white. A marking phase begins immediatly to mark reachable objects black.

Adaptive Collectors

  • An adaptive algorithm monitors the current situation on the heap and adjusts its garbage collection algorithm and configuration.

Oracle Java 7 G1 (Garbage First) Garbage Collector:

  • Compact free space without lengthy and more predictable GC induced pause times.
  • Its still not suitable for Real-Time systems.
  • Targetted for multi-processors with large memories.
  • G1 is both concurrent and parallel. 
    • It uses all available CPUs to speed up its “stop-the-world” pauses. 
    • It also works concurrently with running Java threads to minimize whole-heap operations during stop-the-world pauses.

No comments:

Post a Comment