Geeks With Blogs
Josh Reuben

The JVM (Java Virtual Machine) is a virtual "execution engine" instance that executes the bytecodes in Java class files on a CPU - Knowing how to tune its myriad flags affects how your application executes. 



JIT Compiler Tuning

Hotspot Compilation Mechanism Selection

  • -client - client compiler (C1) - begins compiling earlier -> optimize startup time. default for 32 bit OS, choose if heap < 3GB
  • -server - server compiler (C2)- better optimize perf for long-running apps. 2 subtypes: -server 32-bit: 5-20% faster, but total process size must be < 4GB -d64 64-bit: (DEFAULT on modern machines)
  • -XX:+TieredCompilation - tiered compilation (combines both client and server) - the best perf, but requires more native memory for extra compiled code. DEFAULT.

Advanced JIT Tuning:

  • -XX:ReservedCodeCacheSize= Reserves space - Use with tiered compilation, or on depleted code cache warning. Default=240 MB
  • -XX:InitialCodeCacheSize= reallocate initial space - uncommon. -XX:CompileThreshold= Sets #times a method/loop is executed before compiling it -> cause more methods to be compiled,& sooner. Default = 10,000 before OSR (on stack replacement)
  • -XX:+PrintCompilation - diagnostic log of JIT compiler operations - Inspect the compilation eg check if an important method is being compiled
  • -XX:+CICompilerCount= - Sets #threads used by the JIT compiler. If too many compiler threads are being started (eg when running multiple JVMs on a machine).

Garbage Collection Tuning

consider GC tuning if GCLog indicates >= 4% time in GC. first tweak desired pause time, then increase heap size (which may increase pause time) or young gen ratio.

Sizing the Heap

heap size spacetime tradeoff: smaller -> more frequent GC, larger -> longer pauses. sum of heap size for all JVMs must be smaller than physical memory - 1GB (do not include swapfile size !!! eg concurrent mode failure). set initial & max heap size -> JVM can tune itself according to workload. Instead, specify perform goals for GC algorithm: tolerable pause times, %time to spend in GC rule of thumb: size heap to be 30% occupied after full GC. (force with jcmd & observe how much memory is used afterwards).

  • -Xms - initial heap size - default [linux:Min(512MB, 1/64 RAM), osx:64MB]
  • -Xmx - max heap size - default [linux: min(32GB, 1/4 RAM) osx: min (1GB, 1/4 RAM)]

GC Collection Goal Hints

  • -XX:MaxGCPauseMillis=N - how long pauses should be. Main tuning point. Default=200ms.be realistic. takes precedence.
  • -XX:GCTimeRatio=N - how much time to spend in GC. = Throughput (1 - Throughput) default=99 = 1.95% of the time. For a throughput goal of 95% (0.95), this equation yields a GCTimeRatio of 19 -XX:-AggressiveHeap - Enables set of tuning flags optimized for high memory machines running a single JVM with a large heap.

GC algorithm Selection

Serial

  • -XX:+UseSerialGC - simple, single-threaded GC algorithm. default for client JIT Compiler. use for single CPU, < 100MB Heap.

Throughput (parallel)

DEFAULT for server. use for best AVERAGE response time, if app can tolerate small full GC pauses.

  • -XX:+UseParallelOldGC - Uses multiple threads to collect OLD gen while app threads are stopped. When app can tolerate occasional long pauses, maximize throughput while minimizing CPU usage.
  • -XX:+UseParallelGC - Uses multiple threads to collect the YOUNG gen while app threads are stopped. Use with UseParallelOldGC.

CMS (Concurrent Mark & Sweep)

scan without pausing, uses more CPU - minimize pauses on response times. uses background thread to periodically scan OLD Gen & discard unused objects. only short pauses during minor GC. heap can fragment - no compaction. Young gen is never resized unless a full GC occurs. CMS aims to never have a full collection --> never resize its young gen (if tuned correctly). Concurrent mode failures (CMF)- concurrent collection of tenured ge did not finish before the tenured gen became full. to avoid: increace heap size, frequency via CMSInitiatingOccupancyFraction, #background threads

  • -XX:+UseConcMarkSweepGC - Uses BACKGROUND thread(s) to collect OLD gen with minimal pauses. short GC pauses, but requires extra core for background thread, suitable for a relatively small heap.
  • -XX:+UseParNewGC - Uses multiple threads to collect young gen while app threads are stopped. Use with UseConcMarkSweepGC.

G1 (Garbage First)

Designed to process large heaps (> 4GB) divided into regions - can move objects between them, partially compacting heap without pause. tuning goal is to avoid full GC - increace Old Gen size (ratio or total heap size), increase # background threads & frequency of their calls

  • -XX:+UseG1GC - Uses multiple threads to collect young gen while app threads are stopped, and background thread(s) to collect old gen with minimal pauses. short GC pauses for a relatively large heap, but requires extra core for background thread.

Sizing the Generations

GC Generations: young eden, young survivor spaces (S0, S1), old (tenured) minor GC - on young. always stop the world full GC - on all - entire heap. metaspace - metadata used by JIT compiler & GC.

  • -XX:NewRatio - initial ratio of young gen to old gen. DEFAULT=2. Note adaptive sizing (default enabled) -> proportion will change (except for CMS, when the young-gen size is constant). If a generation size is reduced then it will experience more GCs.
  • -XX:NewSize - init size of young gen. DEFAULT = 1/3 Xms
  • -XX:MaxNewSize - max size of young gen.
  • -Xmn - Sets both init and max size of young gen.
  • -XX:MetaspaceSize=N - (PermSize for pre-JDK 8) - initial size of metaspace. Increase for apps that use lots of classes.
  • -XX:MaxMetaspaceSize=N - (MaxPermSize for pre-JDK 8) - max size the metaspace. reduce to limit the amount of native space used by class metadata.

Advanced GC Tuning

Adaptive Sizing

  • -XX:+UseAdaptiveSizePolicy - Default: JVM will resize heaps to meet GC goals. Turn off if heap sizes have been finely tuned, if Xms == Xmx, or apps that go though phases with different profiles.
  • -XX:+PrintAdaptiveSizePolicy - Add gen resize info to GC log. check output for G1 to see if full GCs are triggered by humongous object allocation.

Tenuring and Survivor Space Hints

  • -XX:+PrintTenuringDistribution - log
  • -XX:InitialSurvivorRatio=N - % of young gen reserved for survivor spaces. Increase if too frequent promotion of short-lived objects into old gen
  • -XX:MinSurvivorRatio=N - adaptive % of young gen reserved for survivor spaces.
  • -XX:TargetSurvivorRatio=N - % of free space in survivor spaces.
  • -XX:InitialTenuringThreshold=N - initial #GC-cycles to keep an object in survivor spaces.
  • -XX:MaxTenuringThreshold=N - max #GC-cycles to keep an object in survivor spaces.

CMS collector hints

  • -XX:CMSInitiatingOccupancyFraction=N when to begin background scanning of old gen. reduce on CMF
  • -XX:+UseCMSInitiatingOccupancyOnly - use only CMSInitiatingOccupancyFraction to determine when to start CMS background scanning.
  • -XX:ConcGCThreads=N - #threads to use for CMS background scanning. Use on high CPU machine with CMF.
  • -XX:+CMSPermGenSweepingEnabled - sweep the permgen - use if performing lots of class unloading.
  • -XX:CMSInitiatingPermOccupancyFraction=N - when to scan permgen - use if full GCs occur because permgen is filling to fast.
  • -XX:+CMSClassUnloadingEnabled - unload classes after permgen is scanned.
  • -XX:+CMSIncrementalMode - Use on low CPU machine
  • -XX:CMSIncrementalModeSafetyFactor=N - affect frequency of incremental CMS background threads - increase on CMF
  • -XX:CMSIncrementalDutyCycleMin=N - ditto
  • -XX:CMSIncrementalDutyCycleMax=N - ditto
  • -XX:+CMSIncrementalDutyCycle - ditto

G1 collector Hints

  • -XX:ConcGCThreads=N - #threads to use for background scanning. Use on high CPU machine with CMFs
  • -XX:InitiatingHeapOccupancyPercent=N - threshold to begin background scanning - reduce on concurrent mode failures.
  • -XX:G1MixedGCCountTarget=N - #mixed GCs for freeing garbage old gen regions. Reduce on CMF, increase if mixed GC cycles take too long.
  • -XX:G1HeapRegionSize=N - size of a G1 region. Increase for very large heaps, or when allocating huge objects.

Memory Management

Out of memory Errors

  • -XX:+HeapDumpOnOutOfMemoryError - Generates a heap dump when JVM throws out of memory error. ENABLE!
  • -XX:HeapDumpPath= - automatic heap dump java_pid.hprof filepath

Misc

  • -XX:SoftRefLRUPolicyMSPerMB=N - Controls how long soft references survive after being used. Decrease in low-memory machines.
  • -XX:MaxDirectMemorySize=N - Controls how much native memory (NIO) can be allocated viaByteBuffer.allocateDirect()

Large Pages

  • page mappings are held in a global page table
  • most frequently used mappings are held in translation lookaside buffers (TLBs) - fast cache maximizes hit rate
  • *grep Hugepagesize /proc/meminfo *- Determine huge page sizes that kernel supports - based on CPU & boot params. typically 2048 KB
  • calculate HugePageCount needed: (JVM Heap size / Hugepagesize) * 1.1
  • echo $HugePageCount > /proc/sys/vm/nr_hugepages
  • in /etc/sysctl.conf , sys.nr_hugepages=HugePageCount
  • in /etc/security/limits.conf , add soft / hard memlock entries for user permissions to modify
  • enable Transparent large pages: echo always > /sys/kernel/mm/transparent_hugepage/enabled
    • -XX:+UseLargePages - increace page size - JVM will allocate pages from the OS’s large page system - ENABLE!
    • -XX:+LargePageSizeInBytes=N - Solaris only
    • -XX:+StringTableSize=N - size of hashtable used to hold interned strings.

TTLABs (thread local allocation buffers)

  • for frequent creation of large objects
  • -XX:+PrintTLAB - TLAB summary in GC log - diagnostic. ENABLE!
  • -XX:TLABSize=N - size of TLABs. When the app is performing a lot of allocation outside of TLABs, use this value to increase the TLAB size.
  • -XX:-ResizeTLAB - Disables resizing of TLABs.

Thread Management

  • -XX:ParallelGCThreads=N - Control GC Parallelism: Sets #threads used by GC. reduce on multi-JVM systems. increase for large heaps, decrease for small heaps. if N < 8 CPUs, JVM will use N threads, else 8 + ((N - 8) * 5 / 8)
  • -Xss - size of thread native stack - only decreace on 32-bit JVMs to make more memory available for other parts of JVM. default: 64bit:1MB / 32bit:320KB
  • -XX:-BiasedLocking - Disables the biased locking algorithm of the JVM to improve performance of threadpool based apps. Note: Java-level priority of a thread has very little effect

Miscellaneous JVM flags

  • -XX:+AlwaysLockClassLoader - disable parallel classloading on low CPU machines to improve startup performance
  • -XX:+PrintFlagsFinal - show defaults for all flags
  • -XX:-StackTraceInThrowable - Prevents stack traces on thrown exception. Enable if deep stacks or frequently thrown exceptions (that cannot be addressed)
  • -XX:-DisabledContended - set to FALSE to allow non-JDK code to use @Contended annotation to pad variables to prevent false sharing.
  • -XX:+UseCompressedOops - Compressed ordinary object pointers - use 32-bit addresses within 64-bit JVM - enabled by default for heaps between 4 GB and 32 GB, . to compensate for GC impact of uncompressed, add 20% to planned heap size ???
  • -XX:+AggressiveOpts - enable 'experimental default' optimizations. unecessary in JVM 1.8 ?

Diagnostic Flags

GC Diagnostic Logging

  • *-verbose:gc *- Enables basic GC logging. ENABLE!
  • -Xloggc: - Directs GC log to filepath rather than stdout. ENABLE!
  • -XX:+PrintGC - enables basic GC logging. ENABLE!
  • -XX:+PrintGCDetails - enables detailed GC logging. (overhead is minimal). ENABLE!
  • -XX:+PrintGCTimeStamps - Prints a relative timestamp for each entry in GC log.
  • -XX:+PrintGCDateStamps - Prints a readable time-of-day stamp for each entry in GC log. slightly more overhead
  • -XX:+PrintReferenceGC - Prints information about soft and weak reference processing during GC - use to determine their effect on GC overhead.
  • -XX:+UseGCLogFileRotation - Enables rotations of GC log to conserve file space.
  • -XX:NumberOfGCLogFiles=N - When logfile rotation is enabled, indicates the number of logfiles to retain.
  • -XX:GCLogFileSize=N - size trigger of logfile before rotating it.

Java Flight Recorder

  • -XX:+UnlockCommercialFeatures - Allows JVM to use Flight Recorder (non open source)
  • -XX:+FlightRecorder - ENABLE! tiny overhead when idle, small overhead when recording
  • -XX:+FlightRecorderOptions - options
Posted on Monday, April 11, 2016 7:25 AM Scala , Performance | Back to top


Comments on this post: JVM Tuning

# JVm
Requesting Gravatar...
Java Virtual Machine or JVM is one of the ultimate paltforms machine. The Best in Burgers
Left by arwabharmal on May 21, 2016 8:26 AM

# re: JVM Tuning
Requesting Gravatar...
There is much to learn from this information. - Morgan Exteriors
Left by Thomas Miller on Dec 21, 2016 3:03 PM

Your comment:
 (will show your gravatar)


Copyright © JoshReuben | Powered by: GeeksWithBlogs.net