JVM 性能调优

JVM 性能调优

May 2, 2017
Java

JVM 如何进行性能调优?

Java 虚拟机内存模型

stack

HEAP

  • -Xmx: 设置堆的最大值
  • -Xms: 设置堆的最小值,即 JVM 启动时,所占据的操作系统内存大小。JVM 会试图将系统内存尽可能地限制在 -Xms 中,因此当内存使用量触及 -Xms 指定的大小时,会触发 Full GC。因此-Xms 值设置为 -Xmx,可以在系统运行初期减少 GC 的次数和耗时。
  • Xmn: 设置新生代大小。等于把 -XX:NewSize-XX:MaxNewSize 设置成了相同的大小。这两个如果设置成不同的值,会导致内存震荡,产生不必要的开销。
    • -XX:NewSize: 设置新生代的初始大小
    • -XX:MaxNewSize: 设置新生代的最大值

错误的把 Xmx 参数设置为了 Xmn 参数以后:

获取当前内存/最大可用内存/最大可用堆:

Runtime.getRuntime().freeMemory() / 1024 / 1024
Runtime.getRuntime().totalMemory() / 1024 / 1024
Runtime.getRuntime().maxMemory() / 1000 / 1000

逃逸分析

Java 7 开始支持对象的栈分配和逃逸分析机制,这样的机制能够将堆分配对象变成栈分配对象:

void myMethod() {
    V v = new V();
    // use v
    v = null;
}
  • -server: server 模式下,才可以启用逃逸分析
  • -XX:DoEscapeAnalysis: 启用逃逸分析

method area

方法区主要保存的是类的元数据:类型、常量池、字段、方法。在 Hot Spot 虚拟机中,方法区也称为永久区,同样也可以被 GC 回收。持久代的大小直接决定了系统可以支持多少个类定义和多少常量。对于使用 CGLIB 或者 Javassist 等动态字节码生成工具的应用程序而言,设置合理的持久代有利于维持系统稳定。

方法区的大小直接决定了系统可以保存多少个类,如果系统使用了一些动态代理,那么有可能会在运行时生成大量的类,如果这样,就需要设置一个合理的永久区大小,确保不发生永久区内存溢出。

  • -XX:MaxPermSize=4M: 设置持久代的最大值
  • -XX:PermSize=4M: 设置持久代的初始大小

在 JDK 1.8 中,永久区已经被彻底移除,取而代之的是元数据区 (Metaspace),元数据区是一块堆外的直接内存,如果不指定元数据区大小的话,默认情况下,虚拟机会耗尽所有的可用系统内存。

  • -XX:MaxMetaspaceSize: 指定元数据区大小

Interned Strings 放在哪里 ?

String 类型的常量池比较特殊。主要使用方法有两种:

  • 直接使用双引号声明出来的 String 对象会直接存储在常量池中。
  • 如果不是双引号声明的 String 对象,可以使用 String 提供的 intern 方法。intern 会先判断是否存在常量池中,如果不存在,则会将当前字符串放入常量池中。

JDK 6 的常量池放在 Perm 区中,默认大小只有 4 MB。JDK 7开始,放在中。

区域比例

  • -XX:SurvivorRatio=8: 设置新生代中 eden 空间S0 空间 的比例关系
  • -XX:NewRatio=2: 设置老生代和新生代的比例

垃圾回收算法

  • 引用计数法: 虽然循环引用的问题可通过 Recycler 算法解决,但是在多线程环境下,引用计数变更也要进行昂贵的同步操作,性能较低,早期的编程语言会采用此算法。
  • 标记-清除算法 (Mark-Sweep):
    1. 标记从根节点开始的可达对象
    2. 清除所有未被标记的对象
    3. 最大缺点: 回收后的空间是不连续的
  • 复制算法 (新生代):
    1. 内存空间分为两块,每次只用一块
    2. 存活对象复制到未使用的内存块中
    3. 清除正在使用的内存块中的所有对象
    4. 交换两个内存的角色
    5. 适合于新生代: 垃圾对象通常多于存活对象
  • 标记-压缩算法:
    1. 标记从根节点开始的可达对象
    2. 将所有存活对象 (未标记的对象) 压缩到内存的一端
    3. 清理边界外 (标记和未标记对象的边界) 的对象

  • 分代 (Generational Collecting):
    1. 根据每块内存空间特点的不同,使用不同的回收算法。如新生代 (存活对象少,垃圾对象多) 使用复制算法,老年代 (大部分对象是存活对象) 使用标记-压缩算法

为了支持高频率的新生代回收,虚拟机可能使用一种叫做卡表 (Card Table) 的数据结构。卡表为一个比特位集合,每一个比特位可以用来表示老年代的某一区域中的所有对象是否持有新生代对象的引用。这样在新生代 GC 时,只需先扫描卡表,就能快速知道用不用扫描特定的老年代对象,而卡表为 0 的所在区域一定不含有新生代对象的引用。

实用 JVM 参数

  • 获取堆快照。

发生 OutOfMemoryError 时,可以使用 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=C:\m.hprof 来保存当前的堆快照到文件中。也可以加上参数 -XX:OnOutOfMemoryError=c:\reset.bat 来运行一段脚本。

当发生 OutOfMemoryError (在一个 Windows 32 系统上就发生过) 的时候,应该尝试使用增大可用堆

java -Xmn1024M -jar xxx.jar

TODO: 思考: 如果知晓程序究竟需要多大内存?

  • 获取 GC 信息

使用参数 -verbose:gc 或者 -XX:+PrintGC 来获取简要的 GC 信息,也可以使用 -XX:+PrintGCDetails 来获取更加详细的信息。如果需要在 GC 发生的时刻打印 GC 发生的时间,则可以追加 -XX:+PrintGCTimeStamps 选项以查看相对时间或者 -XX:+PrintGCDateStamps 以查看绝对时间。如果许雅查看新生对象晋升到老年代的实际阈值,可以使用参数 -XX:+PrintTenuringDistribution -XX:MaxTenuringThreshold=18 来运行程序。如果需要在 GC 时,打印详细的堆信息,则可以打开 -XX:+PrintHeapAtGC 开关。

  • 控制 GC

-XX:+PrintExplicitGC 选项用于禁止显式的 GC 操作,即禁止在程序中使用 System.gc() 触发的 Full GC。另一个有用的 GC 控制参数是 -Xincgc,一旦启用这个参数,系统便会进行增量式的 GC。

JVM 调优的主要过程有: 确定堆内存大小 (-Xmx、-Xms)、合理分配新生代和老年代 (-XX:NewRatio、-Xmn、-XX:SurvivorRatio)、确定永久区大小 (-XX:Permsize、-XX:MaxPermSize)、选择垃圾收集器、对垃圾收集器进行合理的设置。除此之外,禁用显式 GC (-XX:+DisableExplicitGC)、禁用类元数据回收 (+Xnoclassgc)、禁用类验证 (-Xverify:none) 等设置,对提升系统性能也有一定的帮助。

  • GC 日志示例

使用 -XX:+PrintGC 获取的 GC 日志:

[GC (Allocation Failure)  GC前堆使用量20M->GC后堆使用量(当前可用堆大小90M), 本次GC花费 0.0028389 秒]
[GC (Allocation Failure)  20409K->432K(92672K), 0.0028389 secs]

同样的代码使用 -X:+PrintGCDetails 获取的 GC 日志:

[GC (Allocation Failure) [新生代: 从20M->降为0.4M(可用28M)] 整个堆从20M->将为0.4M(可用90M), 0.0151333 secs] [Times: 用户态时间耗时,系统态时间耗时,GC 实际经历的时间]
    新生代 总大小 28M, 已用 13M [下界,当前上界,上界]
[GC (Allocation Failure) [PSYoungGen: 20409K->448K(28160K)] 20409K->456K(92672K), 0.0151333 secs] [Times: user=0.00 sys=0.00, real=0.02 secs] 
Heap
 PSYoungGen      total 28160K, used 13461K [0x00000000e1380000, 0x00000000e4a80000, 0x0000000100000000)
  eden space 24576K, 52% used [0x00000000e1380000,0x00000000e20356d0,0x00000000e2b80000)
  from space 3584K, 12% used [0x00000000e2b80000,0x00000000e2bf0020,0x00000000e2f00000)
  to   space 3584K, 0% used [0x00000000e4700000,0x00000000e4700000,0x00000000e4a80000)
 ParOldGen       total 64512K, used 8K [0x00000000a3a00000, 0x00000000a7900000, 0x00000000e1380000)
  object space 64512K, 0% used [0x00000000a3a00000,0x00000000a3a02000,0x00000000a7900000)
 Metaspace       used 3264K, capacity 4494K, committed 4864K, reserved 1056768K
  class space    used 363K, capacity 386K, committed 512K, reserved 1048576K

如果需要更为全面的堆信息,还可以使用参数 -XX:+PrintHeapAtGC,它会在每次 GC 前后分别打印堆的信息

{Heap before GC invocations=1 (full 0):
    ...
Heap after GC invocations=1 (full 0):
    ...
}

如果需要分析 GC 发生的时间,还可以使用 -XX:+PrintGCTimeStamps 参数,该输出时间为虚拟机启动后的时间偏移量:

0.174: [GC (Allocation Failure)  20409K->504K(92672K), 0.0016586 secs]
0.179: [GC (Allocation Failure)  19415K->464K(92672K), 0.0031200 secs]
0.186: [GC (Allocation Failure)  19812K->432K(92672K), 0.0009531 secs]

由于 GC 还会引起应用程序停顿,使用参数 -XX:+PrintGCApplicationConcurrentTime 可以打印应用程序的执行时间,使用参数 -XX:+PrintGCApplicationStoppedTime 可以打印应用程序由于 GC 而产生的停顿时间:

Application time: 0.0084849 seconds
[GC (Allocation Failure)  20409K->520K(92672K), 0.0044274 secs]
Total time for which application threads were stopped: 0.0045452 seconds, Stopping threads took: 0.0000210 seconds
Application time: 0.0033066 seconds
[GC (Allocation Failure)  19431K->440K(117248K), 0.0020202 secs]
Total time for which application threads were stopped: 0.0021438 seconds, Stopping threads took: 0.0000258 seconds
Application time: 0.0082455 seconds

如果想跟踪系统内的软引用、弱引用、虚引用和 Finalize 队列,则可以使用打开 -XX:+PrintReferenceGC 开关. 使用参数 -Xloggc:log/gc.log 启动虚拟机,将 GC 日志输出到 gc.log 文件中

Java HotSpot(TM) 64-Bit Server VM (25.111-b14) for linux-amd64 JRE (1.8.0_111-b14), built on Sep 22 2016 16:14:03 by "java_re" with gcc 4.3.0 20080428 (Red Hat 4.3.0-8)
Memory: 4k page, physical 6052560k(316636k free), swap 6233084k(4248464k free)
CommandLine flags: -XX:InitialHeapSize=96840960 -XX:MaxHeapSize=1549455360 -XX:+PrintGC -XX:+PrintGCApplicationConcurrentTime -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCTimeStamps -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseParallelGC 
0.183: Application time: 0.0107645 seconds
0.183: [GC (Allocation Failure)  20409K->432K(92672K), 0.0033748 secs]
0.187: Total time for which application threads were stopped: 0.0035825 seconds, Stopping threads took: 0.0000191 seconds
0.192: Application time: 0.0054269 seconds
0.193: [GC (Allocation Failure)  19343K->496K(117248K), 0.0108382 secs]
0.204: Total time for which application threads were stopped: 0.0116746 seconds, Stopping threads took: 0.0000766 seconds
0.212: Application time: 0.0084699 seconds

系统参数查看:

  • -XX:+PrintVMOptions: 打印虚拟机接受的命令行显示参数
  • -XX:+PrintCommandLineFlags: 打印虚拟机的显示和隐式参数
  • -XX:+PrintFlagsFinal: 打印所有的系统参数的值
# 打印出系统的堆大小
java -XX:+PrintFlagsFinal -version | grep -iE 'HeapSize|PermSize|ThreadStackSize'

Minor GC、Major GC 和 Full GC

  • Minor GC: 从年轻代回收垃圾,当 JVM 无法分配新对象的时候会触发 Minor GC,也就是说 Eden 区域已经满了
  • Major GC: 清除 Tenured 区域
  • Full GC: 清除整个堆,包括 Yound 和 Tenured 区域

JVM 的工作模式

  • java -version: 查看 Server VM
  • java -client -version: 查看 Client VM

ClientServer 模式下的各种参数可能会有很大不同

Heap Memory 最佳实践

  • 是否分配了过多实例: 使用 jcmd 8998 GC.class_histogram 来查看各实例有多少个,也可以使用 jmap -histo 8998 来获得相同的结果
  • 分析堆快照: 使用 jhat、jvisualvm、mat 等工具来分析 hprof 文件
    • jcmd 8998 GC.heap_dump /path/to/heap_dump.hprof
    • jmap -dump:live,file=/path/to/heap_dump.hprof 8998: 引入 live 强制 full GC

Java Monitoring 常用工具

jstack

Jstack: Dumps the stacks of a Java 进程

jstack $PID > $DATE_DIR/jstack-$PID.dump 2>&1

jinfo

Jinfo: Provides visibility into the system properties of the JVM, and allows some system properties to be set dynamically.

[email protected]:~# jinfo 18772
Attaching to process ID 18772, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 25.144-b01
Java System Properties:

com.sun.management.jmxremote.authenticate = false
java.runtime.name = Java(TM) SE Runtime Environment
java.vm.version = 25.144-b01
...(省略好多)

VM Flags:
Non-default VM flags: -XX:CICompilerCount=3 -XX:InitialHeapSize=98566144 -XX:+ManagementServer -XX:MaxHeapSize=1549795328 -XX:MaxNewSize=516423680 -XX:MinHeapDeltaBytes=524288 -XX:NewSize=32505856 -XX:OldSize=66060288 -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseFastUnorderedTimeStamps -XX:+UseParallelGC 
Command line:  -Dcom.sun.management.jmxremote.port=5780 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -javaagent:/usr/lib/intellij_idea/idea-IC-172.3968.16/lib/idea_rt.jar=35487:/usr/lib/intellij_idea/idea-IC-172.3968.16/bin -Dfile.encoding=UTF-8

jstat

jstat: 提供有关 GC 和类加载活动的相关信息

显示可用的九个 options:

jstat -options

One useful option is -gcutil, which displays the time spent in GC as well as the percentage of each GC area that is currently filled. Other options to jstat will display the GC sizes in terms of KB.

Remember that jstat takes an optional argument—the number of milliseconds to repeat the command—so it can monitor over time the effect of GC in an application.

jstat -gcutil process_id 1000

打印出的是:

[email protected]:~# jstat -gcutil 18772
  S0     S1     E      O      M     CCS    YGC     YGCT    FGC    FGCT     GCT   
  0.00  71.53  97.93  34.02  96.70  93.37     29    0.133     1    0.040    0.172

gccapacity 可以显示 VM 内存中三代(young,old,perm)对象的使用和占用大小

jstat -gccapacity process_id

打印出的是:

[email protected]:~# jstat -gccapacity 18772
 NGCMN    NGCMX     NGC     S0C   S1C       EC      OGCMN      OGCMX       OGC         OC       MCMN     MCMX      MC     CCSMN    CCSMX     CCSC    YGC    FGC 
 31744.0 504320.0  30720.0 4608.0 4608.0  21504.0    64512.0  1009152.0    44032.0    44032.0      0.0 1069056.0  22272.0      0.0 1048576.0   2560.0     32     1

jmap (Memory Map)

jmap: Provides heap dumps and other information about JVM memory usage.

jmap $PID

打印的是一堆这种东西:

[email protected]:~# jmap 18772
Attaching to process ID 18772, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 25.144-b01
0x0000000000400000	7K	/usr/lib/jvm/oracle_jdk8/jdk1.8.0_144/bin/java
0x00007f7072978000	98K	/lib/x86_64-linux-gnu/libresolv-2.23.so
0x00007f7072b93000	26K	/lib/x86_64-linux-gnu/libnss_dns-2.23.so
0x00007f7072d9a000	10K	/lib/x86_64-linux-gnu/libnss_mdns4_minimal.so.2
0x00007f70737a1000	87K	/lib/x86_64-linux-gnu/libgcc_s.so.1
0x00007f70739b7000	251K	/usr/lib/jvm/oracle_jdk8/jdk1.8.0_144/jre/lib/amd64/libsunec.so
...(省略好多)

Print histogram(直方图;柱状图) of java object heap; if the “live” suboption is specified, only count live objects:

jmap -histo $PID
jmap -histo:live $PID
[email protected]:~# jmap -F -histo 18772
Object Histogram:

num 	  #instances	#bytes	Class description
--------------------------------------------------------------------------
1:		65711	10183976	char[]
2:		13523	8919400	byte[]
3:		54732	2159368	java.lang.Object[]
4:		7341	1451792	int[]
5:		56423	1354152	java.lang.String
6:		15476	619040	java.util.TreeMap$Entry
7:		16562	529984	java.io.ObjectStreamClass$WeakClassKey
8:		11915	476600	java.util.LinkedHashMap$Entry
9:		9716	466368	java.util.HashMap
10:		3993	453312	java.lang.Class
11:		11568	370176	java.util.concurrent.ConcurrentHashMap$Node
12:		6160	306952	java.util.HashMap$Node[]
13:		4210	279856	java.util.Hashtable$Entry[]
14:		8320	266240	java.util.Vector
15:		8070	258240	java.util.HashMap$Node
16:		10495	251880	org.jsoup.nodes.Attribute
17:		4181	200688	java.util.Hashtable
...(省略好多)

Print java heap summary:

jmap -heap $PID

打印出的是一堆这种东西:

[email protected]:~# jmap -heap 18772
Attaching to process ID 18772, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 25.144-b01

using thread-local object allocation.
Parallel GC with 4 thread(s)

Heap Configuration:
   MinHeapFreeRatio         = 0
   MaxHeapFreeRatio         = 100
   MaxHeapSize              = 1549795328 (1478.0MB)
   NewSize                  = 32505856 (31.0MB)
   MaxNewSize               = 516423680 (492.5MB)
   OldSize                  = 66060288 (63.0MB)
   NewRatio                 = 2
   SurvivorRatio            = 8
   MetaspaceSize            = 21807104 (20.796875MB)
   CompressedClassSpaceSize = 1073741824 (1024.0MB)
   MaxMetaspaceSize         = 17592186044415 MB
   G1HeapRegionSize         = 0 (0.0MB)

Heap Usage:
PS Young Generation
Eden Space:
   capacity = 23068672 (22.0MB)
   used     = 11772712 (11.227333068847656MB)
   free     = 11295960 (10.772666931152344MB)
   51.03333213112571% used
From Space:
   capacity = 11010048 (10.5MB)
   used     = 2035424 (1.941131591796875MB)
   free     = 8974624 (8.558868408203125MB)
   18.48696754092262% used
To Space:
   capacity = 11534336 (11.0MB)
   used     = 0 (0.0MB)
   free     = 11534336 (11.0MB)
   0.0% used
PS Old Generation
   capacity = 45088768 (43.0MB)
   used     = 13718432 (13.082916259765625MB)
   free     = 31370336 (29.917083740234375MB)
   30.42538665061773% used

8999 interned Strings occupying 836656 bytes.

堆内存使用最佳实践

堆分析

(1) 查看直方图

// jcmd 命令默认就会进行 full GC
jcmd 6808 GC.class_histogram
jmap -histo 6808
// 如果指明 live: 选项,将会强制进行一个 full GC
jmap -histo:live 6808
 num     #instances         #bytes  class name
----------------------------------------------
   1:         12227        1303424  [C
   2:          1003         627856  [B
   3:          1917         461864  [I
   4:          3828         421768  java.lang.Class
   5:         11665         279960  java.lang.String
   6:          6065         194080  java.util.concurrent.ConcurrentHashMap$Node
   7:          2794         173144  [Ljava.lang.Object;
   8:          3072         122880  org.apache.lucene.index.FreqProxTermsWriter$PostingList
   9:          2760         110400  java.util.LinkedHashMap$Entry
  10:          1097         101144  [Ljava.util.HashMap$Node;
  11:          5440          87040  java.lang.Object
  12:          2680          85760  java.util.HashMap$Node
  13:           520          45760  java.lang.reflect.Method
  14:            44          44064  [Ljava.util.concurrent.ConcurrentHashMap$Node;
  15:           781          43736  java.util.LinkedHashMap
  16:            96          41088  [Lorg.apache.lucene.index.RawPostingList;
...

(2) Dump 堆

// 指明 live,强制进行 full GC
jmap -dump:live,file=/tmp/heap_dump.hprof 6808
// 或者
jmap -F -dump:format=b,file=filename.hprof 20961
// 或者简单点
jmap -F -dump:file=filename.hprof 20961

注意: 路径一定要显示指明,否则不知道默认保存到哪里去了

通常有三种工具能够分析 .hprof 文件:

  • jhat
  • jvisualvm
  • mat

(3) 内存溢出

内存溢出通常发生在:

  • Native 内存用光了
  • permgen(Java 7) 或者 metaspace(Java 8) 内存用光了
  • Java 堆内存用光了
  • JVM 进行 GC 的时间太长了

使用更少的内存

(1) 减少对象大小

(2) 延迟初始化 (3) 不可变对象 (4) String Interning

对象生命周期管理

JIT

(1) 编译还是解释

Languages like C++ and Fortran are called compiled languages because their programs are delivered as binary (compiled) code: the program is written, and then a static compiler produces a binary. The assembly code in that binary is targeted to a particular CPU. Complementary CPUs can execute the same binary: for example, AMD and Intel CPUs share a basic, common set of assembly language instructions, and later versions of CPUs almost always can execute the same set of instructions as previous versions of that CPU.

Languages like PHP and Perl, on the other hand, are interpreted. The same program source code can be run on any CPU as long as the machine has the correct interpreter (that is, the program called php or perl). The interpreter translates each line of the program into binary code as that line is executed.

Java attempts to find a middle ground here. Java applications are compiled—but instead of being compiled into a specific binary for a specific CPU, they are compiled into an idealized assembly language. This assembly language (know as Java bytecodes) is then run by the java binary (in the same way that an interpreted PHP script is run by the php binary). This gives Java the platform independence of an interpreted language. Because it is executing an idealized binary code, the java program is able to compile the code into the platform binary as the code executes. This compilation occurs as the program is executed: it happens “just in time.

(2) HotSpot 名字的含义

In a typical program, only a small subset of code is executed frequently, and the performance of an application depends primarily on how fast those sections of code are executed. These critical sections are known as the hot spots of the application; the more the section of code is executed, the hotter that section is said to be.

Hence, when the JVM executes code, it does not begin compiling the code immediately. There are two basic reasons for this. First, if the code is going to be executed only once, then compiling it is essentially a wasted effort; it will be faster to interpret the Java bytecodes than to compile them and execute (only once) the compiled code.

the more times that the JVM executes a particular method or loop, the more information it has about that code. This allows the JVM to make a number of optimizations when it compiles the code.

(3) 寄存器和内存

If the value of sum were to be retrieved from (and stored back to) main memory on every iteration of this loop, performance would be dismal. Instead, the compiler will load a register with the initial value of sum, perform the loop using that value in the register, and then (at an indeterminate point in time) store the final result from the register back to main memory.

Register usage is a general optimization of the compiler, and when escape analysis is enabled (see the end of this chapter), register use is quite aggressive.

(4) 选择 Java 编译器

  • A 32-bit client version (-client)
  • A 32-bit server version (-server)
  • A 64-bit server version (-d64)

For the sake of compatibility, the argument specifying which compiler to use is not rigorously followed. If you have a 64-bit JVM and specify -client, the application will use the 64-bit server compiler anyway. If you have a 32 bit JVM and you specify -d64, you will get an error that the given instance does not support a 64-bit JVM.

The client compiler begins compiling sooner than the server compiler does. code produced by the server compiler will be faster than that produced by the client compiler. couldn’t the JVM start with the client compiler, and then use the server compiler as code gets hotter? That technique is known as tiered compilation. With tiered compilation, code is first compiled by the client compiler; as it becomes hot, it is recompiled by the server compiler.

# Java 7 需要打开, Java 8 默认开启
-server -XX:+TieredCompilation
  • For GUI programs, uses the client compiler by default. Performance is often all about perception: if the initial startup seems faster, and everything else seems fine, users will tend to view the program that has started faster as being faster overall.
  • For long-running applications, always choose the server compiler, preferably in conjunction with tiered compilation.

查看默认编译器:

java -version

(5) 更多考虑因素

Code Cache: When the JVM compiles code, it holds the set of assembly-language instructions in the code cache. Code Cache 有固定大小, and once it has filled up, the JVM is not able to compile any additional code.


编译阈值: The major factor involved here is 多频繁 the code is executed; once it is executed a certain number of times, its compilation threshold is reached, and the compiler deems that it has enough information to compile the code.

Compilation is based on two counters in the JVM: 方法调用次数, and 方法内循环的实际次数. When the JVM executes a Java method, it checks the sum of those two counters and decides whether or not the method is eligible for compilation. This kind of compilation has no official name but is often called standard compilation (标准编译).

But what if the method has a really long loop—or one that never exits and provides all the logic of the program? In that case, the JVM needs to compile the loop without waiting for a method invocation. So every time the loop completes an execution, the branching counter is incremented and inspected. If the branching counter has exceeded its individual threshold, then the loop (and not the entire method) becomes eligible for compilation.

This kind of compilation is called on-stack replacement (OSR), because even if the loop is compiled, that isn’t sufficient: the JVM has to have the ability to start executing the compiled version of the loop while the loop is still running. When the code for the has finished compiling, the JVM replaces the code (on-stack), and the next iteration of the loop will execute the much-faster compiled version of the code (下一次循环就是编译版本了).

Standard compilation is triggered by the value of the -XX:CompileThreshold=N flag. The default value of N for the client compiler is 1,500; for the server compiler it is 10,000.


查看编译过程: -XX:+PrintCompilation.

jstat has two options to provide information about the compiler. The -compiler option supplies summary information about 多少方法被编译了 (here 5003 is the process ID of the program to be inspected):

jstat -compiler 5003

lternately, you can use the -printcompilation option to get information about the 最后一个方法 that is compiled. In this example, jstat repeats the information for process ID 5003 every second (1,000 ms):

jstat -printcompilation 5003 1000

编译线程个数:


内联:

One of the most important optimizations the compiler makes is to inline methods.

public class Point {
    private int x, y;
    public void getX() { return x; }
    public void setX(int i) { x = i; }
}

当你写这样代码的时候:

Point p = getPoint();
p.setX(p.getX() * 2);

编译后的代码执行的将会是:

Point p = getPoint();
p.x = p.x * 2;

The basic decision about whether to inline a method depends on 多频繁 and 大小. The JVM determines if a method is hot (i.e., called frequently) based on an internal calculation; it is not directly subject to any tunable parameters. If a method is eligible for inlining because it is called frequently, then it will be inlined only if its 字节码大小小于 325 字节 (or whatever is specified as the -XX:MaxFreqInlineSize=N flag). Otherwise, it is eligible for inlining only if it is small: 小于 35 字节 (or whatever is specified as the -XX:MaxInlineSize=N flag)


逃逸分析:

The server compiler performs some very aggressive optimizations if escape analysis is enabled (-XX:+DoEscapeAnalysis, 默认开启).

public class Factorial {
    private BigInteger factorial;
    private int n;
    public Factorial(int n) {
        this.n = n;
    }
    public synchronized BigInteger getFactorial() {
        if (factorial == null)
            factorial = ...;
        return factorial;
    }
}

The factorial object is referenced only inside that loop; no other code can ever access that object. Hence, the JVM is free to perform a number of optimizations on that object:

  • It needn’t get a synchronization lock when calling the getFactorial() method.
  • It needn’t store the field n in memory; it can keep that value in a register. Similarly it can store the factorial object reference in a register.
  • In fact, it needn’t allocate an actual factorial object at all; it can just keep track of the individual fields of the object.

(6) Deoptimization

Deoptimization means that the compiler 不得不撤销一些优化; the effect is that the performance of the application will be reduced—at least until the compiler can recompile the code in question. There are two cases of deoptimization: when code is “made not entrant,” and when code is “made zombie”.


Not Entrant Code:

There are two things that cause code to be made not entrant. One is due to the way classes and interfaces work, and one is an implementation detail of tiered compilation

StockPriceHistory sph;
String log = request.getParameter("log");
if (log != null && log.equals("true")) {
    sph = new StockPriceHistoryLogger(...);
}
else {
    sph = new StockPriceHistoryImpl(...);
}
// Then the JSP makes calls to:
sph.getHighPrice();
sph.getStdDev();
// and so on

If a bunch of calls are made to http://localhost:8080/StockServlet (that is, without the log parameter), the compiler will see that the actual type of the sph object is StockPriceHistoryImpl. It will then inline code and perform other optimizations based on that knowledge. Later, say a call is made to http://localhost:8080/StockServlet?log=true. Now the assumption the compiler made regarding the type of the sph object is false; the previous optimizations are no longer valid. This generates a deoptimization trap, and the previous optimizations are discarded. If a lot of additional calls are made with logging enabled, the JVM will quickly end up compiling that code and making new optimizations.

In tiered compilation, code is compiled by the client compiler, and then later compiled by the server compiler (and actually it’s a little more complicated than that, as discussed in the next section). When the code compiled by the server compiler is ready, the JVM must replace the code compiled by the client compiler. It does this by 将旧代码标记为 Not Entrant and using the same mechanism to substitute the newly compiled (and more efficient) code.


Deoptimizing Zombie Code:

Recall that the compiled code is held in a fixedsize code cache; when zombie methods are identified, it means that the code in question can be removed from the code cache, making room for other classes to be compiled (or limiting the amount of memory the JVM will need to allocate later).

The possible downside here is that if the code for the class is made zombie and then later reloaded and heavily used again, the JVM will need to recompile and reoptimize the code.

远程 JVisualVM

远程机器上输入 jstatd:

Could not create remote object
access denied ("java.util.PropertyPermission" "java.rmi.server.ignoreSubClasses" "write")
java.security.AccessControlException: access denied ("java.util.PropertyPermission" "java.rmi.server.ignoreSubClasses" "write")
	at java.security.AccessControlContext.checkPermission(AccessControlContext.java:472)
	at java.security.AccessController.checkPermission(AccessController.java:884)
	at java.lang.SecurityManager.checkPermission(SecurityManager.java:549)
	at java.lang.System.setProperty(System.java:792)
	at sun.tools.jstatd.Jstatd.main(Jstatd.java:139)

你需要创建一个安全策略文件: jstatd.all.policy,里面写上这句话:

grant codebase "file:/opt/java/jdk1.7.0_21/lib/tools.jar" { permission java.security.AllPermission; };

然后使用如下命令重新启动:

jstatd -J -Djava.security.policy=/home/user/jstatd.all.policy

在本机测试,是否能够 telnetjstatd 服务:

telnet 10.108.112.218 1099

有些时候,jstatd 可能绑定的并不是正确的网卡:

-J-Djava.rmi.server.hostname=10.1.1.123

强制使用 IPV4:

-J-Djava.net.preferIPv4Stack=true

查看一些日志输出:

-J-Djava.rmi.server.logCalls=true

最后的命令:

jstatd -J-Djava.security.policy=./jstatd.all.policy -J-Djava.rmi.server.hostname=10.108.112.218 -J-Djava.rmi.server.logCalls=true

DUMP 什么

以下是 dubbo - dump.sh 备份的内容:

DUMP_DATE=`date +%Y%m%d%H%M%S`
DATE_DIR=$DUMP_DIR/$DUMP_DATE

echo -e "Dumping the $SERVER_NAME ...\c"
for PID in $PIDS ; do
	jstack $PID > $DATE_DIR/jstack-$PID.dump 2>&1
	echo -e ".\c"
	jinfo $PID > $DATE_DIR/jinfo-$PID.dump 2>&1
	echo -e ".\c"
	jstat -gcutil $PID > $DATE_DIR/jstat-gcutil-$PID.dump 2>&1
	echo -e ".\c"
	jstat -gccapacity $PID > $DATE_DIR/jstat-gccapacity-$PID.dump 2>&1
	echo -e ".\c"
	jmap $PID > $DATE_DIR/jmap-$PID.dump 2>&1
	echo -e ".\c"
	jmap -heap $PID > $DATE_DIR/jmap-heap-$PID.dump 2>&1
	echo -e ".\c"
	jmap -histo $PID > $DATE_DIR/jmap-histo-$PID.dump 2>&1
	echo -e ".\c"
	if [ -r /usr/sbin/lsof ]; then
	/usr/sbin/lsof -p $PID > $DATE_DIR/lsof-$PID.dump
	echo -e ".\c"
	fi
done

if [ -r /bin/netstat ]; then
/bin/netstat -an > $DATE_DIR/netstat.dump 2>&1
echo -e ".\c"
fi
if [ -r /usr/bin/iostat ]; then
/usr/bin/iostat > $DATE_DIR/iostat.dump 2>&1
echo -e ".\c"
fi
if [ -r /usr/bin/mpstat ]; then
/usr/bin/mpstat > $DATE_DIR/mpstat.dump 2>&1
echo -e ".\c"
fi
if [ -r /usr/bin/vmstat ]; then
/usr/bin/vmstat > $DATE_DIR/vmstat.dump 2>&1
echo -e ".\c"
fi
if [ -r /usr/bin/free ]; then
/usr/bin/free -t > $DATE_DIR/free.dump 2>&1
echo -e ".\c"
fi
if [ -r /usr/bin/sar ]; then
/usr/bin/sar > $DATE_DIR/sar.dump 2>&1
echo -e ".\c"
fi
if [ -r /usr/bin/uptime ]; then
/usr/bin/uptime > $DATE_DIR/uptime.dump 2>&1
echo -e ".\c"
fi

从上可知一般统计的都有如下几项:

  • jstack: 线程信息
  • jinfo: 配置信息. The configuration information includes Java system properties and Java Virtual Machine (JVM) command-line flags.
  • jstat -gcutil: 垃圾收集统计
  • jstat -gccapacity: Displays statistics about the capacities of the generations and their corresponding spaces.
  • jmap: Prints 共享对象内存 maps or 堆内存 details for a process, core file, or remote debug server.
  • jmap -heap: Prints a heap summary of the garbage collection used, the head configuration, and generation-wise heap usage. In addition, the number and size of interned Strings are printed.
  • jmap -histo: Prints a histogram of the heap
  • lsof -p
  • netstat -an
  • iostat: Report Central Processing Unit (CPU) statistics and input/output statistics for devices, partitions and network filesystems (NFS).
  • mpstat: Report 处理器 related statistics.
  • vmstat: vmstat (virtual memory statistics) is a computer system monitoring tool that collects and displays summary information about operating system memory, processes, interrupts, paging and block I/O.
  • free -t: Display amount of 可用/已用内存 in the system. -t: Display a line showing the column totals.
  • sar: In computing, sar (System Activity Report) is a Unix System V-derived system monitor command used to report on various system loads, including CPU 活动, memory/paging, 设备负载, 网络. Linux distributions provide sar through the sysstat package.
  • uptime: uptime gives a one line display of the following information. The 当前时间, 多长时间 the system has been running, 多少用户 are currently logged on, and the 系统平均负载 averages for the past 1, 5, and 15 minutes.

实际运用中如何清晰明了地观察 JVM 的运行过程?

  • 图形工具: JProfiler, JConsole, Java VisualVM
  • 命令: jps, jstack, jmap, jhat, jstat

JVM 如何进阶

问:JVM如何进阶,目前周志明的《深入理解JVM》第2版看了两遍,能够根据目录口述书中大部分内容,还需要了解哪些知识?

答:周志明的书只能算是 JVM 的入门书籍。接下来你应该去读一读**《Java虚拟机规范》**,周志明的书很多内容是从里面来的,但是规范本身比较详细,注意读英文原版。其次去读一下Oralce的文档:**《Hotspot Memory Management white paper》, 《Java Platform, Standard Edition HotSpot Virtual Machine Garbage Collection Tuning Guide》**。现在你需要进一步修炼关于**内存管理**的部分,阅读比如**《垃圾回收算法与实现》**,如果这本读完还不满足,那么阅读**《自动内存管理艺术——垃圾回收算法手册》**。到了这一步,理论你已经掌握得很好了,是时候把 Hotspot 源码 download 下来编译好之后断点调试玩玩了,这个时候我要推荐你今年阿里人刚出的**《揭秘Java虚拟机》**,不过阅读这本书之前你要是愿意先读完**《深入理解计算机系统》**效果更好。到了这一步,剩下的,自己探索了,我也在探索。


JVM 分析

CPU 高

问: 线上CPU很高、内存占用很少,有能快速查找到原因的方法吗?

答: 给一个代码,在 Linux 下保存成 .sh 文件直接执行即可。

#!/bin/sh
ts=$(date +"%s")
jvmPid=$1
defaultLines=100
defaultTop=20

threadStackLines=${2:-$defaultLines}
topThreads=${3:-$defaultTop}

jvmCapture=$(top -b -n1 | grep java )
threadsTopCapture=$(top -b -n1 -H | grep java )
jstackOutput=$(echo "$(jstack $jvmPid )" )
topOutput=$(echo "$(echo "$threadsTopCapture" | head -n $topThreads | perl -pe 's/\e\[?.*?[\@-~] ?//g' | awk '{gsub(/^ +/,"");print}' | awk '{gsub(/ +|[+-]/," ");print}' | cut -d " " -f 1,9 )\n ")

echo "*************************************************************************************************************"

uptime

echo "Analyzing top $topThreads threads"

echo "*************************************************************************************************************"

printf %s "$topOutput" | while IFS= read  line

do
   pid=$(echo $line | cut -d " " -f 1)
   hexapid=$(printf "%x" $pid)
   cpu=$(echo $line | cut -d " " -f 2)
   echo -n $cpu"% [$pid] " 
   echo "$jstackOutput" | grep "tid.*0x$hexapid " -A $threadStackLines | sed -n -e '/0x'$hexapid'/,/tid/ p' | head -n -1
   echo "\n"
done

echo "\n" 

代码的意思,打印出 JVM 的所有线程以及按照 CPU 占比排序。


问: 您好,想问一个 JVM 比较基础的知识,现在的垃圾收集都是分代回收,那么在回收新生代的时候是要同时扫描老年代吗?是全表还是有一种策略,比如 G1 的 Remembered set,这个 set 只是记录了一种引用关系;那其它的分代回收,比如 CMS 和 ParNew 组合时只能是回收新生代的时候扫描老年代吗?那这样效率不就是降低了不少吗?

答:对于老年代指向新生代的引用,JVM提供了一种叫 card table 的数据结构,所以每次并不需要全量遍历老年代,只需要遍历 card table 就行了。

CPU 占用

使用命令 top -Hp pid 这个时候 top 命令显示的最左侧的就是这个 java 应用内部的线程 id,不过是 10 进制的,使用 printf "%X\n" pid 转为 16 进制

OOM

问: 线上定位内存 JVM 内存溢出,除了打印堆栈拿出来分析,还有没有其它的方式?

答:导出 JVM dump 文件,在本地使用 Eclipse 插件 MAT 分析,可视化的分析最方便、直观、有效。

MAT

1) The Dominator Tree:

The key to understanding your retained heap, is looking at the dominator tree. The dominator tree is a tree produced by the complex object graph in your system. The dominator tree allows you to identify the largest memory graphs. An Object X is said to dominate an Object Y if every path from the Root to Y must pass through X.

https://javaeesupportpatterns.blogspot.jp/2013/03/openjpa-memory-leak-case-study.html

JVM 诊断示例

1) 健康的 JVM:

2) 启动内存暴涨:

3) 激增:

4) 内存泄露

JVisualVM

需要安装一个 Visual GC 插件:

才能显示具体的 GC 过程:

如何在生产环境使用 Btrace 进行调试

大多数问题的解决方式都是在本地打断点进行调试,或者在测试环境利用输出日志进行调试,这种方式简单粗暴,但过程比较繁琐,需要各种重新发布,重启应用,还不能保证一次就找到问题的根源。

BTrace 是 sun 公司推出的一款 Java 动态、安全追踪(监控)工具,可以在不用重启的情况下监控系统运行情况,方便的获取程序运行时的数据信息,如方法参数、返回值、全局变量和堆栈信息等,并且做到最少的侵入,占用最少的系统资源。

由于 Btrace 会把脚本逻辑直接侵入到运行的代码中,所以在使用上做很多限制:

  1. 不能创建对象
  2. 不能使用数组
  3. 不能抛出或捕获异常
  4. 不能使用循环
  5. 不能使用 synchronized 关键字
  6. 属性和方法必须使用 static 修饰

根据官方声明,不恰当的使用 BTrace 可能导致 JVM 崩溃,如在 BTrace 脚本使用错误的 class 文件,所以在上生产环境之前,务必在本地充分的验证脚本的正确性

Btrace 可以做什么?

  • 接口性能变慢,分析每个方法的耗时情况;
  • 当在 Map 中插入大量数据,分析其扩容情况;
  • 分析哪个方法调用了 System.gc()
  • 执行某个方法抛出异常时,分析运行时参数

假设服务器端运行的是如下代码:

public class BtraceCase {
    public static Random random = new Random();
    public int size;
 
    public static void main(String[] args) throws Exception {
        new BtraceCase().run();
    }
 
    public void run() throws Exception {
        while (true) {
            add(random.nextInt(10), random.nextInt(10));
        }
    }
 
    public int add(int a, int b) throws Exception {
        Thread.sleep(random.nextInt(10) * 100);
        return a + b;
    }
}

我们想要对 add 方法的传入参数、返回值和执行耗时进行分析:

通过 jps 获取服务器端的进程ID: 8454,执行命令

btrace 8454 Debug.java

实现对运行代码的监控:

可以发现,Btrace 可以获取每次执行 add 方法时的数据,当然 Btrace 能做的远远不止这些,比如获取当前 jvm 堆使用情况、当前线程的执行栈等等。

参数说明

// clazz: 需要监控的类
// method: 需要监控的方法
//       clazz 和 method 可以使用正则、接口、注解等来指定
// location: 拦截位置
//     Kind.ENTRY: 进入方法的时候,调用脚本
//     Kind.RETURN: 执行完的时候,调用脚本
//     只有定义为 RETURN,才能获取方法的返回结果 @Return 和 @Duration
@OnMethod(clazz="com.metty.rpc.common.BtraceCase",
          method="add",
          location=@Location(Kind.RETURN))

如何使用 Btrace 定位问题

  • 找出所有耗时超过 1ms 的过滤器 Filter

由于 @Dutation 返回的时间是纳秒级别,需要进行转换。

  • 哪个方法调用了 System.gc(),调用栈如何?

  • 统计方法的调用次数,且每隔 1 分钟打印调用次数

Btrace 的 @OnTimer 注解可以实现定时执行脚本中的一个方法

  • 方法执行时,查看对象的实例属性值

通过反射机制,可以很方便的得到当前实例的属性值。

总结

Btrace 能做的事情太多,但使用之前切记检查脚本的可行性,一旦 Btrace 脚本侵入到系统中,只有通过重启才能恢复

参考