JVM 性能调优
JVM 如何进行性能调优?
Java 虚拟机内存模型
stack
HEAP
-Xmx
: 设置堆的最大值-Xms
: 设置堆的最小值,即 JVM 启动时,所占据的操作系统内存大小。JVM 会试图将系统内存尽可能地限制在-Xms
中,因此当内存使用量触及-Xms
指定的大小时,会触发 Full GC。因此把-Xms
值设置为-Xmx
时,可以在系统运行初期减少 GC 的次数和耗时。Xmn
: 设置新生代大小。等于把-XX:NewSize
和-XX:MaxNewSize
设置成了相同的大小。这两个如果设置成不同的值,会导致内存震荡,产生不必要的开销。-XX:NewSize
: 设置新生代的初始大小-XX:MaxNewSize
: 设置新生代的最大值
错误的把 Xmx
参数设置为了 Xmn
参数以后:
获取当前内存/最大可用内存/最大可用堆:
Runtime.getRuntime().freeMemory() / 1024 / 1024
Runtime.getRuntime().totalMemory() / 1024 / 1024
Runtime.getRuntime().maxMemory() / 1000 / 1000
逃逸分析
Java 7 开始支持对象的栈分配和逃逸分析机制,这样的机制能够将堆分配对象变成栈分配对象:
void myMethod() {
V v = new V();
// use v
v = null;
}
-server
:server
模式下,才可以启用逃逸分析-XX:DoEscapeAnalysis
: 启用逃逸分析
method area
方法区主要保存的是类的元数据:类型、常量池、字段、方法。在 Hot Spot 虚拟机中,方法区也称为永久区,同样也可以被 GC 回收。持久代的大小直接决定了系统可以支持多少个类定义和多少常量。对于使用 CGLIB 或者 Javassist 等动态字节码生成工具的应用程序而言,设置合理的持久代有利于维持系统稳定。
方法区的大小直接决定了系统可以保存多少个类,如果系统使用了一些动态代理,那么有可能会在运行时生成大量的类,如果这样,就需要设置一个合理的永久区大小,确保不发生永久区内存溢出。
-XX:MaxPermSize=4M
: 设置持久代的最大值-XX:PermSize=4M
: 设置持久代的初始大小
在 JDK 1.8 中,永久区已经被彻底移除,取而代之的是元数据区 (Metaspace),元数据区是一块堆外的直接内存,如果不指定元数据区大小的话,默认情况下,虚拟机会耗尽所有的可用系统内存。
-XX:MaxMetaspaceSize
: 指定元数据区大小
Interned Strings 放在哪里 ?
String
类型的常量池比较特殊。主要使用方法有两种:
- 直接使用双引号声明出来的
String
对象会直接存储在常量池中。 - 如果不是双引号声明的
String
对象,可以使用String
提供的intern
方法。intern
会先判断是否存在常量池中,如果不存在,则会将当前字符串放入常量池中。
JDK 6 的常量池放在 Perm
区中,默认大小只有 4 MB。JDK 7开始,放在堆中。
区域比例
-XX:SurvivorRatio=8
: 设置新生代中 eden 空间 和 S0 空间 的比例关系-XX:NewRatio=2
: 设置老生代和新生代的比例
垃圾回收算法
- 引用计数法: 虽然循环引用的问题可通过 Recycler 算法解决,但是在多线程环境下,引用计数变更也要进行昂贵的同步操作,性能较低,早期的编程语言会采用此算法。
- 标记-清除算法 (Mark-Sweep):
- 标记从根节点开始的可达对象
- 清除所有未被标记的对象
- 最大缺点: 回收后的空间是不连续的
- 复制算法 (新生代):
- 内存空间分为两块,每次只用一块
- 存活对象复制到未使用的内存块中
- 清除正在使用的内存块中的所有对象
- 交换两个内存的角色
- 适合于新生代: 垃圾对象通常多于存活对象
- 标记-压缩算法:
- 标记从根节点开始的可达对象
- 将所有存活对象 (未标记的对象) 压缩到内存的一端
- 清理边界外 (标记和未标记对象的边界) 的对象
- 分代 (Generational Collecting):
- 根据每块内存空间特点的不同,使用不同的回收算法。如新生代 (存活对象少,垃圾对象多) 使用复制算法,老年代 (大部分对象是存活对象) 使用标记-压缩算法
为了支持高频率的新生代回收,虚拟机可能使用一种叫做卡表 (Card Table) 的数据结构。卡表为一个比特位集合,每一个比特位可以用来表示老年代的某一区域中的所有对象是否持有新生代对象的引用。这样在新生代 GC 时,只需先扫描卡表,就能快速知道用不用扫描特定的老年代对象,而卡表为 0 的所在区域一定不含有新生代对象的引用。
实用 JVM 参数
- 获取堆快照。
发生 OutOfMemoryError
时,可以使用 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=C:\m.hprof
来保存当前的堆快照到文件中。也可以加上参数 -XX:OnOutOfMemoryError=c:\reset.bat
来运行一段脚本。
当发生 OutOfMemoryError
(在一个 Windows 32
系统上就发生过) 的时候,应该尝试使用增大可用堆:
java -Xmn1024M -jar xxx.jar
TODO: 思考: 如果知晓程序究竟需要多大内存?
- 获取 GC 信息
使用参数 -verbose:gc
或者 -XX:+PrintGC
来获取简要的 GC 信息,也可以使用 -XX:+PrintGCDetails
来获取更加详细的信息。如果需要在 GC 发生的时刻打印 GC 发生的时间,则可以追加 -XX:+PrintGCTimeStamps
选项以查看相对时间或者 -XX:+PrintGCDateStamps
以查看绝对时间。如果许雅查看新生对象晋升到老年代的实际阈值,可以使用参数 -XX:+PrintTenuringDistribution -XX:MaxTenuringThreshold=18
来运行程序。如果需要在 GC 时,打印详细的堆信息,则可以打开 -XX:+PrintHeapAtGC
开关。
- 控制 GC
-XX:+PrintExplicitGC
选项用于禁止显式的 GC 操作,即禁止在程序中使用 System.gc()
触发的 Full GC
。另一个有用的 GC 控制参数是 -Xincgc
,一旦启用这个参数,系统便会进行增量式的 GC。
JVM 调优的主要过程有: 确定堆内存大小 (-Xmx、-Xms
)、合理分配新生代和老年代 (-XX:NewRatio、-Xmn、-XX:SurvivorRatio
)、确定永久区大小 (-XX:Permsize、-XX:MaxPermSize
)、选择垃圾收集器、对垃圾收集器进行合理的设置。除此之外,禁用显式 GC (-XX:+DisableExplicitGC
)、禁用类元数据回收 (+Xnoclassgc
)、禁用类验证 (-Xverify:none
) 等设置,对提升系统性能也有一定的帮助。
- GC 日志示例
使用 -XX:+PrintGC
获取的 GC 日志:
[GC (Allocation Failure) GC前堆使用量20M->GC后堆使用量(当前可用堆大小90M), 本次GC花费 0.0028389 秒]
[GC (Allocation Failure) 20409K->432K(92672K), 0.0028389 secs]
同样的代码使用 -X:+PrintGCDetails
获取的 GC 日志:
[GC (Allocation Failure) [新生代: 从20M->降为0.4M(可用28M)] 整个堆从20M->将为0.4M(可用90M), 0.0151333 secs] [Times: 用户态时间耗时,系统态时间耗时,GC 实际经历的时间]
新生代 总大小 28M, 已用 13M [下界,当前上界,上界]
[GC (Allocation Failure) [PSYoungGen: 20409K->448K(28160K)] 20409K->456K(92672K), 0.0151333 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
Heap
PSYoungGen total 28160K, used 13461K [0x00000000e1380000, 0x00000000e4a80000, 0x0000000100000000)
eden space 24576K, 52% used [0x00000000e1380000,0x00000000e20356d0,0x00000000e2b80000)
from space 3584K, 12% used [0x00000000e2b80000,0x00000000e2bf0020,0x00000000e2f00000)
to space 3584K, 0% used [0x00000000e4700000,0x00000000e4700000,0x00000000e4a80000)
ParOldGen total 64512K, used 8K [0x00000000a3a00000, 0x00000000a7900000, 0x00000000e1380000)
object space 64512K, 0% used [0x00000000a3a00000,0x00000000a3a02000,0x00000000a7900000)
Metaspace used 3264K, capacity 4494K, committed 4864K, reserved 1056768K
class space used 363K, capacity 386K, committed 512K, reserved 1048576K
如果需要更为全面的堆信息,还可以使用参数 -XX:+PrintHeapAtGC
,它会在每次 GC 前后分别打印堆的信息
{Heap before GC invocations=1 (full 0):
...
Heap after GC invocations=1 (full 0):
...
}
如果需要分析 GC 发生的时间,还可以使用 -XX:+PrintGCTimeStamps
参数,该输出时间为虚拟机启动后的时间偏移量:
0.174: [GC (Allocation Failure) 20409K->504K(92672K), 0.0016586 secs]
0.179: [GC (Allocation Failure) 19415K->464K(92672K), 0.0031200 secs]
0.186: [GC (Allocation Failure) 19812K->432K(92672K), 0.0009531 secs]
由于 GC 还会引起应用程序停顿,使用参数 -XX:+PrintGCApplicationConcurrentTime
可以打印应用程序的执行时间,使用参数 -XX:+PrintGCApplicationStoppedTime
可以打印应用程序由于 GC 而产生的停顿时间:
Application time: 0.0084849 seconds
[GC (Allocation Failure) 20409K->520K(92672K), 0.0044274 secs]
Total time for which application threads were stopped: 0.0045452 seconds, Stopping threads took: 0.0000210 seconds
Application time: 0.0033066 seconds
[GC (Allocation Failure) 19431K->440K(117248K), 0.0020202 secs]
Total time for which application threads were stopped: 0.0021438 seconds, Stopping threads took: 0.0000258 seconds
Application time: 0.0082455 seconds
如果想跟踪系统内的软引用、弱引用、虚引用和 Finalize
队列,则可以使用打开 -XX:+PrintReferenceGC
开关. 使用参数 -Xloggc:log/gc.log
启动虚拟机,将 GC 日志输出到 gc.log
文件中
Java HotSpot(TM) 64-Bit Server VM (25.111-b14) for linux-amd64 JRE (1.8.0_111-b14), built on Sep 22 2016 16:14:03 by "java_re" with gcc 4.3.0 20080428 (Red Hat 4.3.0-8)
Memory: 4k page, physical 6052560k(316636k free), swap 6233084k(4248464k free)
CommandLine flags: -XX:InitialHeapSize=96840960 -XX:MaxHeapSize=1549455360 -XX:+PrintGC -XX:+PrintGCApplicationConcurrentTime -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCTimeStamps -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseParallelGC
0.183: Application time: 0.0107645 seconds
0.183: [GC (Allocation Failure) 20409K->432K(92672K), 0.0033748 secs]
0.187: Total time for which application threads were stopped: 0.0035825 seconds, Stopping threads took: 0.0000191 seconds
0.192: Application time: 0.0054269 seconds
0.193: [GC (Allocation Failure) 19343K->496K(117248K), 0.0108382 secs]
0.204: Total time for which application threads were stopped: 0.0116746 seconds, Stopping threads took: 0.0000766 seconds
0.212: Application time: 0.0084699 seconds
系统参数查看:
- -XX:+PrintVMOptions: 打印虚拟机接受的命令行显示参数
- -XX:+PrintCommandLineFlags: 打印虚拟机的显示和隐式参数
- -XX:+PrintFlagsFinal: 打印所有的系统参数的值
# 打印出系统的堆大小
java -XX:+PrintFlagsFinal -version | grep -iE 'HeapSize|PermSize|ThreadStackSize'
Minor GC、Major GC 和 Full GC
- Minor GC: 从年轻代回收垃圾,当 JVM 无法分配新对象的时候会触发 Minor GC,也就是说 Eden 区域已经满了
- Major GC: 清除 Tenured 区域
- Full GC: 清除整个堆,包括 Yound 和 Tenured 区域
JVM 的工作模式
java -version
: 查看Server VM
java -client -version
: 查看Client VM
Client
和 Server
模式下的各种参数可能会有很大不同
Heap Memory 最佳实践
- 是否分配了过多实例: 使用
jcmd 8998 GC.class_histogram
来查看各实例有多少个,也可以使用jmap -histo 8998
来获得相同的结果 - 分析堆快照: 使用 jhat、jvisualvm、mat 等工具来分析 hprof 文件
jcmd 8998 GC.heap_dump /path/to/heap_dump.hprof
jmap -dump:live,file=/path/to/heap_dump.hprof 8998
: 引入live
强制 full GC
Java Monitoring 常用工具
jstack
Jstack: Dumps the stacks of a Java 进程
jstack $PID > $DATE_DIR/jstack-$PID.dump 2>&1
jinfo
Jinfo: Provides visibility into the system properties of the JVM, and allows some system properties to be set dynamically.
[email protected]:~# jinfo 18772
Attaching to process ID 18772, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 25.144-b01
Java System Properties:
com.sun.management.jmxremote.authenticate = false
java.runtime.name = Java(TM) SE Runtime Environment
java.vm.version = 25.144-b01
...(省略好多)
VM Flags:
Non-default VM flags: -XX:CICompilerCount=3 -XX:InitialHeapSize=98566144 -XX:+ManagementServer -XX:MaxHeapSize=1549795328 -XX:MaxNewSize=516423680 -XX:MinHeapDeltaBytes=524288 -XX:NewSize=32505856 -XX:OldSize=66060288 -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseFastUnorderedTimeStamps -XX:+UseParallelGC
Command line: -Dcom.sun.management.jmxremote.port=5780 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -javaagent:/usr/lib/intellij_idea/idea-IC-172.3968.16/lib/idea_rt.jar=35487:/usr/lib/intellij_idea/idea-IC-172.3968.16/bin -Dfile.encoding=UTF-8
jstat
jstat: 提供有关 GC 和类加载活动的相关信息
显示可用的九个 options:
jstat -options
One useful option is -gcutil
, which displays the time spent in GC as well as the percentage of each GC area that is currently filled. Other options to jstat
will display the GC sizes in terms of KB.
Remember that jstat
takes an optional argument—the number of milliseconds to repeat the command—so it can monitor over time the effect of GC in an application.
jstat -gcutil process_id 1000
打印出的是:
[email protected]:~# jstat -gcutil 18772
S0 S1 E O M CCS YGC YGCT FGC FGCT GCT
0.00 71.53 97.93 34.02 96.70 93.37 29 0.133 1 0.040 0.172
gccapacity
可以显示 VM 内存中三代(young,old,perm)对象的使用和占用大小
jstat -gccapacity process_id
打印出的是:
[email protected]:~# jstat -gccapacity 18772
NGCMN NGCMX NGC S0C S1C EC OGCMN OGCMX OGC OC MCMN MCMX MC CCSMN CCSMX CCSC YGC FGC
31744.0 504320.0 30720.0 4608.0 4608.0 21504.0 64512.0 1009152.0 44032.0 44032.0 0.0 1069056.0 22272.0 0.0 1048576.0 2560.0 32 1
jmap (Memory Map)
jmap: Provides heap dumps and other information about JVM memory usage.
jmap $PID
打印的是一堆这种东西:
[email protected]:~# jmap 18772
Attaching to process ID 18772, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 25.144-b01
0x0000000000400000 7K /usr/lib/jvm/oracle_jdk8/jdk1.8.0_144/bin/java
0x00007f7072978000 98K /lib/x86_64-linux-gnu/libresolv-2.23.so
0x00007f7072b93000 26K /lib/x86_64-linux-gnu/libnss_dns-2.23.so
0x00007f7072d9a000 10K /lib/x86_64-linux-gnu/libnss_mdns4_minimal.so.2
0x00007f70737a1000 87K /lib/x86_64-linux-gnu/libgcc_s.so.1
0x00007f70739b7000 251K /usr/lib/jvm/oracle_jdk8/jdk1.8.0_144/jre/lib/amd64/libsunec.so
...(省略好多)
Print histogram(直方图;柱状图) of java object heap; if the “live” suboption is specified, only count live objects:
jmap -histo $PID
jmap -histo:live $PID
[email protected]:~# jmap -F -histo 18772
Object Histogram:
num #instances #bytes Class description
--------------------------------------------------------------------------
1: 65711 10183976 char[]
2: 13523 8919400 byte[]
3: 54732 2159368 java.lang.Object[]
4: 7341 1451792 int[]
5: 56423 1354152 java.lang.String
6: 15476 619040 java.util.TreeMap$Entry
7: 16562 529984 java.io.ObjectStreamClass$WeakClassKey
8: 11915 476600 java.util.LinkedHashMap$Entry
9: 9716 466368 java.util.HashMap
10: 3993 453312 java.lang.Class
11: 11568 370176 java.util.concurrent.ConcurrentHashMap$Node
12: 6160 306952 java.util.HashMap$Node[]
13: 4210 279856 java.util.Hashtable$Entry[]
14: 8320 266240 java.util.Vector
15: 8070 258240 java.util.HashMap$Node
16: 10495 251880 org.jsoup.nodes.Attribute
17: 4181 200688 java.util.Hashtable
...(省略好多)
Print java heap summary:
jmap -heap $PID
打印出的是一堆这种东西:
[email protected]:~# jmap -heap 18772
Attaching to process ID 18772, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 25.144-b01
using thread-local object allocation.
Parallel GC with 4 thread(s)
Heap Configuration:
MinHeapFreeRatio = 0
MaxHeapFreeRatio = 100
MaxHeapSize = 1549795328 (1478.0MB)
NewSize = 32505856 (31.0MB)
MaxNewSize = 516423680 (492.5MB)
OldSize = 66060288 (63.0MB)
NewRatio = 2
SurvivorRatio = 8
MetaspaceSize = 21807104 (20.796875MB)
CompressedClassSpaceSize = 1073741824 (1024.0MB)
MaxMetaspaceSize = 17592186044415 MB
G1HeapRegionSize = 0 (0.0MB)
Heap Usage:
PS Young Generation
Eden Space:
capacity = 23068672 (22.0MB)
used = 11772712 (11.227333068847656MB)
free = 11295960 (10.772666931152344MB)
51.03333213112571% used
From Space:
capacity = 11010048 (10.5MB)
used = 2035424 (1.941131591796875MB)
free = 8974624 (8.558868408203125MB)
18.48696754092262% used
To Space:
capacity = 11534336 (11.0MB)
used = 0 (0.0MB)
free = 11534336 (11.0MB)
0.0% used
PS Old Generation
capacity = 45088768 (43.0MB)
used = 13718432 (13.082916259765625MB)
free = 31370336 (29.917083740234375MB)
30.42538665061773% used
8999 interned Strings occupying 836656 bytes.
堆内存使用最佳实践
堆分析
(1) 查看直方图
// jcmd 命令默认就会进行 full GC
jcmd 6808 GC.class_histogram
jmap -histo 6808
// 如果指明 live: 选项,将会强制进行一个 full GC
jmap -histo:live 6808
num #instances #bytes class name
----------------------------------------------
1: 12227 1303424 [C
2: 1003 627856 [B
3: 1917 461864 [I
4: 3828 421768 java.lang.Class
5: 11665 279960 java.lang.String
6: 6065 194080 java.util.concurrent.ConcurrentHashMap$Node
7: 2794 173144 [Ljava.lang.Object;
8: 3072 122880 org.apache.lucene.index.FreqProxTermsWriter$PostingList
9: 2760 110400 java.util.LinkedHashMap$Entry
10: 1097 101144 [Ljava.util.HashMap$Node;
11: 5440 87040 java.lang.Object
12: 2680 85760 java.util.HashMap$Node
13: 520 45760 java.lang.reflect.Method
14: 44 44064 [Ljava.util.concurrent.ConcurrentHashMap$Node;
15: 781 43736 java.util.LinkedHashMap
16: 96 41088 [Lorg.apache.lucene.index.RawPostingList;
...
(2) Dump 堆
// 指明 live,强制进行 full GC
jmap -dump:live,file=/tmp/heap_dump.hprof 6808
// 或者
jmap -F -dump:format=b,file=filename.hprof 20961
// 或者简单点
jmap -F -dump:file=filename.hprof 20961
注意: 路径一定要显示指明,否则不知道默认保存到哪里去了
通常有三种工具能够分析 .hprof
文件:
jhat
jvisualvm
mat
(3) 内存溢出
内存溢出通常发生在:
- Native 内存用光了
- permgen(Java 7) 或者 metaspace(Java 8) 内存用光了
- Java 堆内存用光了
- JVM 进行 GC 的时间太长了
使用更少的内存
(1) 减少对象大小
(2) 延迟初始化 (3) 不可变对象 (4) String Interning
对象生命周期管理
JIT
(1) 编译还是解释
Languages like C++ and Fortran are called compiled languages because their programs are delivered as binary (compiled) code: the program is written, and then a static compiler produces a binary. The assembly code in that binary is targeted to a particular CPU. Complementary CPUs can execute the same binary: for example, AMD and Intel CPUs share a basic, common set of assembly language instructions, and later versions of CPUs almost always can execute the same set of instructions as previous versions of that CPU.
Languages like PHP and Perl, on the other hand, are interpreted. The same program source code can be run on any CPU as long as the machine has the correct interpreter (that is, the program called php
or perl
). The interpreter translates each line of the program into binary code as that line is executed.
Java attempts to find a middle ground here. Java applications are compiled—but instead of being compiled into a specific binary for a specific CPU, they are compiled into an idealized assembly language. This assembly language (know as Java bytecodes) is then run by the java
binary (in the same way that an interpreted PHP script is run by the php binary). This gives Java the platform independence of an interpreted language. Because it is executing an idealized binary code, the java program is able to compile the code into the platform binary as the code executes. This compilation occurs as the program is executed: it happens “just in time.
(2) HotSpot 名字的含义
In a typical program, only a small subset of code is executed frequently, and the performance of an application depends primarily on how fast those sections of code are executed. These critical sections are known as the hot spots of the application; the more the section of code is executed, the hotter that section is said to be.
Hence, when the JVM executes code, it does not begin compiling the code immediately. There are two basic reasons for this. First, if the code is going to be executed only once, then compiling it is essentially a wasted effort; it will be faster to interpret the Java bytecodes than to compile them and execute (only once) the compiled code.
the more times that the JVM executes a particular method or loop, the more information it has about that code. This allows the JVM to make a number of optimizations when it compiles the code.
(3) 寄存器和内存
If the value of sum were to be retrieved from (and stored back to) main memory on every iteration of this loop, performance would be dismal. Instead, the compiler will load a register with the initial value of sum, perform the loop using that value in the register, and then (at an indeterminate point in time) store the final result from the register back to main memory.
Register usage is a general optimization of the compiler, and when escape analysis is enabled (see the end of this chapter), register use is quite aggressive.
(4) 选择 Java 编译器
- A 32-bit client version (
-client
) - A 32-bit server version (
-server
) - A 64-bit server version (
-d64
)
For the sake of compatibility, the argument specifying which compiler to use is not rigorously followed. If you have a 64-bit JVM and specify -client
, the application will use the 64-bit server compiler anyway. If you have a 32 bit JVM and you specify -d64
, you will get an error that the given instance does not support a 64-bit JVM.
The client compiler begins compiling sooner than the server compiler does. code produced by the server compiler will be faster than that produced by the client compiler. couldn’t the JVM start with the client compiler, and then use the server compiler as code gets hotter? That technique is known as tiered compilation. With tiered compilation, code is first compiled by the client compiler; as it becomes hot, it is recompiled by the server compiler.
# Java 7 需要打开, Java 8 默认开启
-server -XX:+TieredCompilation
- For GUI programs, uses the client compiler by default. Performance is often all about perception: if the initial startup seems faster, and everything else seems fine, users will tend to view the program that has started faster as being faster overall.
- For long-running applications, always choose the server compiler, preferably in conjunction with tiered compilation.
查看默认编译器:
java -version
(5) 更多考虑因素
Code Cache: When the JVM compiles code, it holds the set of assembly-language instructions in the code cache. Code Cache 有固定大小, and once it has filled up, the JVM is not able to compile any additional code.
编译阈值: The major factor involved here is 多频繁 the code is executed; once it is executed a certain number of times, its compilation threshold is reached, and the compiler deems that it has enough information to compile the code.
Compilation is based on two counters in the JVM: 方法调用次数, and 方法内循环的实际次数. When the JVM executes a Java method, it checks the sum of those two counters and decides whether or not the method is eligible for compilation. This kind of compilation has no official name but is often called standard compilation (标准编译).
But what if the method has a really long loop—or one that never exits and provides all the logic of the program? In that case, the JVM needs to compile the loop without waiting for a method invocation. So every time the loop completes an execution, the branching counter is incremented and inspected. If the branching counter has exceeded its individual threshold, then the loop (and not the entire method) becomes eligible for compilation.
This kind of compilation is called on-stack replacement (OSR), because even if the loop is compiled, that isn’t sufficient: the JVM has to have the ability to start executing the compiled version of the loop while the loop is still running. When the code for the has finished compiling, the JVM replaces the code (on-stack), and the next iteration of the loop will execute the much-faster compiled version of the code (下一次循环就是编译版本了).
Standard compilation is triggered by the value of the -XX:CompileThreshold=N
flag. The default value of N for the client compiler is 1,500
; for the server compiler it is 10,000
.
查看编译过程: -XX:+PrintCompilation
.
jstat
has two options to provide information about the compiler. The -compiler
option supplies summary information about 多少方法被编译了 (here 5003 is the process ID of the program to be inspected):
jstat -compiler 5003
lternately, you can use the -printcompilation
option to get information about the 最后一个方法 that is compiled. In this example, jstat
repeats the information for process ID 5003 every second (1,000 ms):
jstat -printcompilation 5003 1000
编译线程个数:
内联:
One of the most important optimizations the compiler makes is to inline methods.
public class Point {
private int x, y;
public void getX() { return x; }
public void setX(int i) { x = i; }
}
当你写这样代码的时候:
Point p = getPoint();
p.setX(p.getX() * 2);
编译后的代码执行的将会是:
Point p = getPoint();
p.x = p.x * 2;
The basic decision about whether to inline a method depends on 多频繁 and 大小. The JVM determines if a method is hot (i.e., called frequently) based on an internal calculation; it is not directly subject to any tunable parameters. If a method is eligible for inlining because it is called frequently, then it will be inlined only if its 字节码大小小于 325 字节 (or whatever is specified as the -XX:MaxFreqInlineSize=N
flag). Otherwise, it is eligible for inlining only if it is small: 小于 35 字节 (or whatever is specified as the -XX:MaxInlineSize=N
flag)
逃逸分析:
The server compiler performs some very aggressive optimizations if escape analysis is enabled (-XX:+DoEscapeAnalysis
, 默认开启).
public class Factorial {
private BigInteger factorial;
private int n;
public Factorial(int n) {
this.n = n;
}
public synchronized BigInteger getFactorial() {
if (factorial == null)
factorial = ...;
return factorial;
}
}
The factorial
object is referenced only inside that loop; no other code can ever access that object. Hence, the JVM is free to perform a number of optimizations on that object:
- It needn’t get a synchronization lock when calling the
getFactorial()
method. - It needn’t store the field
n
in memory; it can keep that value in aregister
. Similarly it can store thefactorial
object reference in a register. - In fact, it needn’t allocate an actual
factorial
object at all; it can just keep track of the individual fields of the object.
(6) Deoptimization
Deoptimization means that the compiler 不得不撤销一些优化; the effect is that the performance of the application will be reduced—at least until the compiler can recompile the code in question. There are two cases of deoptimization: when code is “made not entrant,” and when code is “made zombie”.
Not Entrant Code:
There are two things that cause code to be made not entrant. One is due to the way classes and interfaces work, and one is an implementation detail of tiered compilation
StockPriceHistory sph;
String log = request.getParameter("log");
if (log != null && log.equals("true")) {
sph = new StockPriceHistoryLogger(...);
}
else {
sph = new StockPriceHistoryImpl(...);
}
// Then the JSP makes calls to:
sph.getHighPrice();
sph.getStdDev();
// and so on
If a bunch of calls are made to http://localhost:8080/StockServlet
(that is, without the log parameter), the compiler will see that the actual type of the sph object is StockPriceHistoryImpl
. It will then inline code and perform other optimizations based on that knowledge. Later, say a call is made to http://localhost:8080/StockServlet?log=true
. Now the assumption the compiler made regarding the type of the sph object is false; the previous optimizations are no longer valid. This generates a deoptimization trap, and the previous optimizations are discarded. If a lot of additional calls are made with logging enabled, the JVM will quickly end up compiling that code and making new optimizations.
In tiered compilation, code is compiled by the client compiler, and then later compiled by the server compiler (and actually it’s a little more complicated than that, as discussed in the next section). When the code compiled by the server compiler is ready, the JVM must replace the code compiled by the client compiler. It does this by 将旧代码标记为 Not Entrant and using the same mechanism to substitute the newly compiled (and more efficient) code.
Deoptimizing Zombie Code:
Recall that the compiled code is held in a fixedsize code cache; when zombie methods are identified, it means that the code in question can be removed from the code cache, making room for other classes to be compiled (or limiting the amount of memory the JVM will need to allocate later).
The possible downside here is that if the code for the class is made zombie and then later reloaded and heavily used again, the JVM will need to recompile and reoptimize the code.
远程 JVisualVM
远程机器上输入 jstatd
:
Could not create remote object
access denied ("java.util.PropertyPermission" "java.rmi.server.ignoreSubClasses" "write")
java.security.AccessControlException: access denied ("java.util.PropertyPermission" "java.rmi.server.ignoreSubClasses" "write")
at java.security.AccessControlContext.checkPermission(AccessControlContext.java:472)
at java.security.AccessController.checkPermission(AccessController.java:884)
at java.lang.SecurityManager.checkPermission(SecurityManager.java:549)
at java.lang.System.setProperty(System.java:792)
at sun.tools.jstatd.Jstatd.main(Jstatd.java:139)
你需要创建一个安全策略文件: jstatd.all.policy
,里面写上这句话:
grant codebase "file:/opt/java/jdk1.7.0_21/lib/tools.jar" { permission java.security.AllPermission; };
然后使用如下命令重新启动:
jstatd -J -Djava.security.policy=/home/user/jstatd.all.policy
在本机测试,是否能够 telnet
到 jstatd
服务:
telnet 10.108.112.218 1099
有些时候,jstatd
可能绑定的并不是正确的网卡:
-J-Djava.rmi.server.hostname=10.1.1.123
强制使用 IPV4
:
-J-Djava.net.preferIPv4Stack=true
查看一些日志输出:
-J-Djava.rmi.server.logCalls=true
最后的命令:
jstatd -J-Djava.security.policy=./jstatd.all.policy -J-Djava.rmi.server.hostname=10.108.112.218 -J-Djava.rmi.server.logCalls=true
DUMP 什么
以下是 dubbo - dump.sh 备份的内容:
DUMP_DATE=`date +%Y%m%d%H%M%S`
DATE_DIR=$DUMP_DIR/$DUMP_DATE
echo -e "Dumping the $SERVER_NAME ...\c"
for PID in $PIDS ; do
jstack $PID > $DATE_DIR/jstack-$PID.dump 2>&1
echo -e ".\c"
jinfo $PID > $DATE_DIR/jinfo-$PID.dump 2>&1
echo -e ".\c"
jstat -gcutil $PID > $DATE_DIR/jstat-gcutil-$PID.dump 2>&1
echo -e ".\c"
jstat -gccapacity $PID > $DATE_DIR/jstat-gccapacity-$PID.dump 2>&1
echo -e ".\c"
jmap $PID > $DATE_DIR/jmap-$PID.dump 2>&1
echo -e ".\c"
jmap -heap $PID > $DATE_DIR/jmap-heap-$PID.dump 2>&1
echo -e ".\c"
jmap -histo $PID > $DATE_DIR/jmap-histo-$PID.dump 2>&1
echo -e ".\c"
if [ -r /usr/sbin/lsof ]; then
/usr/sbin/lsof -p $PID > $DATE_DIR/lsof-$PID.dump
echo -e ".\c"
fi
done
if [ -r /bin/netstat ]; then
/bin/netstat -an > $DATE_DIR/netstat.dump 2>&1
echo -e ".\c"
fi
if [ -r /usr/bin/iostat ]; then
/usr/bin/iostat > $DATE_DIR/iostat.dump 2>&1
echo -e ".\c"
fi
if [ -r /usr/bin/mpstat ]; then
/usr/bin/mpstat > $DATE_DIR/mpstat.dump 2>&1
echo -e ".\c"
fi
if [ -r /usr/bin/vmstat ]; then
/usr/bin/vmstat > $DATE_DIR/vmstat.dump 2>&1
echo -e ".\c"
fi
if [ -r /usr/bin/free ]; then
/usr/bin/free -t > $DATE_DIR/free.dump 2>&1
echo -e ".\c"
fi
if [ -r /usr/bin/sar ]; then
/usr/bin/sar > $DATE_DIR/sar.dump 2>&1
echo -e ".\c"
fi
if [ -r /usr/bin/uptime ]; then
/usr/bin/uptime > $DATE_DIR/uptime.dump 2>&1
echo -e ".\c"
fi
从上可知一般统计的都有如下几项:
jstack
: 线程信息jinfo
: 配置信息. The configuration information includes Java system properties and Java Virtual Machine (JVM) command-line flags.jstat -gcutil
: 垃圾收集统计jstat -gccapacity
: Displays statistics about the capacities of the generations and their corresponding spaces.jmap
: Prints 共享对象内存 maps or 堆内存 details for a process, core file, or remote debug server.jmap -heap
: Prints a heap summary of the garbage collection used, the head configuration, and generation-wise heap usage. In addition, the number and size of interned Strings are printed.jmap -histo
: Prints a histogram of the heaplsof -p
netstat -an
iostat
: Report Central Processing Unit (CPU) statistics and input/output statistics for devices, partitions and network filesystems (NFS).mpstat
: Report 处理器 related statistics.vmstat
: vmstat (virtual memory statistics) is a computer system monitoring tool that collects and displays summary information about operating system memory, processes, interrupts, paging and block I/O.free -t
: Display amount of 可用/已用内存 in the system.-t
: Display a line showing the column totals.sar
: In computing, sar (System Activity Report) is a Unix System V-derived system monitor command used to report on various system loads, including CPU 活动, memory/paging, 设备负载, 网络. Linux distributions providesar
through thesysstat
package.uptime
: uptime gives a one line display of the following information. The 当前时间, 多长时间 the system has been running, 多少用户 are currently logged on, and the 系统平均负载 averages for the past 1, 5, and 15 minutes.
实际运用中如何清晰明了地观察 JVM 的运行过程?
- 图形工具:
JProfiler
,JConsole
,Java VisualVM
- 命令:
jps
,jstack
,jmap
,jhat
,jstat
JVM 如何进阶
问:JVM
如何进阶,目前周志明的《深入理解JVM》第2版看了两遍,能够根据目录口述书中大部分内容,还需要了解哪些知识?
答:周志明的书只能算是 JVM
的入门书籍。接下来你应该去读一读**《Java虚拟机规范》**,周志明的书很多内容是从里面来的,但是规范本身比较详细,注意读英文原版。其次去读一下Oralce的文档:**《Hotspot Memory Management white paper》, 《Java Platform, Standard Edition HotSpot Virtual Machine Garbage Collection Tuning Guide》**。现在你需要进一步修炼关于**内存管理**的部分,阅读比如**《垃圾回收算法与实现》**,如果这本读完还不满足,那么阅读**《自动内存管理艺术——垃圾回收算法手册》**。到了这一步,理论你已经掌握得很好了,是时候把 Hotspot
源码 download 下来编译好之后断点调试玩玩了,这个时候我要推荐你今年阿里人刚出的**《揭秘Java虚拟机》**,不过阅读这本书之前你要是愿意先读完**《深入理解计算机系统》**效果更好。到了这一步,剩下的,自己探索了,我也在探索。
JVM 分析
CPU 高
问: 线上CPU很高、内存占用很少,有能快速查找到原因的方法吗?
答: 给一个代码,在 Linux
下保存成 .sh
文件直接执行即可。
#!/bin/sh
ts=$(date +"%s")
jvmPid=$1
defaultLines=100
defaultTop=20
threadStackLines=${2:-$defaultLines}
topThreads=${3:-$defaultTop}
jvmCapture=$(top -b -n1 | grep java )
threadsTopCapture=$(top -b -n1 -H | grep java )
jstackOutput=$(echo "$(jstack $jvmPid )" )
topOutput=$(echo "$(echo "$threadsTopCapture" | head -n $topThreads | perl -pe 's/\e\[?.*?[\@-~] ?//g' | awk '{gsub(/^ +/,"");print}' | awk '{gsub(/ +|[+-]/," ");print}' | cut -d " " -f 1,9 )\n ")
echo "*************************************************************************************************************"
uptime
echo "Analyzing top $topThreads threads"
echo "*************************************************************************************************************"
printf %s "$topOutput" | while IFS= read line
do
pid=$(echo $line | cut -d " " -f 1)
hexapid=$(printf "%x" $pid)
cpu=$(echo $line | cut -d " " -f 2)
echo -n $cpu"% [$pid] "
echo "$jstackOutput" | grep "tid.*0x$hexapid " -A $threadStackLines | sed -n -e '/0x'$hexapid'/,/tid/ p' | head -n -1
echo "\n"
done
echo "\n"
代码的意思,打印出 JVM
的所有线程以及按照 CPU
占比排序。
问: 您好,想问一个 JVM 比较基础的知识,现在的垃圾收集都是分代回收,那么在回收新生代的时候是要同时扫描老年代吗?是全表还是有一种策略,比如 G1 的 Remembered set,这个 set 只是记录了一种引用关系;那其它的分代回收,比如 CMS 和 ParNew 组合时只能是回收新生代的时候扫描老年代吗?那这样效率不就是降低了不少吗?
答:对于老年代指向新生代的引用,JVM提供了一种叫 card table
的数据结构,所以每次并不需要全量遍历老年代,只需要遍历 card table
就行了。
CPU 占用
使用命令 top -Hp pid
这个时候 top
命令显示的最左侧的就是这个 java 应用内部的线程 id,不过是 10 进制的,使用 printf "%X\n" pid
转为 16 进制
OOM
问: 线上定位内存 JVM 内存溢出,除了打印堆栈拿出来分析,还有没有其它的方式?
答:导出 JVM dump
文件,在本地使用 Eclipse
插件 MAT
分析,可视化的分析最方便、直观、有效。
MAT
1) The Dominator Tree:
The key to understanding your retained heap, is looking at the dominator tree. The dominator tree is a tree produced by the complex object graph in your system. The dominator tree allows you to identify the largest memory graphs. An Object X is said to dominate an Object Y if every path from the Root to Y must pass through X.
https://javaeesupportpatterns.blogspot.jp/2013/03/openjpa-memory-leak-case-study.html
JVM 诊断示例
1) 健康的 JVM
:
2) 启动内存暴涨:
3) 激增:
4) 内存泄露
JVisualVM
需要安装一个 Visual GC
插件:
才能显示具体的 GC
过程:
如何在生产环境使用 Btrace 进行调试
大多数问题的解决方式都是在本地打断点进行调试,或者在测试环境利用输出日志进行调试,这种方式简单粗暴,但过程比较繁琐,需要各种重新发布,重启应用,还不能保证一次就找到问题的根源。
BTrace
是 sun 公司推出的一款 Java
动态、安全追踪(监控)工具,可以在不用重启的情况下监控系统运行情况,方便的获取程序运行时的数据信息,如方法参数、返回值、全局变量和堆栈信息等,并且做到最少的侵入,占用最少的系统资源。
由于 Btrace 会把脚本逻辑直接侵入到运行的代码中,所以在使用上做很多限制:
- 不能创建对象
- 不能使用数组
- 不能抛出或捕获异常
- 不能使用循环
- 不能使用
synchronized
关键字 - 属性和方法必须使用
static
修饰
根据官方声明,不恰当的使用 BTrace 可能导致 JVM 崩溃,如在 BTrace 脚本使用错误的 class 文件,所以在上生产环境之前,务必在本地充分的验证脚本的正确性。
Btrace 可以做什么?
- 接口性能变慢,分析每个方法的耗时情况;
- 当在 Map 中插入大量数据,分析其扩容情况;
- 分析哪个方法调用了
System.gc()
- 执行某个方法抛出异常时,分析运行时参数;
- …
假设服务器端运行的是如下代码:
public class BtraceCase {
public static Random random = new Random();
public int size;
public static void main(String[] args) throws Exception {
new BtraceCase().run();
}
public void run() throws Exception {
while (true) {
add(random.nextInt(10), random.nextInt(10));
}
}
public int add(int a, int b) throws Exception {
Thread.sleep(random.nextInt(10) * 100);
return a + b;
}
}
我们想要对 add
方法的传入参数、返回值和执行耗时进行分析:
通过 jps
获取服务器端的进程ID: 8454,执行命令
btrace 8454 Debug.java
实现对运行代码的监控:
可以发现,Btrace
可以获取每次执行 add
方法时的数据,当然 Btrace
能做的远远不止这些,比如获取当前 jvm
堆使用情况、当前线程的执行栈等等。
参数说明
// clazz: 需要监控的类
// method: 需要监控的方法
// clazz 和 method 可以使用正则、接口、注解等来指定
// location: 拦截位置
// Kind.ENTRY: 进入方法的时候,调用脚本
// Kind.RETURN: 执行完的时候,调用脚本
// 只有定义为 RETURN,才能获取方法的返回结果 @Return 和 @Duration
@OnMethod(clazz="com.metty.rpc.common.BtraceCase",
method="add",
location=@Location(Kind.RETURN))
如何使用 Btrace 定位问题
- 找出所有耗时超过 1ms 的过滤器
Filter
由于 @Dutation
返回的时间是纳秒级别,需要进行转换。
- 哪个方法调用了
System.gc()
,调用栈如何?
- 统计方法的调用次数,且每隔 1 分钟打印调用次数
Btrace 的 @OnTimer
注解可以实现定时执行脚本中的一个方法
- 方法执行时,查看对象的实例属性值
通过反射机制,可以很方便的得到当前实例的属性值。
总结
Btrace
能做的事情太多,但使用之前切记检查脚本的可行性,一旦 Btrace
脚本侵入到系统中,只有通过重启才能恢复。
参考
- 《Java 程序性能优化》
- Java (JVM) Memory Model – Memory Management in Java
- find which type of garbage collector is running
- Default garbage collector for Java 8
- Getting Started with the G1 Garbage Collector
- cms
- Minor GC vs Major GC vs Full GC
- 《Java Performance: The Definitive Guide》
- 《大话 Java 性能调优》
- 《深入理解 JVM & G1 GC》