Cassandra G1 GC

Cassandra G1 GC使用,案例
参考文档:
https://medium.com/@mlowicki/move-cassandra-2-1-to-g1-garbage-collector-b9fb27365509#.7ugg78nq7
http://www.oracle.com/technetwork/cn/articles/java/g1gc-1984535-zhs.html

G1 GC

关键词:调整

G1 GC 有一个力求达到的暂停时间目标(软实时)。在年轻代回收期间,G1 GC 会调整其年轻代空间(eden 和存活空间大小)以满足软实时目标。
在混合回收期间,G1 GC 会根据混合垃圾回收的目标次数调整所回收的旧区域数量,并调整堆的每个区域中存活对象的百分比,以及总体可接受的堆废物百分比。

从CMS迁移到G1

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
#MAX_HEAP_SIZE="8192M"
#HEAP_NEWSIZE="2048M"
MAX_HEAP_SIZE="16384M"

#calculate_heap_sizes函数定义的地方可以不用删除,只要调用的地方去掉就可以
#if [ "x$MAX_HEAP_SIZE" = "x" ] && [ "x$HEAP_NEWSIZE" = "x" ]; then
# calculate_heap_sizes
#else
# if [ "x$MAX_HEAP_SIZE" = "x" ] || [ "x$HEAP_NEWSIZE" = "x" ]; then
# echo "please set or unset MAX_HEAP_SIZE and HEAP_NEWSIZE in pairs (see cassandra-env.sh)"
# exit 1
# fi
#fi

JVM_OPTS="$JVM_OPTS -Xms${MAX_HEAP_SIZE}"
JVM_OPTS="$JVM_OPTS -Xmx${MAX_HEAP_SIZE}"
#JVM_OPTS="$JVM_OPTS -Xmn${HEAP_NEWSIZE}" #G1不需要配置New
#JVM_OPTS="$JVM_OPTS -Xms16G" #这些原本是不存在的,都去掉
#JVM_OPTS="$JVM_OPTS -Xmx16G"
#JVM_OPTS="$JVM_OPTS -Xmn4G"
#JVM_OPTS="$JVM_OPTS -XX:MaxDirectMemorySize=6G"
#JVM_OPTS="$JVM_OPTS -XX:+HeapDumpOnOutOfMemoryError"

#JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC"
#JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC"
#JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled"
#JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=8"
#JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=1"
#JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75"
#JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly"
JVM_OPTS="$JVM_OPTS -XX:+UseTLAB"
JVM_OPTS="$JVM_OPTS -XX:CompileCommandFile=$CASSANDRA_CONF/hotspot_compiler"
#JVM_OPTS="$JVM_OPTS -XX:CMSWaitDuration=10000"
JVM_OPTS="$JVM_OPTS -XX:+UseG1GC"
JVM_OPTS="$JVM_OPTS -XX:MaxGCPauseMillis=500"
JVM_OPTS="$JVM_OPTS -XX:G1RSetUpdatingPauseTimePercent=5"
JVM_OPTS="$JVM_OPTS -XX:+AlwaysPreTouch"
JVM_OPTS="$JVM_OPTS -XX:-UseBiasedLocking"

## note: bash evals ‘1.7.x’ as > ‘1.7’ so this is really a >= 1.7 jvm check
#if { [ "$JVM_VERSION" \> "1.7" ] && [ "$JVM_VERSION" \< "1.8.0" ] && [ "$JVM_PATCH_VERSION" -ge "60" ]; } || [ "$JVM_VERSION" \> "1.8" ] ; then
# JVM_OPTS="$JVM_OPTS -XX:+CMSParallelInitialMarkEnabled -XX:+CMSEdenChunksRecordAlways -XX:CMSWaitDuration=10000"
#fi

cassandra.yaml需要修改的配置:

1
2
table_allocation_type: offheap_objects
memtable_offheap_space_in_mb: 2048

CMS切换为G1脚本:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
cd /usr/install/cassandra/conf
mv cassandra-env.sh /usr/install/cassandra/conf/cassandra-env.sh.cms_20160621
host=`ifconfig | grep "192.168.4" | awk '/inet addr/{sub("addr:",""); print $2}'`
comp=`cat /usr/install/cassandra/conf/cassandra.yaml | grep compactors`
wget http://192.168.47.211:8000/cassandra-env.sh
sed -i -e "s/localhost/$host/g" cassandra-env.sh
sed -i -e "s/heap_buffers/offheap_objects/g" cassandra.yaml

echo "memtable_offheap_space_in_mb: 2048" >> cassandra.yaml
sed -i -e "s/$comp/concurrent_compactors: 4/g" cassandra.yaml

/usr/install/cassandra/bin/nodetool flush
/usr/install/cassandra/bin/nodetool stopdaemon
/usr/install/cassandra/bin/cassandra

比较GC时间

使用CMS,FGC问题很严重

1
2
3
4
5
6
7
8
9
10
11
12
13
14
163仍然使用CMS,22号:
[admin@fp-cass048163 ~]$ jstat -gcutil 30764 1000
S0 S1 E O P YGC YGCT FGC FGCT GCT
0.00 3.91 58.22 29.55 59.81 1554565 45480.263 7069 2087.903 47568.166
6.93 0.00 5.53 29.62 59.81 1554566 45480.290 7069 2087.903 47568.193
6.93 0.00 46.36 29.62 59.81 1554566 45480.290 7069 2087.903 47568.193
24号:
[admin@fp-cass048163 ~]$ jstat -gcutil 30764 1000
S0 S1 E O P YGC YGCT FGC FGCT GCT
0.00 6.79 26.05 68.01 59.81 1613189 47191.337 7335 2159.359 49350.696
0.00 6.79 63.72 68.01 59.81 1613189 47191.337 7335 2159.359 49350.696
6.86 0.00 2.59 68.13 59.81 1613190 47191.368 7335 2159.359 49350.726
6.86 0.00 38.73 68.13 59.81 1613190 47191.368 7335 2159.359 49350.726
6.86 0.00 77.89 68.13 59.81 1613190 47191.368 7335 2159.359 49350.726
统计时间 GC次数(超过200ms) 超过1s
2016-06-01 73 16
2016-06-02 469 103
2016-06-03 81 2
2016-06-04 228 28
2016-06-05 618 137
2016-06-06 312 47
2016-06-07 63 9
2016-06-08 273 60
2016-06-09 64 7
2016-06-10 66 8
2016-06-11 79 8
2016-06-12 1037 244
2016-06-13 1160 277
2016-06-14 65 9
2016-06-15 82 10
2016-06-16 880 221
2016-06-17 622 137
2016-06-18 1454 343
2016-06-19 68 5
2016-06-20 654 175
2016-06-21 338 87

最后一条数据6-21号这天数据不是很准,因为这一天从CMS切换到G1

使用G1

在162上测试,YGC超过200ms的都很少了

1
2
3
4
5
[admin@fp-cass048162 logs]$ tail -f system.log | grep GC
2016-06-21 11:14:12,947 - G1 Young Generation GC in 246ms. G1 Eden Space: 8917090304 -> 0; G1 Old Gen: 6059796976 -> 6071479536; G1 Survivor Space: 167772160 -> 293601280
2016-06-21 11:15:36,990 - G1 Young Generation GC in 245ms. G1 Eden Space: 8506048512 -> 0; G1 Old Gen: 6200014560 -> 6216498552;
2016-06-21 13:22:22,631 - G1 Young Generation GC in 283ms. G1 Eden Space: 8849981440 -> 0; G1 Old Gen: 5812656096 -> 5823218856; G1 Survivor Space: 327155712 -> 335544320
2016-06-21 13:50:59,920 - G1 Young Generation GC in 241ms. G1 Eden Space: 10032775168 -> 0; G1 Old Gen: 4619885032 -> 4633499088; G1 Survivor Space: 268435456 -> 276824064

✅ 162开启G1,22号到24号为止,没有发生一次FGC,下面是24号统计的信息:YGCT/YGC=2496/28780=0.086=86ms

1
2
3
4
5
6
7
[admin@fp-cass048162 ~]$ jstat -gcutil 47069 1000
S0 S1 E O P YGC YGCT FGC FGCT GCT
0.00 100.00 12.13 78.58 97.10 28780 2496.678 0 0.000 2496.678
0.00 100.00 24.17 78.58 97.10 28780 2496.678 0 0.000 2496.678
0.00 100.00 35.87 78.58 97.10 28780 2496.678 0 0.000 2496.678
0.00 100.00 47.74 78.58 97.10 28780 2496.678 0 0.000 2496.678
0.00 100.00 59.27 78.58 97.10 28780 2496.678 0 0.000 2496.678

✅ 171开启G1,22号(运行一天,没有发生一次FGC):YGCT/YGC=494/6258=0.078=78ms

1
2
3
4
5
6
[admin@cass048171 ~]$ jstat -gcutil 27229 1000
S0 S1 E O P YGC YGCT FGC FGCT GCT
0.00 100.00 60.11 80.74 96.16 6258 494.761 0 0.000 494.761
0.00 100.00 74.05 80.74 96.16 6258 494.761 0 0.000 494.761
0.00 100.00 88.07 80.74 96.16 6258 494.761 0 0.000 494.761
0.00 100.00 7.06 80.81 96.16 6259 494.846 0 0.000 494.846

GC统计历史数据

注:由于Cassandra日志文件会进行滚动,所以当前日志文件用grep,以前的文件用zgrep统计
统计CMS一般在压缩文件中查找:ConcurrentMarkSweep
统计G1一般在当前文件中查找:G1 Old Generation GC(当然也可以在压缩文件中找)

所有GC统计(当前文件,压缩文件):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
当前文件,所有GC,按天统计
$ grep 'GC' /usr/install/cassandra/logs/system.log | awk '{print $4}' | sort | uniq -c | awk '{print $2" "$1}' | tail
2016-06-25 29
2016-06-26 55
压缩文件,所有GC,按天统计
$ zgrep 'GC' /usr/install/cassandra/logs/system.log.*.zip | awk '{print $4}' | sort | uniq -c | awk '{print $2" "$1}' | tail
2016-06-16 880
2016-06-17 622
2016-06-18 1454
2016-06-19 68
2016-06-20 654
2016-06-21 362
2016-06-22 22
2016-06-23 84
2016-06-24 46
2016-06-25 62

超过10s的GC(当前文件,压缩文件)

1
2
3
4
5
6
7
8
9
10
11
12
13
$ zgrep 'GC in [0-9][0-9][0-9][0-9]' /usr/install/cassandra/logs/system.log.*.zip | awk '{print $4}' | sort | uniq -c | awk '{print $2" "$1}' | tail
2016-06-13 280
2016-06-14 10
2016-06-15 11
2016-06-16 224
2016-06-17 139
2016-06-18 346
2016-06-19 5
2016-06-20 177
2016-06-21 88
2016-06-23 1

$ grep 'GC in [0-9][0-9][0-9][0-9]' /usr/install/cassandra/logs/system.log | awk '{print $4}' | sort | uniq -c | awk '{print $2" "$1}' | tail

当前文件统计(所有GC,超过10s)***

1
2
3
4
5
6
当前文件,所有GC,按时段统计
grep 'GC' /usr/install/cassandra/logs/system.log | awk '{print $4":"substr($5,0,2)}' | uniq -c | awk '{print $2" "$1}' | sort
当前文件,超过10s的所有GC,按天统计
grep 'GC in [0-9][0-9][0-9][0-9]' /usr/install/cassandra/logs/system.log | awk '{print $4}' | uniq -c | awk '{print $2" "$1}' | sort
当前文件,超过10s的GC,按时段统计
grep 'GC in [0-9][0-9][0-9][0-9]' /usr/install/cassandra/logs/system.log | awk '{print $4":"substr($5,0,2)}' | uniq -c | awk '{print $2" "$1}' | sort

压缩文件统计(所有GC,超过10s)

1
2
3
4
压缩文件,所有GC,按时段统计
zgrep 'GC' /usr/install/cassandra/logs/system.log.1.zip | awk '{print $4":"substr($5,0,2)}' | sort | uniq -c | awk '{print $2" "$1}'
压缩文件,超过10s的所有GC,按天统计
zgrep 'GC in [0-9][0-9][0-9][0-9]ms' /usr/install/cassandra/logs/system.log.1.zip | awk '{print $4}' | sort | uniq -c | awk '{print $2" "$1}'

压缩文件的CMS,当前文件的G1

1
2
3
4
5
6
7
8
9
10
11
12
13
14
压缩文件,CMS GC,按天统计
$ zgrep 'ConcurrentMarkSweep' /usr/install/cassandra/logs/system.log.*.zip | awk '{print $4}' | sort | uniq -c | awk '{print $2" "$1}' | tail
2016-06-12 1023
2016-06-13 1138
2016-06-14 60
2016-06-15 75
2016-06-16 868
2016-06-17 612
2016-06-18 1438
2016-06-19 62
2016-06-20 635
2016-06-21 326
当前文件,G1 GC,按天统计
$ grep 'G1 Old Generation GC' /usr/install/cassandra/logs/system.log | awk '{print $4}' | sort | uniq -c | awk '{print $2" "$1}'

并不是说不会FGC了

1
2
3
4
[admin@fp-cass048162 ~]$ grep 'GC' /usr/install/cassandra/logs/system.log | wc -l
134
[admin@fp-cass048162 ~]$ grep 'GC in [0-9][0-9][0-9][0-9]' /usr/install/cassandra/logs/system.log | wc -l
1

咦,还是有一个超过10s的GC:

1
2
3
4
2016-06-23 08:09:25,970 - G1 Young Generation GC in 206ms.   G1 Eden Space: 9646899200 -> 0; G1 Old Gen: 4639205352 -> 5004258312;   G1 Survivor Space: 511705088 -> 176160768
2016-06-23 09:47:38,757 - G1 Young Generation GC in 3770ms. G1 Eden Space: 8027897856 -> 0; G1 Old Gen: 6979980640 -> 13109433016; G1 Survivor Space: 150994944 -> 1023410176
2016-06-23 09:47:39,607 - G1 Young Generation GC in 780ms. G1 Eden Space: 159383552 -> 0; G1 Old Gen: 13109433016 -> 14160738744; G1 Survivor Space: 1023410176 -> 75497472
2016-06-23 10:18:11,557 - G1 Young Generation GC in 257ms. G1 Eden Space: 771751936 -> 0; G1 Old Gen: 6197828584 -> 5781983928; G1 Survivor Space: 83886080 -> 92274688

同样的情况发生在171上:24号:
YGCT/YGC=3051/57162=0.053=53ms
FTCT/FGC=63/2=30s!!!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
[admin@cass048171 ~]$ jstat -gcutil 27229 1000
S0 S1 E O P YGC YGCT FGC FGCT GCT
0.00 100.00 25.22 86.72 96.99 57162 3051.206 2 63.878 3115.084
0.00 100.00 53.48 86.72 96.99 57162 3051.206 2 63.878 3115.084
0.00 100.00 78.70 86.65 96.99 57162 3051.206 2 63.878 3115.084

[admin@cass048171 ~]$ grep 'GC in [0-9][0-9][0-9][0-9]' /usr/install/cassandra/logs/system.log | wc -l
4
[admin@cass048171 ~]$ grep 'GC' /usr/install/cassandra/logs/system.log | wc -l
24

2016-06-23 10:33:10,706 - G1 Young Generation GC in 353ms. G1 Eden Space: 10049552384 -> 0; G1 Old Gen: 5018700376 -> 5039128024; G1 Survivor Space: 134217728 -> 914358272
2016-06-23 10:33:36,366 - G1 Young Generation GC in 21135ms. G1 Eden Space: 8438939648 -> 0; G1 Old Gen: 5039128024 -> 13030534304; G1 Survivor Space: 914358272 -> 1174405120 🙅
2016-06-23 10:33:37,667 - G1 Young Generation GC in 1209ms. G1 Eden Space: 8388608 -> 0; G1 Old Gen: 13030534304 -> 14225910944; G1 Survivor Space: 1174405120 -> 58720256 🙅
2016-06-23 10:33:38,638 - G1 Young Generation GC in 259ms. G1 Eden Space: 796917760 -> 0; G1 Old Gen: 14225910944 -> 14276242592; G1 Survivor Space: 58720256 -> 75497472
2016-06-23 10:33:48,066 - G1 Young Generation GC in 1096ms. G1 Eden Space: 771751936 -> 0; G1 Old Gen: 16117542048 -> 15689723040; G1 Survivor Space: 83886080 -> 33554432 🙅
2016-06-23 17:58:26,833 - G1 Young Generation GC in 261ms. G1 Eden Space: 796917760 -> 0; G1 Old Gen: 13005169216 -> 12363114936;
2016-06-23 21:01:43,202 - G1 Young Generation GC in 268ms. G1 Eden Space: 796917760 -> 0; G1 Old Gen: 14009140736 -> 13586786088; G1 Survivor Space: 58720256 -> 67108864
2016-06-23 22:54:34,508 - G1 Old Generation GC in 35275ms. G1 Old Gen: 14394600712 -> 10991951608; G1 Survivor Space: 25165824 -> 0 🙅(FGC)
2016-06-23 22:56:53,104 - G1 Young Generation GC in 229ms. G1 Eden Space: 796917760 -> 0; G1 Old Gen: 6362278144 -> 4860703680; G1 Survivor Space: 58720256 -> 33554432
2016-06-23 22:56:54,109 - G1 Young Generation GC in 263ms. G1 Eden Space: 822083584 -> 0; G1 Old Gen: 4860703680 -> 3774189968; G1 Survivor Space: 33554432 -> 50331648
2016-06-24 01:15:31,753 - G1 Young Generation GC in 251ms. G1 Eden Space: 10141827072 -> 0; G1 Old Gen: 4901503760 -> 4907845240; G1 Survivor Space: 134217728 -> 687865856
2016-06-24 01:15:38,930 - G1 Young Generation GC in 252ms. G1 Eden Space: 9026142208 -> 0; G1 Old Gen: 4907845240 -> 4959650376; G1 Survivor Space: 687865856 -> 645922816
2016-06-24 01:15:46,310 - G1 Young Generation GC in 263ms. G1 Eden Space: 9059696640 -> 0; G1 Old Gen: 4959650376 -> 5025814888; G1 Survivor Space: 645922816 -> 637534208
2016-06-24 01:15:53,692 - G1 Young Generation GC in 232ms. G1 Eden Space: 9009364992 -> 0; G1 Old Gen: 5025814888 -> 5035267128; G1 Survivor Space: 637534208 -> 645922816
2016-06-24 01:16:01,092 - G1 Young Generation GC in 234ms. G1 Eden Space: 8984199168 -> 0; G1 Old Gen: 5035267128 -> 5635307288; G1 Survivor Space: 645922816 -> 100663296
2016-06-24 02:40:23,736 - G1 Young Generation GC in 205ms. G1 Eden Space: 771751936 -> 0; G1 Old Gen: 5298934024 -> 4154593728;
2016-06-24 04:47:09,843 - G1 Young Generation GC in 429ms. G1 Eden Space: 8833204224 -> 0; G1 Old Gen: 6350837368 -> 6645919712; G1 Survivor Space: 75497472 -> 1115684864
2016-06-24 04:47:10,624 - G1 Young Generation GC in 476ms. G1 Eden Space: 545259520 -> 0; G1 Old Gen: 6645919712 -> 7641490784; G1 Survivor Space: 1115684864 -> 142606336
2016-06-24 04:47:16,581 - G1 Young Generation GC in 796ms. G1 Eden Space: 6660554752 -> 0; G1 Old Gen: 7779693536 -> 8886989792; G1 Survivor Space: 142606336 -> 855638016
2016-06-24 04:47:17,526 - G1 Young Generation GC in 875ms. G1 Eden Space: 8388608 -> 0; G1 Old Gen: 8886989792 -> 8136209376; G1 Survivor Space: 855638016 -> 58720256
2016-06-24 04:47:18,447 - G1 Young Generation GC in 286ms. G1 Eden Space: 796917760 -> 0; G1 Old Gen: 8136209376 -> 7452538096; G1 Survivor Space: 58720256 -> 109051904
2016-06-24 04:47:19,135 - G1 Young Generation GC in 217ms. G1 Eden Space: 746586112 -> 0; G1 Old Gen: 7452538096 -> 7683224816;
2016-06-24 04:47:24,022 - G1 Young Generation GC in 202ms. G1 Eden Space: 2147483648 -> 0; G1 Old Gen: 8652109040 -> 8836658416; G1 Survivor Space: 109051904 -> 285212672
2016-06-24 04:47:29,255 - G1 Young Generation GC in 263ms. G1 Eden Space: 536870912 -> 0; G1 Old Gen: 8853435632 -> 8970333808; G1 Survivor Space: 318767104 -> 58720256

如果将GC时间和Compaction对照,会发现时间是对应的。但是这能说明什么问题,是因为Compaction导致GC的吗?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
[admin@cass048171 ~]$ cat /usr/install/cassandra/logs/system.log | grep CompactionManager
INFO [Service Thread] 2016-06-23 10:33:10,755 StatusLogger.java:75 - CompactionManager 4 17
INFO [Service Thread] 2016-06-23 10:33:37,642 StatusLogger.java:75 - CompactionManager 5 18
INFO [Service Thread] 2016-06-23 10:33:37,726 StatusLogger.java:75 - CompactionManager 5 18
INFO [Service Thread] 2016-06-23 10:33:38,680 StatusLogger.java:75 - CompactionManager 5 18
INFO [Service Thread] 2016-06-23 10:33:48,084 StatusLogger.java:75 - CompactionManager 5 18
INFO [Service Thread] 2016-06-23 17:58:26,881 StatusLogger.java:75 - CompactionManager 6 61
INFO [Service Thread] 2016-06-23 21:01:43,258 StatusLogger.java:75 - CompactionManager 6 73
INFO [Service Thread] 2016-06-23 22:54:34,576 StatusLogger.java:75 - CompactionManager 5 81
INFO [Service Thread] 2016-06-23 22:56:53,115 StatusLogger.java:75 - CompactionManager 5 71
INFO [Service Thread] 2016-06-23 22:56:54,124 StatusLogger.java:75 - CompactionManager 5 71
INFO [Service Thread] 2016-06-24 01:15:31,762 StatusLogger.java:75 - CompactionManager 5 66
INFO [Service Thread] 2016-06-24 01:15:38,940 StatusLogger.java:75 - CompactionManager 5 66
INFO [Service Thread] 2016-06-24 01:15:46,321 StatusLogger.java:75 - CompactionManager 5 66
INFO [Service Thread] 2016-06-24 01:15:53,717 StatusLogger.java:75 - CompactionManager 5 66
INFO [Service Thread] 2016-06-24 01:16:01,106 StatusLogger.java:75 - CompactionManager 5 66
INFO [Service Thread] 2016-06-24 02:40:23,752 StatusLogger.java:75 - CompactionManager 5 44
INFO [Service Thread] 2016-06-24 04:47:09,879 StatusLogger.java:75 - CompactionManager 6 35
INFO [Service Thread] 2016-06-24 04:47:10,636 StatusLogger.java:75 - CompactionManager 6 35
INFO [Service Thread] 2016-06-24 04:47:17,491 StatusLogger.java:75 - CompactionManager 6 35
INFO [Service Thread] 2016-06-24 04:47:17,555 StatusLogger.java:75 - CompactionManager 6 35
INFO [Service Thread] 2016-06-24 04:47:18,458 StatusLogger.java:75 - CompactionManager 6 35
INFO [Service Thread] 2016-06-24 04:47:19,150 StatusLogger.java:75 - CompactionManager 6 35
INFO [Service Thread] 2016-06-24 04:47:24,031 StatusLogger.java:75 - CompactionManager 6 35
INFO [Service Thread] 2016-06-24 04:47:29,287 StatusLogger.java:75 - CompactionManager 6 35

长时间的YGC案例

业务监控系统:

c_ygc_monitor

OpsCenter:

c_ygc_ops

Cassandra日志文件:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
[admin@cass048171 ~]$ grep 'GC in [0-9][0-9][0-9]' /usr/install/cassandra/logs/system.log
INFO [Service Thread] 2016-07-07 08:19:08,106 GCInspector.java:258 - G1 Young Generation GC in 204ms. G1 Eden Space: 696254464 -> 0; G1 Old Gen: 7281313536 -> 5909453064; G1 Survivor Space: 159383552 -> 92274688;
INFO [Service Thread] 2016-07-07 08:19:09,324 GCInspector.java:258 - G1 Young Generation GC in 313ms. G1 Eden Space: 763363328 -> 0; G1 Old Gen: 5909453064 -> 5497545648; G1 Survivor Space: 92274688 -> 100663296;
INFO [Service Thread] 2016-07-07 08:59:00,400 GCInspector.java:258 - G1 Young Generation GC in 270ms. G1 Eden Space: 763363328 -> 0; G1 Old Gen: 3972003048 -> 3531601136; G1 Survivor Space: 92274688 -> 67108864;
INFO [Service Thread] 2016-07-07 09:49:47,102 GCInspector.java:258 - G1 Young Generation GC in 17887ms.G1 Eden Space: 8875147264 -> 0; G1 Old Gen: 6224121000 -> 11796004824; G1 Survivor Space: 125829120 -> 1132462080;
INFO [Service Thread] 2016-07-07 09:49:48,795 GCInspector.java:258 - G1 Young Generation GC in 536ms. G1 Eden Space: 1241513984 -> 0; G1 Old Gen: 11796004824 -> 13020741592; G1 Survivor Space: 1132462080 -> 301989888;
INFO [Service Thread] 2016-07-07 09:49:51,681 GCInspector.java:258 - G1 Young Generation GC in 464ms. G1 Eden Space: 1619001344 -> 0; G1 Old Gen: 13020741592 -> 13700218840; G1 Survivor Space: 301989888 -> 243269632;
INFO [Service Thread] 2016-07-07 09:49:53,068 GCInspector.java:258 - G1 Young Generation GC in 311ms. G1 Eden Space: 1006632960 -> 0; G1 Old Gen: 13700218840 -> 14111260632; G1 Survivor Space: 243269632 -> 159383552;
INFO [Service Thread] 2016-07-07 09:49:54,076 GCInspector.java:258 - G1 Young Generation GC in 263ms. G1 Eden Space: 746586112 -> 0; G1 Old Gen: 14111260632 -> 14463582168; G1 Survivor Space: 159383552 -> 117440512;
INFO [Service Thread] 2016-07-07 09:49:55,003 GCInspector.java:258 - G1 Young Generation GC in 236ms. G1 Eden Space: 738197504 -> 0; G1 Old Gen: 14463582168 -> 14803320800; G1 Survivor Space: 117440512 -> 109051904;
INFO [Service Thread] 2016-07-07 09:49:55,916 GCInspector.java:258 - G1 Young Generation GC in 225ms. G1 Eden Space: 746586112 -> 0; G1 Old Gen: 14803320800 -> 15096922080;
INFO [Service Thread] 2016-07-07 09:49:57,943 GCInspector.java:258 - G1 Young Generation GC in 202ms. G1 Eden Space: 746586112 -> 0; G1 Old Gen: 15390523360 -> 15717679072;
INFO [Service Thread] 2016-07-07 09:50:02,743 GCInspector.java:258 - G1 Young Generation GC in 251ms. G1 Eden Space: 830472192 -> 0; G1 Old Gen: 10585068072 -> 9267343240; G1 Survivor Space: 25165824 -> 33554432;
INFO [Service Thread] 2016-07-07 09:50:04,138 GCInspector.java:258 - G1 Young Generation GC in 326ms. G1 Eden Space: 822083584 -> 0; G1 Old Gen: 9267343240 -> 8527731712;
INFO [Service Thread] 2016-07-07 09:50:28,182 GCInspector.java:258 - G1 Young Generation GC in 204ms. G1 Eden Space: 796917760 -> 0; G1 Old Gen: 8527855792 -> 8195194872; G1 Survivor Space: 58720256 -> 100663296;

2016-07-07 09:49:47发生17s的YGC,堆中各个代的大小:

在/var/log/cassandra/gc.log中是:2016-07-07T09:49:47

1
2
3
4
G1 Young Generation GC in 17887ms.                  YGC花费了17s
G1 Eden Space: 8,875,147,264 -> 0; 8G -> 0 新生代
G1 Old Gen: 6,224,121,000 -> 11,796,004,824; 6G -> 11G
G1 Survivor Space: 125,829,120 -> 1,132,462,080; 125M -> 1G

Eden从8G减少到0,Survivor从125M增加到1G,其实很好理解:
对象从Eden转移到了Survivor,所以Eden减少,Survivor增加。
Old也会增加的原因是:在GC的时候,Survivor的对象到达年龄计数器后,也会转移到Old。
所以Survivor增加的空间并不一定都是从Eden转移过来的,因为Survivor有一部分会转移到Old。
同时并不是Eden的所有对象都会转移到Survivor,只有需要的对象才会转移到Survivor。

按照正常转移方式,Old增加的大小,最多不会超过Survivor的大小。但是上面Old从6G增加到11G,
而Survivor在GC前只有125M,所以正常来说,就算Survivor的所有对象都转移到Old最多也不会超过7G!

那么唯一的解释是:Eden的对象直接被转移到了Old!
Eden总共有8G,转移了1G到Survivor,并且有4G跳过Survivor直接转移到Old!

新生代8G看起来有点大,对比其他日志,最大的也才1.6G。

年轻代的GC:存活对象会被拷贝/移动到一个或多个“Survivor”区域,存活时间够长的直接移到“老年代”区域。
这里会stop the world,但young GC的过程是多线程执行的。

结论:YGC会STW,如果YGC时间很长,应用程序暂停时间就很长!

总体GC时间

上gc.log:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
下面这两条只是作为对比:
2016-07-07T09:49:22.341+0800: 942859.604: Total time for which application threads were stopped: 0.0814731 seconds, Stopping threads took: 0.0002340 seconds
2016-07-07T09:49:29.213+0800: 942866.476: Total time for which application threads were stopped: 0.0012369 seconds, Stopping threads took: 0.0002430 seconds

这里才开始比较严重的YGC:
{Heap before GC invocations=159966 (full 2):
garbage-first heap total 16777216K, used 14876451K [0x00000003c0000000, 0x00000003c0804000, 0x00000007c0000000)
region size 8192K, 1074 young (8798208K), 15 survivors (122880K)
Metaspace used 35401K, capacity 35916K, committed 36224K, reserved 1081344K
class space used 3719K, capacity 3839K, committed 3968K, reserved 1048576K
2016-07-07T09:49:29.213+0800: 942866.477: [GC pause (GCLocker Initiated GC) (young) →YGC,暂停在GCLocker Initiated GC阶段
Desired survivor size 566231040 bytes, new threshold 15 (max 15) 中间这一段打印的是存活区的信息
- age 1: 16845608 bytes, 16845608 total
- age 2: 6895480 bytes, 23741088 total
- age 3: 7210040 bytes, 30951128 total
- age 4: 6712376 bytes, 37663504 total
- age 5: 6050320 bytes, 43713824 total
- age 6: 5356392 bytes, 49070216 total
- age 7: 5514552 bytes, 54584768 total
- age 8: 5607616 bytes, 60192384 total
- age 9: 5430416 bytes, 65622800 total
- age 10: 5450352 bytes, 71073152 total
- age 11: 5550480 bytes, 76623632 total
- age 12: 5110560 bytes, 81734192 total
- age 13: 5440256 bytes, 87174448 total
- age 14: 5565200 bytes, 92739648 total
- age 15: 5550672 bytes, 98290320 total
(to-space exhausted), 17.8874292 secs] → YGC花费了17s

关于G1日志文件的分析参考:https://blogs.oracle.com/g1gc/entry/g1gc_logs_how_to_print

1
2
2014-07-16T11:14:35.792-0700: 11.871: [GC pause (G1 Evacuation Pause) (young),0.0178446 secs]
Date-and-time: timestamp [GC (gc-cause-is-evacuation-pause) (gc-type-is-young-gc), total-gc-pause-in-sec]

对应我们这里的日志文件:
2016-07-07T09:49:29.213+0800: 942866.477: [GC pause (GCLocker Initiated GC) (young) (to-space exhausted), 17.8874292 secs]

在GC类型(这里是young,具体阶段是:GCLocker Initiated GC)之后的是GC的原因:to空间耗尽了。

问题:为什么to空间不够?

http://bboniao.com/jvm/2014-03/g1garbage-first.html
https://www.infoq.com/articles/tuning-tips-G1-GC

解决方案:

  • -XX:G1ReservePercent 增加预存内存量
  • -XX:InitiatingHeapOccupancyPercent 减少此值,提前启动标记周期
  • -XX:ConcGCThreads 增加并行标记线程的数目

参考线上API的GC设置:

1
-XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:G1ReservePercent=30 -XX:InitiatingHeapOccupancyPercent=30 -XX:ConcGCThreads=4

JVM_OPTS=”$JVM_OPTS -XX:G1ReservePercent=30” 10
JVM_OPTS=”$JVM_OPTS -XX:InitiatingHeapOccupancyPercent=30” 45
JVM_OPTS=”$JVM_OPTS -XX:ConcGCThreads=4”

GC各个阶段

接下来打印了各个阶段的GC时间:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
[Parallel Time: 16666.1 ms, GC Workers: 13]  ==》并行收集花了16s
[GC Worker Start (ms): Min: 942866476.9, Avg: 942866477.0, Max: 942866477.1, Diff: 0.2]
[Ext Root Scanning (ms): Min: 2.9, Avg: 3.0, Max: 3.1, Diff: 0.2, Sum: 39.0]
[Update RS (ms): Min: 71.8, Avg: 72.2, Max: 72.7, Diff: 0.9, Sum: 938.2]
[Processed Buffers: Min: 52, Avg: 75.5, Max: 102, Diff: 50, Sum: 982]
[Scan RS (ms): Min: 0.1, Avg: 0.2, Max: 0.4, Diff: 0.3, Sum: 2.1]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
[Object Copy (ms): Min: 16589.9, Avg: 16590.4, Max: 16590.6, Diff: 0.7, Sum: 215675.3] ==>主要在这里!
[Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2, Sum: 1.3]
[Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 13]
[GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.7]
[GC Worker Total (ms): Min: 16665.8, Avg: 16665.9, Max: 16666.0, Diff: 0.2, Sum: 216656.7]
[GC Worker End (ms): Min: 942883142.9, Avg: 942883142.9, Max: 942883142.9, Diff: 0.1]
[Code Root Fixup: 0.1 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 2.0 ms]
[Other: 1219.2 ms]
[Evacuation Failure: 1212.5 ms] ==>疏散失败!
[Choose CSet: 0.0 ms]
[Ref Proc: 0.7 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 4.4 ms]
[Humongous Register: 0.1 ms]
[Humongous Reclaim: 0.1 ms]
[Free CSet: 0.7 ms]
[Eden: 8472.0M(8464.0M)->0.0B(1184.0M) Survivors: 120.0M->1080.0M Heap: 14.2G(16.0G)->12.0G(16.0G)]

堆内存变化

上面最后一行是本次GC的堆信息,区域:GC前大小->GC后大小,括号里是Capacity(也有可能变化)

Heap的Capacity等于-Xms和-Xmx的值=16G,这个是不会变的,但是Eden的大小是会变化,比如GC前Capacity=8G,GC后变成了1G。

Young中分成Eden和Survivor,其中Eden和Survivor是成比例的。

1
2
3
[Eden: used-enden-before-gc(eden-capacity)->used-eden-after-gc(eden-capacity-for-next-gc) 
Survivors: used-survivors-before-gc->used-survivors-before-gc
Heap: used-heap-before-gc(heap-capacity)->used-heap-after-gc(heap-capacity-for-next-gc)]

这里打印的堆信息其实和Cassandra的system.log是一致的,不过上面的日志并没有打印Old!

1
2
3
4
G1 Young Generation GC in 17887ms.                  YGC花费了17s
G1 Eden Space: 8,875,147,264 -> 0; 8G -> 0 新生代GC前8G,GC后清空
G1 Old Gen: 6,224,121,000 -> 11,796,004,824; 6G -> 11G 老年代从6G增加到11G
G1 Survivor Space: 125,829,120 -> 1,132,462,080; 125M -> 1G 存活区从125M增加到1G

查看其它正常的堆变化情况,会发现基本上Survivor不会增大到1G,而且Heap一般

1
2
3
4
5
6
7
8
9
10
11
[admin@cass048171 ~]$ cat /var/log/cassandra/gc.log.4.current | grep Eden | tail
[Eden: 4952.0M(4952.0M)->0.0B(696.0M) Survivors: 120.0M->120.0M Heap: 14.1G(16.0G)->9511.6M(16.0G)]
[Eden: 696.0M(696.0M)->0.0B(5056.0M) Survivors: 120.0M->80.0M Heap: 10207.6M(16.0G)->9455.0M(16.0G)]
[Eden: 5056.0M(5056.0M)->0.0B(5040.0M) Survivors: 80.0M->88.0M Heap: 14.2G(16.0G)->9463.0M(16.0G)]
[Eden: 5040.0M(5040.0M)->0.0B(4992.0M) Survivors: 88.0M->112.0M Heap: 14.2G(16.0G)->9483.0M(16.0G)]
[Eden: 4992.0M(4992.0M)->0.0B(4976.0M) Survivors: 112.0M->120.0M Heap: 14.1G(16.0G)->9491.0M(16.0G)]
[Eden: 4976.0M(4976.0M)->0.0B(696.0M) Survivors: 120.0M->120.0M Heap: 14.1G(16.0G)->9495.0M(16.0G)]
[Eden: 696.0M(696.0M)->0.0B(5024.0M) Survivors: 120.0M->80.0M Heap: 10191.0M(16.0G)->9488.6M(16.0G)]
[Eden: 5024.0M(5024.0M)->0.0B(5008.0M) Survivors: 80.0M->88.0M Heap: 14.2G(16.0G)->9496.6M(16.0G)]
[Eden: 5008.0M(5008.0M)->0.0B(4960.0M) Survivors: 88.0M->112.0M Heap: 14.2G(16.0G)->9524.6M(16.0G)]
[Eden: 4968.0M(4960.0M)->0.0B(4944.0M) Survivors: 112.0M->120.0M Heap: 14.1G(16.0G)->9532.6M(16.0G)]

GC前后统计

最后是Heap在GC前后的统计,这里把GC前的也放在一起进行比较:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
{Heap before GC invocations=159966 (full 2):
garbage-first heap total 16777216K, used 14876451K [0x00000003c0000000, 0x00000003c0804000, 0x00000007c0000000)
region size 8192K, 1074 young (8798208K), 15 survivors (122880K)
Metaspace used 35401K, capacity 35916K, committed 36224K, reserved 1081344K
class space used 3719K, capacity 3839K, committed 3968K, reserved 1048576K

Heap after GC invocations=159967 (full 2):
garbage-first heap total 16777216K, used 12625455K [0x00000003c0000000, 0x00000003c0804000, 0x00000007c0000000)
region size 8192K, 135 young (1105920K), 135 survivors (1105920K)
Metaspace used 35401K, capacity 35916K, committed 36224K, reserved 1081344K
class space used 3719K, capacity 3839K, committed 3968K, reserved 1048576K
}
[Times: user=41.47 sys=3.94, real=17.88 secs]
2016-07-07T09:49:47.101+0800: 942884.364: Total time for which application threads were stopped: 17.8884392 seconds, Stopping threads took: 0.0001048 seconds

其他调优

The ratio user/real should be close to the number of parallel gc threads.
user和real的比值是并行的GC线程,这里41.47/17.88=2.3

-XX:ConcGCThreads:Max((ParallelGCThreads#+2)/4,1)

RegionSize默认=HeapSize/2048,这里Heap=16G,RegionSize=8M =》 region size 8192K

后面的一些正常GC日志

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
2016-07-07T09:49:48.103+0800: 942885.366: Total time for which application threads were stopped: 0.0013006 seconds, Stopping threads took: 0.0002328 seconds
{Heap before GC invocations=159967 (full 2):
garbage-first heap total 16777216K, used 13837871K [0x00000003c0000000, 0x00000003c0804000, 0x00000007c0000000)
region size 8192K, 283 young (2318336K), 135 survivors (1105920K)
Metaspace used 35401K, capacity 35916K, committed 36224K, reserved 1081344K
class space used 3719K, capacity 3839K, committed 3968K, reserved 1048576K
2016-07-07T09:49:48.248+0800: 942885.511: [GC pause (G1 Evacuation Pause) (young) (initial-mark), 0.5353097 secs]
[Parallel Time: 528.2 ms, GC Workers: 13]
[GC Worker Start (ms): Min: 942885512.0, Avg: 942885512.1, Max: 942885512.2, Diff: 0.2]
[Ext Root Scanning (ms): Min: 2.9, Avg: 3.0, Max: 3.1, Diff: 0.2, Sum: 38.7]
[Update RS (ms): Min: 28.9, Avg: 29.2, Max: 30.6, Diff: 1.8, Sum: 379.2]
[Processed Buffers: Min: 54, Avg: 84.8, Max: 133, Diff: 79, Sum: 1103]
[Scan RS (ms): Min: 14.6, Avg: 16.2, Max: 16.5, Diff: 1.9, Sum: 210.2]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
[Object Copy (ms): Min: 479.2, Avg: 479.4, Max: 479.5, Diff: 0.3, Sum: 6231.7]
[Termination (ms): Min: 0.0, Avg: 0.2, Max: 0.3, Diff: 0.3, Sum: 3.1]
[Termination Attempts: Min: 1, Avg: 395.1, Max: 486, Diff: 485, Sum: 5136]
[GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.8]
[GC Worker Total (ms): Min: 527.9, Avg: 528.0, Max: 528.1, Diff: 0.2, Sum: 6863.7]
[GC Worker End (ms): Min: 942886040.0, Avg: 942886040.0, Max: 942886040.1, Diff: 0.1]
[Code Root Fixup: 0.1 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 1.4 ms]
[Other: 5.6 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 0.4 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 3.1 ms]
[Humongous Register: 0.0 ms]
[Humongous Reclaim: 0.1 ms]
[Free CSet: 0.7 ms]
[Eden: 1184.0M(1184.0M)->0.0B(1544.0M) Survivors: 1080.0M->288.0M Heap: 13.2G(16.0G)->12.4G(16.0G)]
Heap after GC invocations=159968 (full 2):
garbage-first heap total 16777216K, used 13010479K [0x00000003c0000000, 0x00000003c0804000, 0x00000007c0000000)
region size 8192K, 36 young (294912K), 36 survivors (294912K)
Metaspace used 35401K, capacity 35916K, committed 36224K, reserved 1081344K
class space used 3719K, capacity 3839K, committed 3968K, reserved 1048576K
}
[Times: user=6.90 sys=0.01, real=0.54 secs]
2016-07-07T09:49:48.784+0800: 942886.047: [GC concurrent-root-region-scan-start]
2016-07-07T09:49:48.784+0800: 942886.047: Total time for which application threads were stopped: 0.5368812 seconds, Stopping threads took: 0.0002119 seconds
2016-07-07T09:49:48.973+0800: 942886.236: [GC concurrent-root-region-scan-end, 0.1896420 secs]
2016-07-07T09:49:48.973+0800: 942886.236: [GC concurrent-mark-start]
{Heap before GC invocations=159968 (full 2):
garbage-first heap total 16777216K, used 14591535K [0x00000003c0000000, 0x00000003c0804000, 0x00000007c0000000)
region size 8192K, 229 young (1875968K), 36 survivors (294912K)
Metaspace used 35401K, capacity 35916K, committed 36224K, reserved 1081344K
class space used 3719K, capacity 3839K, committed 3968K, reserved 1048576K
2016-07-07T09:49:51.216+0800: 942888.479: [GC pause (G1 Evacuation Pause) (young)
Desired survivor size 121634816 bytes, new threshold 1 (max 15)
- age 1: 277632680 bytes, 277632680 total
, 0.4639409 secs]
[Parallel Time: 455.9 ms, GC Workers: 13]
[GC Worker Start (ms): Min: 942888479.3, Avg: 942888479.4, Max: 942888479.5, Diff: 0.3]
[Ext Root Scanning (ms): Min: 2.8, Avg: 3.0, Max: 3.1, Diff: 0.2, Sum: 38.4]
[Update RS (ms): Min: 166.4, Avg: 167.6, Max: 168.7, Diff: 2.4, Sum: 2178.7]
[Processed Buffers: Min: 59, Avg: 81.7, Max: 109, Diff: 50, Sum: 1062]
[Scan RS (ms): Min: 6.8, Avg: 7.9, Max: 8.2, Diff: 1.4, Sum: 102.4]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Object Copy (ms): Min: 277.0, Avg: 277.0, Max: 277.1, Diff: 0.2, Sum: 3601.5]
[Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 13]
[GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.8]
[GC Worker Total (ms): Min: 455.5, Avg: 455.6, Max: 455.7, Diff: 0.2, Sum: 5923.0]
[GC Worker End (ms): Min: 942888935.0, Avg: 942888935.0, Max: 942888935.1, Diff: 0.1]
[Code Root Fixup: 0.3 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 1.2 ms]
[Other: 6.6 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 3.6 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 1.7 ms]
[Humongous Register: 0.1 ms]
[Humongous Reclaim: 0.1 ms]
[Free CSet: 0.5 ms]
[Eden: 1544.0M(1544.0M)->0.0B(960.0M) Survivors: 288.0M->232.0M Heap: 13.9G(16.0G)->13.0G(16.0G)]
Heap after GC invocations=159969 (full 2):
garbage-first heap total 16777216K, used 13616687K [0x00000003c0000000, 0x00000003c0804000, 0x00000007c0000000)
region size 8192K, 29 young (237568K), 29 survivors (237568K)
Metaspace used 35401K, capacity 35916K, committed 36224K, reserved 1081344K
class space used 3719K, capacity 3839K, committed 3968K, reserved 1048576K
}
[Times: user=5.99 sys=0.00, real=0.46 secs]
2016-07-07T09:49:51.680+0800: 942888.943: Total time for which application threads were stopped: 0.4655046 seconds, Stopping threads took: 0.0001802 seconds
2016-07-07T09:49:51.680+0800: 942888.943: [GC concurrent-mark-reset-for-overflow]
{Heap before GC invocations=159969 (full 2):
garbage-first heap total 16777216K, used 14599727K [0x00000003c0000000, 0x00000003c0804000, 0x00000007c0000000)
region size 8192K, 149 young (1220608K), 29 survivors (237568K)
Metaspace used 35401K, capacity 35916K, committed 36224K, reserved 1081344K
class space used 3719K, capacity 3839K, committed 3968K, reserved 1048576K
2016-07-07T09:49:52.756+0800: 942890.019: [GC pause (G1 Evacuation Pause) (young)
Desired survivor size 79691776 bytes, new threshold 1 (max 15)
- age 1: 243266912 bytes, 243266912 total
, 0.3106738 secs]
[Parallel Time: 303.5 ms, GC Workers: 13]
[GC Worker Start (ms): Min: 942890019.9, Avg: 942890020.0, Max: 942890020.0, Diff: 0.1]
[Ext Root Scanning (ms): Min: 3.0, Avg: 3.0, Max: 3.1, Diff: 0.1, Sum: 39.2]
[Update RS (ms): Min: 112.8, Avg: 114.5, Max: 115.1, Diff: 2.3, Sum: 1488.6]
[Processed Buffers: Min: 52, Avg: 75.5, Max: 115, Diff: 63, Sum: 981]
[Scan RS (ms): Min: 3.4, Avg: 3.9, Max: 4.1, Diff: 0.6, Sum: 50.8]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Object Copy (ms): Min: 181.6, Avg: 181.7, Max: 181.8, Diff: 0.2, Sum: 2362.2]
[Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 13]
[GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.7]
[GC Worker Total (ms): Min: 303.3, Avg: 303.3, Max: 303.4, Diff: 0.1, Sum: 3943.4]
[GC Worker End (ms): Min: 942890323.3, Avg: 942890323.3, Max: 942890323.4, Diff: 0.1]
[Code Root Fixup: 0.1 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 0.9 ms]
[Other: 6.1 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 3.8 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 1.1 ms]
[Humongous Register: 0.1 ms]
[Humongous Reclaim: 0.1 ms]
[Free CSet: 0.3 ms]
[Eden: 960.0M(960.0M)->0.0B(712.0M) Survivors: 232.0M->152.0M Heap: 13.9G(16.0G)->13.3G(16.0G)]
Heap after GC invocations=159970 (full 2):
garbage-first heap total 16777216K, used 13936175K [0x00000003c0000000, 0x00000003c0804000, 0x00000007c0000000)
region size 8192K, 19 young (155648K), 19 survivors (155648K)
Metaspace used 35401K, capacity 35916K, committed 36224K, reserved 1081344K
class space used 3719K, capacity 3839K, committed 3968K, reserved 1048576K
}
[Times: user=4.00 sys=0.01, real=0.31 secs]
2016-07-07T09:49:53.067+0800: 942890.330: Total time for which application threads were stopped: 0.3122465 seconds, Stopping threads took: 0.0002199 seconds
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
S0C    S1C    S0U    S1U      EC       EU        OC         OU       MC     MU    CCSC   CCSU   YGC     YGCT    FGC    FGCT     GCT
0.0 90112.0 0.0 90112.0 2080768.0 1032192.0 14606336.0 11107990.3 36220.0 35414.9 3964.0 3758.7 3240 230.268 0 0.000 230.268
0.0 90112.0 0.0 90112.0 2080768.0 1556480.0 14606336.0 11107990.3 36220.0 35414.9 3964.0 3758.7 3240 230.268 0 0.000 230.268
0.0 90112.0 0.0 90112.0 2080768.0 1974272.0 14606336.0 11107990.3 36220.0 35414.9 3964.0 3758.7 3240 230.268 0 0.000 230.268
0.0 90112.0 0.0 90112.0 2080768.0 1974272.0 14606336.0 11107990.3 36220.0 35414.9 3964.0 3758.7 3240 230.268 0 0.000 230.268
0.0 90112.0 0.0 90112.0 2080768.0 319488.0 14606336.0 11111529.3 36220.0 35414.9 3964.0 3758.7 3241 230.308 0 0.000 230.308
0.0 90112.0 0.0 90112.0 2080768.0 606208.0 14606336.0 11111529.3 36220.0 35414.9 3964.0 3758.7 3241 230.308 0 0.000 230.308
0.0 90112.0 0.0 90112.0 2080768.0 892928.0 14606336.0 11111529.3 36220.0 35414.9 3964.0 3758.7 3241 230.308 0 0.000 230.308
0.0 90112.0 0.0 90112.0 2080768.0 1228800.0 14606336.0 11111529.3 36220.0 35414.9 3964.0 3758.7 3241 230.308 0 0.000 230.308
0.0 90112.0 0.0 90112.0 2080768.0 1581056.0 14606336.0 11111529.3 36220.0 35414.9 3964.0 3758.7 3241 230.308 0 0.000 230.308
0.0 90112.0 0.0 90112.0 2080768.0 1941504.0 14606336.0 11111529.3 36220.0 35414.9 3964.0 3758.7 3241 230.308 0 0.000 230.308
0.0 73728.0 0.0 73728.0 2105344.0 286720.0 14598144.0 11118135.9 36220.0 35414.9 3964.0 3758.7 3242 230.347 0 0.000 230.347
0.0 73728.0 0.0 73728.0 2105344.0 606208.0 14598144.0 11118135.9 36220.0 35414.9 3964.0 3758.7 3242 230.347 0 0.000 230.347
0.0 73728.0 0.0 73728.0 2105344.0 950272.0 14598144.0 11118135.9 36220.0 35414.9 3964.0 3758.7 3242 230.347 0 0.000 230.347
0.0 73728.0 0.0 73728.0 2105344.0 1269760.0 14598144.0 11118135.9 36220.0 35414.9 3964.0 3758.7 3242 230.347 0 0.000 230.347
0.0 73728.0 0.0 73728.0 2105344.0 1622016.0 14598144.0 11118135.9 36220.0 35414.9 3964.0 3758.7 3242 230.347 0 0.000 230.347
0.0 73728.0 0.0 73728.0 2105344.0 1974272.0 14598144.0 11118135.9 36220.0 35414.9 3964.0 3758.7 3242 230.347 0 0.000 230.347
0.0 73728.0 0.0 73728.0 2105344.0 311296.0 14598144.0 11119663.4 36220.0 35414.9 3964.0 3758.7 3243 230.387 0 0.000 230.387
0.0 73728.0 0.0 73728.0 2105344.0 688128.0 14598144.0 11119663.4 36220.0 35414.9 3964.0 3758.7 3243 230.387 0 0.000 230.387
0.0 73728.0 0.0 73728.0 2105344.0 1097728.0 14598144.0 11119663.4 36220.0 35414.9 3964.0 3758.7 3243 230.387 0 0.000 230.387
0.0 73728.0 0.0 73728.0 2105344.0 1425408.0 14598144.0 11119663.4 36220.0 35414.9 3964.0 3758.7 3243 230.387 0 0.000 230.387
0.0 73728.0 0.0 73728.0 2105344.0 1769472.0 14598144.0 11119663.4 36220.0 35414.9 3964.0 3758.7 3243 230.387 0 0.000 230.387
0.0 81920.0 0.0 81920.0 2088960.0 90112.0 14606336.0 11122090.6 36220.0 35414.9 3964.0 3758.7 3244 230.429 0 0.000 230.429
0.0 81920.0 0.0 81920.0 2088960.0 409600.0 14606336.0 11113898.6 36220.0 35414.9 3964.0 3758.7 3244 230.429 0 0.000 230.429
0.0 81920.0 0.0 81920.0 2088960.0 860160.0 14606336.0 11113898.6 36220.0 35414.9 3964.0 3758.7 3244 230.429 0 0.000 230.429
0.0 81920.0 0.0 81920.0 2088960.0 1294336.0 14606336.0 11113898.6 36220.0 35414.9 3964.0 3758.7 3244 230.429 0 0.000 230.429
0.0 81920.0 0.0 81920.0 2088960.0 1744896.0 14606336.0 11113898.6 36220.0 35414.9 3964.0 3758.7 3244 230.429 0 0.000 230.429
0.0 81920.0 0.0 81920.0 802816.0 237568.0 15892480.0 11121098.8 36220.0 35414.9 3964.0 3758.7 3245 230.472 0 0.000 230.472
0.0 73728.0 0.0 73728.0 2113536.0 98304.0 14589952.0 11111715.9 36220.0 35414.9 3964.0 3758.7 3246 230.523 0 0.000 230.523
0.0 73728.0 0.0 73728.0 2113536.0 704512.0 14589952.0 11111715.9 36220.0 35414.9 3964.0 3758.7 3246 230.523 0 0.000 230.523
0.0 73728.0 0.0 73728.0 2113536.0 1327104.0 14589952.0 11111715.9 36220.0 35414.9 3964.0 3758.7 3246 230.523 0 0.000 230.523
0.0 73728.0 0.0 73728.0 2113536.0 1949696.0 14589952.0 11111715.9 36220.0 35414.9 3964.0 3758.7 3246 230.523 0 0.000 230.523
0.0 73728.0 0.0 73728.0 2105344.0 507904.0 14598144.0 11121792.3 36220.0 35414.9 3964.0 3758.7 3247 230.570 0 0.000 230.570
0.0 73728.0 0.0 73728.0 2105344.0 1138688.0 14598144.0 11121792.3 36220.0 35414.9 3964.0 3758.7 3247 230.570 0 0.000 230.570
0.0 73728.0 0.0 73728.0 2105344.0 1769472.0 14598144.0 11121792.3 36220.0 35414.9 3964.0 3758.7 3247 230.570 0 0.000 230.570
0.0 73728.0 0.0 73728.0 2105344.0 204800.0 14598144.0 11125318.7 36220.0 35414.9 3964.0 3758.7 3248 230.613 0 0.000 230.613
0.0 73728.0 0.0 73728.0 2105344.0 548864.0 14598144.0 11125318.7 36220.0 35414.9 3964.0 3758.7 3248 230.613 0 0.000 230.613
0.0 73728.0 0.0 73728.0 2105344.0 884736.0 14598144.0 11125318.7 36220.0 35414.9 3964.0 3758.7 3248 230.613 0 0.000 230.613
0.0 73728.0 0.0 73728.0 2105344.0 1228800.0 14598144.0 11125318.7 36220.0 35414.9 3964.0 3758.7 3248 230.613 0 0.000 230.613
0.0 73728.0 0.0 73728.0 2105344.0 1597440.0 14598144.0 11125318.7 36220.0 35414.9 3964.0 3758.7 3248 230.613 0 0.000 230.613
0.0 73728.0 0.0 73728.0 2105344.0 1916928.0 14598144.0 11125318.7 36220.0 35414.9 3964.0 3758.7 3248 230.613 0 0.000 230.613
0.0 73728.0 0.0 73728.0 2097152.0 262144.0 14606336.0 11127363.3 36220.0 35414.9 3964.0 3758.7 3249 230.658 0 0.000 230.658
0.0 73728.0 0.0 73728.0 2097152.0 598016.0 14606336.0 11127363.3 36220.0 35414.9 3964.0 3758.7 3249 230.658 0 0.000 230.658
0.0 73728.0 0.0 73728.0 2097152.0 942080.0 14606336.0 11127363.3 36220.0 35414.9 3964.0 3758.7 3249 230.658 0 0.000 230.658
0.0 73728.0 0.0 73728.0 2097152.0 1277952.0 14606336.0 11127363.3 36220.0 35414.9 3964.0 3758.7 3249 230.658 0 0.000 230.658
0.0 73728.0 0.0 73728.0 2097152.0 1630208.0 14606336.0 11127363.3 36220.0 35414.9 3964.0 3758.7 3249 230.658 0 0.000 230.658
0.0 73728.0 0.0 73728.0 2097152.0 1990656.0 14606336.0 11127363.3 36220.0 35414.9 3964.0 3758.7 3250 230.658 0 0.000 230.658
0.0 90112.0 0.0 90112.0 2064384.0 311296.0 14622720.0 11126943.6 36220.0 35414.9 3964.0 3758.7 3250 230.702 0 0.000 230.702
0.0 90112.0 0.0 90112.0 2064384.0 655360.0 14622720.0 11126943.6 36220.0 35414.9 3964.0 3758.7 3250 230.702 0 0.000 230.702
0.0 90112.0 0.0 90112.0 2064384.0 999424.0 14622720.0 11126943.6 36220.0 35414.9 3964.0 3758.7 3250 230.702 0 0.000 230.702
0.0 90112.0 0.0 90112.0 2064384.0 1351680.0 14622720.0 11126943.6 36220.0 35414.9 3964.0 3758.7 3250 230.702 0 0.000 230.702
0.0 90112.0 0.0 90112.0 2064384.0 1720320.0 14622720.0 11126943.6 36220.0 35414.9 3964.0 3758.7 3250 230.702 0 0.000 230.702

Improving Apache Cassandra Performance

Cassandra’s internal operations can also be a major driver of performance issues. One example is “tombstones”. A major advantage of Cassandra is its dynamic structure, where developers can create and remove columns on the fly. When rows have dynamic columns and the application deletes those columns, Cassandra creates a “tombstone” that exists until the next compaction runs and cleans up any that have expired (ones that have passed gc_grace_seconds). If the client application then performs a slice query to read a few columns, Cassandra will scan all the tombstones while looking for the non- deleted columns requested. For applications with hundreds of tombstones, Cassandra can fill up heap memory in Java reading them all and is likely to trigger a long pause for garbage collection, stalling the node.

Compaction can also be a problem. Cassandra by default uses a single thread for compaction. During this process, it reads each row, cleans up expired tombstones, then sorts the columns and writes the row to a new SSTable. Compaction only uses memory up to the limit you set (the default is 64 MB). Depending on how the process is throttled, objects can live in the heap too long and get promoted to the old generation causing it to fill up. Once it fills, the JVM will pause to perform a full GC and stall the node.

Similarly, Cassandra-based applications sometimes make use of wide rows, or those with hundreds of columns. If the application reads hundreds of columns from a single wide row, it will quickly fill the heap’s new generation and can cause promotion failures or concurrent mode failures. Once that happens, the JVM will pause for a full GC and stall the node.

A separate issue is the creation of short-lived objects for read requests under heavy load. Each read allocates short-lived objects in the heap for results returned to the client/coordinator and for any objects that actually process the request. As the number of read requests grows, the new gen quickly fills up with objects that aren’t yet garbage when the new gen GC runs (but will be soon). Some are moved into the old gen space to free up space in new gen for more objects. When the old gen fills up with these short-lived objects, a major GC is triggered that stops Cassandra for a long time, often several seconds or more.


文章目录
  1. 1. G1 GC
  2. 2. 从CMS迁移到G1
  3. 3. 比较GC时间
    1. 3.1. 使用CMS,FGC问题很严重
    2. 3.2. 使用G1
    3. 3.3. GC统计历史数据
      1. 3.3.1. 所有GC统计(当前文件,压缩文件):
      2. 3.3.2. 超过10s的GC(当前文件,压缩文件)
      3. 3.3.3. 当前文件统计(所有GC,超过10s)***
      4. 3.3.4. 压缩文件统计(所有GC,超过10s)
      5. 3.3.5. 压缩文件的CMS,当前文件的G1
  4. 4. 并不是说不会FGC了
  5. 5. 长时间的YGC案例
  6. 6. 其他调优
  7. 7. 后面的一些正常GC日志