Spark Metrics

Spark Metrics

命令行添加监控

直接添加到命令行后

1
--files=/yourPath/metrics.properties --conf spark.metrics.conf=metrics.properties

The –files flag will cause /path/to/metrics.properties to be sent to every executor,
and spark.metrics.conf=metrics.properties will tell all executors to load that file
when initializing their respective MetricsSystems.

或者用conf的形式

1
2
--conf spark.metrics.conf.*.sink.graphite.class=org.apache.spark.metrics.sink.GraphiteSink \
--conf spark.metrics.conf.*.sink.graphite.host=...

Spark Metrics

1
2
3
4
5
6
7
*.sink.console.class=org.apache.spark.metrics.sink.ConsoleSink
*.sink.console.period=10
*.sink.console.unit=seconds
*.sink.csv.class=org.apache.spark.metrics.sink.CsvSink
*.sink.csv.period=1
*.sink.csv.unit=minutes
*.sink.csv.directory=/tmp/
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
➜  spark-2.0.1-bin-hadoop2.7 bin/spark-shell
Spark context Web UI available at http://10.57.2.5:4040
Spark context available as 'sc' (master = local[*], app id = local-1495078254084).
Spark session available as 'spark'.
scala> 17-5-18 11:31:05 ===============================================================

-- Gauges ----------------------------------------------------------------------
local-1495078254084.driver.BlockManager.disk.diskSpaceUsed_MB value = 0
local-1495078254084.driver.BlockManager.memory.maxMem_MB value = 366
local-1495078254084.driver.BlockManager.memory.memUsed_MB value = 0
local-1495078254084.driver.BlockManager.memory.remainingMem_MB value = 366
local-1495078254084.driver.DAGScheduler.job.activeJobs value = 0
local-1495078254084.driver.DAGScheduler.job.allJobs value = 0
local-1495078254084.driver.DAGScheduler.stage.failedStages value = 0
local-1495078254084.driver.DAGScheduler.stage.runningStages value = 0
local-1495078254084.driver.DAGScheduler.stage.waitingStages value = 0

-- Histograms ------------------------------------------------------------------
local-1495078254084.driver.CodeGenerator.compilationTime
count = 0
min = 0
max = 0
mean = 0.00
stddev = 0.00
median = 0.00
75% <= 0.00
95% <= 0.00
98% <= 0.00
99% <= 0.00
99.9% <= 0.00
local-1495078254084.driver.CodeGenerator.generatedClassSize
count = 0
min = 0
max = 0
mean = 0.00
stddev = 0.00
median = 0.00
75% <= 0.00
95% <= 0.00
98% <= 0.00
99% <= 0.00
99.9% <= 0.00
local-1495078254084.driver.CodeGenerator.generatedMethodSize
count = 0
min = 0
max = 0
mean = 0.00
stddev = 0.00
median = 0.00
75% <= 0.00
95% <= 0.00
98% <= 0.00
99% <= 0.00
99.9% <= 0.00
local-1495078254084.driver.CodeGenerator.sourceCodeSize
count = 0
min = 0
max = 0
mean = 0.00
stddev = 0.00
median = 0.00
75% <= 0.00
95% <= 0.00
98% <= 0.00
99% <= 0.00
99.9% <= 0.00

-- Timers ----------------------------------------------------------------------
local-1495078254084.driver.DAGScheduler.messageProcessingTime
count = 0
mean rate = 0.00 calls/second
1-minute rate = 0.00 calls/second
5-minute rate = 0.00 calls/second
15-minute rate = 0.00 calls/second
min = 0.00 milliseconds
max = 0.00 milliseconds
mean = 0.00 milliseconds
stddev = 0.00 milliseconds
median = 0.00 milliseconds
75% <= 0.00 milliseconds
95% <= 0.00 milliseconds
98% <= 0.00 milliseconds
99% <= 0.00 milliseconds
99.9% <= 0.00 milliseconds


17-5-18 11:31:15 ===============================================================

scala> sc.parallelize(List(1,2,3,4,5)).count
res1: Long = 5

scala> 17-5-18 11:33:15 ===============================================================

-- Timers ----------------------------------------------------------------------
local-1495078254084.driver.DAGScheduler.messageProcessingTime
count = 10
mean rate = 0.07 calls/second
1-minute rate = 0.16 calls/second
5-minute rate = 0.03 calls/second
15-minute rate = 0.01 calls/second
min = 0.03 milliseconds
max = 1207.28 milliseconds
mean = 125.02 milliseconds
stddev = 358.42 milliseconds
median = 0.32 milliseconds
75% <= 16.58 milliseconds
95% <= 1207.28 milliseconds
98% <= 1207.28 milliseconds
99% <= 1207.28 milliseconds
99.9% <= 1207.28 milliseconds
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
➜  ~ ll /tmp/ -rth
-rw-r--r-- 1 zhengqh wheel 99B 5 18 11:36 local-1495078254084.driver.DAGScheduler.stage.waitingStages.csv
-rw-r--r-- 1 zhengqh wheel 99B 5 18 11:36 local-1495078254084.driver.DAGScheduler.stage.runningStages.csv
-rw-r--r-- 1 zhengqh wheel 99B 5 18 11:36 local-1495078254084.driver.DAGScheduler.stage.failedStages.csv
-rw-r--r-- 1 zhengqh wheel 1.3K 5 18 11:36 local-1495078254084.driver.DAGScheduler.messageProcessingTime.csv
-rw-r--r-- 1 zhengqh wheel 99B 5 18 11:36 local-1495078254084.driver.DAGScheduler.job.allJobs.csv
-rw-r--r-- 1 zhengqh wheel 99B 5 18 11:36 local-1495078254084.driver.DAGScheduler.job.activeJobs.csv
-rw-r--r-- 1 zhengqh wheel 676B 5 18 11:36 local-1495078254084.driver.CodeGenerator.sourceCodeSize.csv
-rw-r--r-- 1 zhengqh wheel 676B 5 18 11:36 local-1495078254084.driver.CodeGenerator.generatedMethodSize.csv
-rw-r--r-- 1 zhengqh wheel 676B 5 18 11:36 local-1495078254084.driver.CodeGenerator.generatedClassSize.csv
-rw-r--r-- 1 zhengqh wheel 676B 5 18 11:36 local-1495078254084.driver.CodeGenerator.compilationTime.csv
-rw-r--r-- 1 zhengqh wheel 113B 5 18 11:36 local-1495078254084.driver.BlockManager.memory.remainingMem_MB.csv
-rw-r--r-- 1 zhengqh wheel 99B 5 18 11:36 local-1495078254084.driver.BlockManager.memory.memUsed_MB.csv
-rw-r--r-- 1 zhengqh wheel 113B 5 18 11:36 local-1495078254084.driver.BlockManager.memory.maxMem_MB.csv
-rw-r--r-- 1 zhengqh wheel 99B 5 18 11:36 local-1495078254084.driver.BlockManager.disk.diskSpaceUsed_MB.csv

➜ /tmp cat local-1495078254084.driver.DAGScheduler.messageProcessingTime.csv
t,count,max,mean,min,stddev,p50,p75,p95,p98,p99,p999,mean_rate,m1_rate,m5_rate,m15_rate,rate_unit,duration_unit
1495078315,0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,calls/second,milliseconds
1495078375,0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,calls/second,milliseconds
1495078435,10,1207.284400,125.017564,0.027442,358.422668,0.317114,16.580495,1207.284400,1207.284400,1207.284400,1207.284400,0.055257,0.082101,0.028931,0.010599,calls/second,milliseconds
1495078495,10,1207.284400,125.017564,0.027442,358.422668,0.317114,16.580495,1207.284400,1207.284400,1207.284400,1207.284400,0.041499,0.030203,0.023686,0.009915,calls/second,milliseconds
1495078555,10,1207.284400,125.017564,0.027442,358.422668,0.317114,16.580495,1207.284400,1207.284400,1207.284400,1207.284400,0.033225,0.011111,0.019393,0.009276,calls/second,milliseconds
1495078577,10,1207.284400,125.017564,0.027442,358.422668,0.317114,16.580495,1207.284400,1207.284400,1207.284400,1207.284400,0.030895,0.007962,0.018142,0.009072,calls/second,milliseconds
1495078577,10,1207.284400,125.017564,0.027442,358.422668,0.317114,16.580495,1207.284400,1207.284400,1207.284400,1207.284400,0.030890,0.007962,0.018142,0.009072,calls/second,milliseconds

Spark Cassandra Metrics

1
2
executor.source.cassandra-connector.class=org.apache.spark.metrics.CassandraConnectorSource
driver.source.cassandra-connector.class=org.apache.spark.metrics.CassandraConnectorSource

Spark Influx Metrics

https://github.com/palantir/spark-influx-sink

spark.driver.extraClassPath=spark-influx-sink.jar:metrics-influxdb.jar
spark.executor.extraClassPath=spark-influx-sink.jar:metrics-influxdb.jar

1
2
3
4
5
6
7
*.sink.influx.class=org.apache.spark.metrics.sink.InfluxDbSink
*.sink.influx.protocol=https
*.sink.influx.host=localhost
*.sink.influx.port=8086
*.sink.influx.database=my_metrics
*.sink.influx.auth=metric_client:PASSWORD
*.sink.influx.tags=product:my_product,parent:my_service

文章目录
  1. 1. 命令行添加监控
  2. 2. Spark Metrics
  3. 3. Spark Cassandra Metrics
  4. 4. Spark Influx Metrics