日志监控系统

Collectd: gather statistics on performance, processes, and overall status of the system
Graphite: Store numeric time-series data & Render graphs of this data on demand
Grafana: visualization and dashboarding tool for time series data

一个日志监控系统包括:日志收集,日志存储,日志可视化。
分别对应Collectd,Graphite(不用graphite本身的web工具),Grafana。
时间序列数据库的替代方案有:elasticSearch,influxdb。
而logstash可以认为是日志收集和日志存储之间的桥梁,因为它提供了I/O配置方便地在不同系统中进行数据传输。
如果没有logstash作为桥梁,日志收集后怎么放到存储中是个问题,需要自己调用客户端API。

log systems

那么这些系统之间如何通信,如何组织?

  1. collectd负责收集数据,并通过network可以发送到logstash的指定端口
  2. logstash的输入是监听步骤1的端口,输出可以写到存储系统中,比如es,influxdb
  3. grafana通过配置数据源的方式,获取存储系统中的数据,进行可视化展现
Software Version 安装节点 软件说明
elasticSearch 2.3.3 192.168.6.52 索引数据库
logstash 2.3.2 本机 有Input/Outout,因此可以连接各种管道
collectd 192.168.6.52 收集机器的信息,性能,进程等
Grafana 192.168.6.52 可视化,可以接入不同的数据源es和influxdb
influxdb 0.13 192.168.6.52 时间序列数据库

ELK

LogStash

标准I/O

命令行输入输出,通过脚本执行,codec指定如何解码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
$ cd logstash-2.3.2
$ bin/logstash -e "input {stdin{}} output {stdout{}}" <<< 'Hello World'
Settings: Default pipeline workers: 4
Pipeline main started
2016-05-25T04:00:29.531Z zqhmac Hello World
Pipeline main has been shutdown
stopping pipeline {:id=>"main"}

$ bin/logstash -e 'input{stdin{}}output{stdout{codec=>rubydebug}}' <<< 'Hello World'
{
"message" => "Hello World",
"@version" => "1",
"@timestamp" => "2016-05-25T04:01:09.095Z",
"host" => "zqhmac"
}

$ vi logstash-simple.conf
input { stdin {} }
output {
stdout { codec=> rubydebug }
}
$ bin/logstash agent -f logstash-simple.conf --verbose

File Input

1
2
3
4
5
6
7
8
9
10
11
$ bin/logstash agent -f logstash-file.conf  --verbose
input {
file {
path => "/usr/install/cassandra/logs/system.log"
start_position => beginning
type => "cassandra"
}
}
output {
stdout { codec=> rubydebug }
}

日志文件

1
2
3
4
5
6
7
8
9
10
11
[qihuang.zheng@dp0652 logstash-1.5.0]$ head /usr/install/cassandra/logs/system.log
ERROR [metrics-graphite-reporter-thread-1] 2016-06-13 18:30:39,502 GraphiteReporter.java:281 - Error sending to Graphite:
java.net.SocketException: 断开的管道
at java.net.SocketOutputStream.socketWrite0(Native Method) ~[na:1.7.0_51]
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:113) ~[na:1.7.0_51]
at java.net.SocketOutputStream.write(SocketOutputStream.java:159) ~[na:1.7.0_51]
at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221) ~[na:1.7.0_51]
at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:282) ~[na:1.7.0_51]
at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125) ~[na:1.7.0_51]
at java.io.OutputStreamWriter.write(OutputStreamWriter.java:207) ~[na:1.7.0_51]
at java.io.BufferedWriter.flushBuffer(BufferedWriter.java:129) ~[na:1.7.0_51]

默认一行一个事件,对于有异常的日志文件来说,不经过任何处理肯定是不行的

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
Using version 0.1.x input plugin 'file'. This plugin isn't well supported by the community and likely has no maintainer. {:level=>:info}
Using version 0.1.x codec plugin 'plain'. This plugin isn't well supported by the community and likely has no maintainer. {:level=>:info}
Using version 0.1.x output plugin 'stdout'. This plugin isn't well supported by the community and likely has no maintainer. {:level=>:info}
Using version 0.1.x codec plugin 'rubydebug'. This plugin isn't well supported by the community and likely has no maintainer. {:level=>:info}
Registering file input {:path=>["/usr/install/cassandra/logs/system.log"], :level=>:info}
No sincedb_path set, generating one based on the file path {:sincedb_path=>"/home/qihuang.zheng/.sincedb_261f7476b9c9830f1fa5a51db2793e1e", :path=>["/usr/install/cassandra/logs/system.log"], :level=>:info}
Pipeline started {:level=>:info}
Logstash startup completed
{
"message" => "ERROR [metrics-graphite-reporter-thread-1] 2016-06-13 18:30:39,502 GraphiteReporter.java:281 - Error sending to Graphite:",
"@version" => "1",
"@timestamp" => "2016-06-20T08:42:35.543Z",
"type" => "cassandra",
"host" => "dp0652",
"path" => "/usr/install/cassandra/logs/system.log"
}
{
"message" => "java.net.SocketException: 断开的管道",
"@version" => "1",
"@timestamp" => "2016-06-20T08:42:35.544Z",
"type" => "cassandra",
"host" => "dp0652",
"path" => "/usr/install/cassandra/logs/system.log"
}
{
"message" => "\tat java.net.SocketOutputStream.socketWrite0(Native Method) ~[na:1.7.0_51]",
"@version" => "1",
"@timestamp" => "2016-06-20T08:42:35.544Z",
"type" => "cassandra",
"host" => "dp0652",
"path" => "/usr/install/cassandra/logs/system.log"
}

添加多行支持

1
2
3
4
5
6
7
8
9
10
11
input {
file {
path => "/usr/install/cassandra/logs/system.log"
start_position => beginning
type => "cassandra"
codec => multiline {
pattern => "^\s"
what => "previous"
}
}
}

将任何以空白开始的行与上一行合并,但是这种方式还是不够理想。实际上下面两条记录应该属于一条

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
{
"@timestamp" => "2016-06-20T08:49:54.521Z",
"message" => "ERROR [metrics-graphite-reporter-thread-1] 2016-06-13 18:30:39,850 GraphiteReporter.java:281 - Error sending to Graphite:",
"@version" => "1",
"type" => "cassandra",
"host" => "dp0652",
"path" => "/usr/install/cassandra/logs/system.log"
}
{
"@timestamp" => "2016-06-20T08:49:54.521Z",
"message" => "java.net.SocketException: 断开的管道\n\tat java.net.SocketOutputStream.socketWrite0(Native Method) ~[na:1.7.0_51]\n\tat java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:113) ~[na:1.7.0_51]\n\tat java.net.SocketOutputStream.write(SocketOutputStream.java:159) ~[na:1.7.0_51]\n\tat sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221) ~[na:1.7.0_51]\n\tat sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:282) ~[na:1.7.0_51]\n\tat sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125) ~[na:1.7.0_51]\n\tat java.io.OutputStreamWriter.write(OutputStreamWriter.java:207) ~[na:1.7.0_51]\n\tat java.io.BufferedWriter.flushBuffer(BufferedWriter.java:129) ~[na:1.7.0_51]\n\tat java.io.BufferedWriter.write(BufferedWriter.java:230) ~[na:1.7.0_51]\n\tat java.io.Writer.write(Writer.java:157) ~[na:1.7.0_51]\n\tat com.yammer.metrics.reporting.GraphiteReporter.sendToGraphite(GraphiteReporter.java:271) [metrics-graphite-2.2.0.jar:na]\n\tat com.yammer.metrics.reporting.GraphiteReporter.sendObjToGraphite(GraphiteReporter.java:265) [metrics-graphite-2.2.0.jar:na]\n\tat com.yammer.metrics.reporting.GraphiteReporter.processGauge(GraphiteReporter.java:304) [metrics-graphite-2.2.0.jar:na]\n\tat com.yammer.metrics.reporting.GraphiteReporter.processGauge(GraphiteReporter.java:26) [metrics-graphite-2.2.0.jar:na]\n\tat com.yammer.metrics.core.Gauge.processWith(Gauge.java:28) [metrics-core-2.2.0.jar:na]\n\tat com.yammer.metrics.reporting.GraphiteReporter.printRegularMetrics(GraphiteReporter.java:247) [metrics-graphite-2.2.0.jar:na]\n\tat com.yammer.metrics.reporting.GraphiteReporter.run(GraphiteReporter.java:213) [metrics-graphite-2.2.0.jar:na]\n\tat java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_51]\n\tat java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) [na:1.7.0_51]\n\tat java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) [na:1.7.0_51]\n\tat java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [na:1.7.0_51]\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_51]\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_51]\n\tat java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]",
"@version" => "1",
"tags" => [
[0] "multiline"
],
"type" => "cassandra",
"host" => "dp0652",
"path" => "/usr/install/cassandra/logs/system.log"
}

网上找到的一个Cassandra日志文件的配置:https://github.com/rustyrazorblade/dotfiles/blob/master/logstash.conf

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
output {
elasticsearch {
hosts => ["192.168.6.52:9200"]
index => "logstash-%{type}-%{+YYYY.MM.dd}"
document_type => "%{type}"
workers => 2
flush_size => 1000
idle_flush_time => 5
template_overwrite => true
}
stdout {
}
}

input {
file {
path => "/usr/install/cassandra/logs/system.log"
start_position => beginning
type => cassandra_system
}
}

filter {
if [type] == "cassandra" {
grok {
match => {"message" => "%{LOGLEVEL:level} \[%{WORD:class}:%{NUMBER:line}\] %{TIMESTAMP_ISO8601:timestamp} %{WORD:file}\.java:%{NUMBER:line2} - %{GREEDYDATA:msg}"}
}
}
}

性能测试

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
input { 
generator {
count => 30000000
}
}
output {
stdout {
codec => dots
}
kafka {
broker_list => "localhost:9092"
topic_id => "test"
compression_codec => "snappy"
}
}

bin/logstash agent -f out.conf | pv -Wbart > /dev/null

1
2
3
4
5
6
7
8
9
10
11
12
topic_id => "test" 
compression_codec => "snappy"
request_required_acks => 1
serializer_class => "kafka.serializer.StringEncoder"
request_timeout_ms => 10000
producer_type => 'async'
message_send_max_retries => 5
retry_backoff_ms => 100
queue_buffering_max_ms => 5000
queue_buffering_max_messages => 10000
queue_enqueue_timeout_ms => -1
batch_num_messages => 1000

Collectd

在52上安装collectd,network插件表示要将当前机器的信息发送到远程服务器10.57.2.26的25856端口

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
$ sudo yum install collectd
$ sudo mv /etc/collectd.conf /etc/collectd_backup.conf
$ sudo vi /etc/collectd.conf
Hostname "dp0652"
FQDNLookup true
LoadPlugin interface
LoadPlugin cpu
LoadPlugin memory
LoadPlugin network
LoadPlugin df
LoadPlugin disk
<Plugin interface>
Interface "eth0"
IgnoreSelected false
</Plugin>
<Plugin network>
Server "10.57.2.26" "25826"
</Plugin>
Include "/etc/collectd.d"
$ sudo service collectd start
Starting collectd: [ OK ]
$ sudo service collectd status
collectd (pid 9295) is running...

开发机器地址是10.57.2.26,监听25826端口,收集来自于collectd发送的信息。

即流程是:在192.168.6.52通过collectd收集系统信息,发送到10.57.2.26的logstash上

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
$ bin/plugin list | grep collect
logstash-codec-collectd

$ vi collectd.conf
input {
udp {
port => 25826
buffer_size => 1452
codec => collectd { }
}
}
output {
stdout {
codec => rubydebug
}
}
$ bin/logstash -f collectd.conf
Settings: Default pipeline workers: 4
Pipeline main started
{
"host" => "dp0652",
"@timestamp" => "2016-05-25T03:49:52.000Z",
"plugin" => "cpu",
"plugin_instance" => "21",
"collectd_type" => "cpu",
"type_instance" => "system",
"value" => 1220568,
"@version" => "1"
}....

ElasticSearch

启动ES

1
2
3
4
5
6
7
8
9
10
$ cd elasticsearch-2.3.3
$ vi config/elasticsearch.yml
cluster.name: es52
network.host: 192.168.6.52
#discovery.zen.ping.multicast.enabled: false
#http.cors.allow-origin: "/.*/"
#http.cors.enabled: true

$ bin/elasticsearch -d
$ curl http://192.168.6.52:9200/

LogStash+CollectD+ElasticSearch

上面把从collectd搜集到的数据打印到控制台,修改out转存到elasticsearch中。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
$ vi elastic.conf
input {
udp {
port => 25826
buffer_size => 1452
codec => collectd { }
}
}
output {
elasticsearch {
hosts => ["192.168.6.52:9200"]
index => "logstash-%{type}-%{+YYYY.MM.dd}"
document_type => "%{type}"
workers => 2
flush_size => 1000
idle_flush_time => 5
template_overwrite => true
}
}

$ bin/logstash -f elastic.conf

$ curl http://192.168.6.52:9200/_search?pretty

{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 613,
"max_score" : 1.0,
"hits" : [ {
"_index" : "logstash-%{type}-2016.05.25",
"_type" : "%{type}",
"_id" : "AVTmTRv8ihrmowv7fKbo",
"_score" : 1.0,
"_source" : {
"host" : "dp0652",
"@timestamp" : "2016-05-25T05:04:42.000Z",
"plugin" : "cpu",
"plugin_instance" : "22",
"collectd_type" : "cpu",
"type_instance" : "steal",
"value" : 0,
"@version" : "1"
}
}, ......]
}
}

最终的网络拓扑流程图如下:

collectd logstash es

实际上collectd是不同的服务器节点(比如Nginx服务器),而logstash和es只需要一台机器即可:

logstash

Kiabana

下载后如果ElasticSearch安装在本机,默认直接启动bin/kibana即可

https://www.elastic.co/guide/en/kibana/current/getting-started.html

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
wget -c https://www.elastic.co/guide/en/kibana/3.0/snippets/shakespeare.json
wget -c https://github.com/bly2k/files/blob/master/accounts.zip?raw=true
wget -c https://download.elastic.co/demos/kibana/gettingstarted/logs.jsonl.gz
unzip accounts.zip
gunzip logs.jsonl.gz

shakespeare表结构

{
"line_id": INT,
"play_name": "String",
"speech_number": INT,
"line_number": "String",
"speaker": "String",
"text_entry": "String",
}

建立mapping表结构

curl -XPUT http://192.168.6.52:9200/shakespeare -d '
{
"mappings" : {
"_default_" : {
"properties" : {
"speaker" : {"type": "string", "index" : "not_analyzed" },
"play_name" : {"type": "string", "index" : "not_analyzed" },
"line_id" : { "type" : "integer" },
"speech_number" : { "type" : "integer" }
}
}
}
}
';

bank表结构

{
"account_number": INT,
"balance": INT,
"firstname": "String",
"lastname": "String",
"age": INT,
"gender": "M or F",
"address": "String",
"employer": "String",
"email": "String",
"city": "String",
"state": "String"
}



//logstash-2015.05.18, logstash-2015.05.19, logstash-2015.05.20都要修改
curl -XPUT http://192.168.6.52:9200/logstash-2015.05.20 -d '
{
"mappings": {
"log": {
"properties": {
"geo": {
"properties": {
"coordinates": {
"type": "geo_point"
}
}
}
}
}
}
}
';

curl -XPOST '192.168.6.52:9200/bank/account/_bulk?pretty' --data-binary @accounts.json
curl -XPOST '192.168.6.52:9200/shakespeare/_bulk?pretty' --data-binary @shakespeare.json
curl -XPOST '192.168.6.52:9200/_bulk?pretty' --data-binary @logs.jsonl

curl '192.168.6.52:9200/_cat/indices?v'

[qihuang.zheng@dp0652 ~]$ curl '192.168.6.52:9200/_cat/indices?v'
health status index pri rep docs.count docs.deleted store.size pri.store.size
yellow open logstash-%{type}-2016.05.25 5 1 12762 0 1.5mb 1.5mb
yellow open logstash-cassandra_system-2016.06.20 5 1 17125 0 3.9mb 3.9mb
yellow open logstash-cassandra_system-2016.06.21 5 1 262 0 233.7kb 233.7kb
yellow open bank 5 1 1000 0 442.2kb 442.2kb
yellow open .kibana 1 1 5 0 23.3kb 23.3kb
yellow open shakespeare 5 1 111396 0 18.4mb 18.4mb
green open graylog_0 1 0 8123 0 2.3mb 2.3mb
yellow open logstash-2015.05.20 5 1 4750 0 28.7mb 28.7mb
yellow open logstash-2015.05.18 5 1 4631 0 27.4mb 27.4mb
yellow open logstash-cassandra_system-2016.06.22 5 1 42 0 146.5kb 146.5kb
yellow open logstash-2015.05.19 5 1 4624 0 27.8mb 27.8mb

InfluxDB

1
2
3
4
5
wget -c https://dl.influxdata.com/influxdb/releases/influxdb-0.13.0_linux_amd64.tar.gz
wget -c https://dl.influxdata.com/telegraf/releases/telegraf-0.13.1_linux_i386.tar.gz
wget -c https://dl.influxdata.com/chronograf/releases/chronograf-0.13.0-1.x86_64.rpm
wget -c https://dl.influxdata.com/kapacitor/releases/kapacitor-0.13.1_linux_amd64.tar.gz
sudo yum localinstall chronograf-0.13.0-1.x86_64.rpm

生成配置文件,启动时指定配置文件

1
2
3
$ cd influxdb-0.13.0-1
$ usr/bin/influxd config > influxdb.generated.conf
$ nohup usr/bin/influxd -config influxdb.generated.conf &

http://192.168.6.52:8083/

用命令行客户端创建数据库,插入数据,查询数据

1
2
3
4
5
6
7
8
9
10
11
$ usr/bin/influx
Connected to http://localhost:8086 version 0.13.x
InfluxDB shell 0.13.x
> CREATE DATABASE mydb
> USE mydb
> INSERT cpu,host=serverA,region=us_west value=0.64
> INSERT cpu,host=serverA,region=us_east value=0.45
> INSERT temperature,machine=unit42,type=assembly external=25,internal=37
> SELECT host, region, value FROM cpu
> SELECT * FROM temperature
> SELECT * FROM /.*/ LIMIT 1

在web页面也可以添加数据:

influxdb write

Grafana

grafana-1.9.1

由于Grafana是存静态的,你只需要下载源代码解压,将它部署在Nginx上面就可以了,或者可以用Python的SimpleHTTPServer来跑

1
2
3
$ wget http://grafanarel.s3.amazonaws.com/grafana-1.9.1.tar.gz
$ cd grafana-1.9.1
$ python -m SimpleHTTPServer 8383

http://192.168.6.52:8383

没有任何数据源时,页面是空白的:

grafana1

添加了数据源后比如influxdb

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
$ cp config.sample.js config.js
$ vi config.js
datasources: {
influxdb: {
type: 'influxdb',
url: "http://192.168.6.52:8086/db/cassandra-metrics",
username: 'admin',
password: 'admin',
},
grafana: {
type: 'influxdb',
url: "http://192.168.6.52:8086/db/grafana",
username: 'admin',
password: 'admin',
grafanaDB: true
},
},

重启python进程后,可以看到多了点东西(如果influxdb没有添加admin用户,上面的username和password可以去掉),

grafana2

但是就是看不到配置相关的按钮,难道是没有权限?而且这个版本进来后,根本没有login页面。

grafana-2.x

https://grafanarel.s3.amazonaws.com/builds/grafana-3.0.3-1463994644.linux-x64.tar.gz

如果用2.5以上的版本包括3.0,和1.9的目录结构相比发生很大变化

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
[qihuang.zheng@dp0652 grafana-1.9.1]$ tree -L 1
.
├── app
├── build.txt
├── config.js
├── config.sample.js
├── css
├── font
├── img
├── index.html
├── LICENSE.md
├── NOTICE.md
├── plugins
├── README.md
├── test
└── vendor

[qihuang.zheng@dp0652 grafana-3.0.3-1463994644]$ tree -L 2
.
├── bin
│   ├── grafana-cli
│   ├── grafana-cli.md5
│   ├── grafana-server
│   └── grafana-server.md5
├── conf
│   ├── defaults.ini
│   └── sample.ini
├── LICENSE.md
├── NOTICE.md
├── public
│   ├── app
│   ├── css
│   ├── dashboards
│   ├── emails
│   ├── fonts
│   ├── img
│   ├── robots.txt
│   ├── sass
│   ├── test
│   ├── vendor
│   └── views
├── README.md
└── vendor
└── phantomjs

如果也是用python -m SimpleHTTPServer 8383打开,浏览器不会显示任何东西

grafana3

参照官方的安装指南分分钟搞定,而且貌似并没有依赖Web服务器之类的。

1
2
3
4
5
6
7
8
9
10
$ sudo yum install https://grafanarel.s3.amazonaws.com/builds/grafana-3.0.4-1464167696.x86_64.rpm
$ sudo service grafana-server start
Starting Grafana Server: .... FAILED
$ ps -ef|grep grafana
grafana 16078 1 0 13:19 ? 00:00:00 /usr/sbin/grafana-server
--pidfile=/var/run/grafana-server.pid --config=/etc/grafana/grafana.ini
cfg:default.paths.data=/var/lib/grafana cfg:default.paths.logs=/var/log/grafana cfg:default.paths.plugins=/var/lib/grafana/plugins
$ vi /var/log/grafana/grafana.log
$ sudo service grafana-server status
grafana-server (pid 16078) is running...

虽然看似失败了,不过日志文件中没有什么错误信息,打开:http://192.168.6.52:3000/
出现了登陆页面,admin/admin

grafana4

而且grafana的图标是可以点的,也有数据源。首先添加influxdb的数据源

grafana5

influx-grafana

添加一个dashboards①,然后添加一个panel②,在数据源中选择influxdb

grafana6

默认的查询语句:SELECT mean("value") FROM "measurement" WHERE $timeFilter GROUP BY time($interval) fill(null)

grafana7

更改为:SELECT mean("value") FROM "cpu" WHERE "host" = "serverA" AND $timeFilter GROUP BY time($interval)

grafana8

这时候上方会出现图,点击关闭,退出编辑状态。 注意不要点击眼睛,一旦变成灰色,表示不发送查询语句
正常q的内容是查询语句:curl -GET 'http://localhost:8086/query?pretty=true' --data-urlencode "db=mydb" --data-urlencode "q=SELECT value FROM cpu_load_short WHERE region='us-west'"

grafana9

往influxdb中插入几条数据:

1
2
cpu,host=serverA,region=us_west value=0.36
cpu,host=serverA,region=us_west value=0.85

grafana10

在右上角可以选择时间范围,如果时间超过了,没有数据,出现N/A。

grafana11

Cassandra+InfluxDB+Grafana

使用Grafana监控Cassandra有两个步骤:

  1. 将Cassandra监控指标数据收集到InfluxDB中
  2. 在Grafana中配置InfluxDB的数据源,展现Cassandra指标数据

参考文档:
http://www.datastax.com/dev/blog/pluggable-metrics-reporting-in-cassandra-2-0-2
https://www.pythian.com/blog/monitoring-cassandra-grafana-influx-db/

其中步骤1有多种方案:

  1. 使用Graphite收集数据,发送到InfluxDB中
  2. 使用InfluxDB的telegraf输入插件收集数据

Graphite

1)下载metrics-graphite.jar放到Cassandra的lib目录下

2)修改influxdb的配置文件,其中database要对应influxdb中的数据库名称,这里为cassandra-metrics(要在influxdb中手动创建这个数据库)

参考文档中配置文件是config.toml,在新版本中启动influxdb时生成了一个配置文件,实际上是一样的。
旧版本的输入配置项是:[input_plugins.graphite],新版本的配置项为:[[graphite]]。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[qihuang.zheng@dp0652 influxdb-0.13.0-1]$ vi influxdb.generated.conf
[[graphite]]
enabled = true
bind-address = ":2003"
protocol = "tcp"
batch-size = 5000
batch-pending = 10
batch-timeout = "1s"
consistency-level = "one"
separator = "."
udp-read-buffer = 0
# database = "graphite"
database = "cassandra-metrics"
udp_enabled = true

上面的配置文件表示,InfluxDB将打开2003端口,接收graphite类型的指标数据

3)重启influxdb,使配置文件生效

1
2
$ service influxdb reload  # 通过RPM方式安装时,不需要重启
$ influxdb -config influxdb.generated.conf reload

4)在cassandra的conf目录下创建influx的配置文件,其中host指的是influxdb的服务端地址,端口对应上面influxdb.generated.conf的2003端口。
如果influxdb安装在不同节点上,下面的host要指向influxdb的地址。 prefix通常设置为当前机器的IP地址。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[qihuang.zheng@dp0652 conf]$ vi /usr/install/cassandra/conf/influx-reporting.yaml
graphite:
-
period: 60
timeunit: 'SECONDS'
prefix: 'Node1'
hosts:
- host: '192.168.6.52'
port: 2003
predicate:
color: "white"
useQualifiedName: true
patterns:
- ".*"

上面的配置文件表示:当前Cassandra节点的每个指标前缀是Node1,都会发送到地址为52的2003端口,即把Cassandra的指标数据发送给InfluxDB。
这里也可以看出指标数据发送的机器只是指定了地址和端口,所以目标地址可以是任何可以接收graphite类型的数据库,不一定是InfluxDB。

5)安装grafana,不需要配置config.js,在web页面也可以添加数据源

6)给cassandra-env.sh添加-D启动选项

1
2
[qihuang.zheng@dp0652 conf]$ vi /usr/install/cassandra/conf/cassandra-env.sh
JVM_OPTS="$JVM_OPTS -Dcassandra.metricsReporterConfigFile=influx-reporting.yaml"

7)重启cassandra

1
2
INFO [main] YYYY-MM-DD HH:MM:SS,SSS CassandraDaemon.java:353 - Trying to load metrics-reporter-config from file: influx-reporting.yaml
INFO [main] YYYY-MM-DD HH:MM:SS,SSS GraphiteReporterConfig.java:68 - Enabling GraphiteReporter to 192.168.6.52:2003

8)验证Cassandra指标数据通过metrics-graphite.jar被收集到influxdb中

cass influxd1

选择一个measurement,验证有数据被写入:select * FROM "Node1.jvm.daemon_thread_count"

cass influxd2

9)在grafana中配置influxdb的数据源,把数据库名称改成cassandra-metrics

cass influxd3

配置监控图,修改measurement

cass influxd4

10)总结下通过Graphite+InfluxDB收集Cassandra指标数据的步骤流程图:

graphite-cass-influxdb

用正则表达式可以聚合多个节点的指标

1
2
select mean(value) from /.*org.apache.cassandra.metrics.ClientRequest.Read/
select mean(value) from "192.168.6.53.org.apache.cassandra.metrics.ClientRequest.Read.Latency.15MinuteRate"

Telegraph

Hekad

http://hekad.readthedocs.io/en/v0.10.0/index.html

Atlas

https://github.com/Netflix/atlas

Graylog

http://www.cnblogs.com/wjoyxt/p/4961262.html
http://docs.graylog.org/en/2.0/pages/installation/manual_setup.html

准备工作:mongodb安装和启动

1
2
3
4
5
6
7
8
9
10
11
12
mkdir -p /home/qihuang.zheng/data/mongodb
curl -O https://fastdl.mongodb.org/linux/mongodb-linux-x86_64-3.2.7.tgz
tar zxf mongodb-linux-x86_64-3.2.7.tgz
nohup mongodb-linux-x86_64-3.2.7/bin/mongod --dbpath /home/qihuang.zheng/data/mongodb &

sudo rpm -ivh pwgen-2.07-1.el6.x86_64.rpm && sudo yum install perl-Digest-SHA
[qihuang.zheng@dp0652 ~]$ pwgen -N 1 -s 96
XNamqyHbxtV46AHXNMlMAvVeV2dutp2pQaeY9IaOSf9XnwgyYGXqN97SSCQ2OLyR2HR41BtCpxwSMH4kFnr1VHmRNUQvYyic
[qihuang.zheng@dp0652 ~]$ echo -n XNamqyHbxtV46AHXNMlMAvVeV2dutp2pQaeY9IaOSf9XnwgyYGXqN97SSCQ2OLyR2HR41BtCpxwSMH4kFnr1VHmRNUQvYyic | shasum -a 256
cb535aa3ff35e81f69f9014005bcf1ad032048cc123dad735bbf87970eb2cacb -
[qihuang.zheng@dp0652 ~]$ echo -n admin | shasum -a 256
8c6976e5b5410415bde908bd4dee15dfb167a9c873fc4bb8a81f6f2ab448a918 -

graylog的配置文件中默认的配置项,单机情况可以不用修改任何配置,不过最好把localhost改成本机IP

1
2
3
4
5
6
7
8
9
10
11
12
13
14
$ wget https://fossies.org/linux/misc/graylog-2.0.2.tgz
$ tar zxf graylog-VERSION.tgz && cd graylog
$ vi graylog.conf.example
is_master = true
node_id_file = /etc/graylog/server/node-id
password_secret =
root_password_sha2 =
rest_listen_uri = http://127.0.0.1:12900/
#elasticsearch_cluster_name = graylog
#elasticsearch_discovery_zen_ping_unicast_hosts = 127.0.0.1:9300, 127.0.0.2:9500
elasticsearch_shards = 4
elasticsearch_replicas = 0
elasticsearch_index_prefix = graylog
mongodb_uri = mongodb://localhost/graylog

配置文件的路径/etc/graylog/server/server.conf写死在bin/graylogctl脚本中

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
$ sudo mkdir -p /etc/graylog/server/
$ sudo cp graylog.conf.example /etc/graylog/server/server.conf
$ cat /etc/graylog/server/server.conf |grep -v grep|grep -v ^#|grep -v ^$

password_secret=XNamqyHbxtV46AHXNMlMAvVeV2dutp2pQaeY9IaOSf9XnwgyYGXqN97SSCQ2OLyR2HR41BtCpxwSMH4kFnr1VHmRNUQvYyic
password=8c6976e5b5410415bde908bd4dee15dfb167a9c873fc4bb8a81f6f2ab448a918
cluster=`cat ~/elasticsearch-2.3.3/config/elasticsearch.yml |grep cluster.name | awk '{print $2}'`
sed -i -e "s#password_secret =#password_secret = $password_secret#g" server.conf
sed -i -e "s#root_password_sha2 =#root_password_sha2 = $password#g" server.conf
sed -i -e "s#127.0.0.1#192.168.6.52#g" server.conf
sed -i -e "s#localhost#192.168.6.52#g" server.conf
sed -i -e "s#elasticsearch_shards = 4#elasticsearch_shards = 1#g" server.conf
sed -i -e "s#web_listen_uri = http://127.0.0.1:9000/#web_listen_uri = http://192.168.6.52:9999/#g" server.conf
######################################
elasticsearch_cluster_name = $cluster
elasticsearch_discovery_zen_ping_unicast_hosts = 192.168.6.52:9300

graylog的启动脚本在bin/graylogctl,实际上启动命令是java -jar graylog.jar

1
2
3
4
5
6
7
8
9
[qihuang.zheng@dp0652 graylog-2.0.2]$ ll
drwxr-xr-x 2 qihuang.zheng users 4096 6月 20 13:37 bin
-rw-r--r-- 1 qihuang.zheng users 35147 5月 26 23:31 COPYING
drwxr-xr-x 4 qihuang.zheng users 4096 6月 20 12:58 data
-rw-r--r-- 1 qihuang.zheng users 23310 5月 26 23:31 graylog.conf.example
-rw-r--r-- 1 qihuang.zheng users 80950701 5月 26 23:34 graylog.jar ⬅️
drwxr-xr-x 3 qihuang.zheng users 4096 6月 20 11:55 lib
drwxr-xr-x 2 qihuang.zheng users 4096 6月 20 13:02 log
drwxr-xr-x 2 qihuang.zheng users 4096 5月 26 23:33 plugin

如果要自定义log配置,修改bin/graylogctl的start部分在-jar前添加配置文件路径

1
-Dlog4j.configurationFile=file:///home/qihuang.zheng/graylog-2.0.2/log4j2.xml -jar "${GRAYLOG_SERVER_JAR}" server

启动graylog,同时会启动web服务,默认端口是9000,不过和HDFS重了,所以上面把web_listen_uri修改成9999

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[qihuang.zheng@dp0652 ~]$ sudo graylog-2.0.2/bin/graylogctl start
Starting graylog-server ...
[qihuang.zheng@dp0652 ~]$ sudo graylog-2.0.2/bin/graylogctl status
graylog-server running with PID 8600
[qihuang.zheng@dp0652 ~]$ ll /etc/graylog/server/
-rw-r--r-- 1 root root 36 6月 20 12:58 node-id
-rw-r--r-- 1 root root 23496 6月 20 12:57 server.conf
[qihuang.zheng@dp0652 ~]$ cat /etc/graylog/server/node-id
eb943e44-3464-47be-9c07-2a554de71428
[qihuang.zheng@dp0652 graylog-2.0.2]$ ps -ef|grep graylog
root 33748 1 48 14:17 pts/0 00:01:20 /home/qihuang.zheng/jdk1.8.0_91/bin/java -Djava.library.path=bin/../lib/sigar -Xms1g -Xmx1g -XX:NewRatio=1 -server -XX:+ResizeTLAB -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:+CMSClassUnloadingEnabled -XX:+UseParNewGC -XX:-OmitStackTraceInFastThrow -jar graylog.jar server -f /etc/graylog/server/server.conf -p /tmp/graylog.pid
506 38283 29183 0 14:20 pts/0 00:00:00 grep graylog
[qihuang.zheng@dp0652 graylog-2.0.2]$ cat /tmp/graylog.pid
33748

日志文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
[qihuang.zheng@dp0652 graylog-2.0.2]$ cat log/graylog-server.log
2016-06-20 14:17:46,602 INFO : kafka.log.LogManager - Loading logs.
2016-06-20 14:17:46,662 INFO : kafka.log.LogManager - Logs loading complete.
2016-06-20 14:17:46,662 INFO : org.graylog2.shared.journal.KafkaJournal - Initialized Kafka based journal at data/journal
2016-06-20 14:17:46,676 INFO : org.graylog2.shared.buffers.InputBufferImpl - Initialized InputBufferImpl with ring size <65536> and wait strategy <BlockingWaitStrategy>, running 2 parallel message handlers.
2016-06-20 14:17:46,706 INFO : org.mongodb.driver.cluster - Cluster created with settings {hosts=[192.168.6.52:27017], mode=SINGLE, requiredClusterType=UNKNOWN, serverSelectionTimeout='30000 ms', maxWaitQueueSize=5000}
2016-06-20 14:17:46,731 INFO : org.mongodb.driver.cluster - No server chosen by ReadPreferenceServerSelector{readPreference=primary} from cluster description ClusterDescription{type=UNKNOWN, connectionMode=SINGLE, all=[ServerDescription{address=192.168.6.52:27017, type=UNKNOWN, state=CONNECTING}]}. Waiting for 30000 ms before timing out
2016-06-20 14:17:46,752 INFO : org.mongodb.driver.connection - Opened connection [connectionId{localValue:1, serverValue:40}] to 192.168.6.52:27017
2016-06-20 14:17:46,753 INFO : org.mongodb.driver.cluster - Monitor thread successfully connected to server with description ServerDescription{address=192.168.6.52:27017, type=STANDALONE, state=CONNECTED, ok=true, version=ServerVersion{versionList=[3, 2, 7]}, minWireVersion=0, maxWireVersion=4, maxDocumentSize=16777216, roundTripTimeNanos=512946}
2016-06-20 14:17:46,758 INFO : org.mongodb.driver.connection - Opened connection [connectionId{localValue:2, serverValue:41}] to 192.168.6.52:27017
2016-06-20 14:17:46,929 INFO : org.graylog2.plugin.system.NodeId - Node ID: eb943e44-3464-47be-9c07-2a554de71428
2016-06-20 14:17:46,978 INFO : org.elasticsearch.node - [graylog-eb943e44-3464-47be-9c07-2a554de71428] version[2.3.2], pid[33748], build[b9e4a6a/2016-04-21T16:03:47Z]
2016-06-20 14:17:46,978 INFO : org.elasticsearch.node - [graylog-eb943e44-3464-47be-9c07-2a554de71428] initializing ...
2016-06-20 14:17:46,984 INFO : org.elasticsearch.plugins - [graylog-eb943e44-3464-47be-9c07-2a554de71428] modules [], plugins [graylog-monitor], sites []
2016-06-20 14:17:48,243 INFO : org.elasticsearch.node - [graylog-eb943e44-3464-47be-9c07-2a554de71428] initialized
2016-06-20 14:17:48,304 INFO : org.hibernate.validator.internal.util.Version - HV000001: Hibernate Validator 5.2.4.Final
2016-06-20 14:17:48,419 INFO : org.graylog2.shared.buffers.ProcessBuffer - Initialized ProcessBuffer with ring size <65536> and wait strategy <BlockingWaitStrategy>.
2016-06-20 14:17:49,918 INFO : org.graylog2.bindings.providers.RulesEngineProvider - No static rules file loaded.
2016-06-20 14:17:50,510 INFO : org.graylog2.bootstrap.ServerBootstrap - Graylog server 2.0.2 (4da1379) starting up
2016-06-20 14:17:50,514 WARN : org.graylog2.shared.events.DeadEventLoggingListener - Received unhandled event of type <org.graylog2.plugin.lifecycles.Lifecycle> from event bus <AsyncEventBus{graylog-eventbus}>
2016-06-20 14:17:50,532 INFO : org.graylog2.shared.initializers.PeriodicalsService - Starting 24 periodicals ...
2016-06-20 14:17:50,538 INFO : org.elasticsearch.node - [graylog-eb943e44-3464-47be-9c07-2a554de71428] starting ...
2016-06-20 14:17:50,544 INFO : org.graylog2.periodical.IndexRetentionThread - Elasticsearch cluster not available, skipping index retention checks.
2016-06-20 14:17:50,546 INFO : org.mongodb.driver.connection - Opened connection [connectionId{localValue:4, serverValue:43}] to 192.168.6.52:27017
2016-06-20 14:17:50,546 INFO : org.mongodb.driver.connection - Opened connection [connectionId{localValue:3, serverValue:42}] to 192.168.6.52:27017
2016-06-20 14:17:50,553 INFO : org.graylog2.periodical.IndexerClusterCheckerThread - Indexer not fully initialized yet. Skipping periodic cluster check.
2016-06-20 14:17:50,573 INFO : org.graylog2.shared.initializers.PeriodicalsService - Not starting [org.graylog2.periodical.UserPermissionMigrationPeriodical] periodical. Not configured to run on this node.
2016-06-20 14:17:50,648 INFO : org.elasticsearch.transport - [graylog-eb943e44-3464-47be-9c07-2a554de71428] publish_address {127.0.0.1:9350}, bound_addresses {[::1]:9350}, {127.0.0.1:9350}
2016-06-20 14:17:50,654 INFO : org.elasticsearch.discovery - [graylog-eb943e44-3464-47be-9c07-2a554de71428] es52/rWpoduohQ1CXcyoAuwzNtg
2016-06-20 14:17:53,239 INFO : org.glassfish.grizzly.http.server.NetworkListener - Started listener bound to [192.168.6.52:9999]
2016-06-20 14:17:53,241 INFO : org.glassfish.grizzly.http.server.HttpServer - [HttpServer] Started.
2016-06-20 14:17:53,242 INFO : org.graylog2.initializers.WebInterfaceService - Started Web Interface at <http://192.168.6.52:9999/>
2016-06-20 14:17:53,657 WARN : org.elasticsearch.discovery - [graylog-eb943e44-3464-47be-9c07-2a554de71428] waited for 3s and no initial state was set by the discovery
2016-06-20 14:17:53,657 INFO : org.elasticsearch.node - [graylog-eb943e44-3464-47be-9c07-2a554de71428] started
2016-06-20 14:17:53,730 INFO : org.elasticsearch.cluster.service - [graylog-eb943e44-3464-47be-9c07-2a554de71428] detected_master {Daisy Johnson}{ZVzCrnWoRsKVnRryYLE6BQ}{192.168.6.52}{192.168.6.52:9300}, added {{Daisy Johnson}{ZVzCrnWoRsKVnRryYLE6BQ}{192.168.6.52}{192.168.6.52:9300},}, reason: zen-disco-receive(from master [{Daisy Johnson}{ZVzCrnWoRsKVnRryYLE6BQ}{192.168.6.52}{192.168.6.52:9300}])
2016-06-20 14:17:56,443 INFO : org.glassfish.grizzly.http.server.NetworkListener - Started listener bound to [192.168.6.52:12900]
2016-06-20 14:17:56,443 INFO : org.glassfish.grizzly.http.server.HttpServer - [HttpServer-1] Started.
2016-06-20 14:17:56,444 INFO : org.graylog2.shared.initializers.RestApiService - Started REST API at <http://192.168.6.52:12900/>
2016-06-20 14:17:56,445 INFO : org.graylog2.shared.initializers.ServiceManagerListener - Services are healthy
2016-06-20 14:17:56,446 INFO : org.graylog2.shared.initializers.InputSetupService - Triggering launching persisted inputs, node transitioned from Uninitialized [LB:DEAD] to Running [LB:ALIVE]
2016-06-20 14:17:56,446 INFO : org.graylog2.bootstrap.ServerBootstrap - Services started, startup times in ms: {JournalReader [RUNNING]=1, BufferSynchronizerService [RUNNING]=1, OutputSetupService [RUNNING]=1, InputSetupService [RUNNING]=2, MetricsReporterService [RUNNING]=2, KafkaJournal [RUNNING]=4, PeriodicalsService [RUNNING]=51, WebInterfaceService [RUNNING]=2705, IndexerSetupService [RUNNING]=3211, RestApiService [RUNNING]=5912}
2016-06-20 14:17:56,451 INFO : org.graylog2.bootstrap.ServerBootstrap - Graylog server up and running.
2016-06-20 14:18:00,548 INFO : org.graylog2.indexer.Deflector - Did not find an deflector alias. Setting one up now.
2016-06-20 14:18:00,552 INFO : org.graylog2.indexer.Deflector - There is no index target to point to. Creating one now.
2016-06-20 14:18:00,554 INFO : org.graylog2.indexer.Deflector - Cycling deflector to next index now.
2016-06-20 14:18:00,555 INFO : org.graylog2.indexer.Deflector - Cycling from <none> to <graylog_0>
2016-06-20 14:18:00,555 INFO : org.graylog2.indexer.Deflector - Creating index target <graylog_0>...
2016-06-20 14:18:00,614 INFO : org.graylog2.indexer.indices.Indices - Created Graylog index template "graylog-internal" in Elasticsearch.
2016-06-20 14:18:00,698 INFO : org.graylog2.indexer.Deflector - Waiting for index allocation of <graylog_0>
2016-06-20 14:18:00,800 INFO : org.graylog2.indexer.Deflector - Done!
2016-06-20 14:18:00,800 INFO : org.graylog2.indexer.Deflector - Pointing deflector to new target index....
2016-06-20 14:18:00,845 INFO : org.graylog2.system.jobs.SystemJobManager - Submitted SystemJob <bd3c64c0-36ae-11e6-bcbc-02423384d6ab> [org.graylog2.indexer.ranges.CreateNewSingleIndexRangeJob]
2016-06-20 14:18:00,845 INFO : org.graylog2.indexer.Deflector - Done!
2016-06-20 14:18:00,845 INFO : org.graylog2.indexer.ranges.CreateNewSingleIndexRangeJob - Calculating ranges for index graylog_0.
2016-06-20 14:18:00,964 INFO : org.graylog2.indexer.ranges.MongoIndexRangeService - Calculated range of [graylog_0] in [117ms].
2016-06-20 14:18:00,973 INFO : org.graylog2.indexer.ranges.CreateNewSingleIndexRangeJob - Created ranges for index graylog_0.
2016-06-20 14:18:00,973 INFO : org.graylog2.system.jobs.SystemJobManager - SystemJob <bd3c64c0-36ae-11e6-bcbc-02423384d6ab> [org.graylog2.indexer.ranges.CreateNewSingleIndexRangeJob] finished in 128ms.

查看ES是否创建索引

1
2
3
[qihuang.zheng@dp0652 graylog-2.0.2]$ curl http://192.168.6.52:9200/_cat/indices
yellow open logstash-%{type}-2016.05.25 5 1 12762 0 1.5mb 1.5mb
green open graylog_0 1 0 0 0 159b 159b

查看MongoDB是否创建数据库

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
[qihuang.zheng@dp0652 graylog-2.0.2]$ ~/mongodb-linux-x86_64-3.2.7/bin/mongo
MongoDB shell version: 3.2.7
connecting to: test
> show dbs;
graylog 0.001GB
local 0.000GB
test 0.000GB
> show collections
cluster_config
cluster_events
collectors
content_packs
grok_patterns
index_failures
index_ranges
nodes
notifications
pipeline_processor_pipelines
pipeline_processor_pipelines_streams
pipeline_processor_rules
roles
sessions
system_messages
users
> db.nodes.count()
1
> db.cluster_config.count()
11
> db.nodes.find()
{ "_id" : ObjectId("5767780ab01412219843147b"), "is_master" : true, "hostname" : "dp0652", "last_seen" : 1466404508, "transport_address" : "http://192.168.6.52:12900/", "type" : "SERVER", "node_id" : "eb943e44-3464-47be-9c07-2a554de71428" }
> db.cluster_config.distinct("type")
[
"org.graylog2.bundles.ContentPackLoaderConfig",
"org.graylog2.cluster.UserPermissionMigrationState",
"org.graylog2.indexer.management.IndexManagementConfig",
"org.graylog2.indexer.retention.strategies.ClosingRetentionStrategyConfig",
"org.graylog2.indexer.retention.strategies.DeletionRetentionStrategyConfig",
"org.graylog2.indexer.rotation.strategies.MessageCountRotationStrategyConfig",
"org.graylog2.indexer.rotation.strategies.SizeBasedRotationStrategyConfig",
"org.graylog2.indexer.rotation.strategies.TimeBasedRotationStrategyConfig",
"org.graylog2.indexer.searches.SearchesClusterConfig",
"org.graylog2.periodical.IndexRangesMigrationPeriodical.MongoIndexRangesMigrationComplete",
"org.graylog2.plugin.cluster.ClusterId"
]

注意必须修改server.conf的elasticsearch_cluster_name配置项和已经安装的elasticsearch的名称一样,否则会报错:

1
2
3
4
5
6
7
8
2016-06-20 14:06:47,920 ERROR: org.graylog2.shared.rest.exceptionmappers.AnyExceptionClassMapper - Unhandled exception in REST resource
org.elasticsearch.discovery.MasterNotDiscoveredException
at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$5.onTimeout(TransportMasterNodeAction.java:226) ~[graylog.jar:?]
at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:236) ~[graylog.jar:?]
at org.elasticsearch.cluster.service.InternalClusterService$NotifyTimeout.run(InternalClusterService.java:804) ~[graylog.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_91]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_91]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_91]

用admin/admin登陆:http://192.168.6.52:9999,密码是echo -n admin | shasum -a 256中的admin

graylog-overview

发送数据到gralog

UDP

1.在web页面的System/Inputs添加一个RAW UDP文本

gray-inputs

2.模拟向端口输入数据

1
[qihuang.zheng@dp0652 graylog-2.0.2]$ echo "Hello Graylog, let's be friends." | nc -w 1 -u 127.0.0.1 5555

3.在Search可以查询到这条消息

gray-udp

GELF

1.配置gelf

gray-gelf

2.发送gelf数据

curl -XPOST http://192.168.6.52:12201/gelf -p0 -d ‘{“short_message”:”Hello there”, “host”:”example.org”, “facility”:”test”, “_foo”:”bar”}’

Logstash

Input配置为GELF UDP,端口为12202,和GELF HTTP的12201区分开来

1
2
➜  logstash-2.3.2 bin/logstash -e 'input { stdin {} } output { gelf {host => "192.168.6.52" port => 12202 } }'
logstash message

syslog

gray-syslog

kafka

1.创建topic

1
2
3
bin/kafka-topics.sh --create --zookeeper 192.168.6.55:2181,192.168.6.56:2181,192.168.6.57:2181 --replication-factor 1 --partitions 1 --topic graylog-test
bin/kafka-console-producer.sh --broker-list 192.168.6.55:9092,192.168.6.56:9092,192.168.6.57:9092 --topic graylog-test
hello msg from kafka

2.配置kafka输入

gray-kafka

3.查看收集的日志

gray-kafka-msg

Graylog-Cassandra

https://github.com/Graylog2/graylog-plugin-metrics-reporter

Ref

http://www.cnblogs.com/kylinlin/p/4692073.html
http://rmoff.net/2016/05/12/monitoring-logstash-ingest-rates-with-influxdb-and-grafana/


文章目录
  1. 1. ELK
    1. 1.1. LogStash
      1. 1.1.1. 标准I/O
    2. 1.2. File Input
      1. 1.2.1. 性能测试
      2. 1.2.2. Collectd
    3. 1.3. ElasticSearch
    4. 1.4. LogStash+CollectD+ElasticSearch
    5. 1.5. Kiabana
  2. 2. InfluxDB
  3. 3. Grafana
    1. 3.1. grafana-1.9.1
    2. 3.2. grafana-2.x
  4. 4. Cassandra+InfluxDB+Grafana
    1. 4.1. Graphite
    2. 4.2. Telegraph
  5. 5. Hekad
  6. 6. Atlas
  7. 7. Graylog
    1. 7.1. 发送数据到gralog
      1. 7.1.1. UDP
      2. 7.1.2. GELF
      3. 7.1.3. Logstash
      4. 7.1.4. syslog
      5. 7.1.5. kafka
    2. 7.2. Graylog-Cassandra
  8. 8. Ref