ElasticSearch入门

You Know, For Search

启动ES:

1
bin/elasticsearch

Hello CRUD:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
//索引. 数据库为twitter,表名为user,主键为kimchy,字段name值为Shay Banon
$ curl -XPUT http://localhost:9200/twitter/user/kimchy -d '{
"name" : "Shay Banon"
}'

//索引,多个field. 主键为1,tweet表有三个字段:user,post_date,message
$ curl -XPUT http://localhost:9200/twitter/tweet/1 -d '{
"user": "kimchy",
"post_date": "2009-11-15T13:12:00",
"message": "Trying out elasticsearch, so far so good?"
}'

//索引,注意url里面的id是不一样的哦. 主键为2,另一条记录
$ curl -XPUT http://localhost:9200/twitter/tweet/2 -d '{
"user": "kimchy",
"post_date": "2009-11-15T14:12:12",
"message": "You know, for Search"
}'

//索引, 同一个id再次put相当于更新,version+1
$ curl -XPUT http://localhost:9200/twitter/tweet/2 -d '{
"user": "kimchy",
"post_date": "2009-11-15T14:12:12",
"message": "You know, for Search"
}'

//获取: 获取twitter数据库中tweet表主键为2的记录
$ curl -XGET http://localhost:9200/twitter/tweet/2

//lucene语法方式的查询, 查询tweet表中user字段=kimchy的记录
$ curl -XGET http://localhost:9200/twitter/tweet/_search?q=user:kimchy

//query DSL方式查询, 同上
$ curl -XGET http://localhost:9200/twitter/tweet/_search -d '{
"query" : {
"term" : { "user": "kimchy" }
}
}'

//query DSL方式查询, 查询post_date在指定范围内的记录集
$ curl -XGET http://localhost:9200/twitter/_search?pretty=true -d '{
"query" : {
"range" : {
"post_date" : {
"from" : "2009-11-15T13:00:00",
"to" : "2009-11-15T14:30:00"
}
}
}
}'
Exampel ElasticSearch DataBases
1 twitter Index 👉 DataBase
2 tweet Type 👉 Table
3 user,post_date,message - Field

对应的结构Mapping: http://localhost:9200/_all?pretty

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
"twitter" : {							    1
"aliases" : { },
"mappings" : {
"user" : { 2
"properties" : {
"name" : { 3️
"type" : "string"
}
}
},
"tweet" : { 2
"properties" : {
"message" : { 3
"type" : "string"
},
"post_date" : { 3
"type" : "date",
"format" : "dateOptionalTime"
},
"user" : { 3
"type" : "string"
}
}
}
},
"settings" : {
"index" : {
"creation_date" : "1435222918809",
"number_of_shards" : "5",
"number_of_replicas" : "1",
"version" : {
"created" : "1060099"
},
"uuid" : "jk1_iEyvTaibjDyLymZO5g"
}
},
"warmers" : { }
},

日志信息

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
➜  elasticsearch-1.6.0  bin/elasticsearch
[2015-06-25 17:01:05,297][INFO ][node ] [Y'Garon] version[1.6.0], pid[5619], build[cdd3ac4/2015-06-09T13:36:34Z]
[2015-06-25 17:01:05,298][INFO ][node ] [Y'Garon] initializing ...
[2015-06-25 17:01:05,313][INFO ][plugins ] [Y'Garon] loaded [], sites []
[2015-06-25 17:01:05,517][INFO ][env ] [Y'Garon] using [1] data paths, mounts [[/ (/dev/disk1)]], net usable_space [61.3gb], net total_space [111.8gb], types [hfs]
[2015-06-25 17:01:13,075][INFO ][node ] [Y'Garon] initialized
[2015-06-25 17:01:13,078][INFO ][node ] [Y'Garon] starting ...
[2015-06-25 17:01:13,457][INFO ][transport ] [Y'Garon] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/192.168.7.130:9300]}
[2015-06-25 17:01:13,532][INFO ][discovery ] [Y'Garon] elasticsearch/5K_ytFi9QfGnt0aPACBo-A
[2015-06-25 17:01:17,368][INFO ][cluster.service ] [Y'Garon] new_master [Y'Garon][5K_ytFi9QfGnt0aPACBo-A][zqhmac][inet[/192.168.7.130:9300]], reason: zen-disco-join (elected_as_master)
[2015-06-25 17:01:17,399][INFO ][http ] [Y'Garon] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/192.168.7.130:9200]}
[2015-06-25 17:01:17,399][INFO ][node ] [Y'Garon] started
[2015-06-25 17:01:17,419][INFO ][gateway ] [Y'Garon] recovered [0] indices into cluster_state
[2015-06-25 17:01:59,932][INFO ][cluster.metadata ] [Y'Garon] [twitter] creating index, cause [auto(index api)], templates [], shards [5]/[1], mappings [tweet]
[2015-06-25 17:02:00,497][INFO ][cluster.metadata ] [Y'Garon] [twitter] update_mapping [tweet] (dynamic)
[2015-06-25 17:05:01,428][INFO ][cluster.metadata ] [Y'Garon] [spark] creating index, cause [api], templates [], shards [5]/[1], mappings []
[2015-06-25 17:05:02,010][INFO ][cluster.metadata ] [Y'Garon] [spark] update_mapping [docs] (dynamic)
[2015-06-25 17:05:03,284][INFO ][cluster.metadata ] [Y'Garon] [spark] update_mapping [docs] (dynamic)
[2015-06-25 17:05:03,402][INFO ][cluster.metadata ] [Y'Garon] [spark] update_mapping [json-trips] (dynamic)
[2015-06-25 17:17:11,309][INFO ][cluster.metadata ] [Y'Garon] [twitter] update_mapping [user] (dynamic)
[2015-06-25 17:25:32,611][INFO ][cluster.metadata ] [Y'Garon] [my-collection] creating index, cause [auto(bulk api)], templates [], shards [5]/[1], mappings [game, music, book]
[2015-06-25 17:25:32,800][INFO ][cluster.metadata ] [Y'Garon] [my-collection] update_mapping [music] (dynamic)
[2015-06-25 17:25:32,811][INFO ][cluster.metadata ] [Y'Garon] [my-collection] update_mapping [book] (dynamic)
[2015-06-25 17:25:32,819][INFO ][cluster.metadata ] [Y'Garon] [my-collection] update_mapping [game] (dynamic)
[2015-06-25 18:04:25,860][INFO ][cluster.metadata ] [Y'Garon] [airports] creating index, cause [api], templates [], shards [5]/[1], mappings []
[2015-06-25 18:04:26,148][INFO ][cluster.metadata ] [Y'Garon] [airports] update_mapping [2015] (dynamic)

基本查询API

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
👉 1.查询所有数据库所有的记录数
➜ ~ curl http://localhost:9200/_count
{"count":9,"_shards":{"total":10,"successful":10,"failed":0}}%

➜ ~ curl http://localhost:9200/_cat/count
1435224149 17:22:29 9

👉 2.查询根路径下都有哪些REST服务
➜ ~ curl http://localhost:9200/_cat
=^.^=
/_cat/allocation
/_cat/shards
/_cat/shards/{index}
/_cat/master
/_cat/nodes
/_cat/indices
/_cat/indices/{index}
/_cat/segments
/_cat/segments/{index}
/_cat/count
/_cat/count/{index}
/_cat/recovery
/_cat/recovery/{index}
/_cat/health
/_cat/pending_tasks
/_cat/aliases
/_cat/aliases/{alias}
/_cat/thread_pool
/_cat/plugins
/_cat/fielddata
/_cat/fielddata/{fields}

👉 3.ES所有信息
➜ ~ curl http://localhost:9200/_all

👉 4.查询都有哪些数据库. 其中index对应了DB
➜ ~ curl http://localhost:9200/_cat/indices
yellow open spark 5 1 6 0 16kb 16kb
yellow open twitter 5 1 3 0 8.7kb 8.7kb

👉 5.查询数据库下有哪些表
➜ ~ curl http://localhost:9200/spark

👉 6.查询数据库的记录数
➜ ~ curl http://localhost:9200/spark/_count
{"count":6,"_shards":{"total":5,"successful":5,"failed":0}}

👉 7.查询数据库下某张表的记录数, 表的信息从5.中得到
➜ ~ curl http://localhost:9200/spark/people/_count

👉 8.查询数据库某张表的所有记录
➜ ~ curl http://localhost:9200/spark/people/_search?q=*&pretty
➜ ~ curl http://localhost:9200/spark/people/_search?petty
{
"took" : 9,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 1.0,
"hits" : [ {
"_index" : "spark",
"_type" : "people",
"_id" : "AU4tjNnbu9a9LJalW5X9",
"_score" : 1.0,
"_source":{"name":"Andy","surname":"Feng","age":30}
}, {
"_index" : "spark",
"_type" : "people",
"_id" : "AU4tjNnbu9a9LJalW5X8",
"_score" : 1.0,
"_source":{"name":"Michael","surname":"Jackson","age":29}
}, {
"_index" : "spark",
"_type" : "people",
"_id" : "AU4tjNnbu9a9LJalW5X-",
"_score" : 1.0,
"_source":{"name":"Justin","surname":"Beeper","age":19}
} ]
}
}

👉 查询某张表所有字段的某个值
➜ ~ curl http://localhost:9200/spark/people/_search?q=*s*
{
"took" : 12,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 1.0,
"hits" : [ {
"_index" : "spark",
"_type" : "people",
"_id" : "AU4tjNnbu9a9LJalW5X8",
"_score" : 1.0,
"_source":{"name":"Michael","surname":"Jackson","age":29}
}, {
"_index" : "spark",
"_type" : "people",
"_id" : "AU4tjNnbu9a9LJalW5X-",
"_score" : 1.0,
"_source":{"name":"Justin","surname":"Beeper","age":19}
} ]
}
}

👉 查询某张表某个字段的值
➜ ~ curl http://localhost:9200/spark/people/_search?q=name:*ch*&pretty
{
"took" : 14,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "spark",
"_type" : "people",
"_id" : "AU4tjNnbu9a9LJalW5X8",
"_score" : 1.0,
"_source":{"name":"Michael","surname":"Jackson","age":29}
} ]
}
}
URL Tips
http://localhost:9200 验证安装成功
http://localhost:9200/_count ES所有记录数
http://localhost:9200/_cat REST服务
http://localhost:9200/_cat/indices ES所有数据库概览
http://localhost:9200/spark 指定数据库spark
http://localhost:9200/spark/_count 指定数据库spark的记录数
http://localhost:9200/spark/people/_count 指定数据库spark指定表people的记录数
http://localhost:9200/spark/people/_search 查询指定数据库指定表的所有记录
http://localhost:9200/spark/people/_search/q=* 查询指定数据库指定表的所有记录
http://localhost:9200/spark/people/_search/q=name:*a* 查询某张表name字段匹配a的记录

spark-es-hdfs

http://chenlinux.com/2014/09/04/spark-to-elasticsearch/

1. How to Get the Schema when only get the ES IP

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
查询都有哪些数据库
➜ ~ curl http://localhost:9200/_cat/indices
green forseti-20140526 2 1 30056 6 125.6mb 62.7mb
...

查询总的记录数
➜ ~ curl http://localhost:9200/_count
{"count":631463275,"_shards":{"total":201,"successful":201,"failed":0}}%

选出其中一个数据库比如forseti-20140417
➜ ~ curl http://localhost:9200/forseti-20140407

获取这个数据库表activity的记录数,表名可以从上一步得到
➜ ~ curl http://localhost:9200/forseti-20140407/activity/_count
{"count":118,"_shards":{"total":2,"successful":2,"failed":0}}%

查询这个数据库的全部数据, 只关心hits.hits.source, 就是文档的表结构
➜ ~ curl http://localhost:9200/forseti-20140407/activity/_search\?pretty
"_source": {
"geo": {
},
"device": {
}
}

Python ES

1.Build an Elasticsearch Index with Python:
http://blog.qbox.io/building-an-elasticsearch-index-with-python

2.Elasticsearch in Apache Spark with Python:
http://blog.qbox.io/elasticsearch-in-apache-spark-python

3.Deploying Elasticsearch and Apache Spark to the Cloud:
http://blog.qbox.io/deploy-elasticsearch-and-apache-spark-to-the-cloud

4.Sparse Matrix Multiplication with Elasticsearch and Apache Spark:
http://blog.qbox.io/sparse-matrix-multiplication-elasticsearch-apache-spark

5.Rectangular Matrix Multiplication with Elasticsearch and Apache Spark:
http://blog.qbox.io/rectangular-matrix-multiplication-elasticsearch-apache-spark

6.Running Asynchronous Apache Spark Jobs from a Web App with Flask, Celery, & Elasticsearch:
http://blog.qbox.io/asynchronous-apache-spark-flask-celery-elasticsearch

导入JSON数据:

1
2
3
4
➜  ~  curl -XPOST 'http://192.168.6.140:9200/forseti-201501/activity/' -d @/Users/zhengqh/data/koudai_201501.json
{"_index":"forseti-201501","_type":"activity","_id":"AU4","_version":1,"created":true}%
➜ ~ curl -XDELETE 'http://192.168.6.140:9200/forseti-201501/activity/'
{"acknowledged":true}%

2. Spark-ES Demo

spark-shell方式:

1
2
3
4
➜  spark-1.4.0-bin-hadoop2.6  bin/spark-shell --master local[2] \
--jars /Users/zhengqh/Downloads/bigdata/elasticsearch-hadoop-2.1.0.rc1.jar

15/06/29 11:17:16 INFO spark.SparkContext: Added JAR file:/Users/zhengqh/Downloads/bigdata/elasticsearch-hadoop-2.1.0.rc1.jar at http://192.168.6.140:52359/jars/elasticsearch-hadoop-2.1.0.rc1.jar with timestamp 1435547836823

文章目录
  1. 1. You Know, For Search
    1. 1.1. 基本查询API
  2. 2. spark-es-hdfs
    1. 2.1. 1. How to Get the Schema when only get the ES IP
    2. 2.2. Python ES
    3. 2.3. 2. Spark-ES Demo