2015-08-01

ElasticSearch入门

You Know, For Search

启动ES:

1	bin/elasticsearch

Hello CRUD:

//索引. 数据库为twitter,表名为user,主键为kimchy,字段name值为Shay Banon
$ curl -XPUT http://localhost:9200/twitter/user/kimchy -d '{
    "name" : "Shay Banon"
}'

//索引，多个field. 主键为1,tweet表有三个字段:user,post_date,message
$ curl -XPUT http://localhost:9200/twitter/tweet/1 -d '{
    "user": "kimchy",
    "post_date": "2009-11-15T13:12:00",
    "message": "Trying out elasticsearch, so far so good?"
}'

//索引，注意url里面的id是不一样的哦. 主键为2,另一条记录
$ curl -XPUT http://localhost:9200/twitter/tweet/2 -d '{
    "user": "kimchy",
    "post_date": "2009-11-15T14:12:12",
    "message": "You know, for Search"
}'

//索引, 同一个id再次put相当于更新,version+1
$ curl -XPUT http://localhost:9200/twitter/tweet/2 -d '{
    "user": "kimchy",
    "post_date": "2009-11-15T14:12:12",
    "message": "You know, for Search"
}'

//获取: 获取twitter数据库中tweet表主键为2的记录
$ curl -XGET http://localhost:9200/twitter/tweet/2

//lucene语法方式的查询, 查询tweet表中user字段=kimchy的记录
$ curl -XGET http://localhost:9200/twitter/tweet/_search?q=user:kimchy

//query DSL方式查询, 同上
$ curl -XGET http://localhost:9200/twitter/tweet/_search -d '{
    "query" : {
        "term" : { "user": "kimchy" }
    }
}'

//query DSL方式查询, 查询post_date在指定范围内的记录集
$ curl -XGET http://localhost:9200/twitter/_search?pretty=true -d '{
    "query" : {
        "range" : {
            "post_date" : {
                "from" : "2009-11-15T13:00:00",
                "to" : "2009-11-15T14:30:00"
            }
        }
    }
}'

Exampel	ElasticSearch	DataBases
1 twitter	Index 👉	DataBase
2 tweet	Type 👉	Table
3 user,post_date,message	-	Field

对应的结构Mapping: http://localhost:9200/_all?pretty

"twitter" : {							    1
    "aliases" : { },
    "mappings" : {
      "user" : {						  2
        "properties" : {
          "name" : {					3️
            "type" : "string"
          }
        }
      },
      "tweet" : {					    2
        "properties" : {
          "message" : {				3
            "type" : "string"
          },
          "post_date" : {			3
            "type" : "date",
            "format" : "dateOptionalTime"
          },
          "user" : {					3
            "type" : "string"
          }
        }
      }
    },
    "settings" : {
      "index" : {
        "creation_date" : "1435222918809",
        "number_of_shards" : "5",
        "number_of_replicas" : "1",
        "version" : {
          "created" : "1060099"
        },
        "uuid" : "jk1_iEyvTaibjDyLymZO5g"
      }
    },
    "warmers" : { }
  },

日志信息

➜  elasticsearch-1.6.0  bin/elasticsearch
[2015-06-25 17:01:05,297][INFO ][node                     ] [Y'Garon] version[1.6.0], pid[5619], build[cdd3ac4/2015-06-09T13:36:34Z]
[2015-06-25 17:01:05,298][INFO ][node                     ] [Y'Garon] initializing ...
[2015-06-25 17:01:05,313][INFO ][plugins                  ] [Y'Garon] loaded [], sites []
[2015-06-25 17:01:05,517][INFO ][env                      ] [Y'Garon] using [1] data paths, mounts [[/ (/dev/disk1)]], net usable_space [61.3gb], net total_space [111.8gb], types [hfs]
[2015-06-25 17:01:13,075][INFO ][node                     ] [Y'Garon] initialized
[2015-06-25 17:01:13,078][INFO ][node                     ] [Y'Garon] starting ...
[2015-06-25 17:01:13,457][INFO ][transport                ] [Y'Garon] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/192.168.7.130:9300]}
[2015-06-25 17:01:13,532][INFO ][discovery                ] [Y'Garon] elasticsearch/5K_ytFi9QfGnt0aPACBo-A
[2015-06-25 17:01:17,368][INFO ][cluster.service          ] [Y'Garon] new_master [Y'Garon][5K_ytFi9QfGnt0aPACBo-A][zqhmac][inet[/192.168.7.130:9300]], reason: zen-disco-join (elected_as_master)
[2015-06-25 17:01:17,399][INFO ][http                     ] [Y'Garon] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/192.168.7.130:9200]}
[2015-06-25 17:01:17,399][INFO ][node                     ] [Y'Garon] started
[2015-06-25 17:01:17,419][INFO ][gateway                  ] [Y'Garon] recovered [0] indices into cluster_state
[2015-06-25 17:01:59,932][INFO ][cluster.metadata         ] [Y'Garon] [twitter] creating index, cause [auto(index api)], templates [], shards [5]/[1], mappings [tweet]
[2015-06-25 17:02:00,497][INFO ][cluster.metadata         ] [Y'Garon] [twitter] update_mapping [tweet] (dynamic)
[2015-06-25 17:05:01,428][INFO ][cluster.metadata         ] [Y'Garon] [spark] creating index, cause [api], templates [], shards [5]/[1], mappings []
[2015-06-25 17:05:02,010][INFO ][cluster.metadata         ] [Y'Garon] [spark] update_mapping [docs] (dynamic)
[2015-06-25 17:05:03,284][INFO ][cluster.metadata         ] [Y'Garon] [spark] update_mapping [docs] (dynamic)
[2015-06-25 17:05:03,402][INFO ][cluster.metadata         ] [Y'Garon] [spark] update_mapping [json-trips] (dynamic)
[2015-06-25 17:17:11,309][INFO ][cluster.metadata         ] [Y'Garon] [twitter] update_mapping [user] (dynamic)
[2015-06-25 17:25:32,611][INFO ][cluster.metadata         ] [Y'Garon] [my-collection] creating index, cause [auto(bulk api)], templates [], shards [5]/[1], mappings [game, music, book]
[2015-06-25 17:25:32,800][INFO ][cluster.metadata         ] [Y'Garon] [my-collection] update_mapping [music] (dynamic)
[2015-06-25 17:25:32,811][INFO ][cluster.metadata         ] [Y'Garon] [my-collection] update_mapping [book] (dynamic)
[2015-06-25 17:25:32,819][INFO ][cluster.metadata         ] [Y'Garon] [my-collection] update_mapping [game] (dynamic)
[2015-06-25 18:04:25,860][INFO ][cluster.metadata         ] [Y'Garon] [airports] creating index, cause [api], templates [], shards [5]/[1], mappings []
[2015-06-25 18:04:26,148][INFO ][cluster.metadata         ] [Y'Garon] [airports] update_mapping [2015] (dynamic)

基本查询API

👉 1.查询所有数据库所有的记录数
➜  ~  curl http://localhost:9200/_count
{"count":9,"_shards":{"total":10,"successful":10,"failed":0}}%

➜  ~  curl http://localhost:9200/_cat/count
1435224149 17:22:29 9

👉 2.查询根路径下都有哪些REST服务
➜  ~  curl http://localhost:9200/_cat
=^.^=
/_cat/allocation
/_cat/shards
/_cat/shards/{index}
/_cat/master
/_cat/nodes
/_cat/indices
/_cat/indices/{index}
/_cat/segments
/_cat/segments/{index}
/_cat/count
/_cat/count/{index}
/_cat/recovery
/_cat/recovery/{index}
/_cat/health
/_cat/pending_tasks
/_cat/aliases
/_cat/aliases/{alias}
/_cat/thread_pool
/_cat/plugins
/_cat/fielddata
/_cat/fielddata/{fields}

👉 3.ES所有信息
➜  ~  curl http://localhost:9200/_all

👉 4.查询都有哪些数据库. 其中index对应了DB
➜  ~  curl http://localhost:9200/_cat/indices
yellow open spark   5 1 6 0  16kb  16kb
yellow open twitter 5 1 3 0 8.7kb 8.7kb

👉 5.查询数据库下有哪些表
➜  ~  curl http://localhost:9200/spark

👉 6.查询数据库的记录数
➜  ~  curl http://localhost:9200/spark/_count
{"count":6,"_shards":{"total":5,"successful":5,"failed":0}}

👉 7.查询数据库下某张表的记录数, 表的信息从5.中得到
➜  ~  curl http://localhost:9200/spark/people/_count

👉 8.查询数据库某张表的所有记录
➜  ~  curl http://localhost:9200/spark/people/_search?q=*&pretty
➜  ~  curl http://localhost:9200/spark/people/_search?petty
{
  "took" : 9,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "spark",
      "_type" : "people",
      "_id" : "AU4tjNnbu9a9LJalW5X9",
      "_score" : 1.0,
      "_source":{"name":"Andy","surname":"Feng","age":30}
    }, {
      "_index" : "spark",
      "_type" : "people",
      "_id" : "AU4tjNnbu9a9LJalW5X8",
      "_score" : 1.0,
      "_source":{"name":"Michael","surname":"Jackson","age":29}
    }, {
      "_index" : "spark",
      "_type" : "people",
      "_id" : "AU4tjNnbu9a9LJalW5X-",
      "_score" : 1.0,
      "_source":{"name":"Justin","surname":"Beeper","age":19}
    } ]
  }
}

👉 查询某张表所有字段的某个值
➜  ~  curl http://localhost:9200/spark/people/_search?q=*s*
{
  "took" : 12,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "spark",
      "_type" : "people",
      "_id" : "AU4tjNnbu9a9LJalW5X8",
      "_score" : 1.0,
      "_source":{"name":"Michael","surname":"Jackson","age":29}
    }, {
      "_index" : "spark",
      "_type" : "people",
      "_id" : "AU4tjNnbu9a9LJalW5X-",
      "_score" : 1.0,
      "_source":{"name":"Justin","surname":"Beeper","age":19}
    } ]
  }
}

👉 查询某张表某个字段的值
➜  ~  curl http://localhost:9200/spark/people/_search?q=name:*ch*&pretty
{
  "took" : 14,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "spark",
      "_type" : "people",
      "_id" : "AU4tjNnbu9a9LJalW5X8",
      "_score" : 1.0,
      "_source":{"name":"Michael","surname":"Jackson","age":29}
    } ]
  }
}

URL	Tips
http://localhost:9200	验证安装成功
http://localhost:9200/_count	ES所有记录数
http://localhost:9200/_cat	REST服务
http://localhost:9200/_cat/indices	ES所有数据库概览
http://localhost:9200/spark	指定数据库spark
http://localhost:9200/spark/_count	指定数据库spark的记录数
http://localhost:9200/spark/people/_count	指定数据库spark指定表people的记录数
http://localhost:9200/spark/people/_search	查询指定数据库指定表的所有记录
http://localhost:9200/spark/people/_search/q=*	查询指定数据库指定表的所有记录
http://localhost:9200/spark/people/_search/q=name:a	查询某张表name字段匹配a的记录

spark-es-hdfs

http://chenlinux.com/2014/09/04/spark-to-elasticsearch/

1. How to Get the Schema when only get the ES IP

查询都有哪些数据库
➜  ~  curl http://localhost:9200/_cat/indices
green forseti-20140526 2 1    30056      6 125.6mb  62.7mb
...

查询总的记录数
➜  ~  curl http://localhost:9200/_count
{"count":631463275,"_shards":{"total":201,"successful":201,"failed":0}}%

选出其中一个数据库比如forseti-20140417
➜  ~  curl http://localhost:9200/forseti-20140407

获取这个数据库表activity的记录数,表名可以从上一步得到
➜  ~  curl http://localhost:9200/forseti-20140407/activity/_count
{"count":118,"_shards":{"total":2,"successful":2,"failed":0}}%

查询这个数据库的全部数据, 只关心hits.hits.source, 就是文档的表结构
➜  ~  curl http://localhost:9200/forseti-20140407/activity/_search\?pretty
"_source": {
  "geo": {
  },
  "device": {
  }
}

Python ES

1.Build an Elasticsearch Index with Python:
http://blog.qbox.io/building-an-elasticsearch-index-with-python

2.Elasticsearch in Apache Spark with Python:
http://blog.qbox.io/elasticsearch-in-apache-spark-python

3.Deploying Elasticsearch and Apache Spark to the Cloud:
http://blog.qbox.io/deploy-elasticsearch-and-apache-spark-to-the-cloud

4.Sparse Matrix Multiplication with Elasticsearch and Apache Spark:
http://blog.qbox.io/sparse-matrix-multiplication-elasticsearch-apache-spark

5.Rectangular Matrix Multiplication with Elasticsearch and Apache Spark:
http://blog.qbox.io/rectangular-matrix-multiplication-elasticsearch-apache-spark

6.Running Asynchronous Apache Spark Jobs from a Web App with Flask, Celery, & Elasticsearch:
http://blog.qbox.io/asynchronous-apache-spark-flask-celery-elasticsearch

导入JSON数据:

➜  ~  curl -XPOST 'http://192.168.6.140:9200/forseti-201501/activity/' -d @/Users/zhengqh/data/koudai_201501.json
{"_index":"forseti-201501","_type":"activity","_id":"AU4","_version":1,"created":true}%                                                                                             
➜  ~  curl -XDELETE 'http://192.168.6.140:9200/forseti-201501/activity/'
{"acknowledged":true}%

2. Spark-ES Demo

spark-shell方式:

➜  spark-1.4.0-bin-hadoop2.6  bin/spark-shell --master local[2] \
   --jars /Users/zhengqh/Downloads/bigdata/elasticsearch-hadoop-2.1.0.rc1.jar

15/06/29 11:17:16 INFO spark.SparkContext: Added JAR file:/Users/zhengqh/Downloads/bigdata/elasticsearch-hadoop-2.1.0.rc1.jar at http://192.168.6.140:52359/jars/elasticsearch-hadoop-2.1.0.rc1.jar with timestamp 1435547836823

本文标题:ElasticSearch入门

文章作者:任何忧伤,都抵不过世界的美丽

发布时间:2015年08月01日 - 00时00分

最后更新:2019年02月14日 - 21时42分

原始链接:http://github.com/zqhxuyuan/2015/08/01/2015-08-01-ElasticSearch/

许可协议: "署名-非商用-相同方式共享 3.0" 转载请保留原文链接及作者。