SnappyData

SnappyData®

SnappyData

开发模式

由于下载的snappydata已经带了spark,所以不需要使用–packges

1
2
3
4
5
6
7
$ cd snappydata-0.9-bin
$ bin/spark-shell --driver-memory=4g \
--conf spark.snappydata.store.sys-disk-dir=quickstartdatadir \
--conf spark.snappydata.store.log-file=quickstartdatadir/quickstart.log \
--driver-java-options="-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSClassUnloadingEnabled -XX:MaxNewSize=1g"
Spark context Web UI available at http://192.168.6.52:4042
>

执行CRUD操作:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
val snappy = new org.apache.spark.sql.SnappySession(spark.sparkContext)
import snappy.implicits._
import org.apache.spark.sql.types._
import org.apache.spark.sql.Row

val ds = Seq((1,"a"), (2, "b"), (3, "c")).toDS()
val tableSchema = StructType(Array(StructField("CustKey", IntegerType, false),StructField("CustName", StringType, false)))

snappy.createTable(tableName = "colTable", provider = "column", schema = tableSchema, options = Map.empty[String, String], allowExisting = false)
snappy.createTable(tableName = "rowTable", provider = "row", schema = tableSchema, options = Map.empty[String, String], allowExisting = false)

ds.write.insertInto("colTable")
ds.write.insertInto("rowTable")

snappy.insert("colTable", Row(10, "f"))
snappy.insert("rowTable", Row(4, "d"))

snappy.table("colTable").count
snappy.table("colTable").orderBy("CustKey").show
snappy.table("rowTable").count
snappy.table("rowTable").orderBy("CUSTKEY").show

// update and delete on row table. current version did't support update and delete on column table.

// update rowTable set custname='d' where custkey=1
snappy.update(tableName = "rowTable", filterExpr = "CUSTKEY=1", newColumnValues = Row("d"), updateColumns = "CUSTNAME")
snappy.table("rowTable").orderBy("CUSTKEY").show
// delete rowTable where custkey=1
snappy.delete(tableName = "rowTable", filterExpr = "CUSTKEY=1")

打开http://192.168.6.52:4042/dashboard/,查看web-ui的dashboard页面

snappy

查看quickstartdir,索引采用GF(GemFire)

1
2
3
4
5
6
7
8
9
10
11
12
13
$ tree quickstartdatadir/
quickstartdatadir/
├── BACKUPGFXD-DEFAULT-DISKSTORE_1.crf
├── BACKUPGFXD-DEFAULT-DISKSTORE_1.drf
├── BACKUPGFXD-DEFAULT-DISKSTORE.if
├── datadictionary
│   ├── BACKUPGFXD-DD-DISKSTORE_1.crf
│   ├── BACKUPGFXD-DD-DISKSTORE_1.drf
│   ├── BACKUPGFXD-DD-DISKSTORE.if
│   └── DRLK_IFGFXD-DD-DISKSTORE.lk
├── DRLK_IFGFXD-DEFAULT-DISKSTORE.lk
├── gemfirexdtemp_1015622261.d
└── quickstart.log

简单的性能测试:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
def benchmark(name: String, times: Int = 10, warmups: Int = 6)(f: => Unit) {
for (i <- 1 to warmups) {
f
}
val startTime = System.nanoTime
for (i <- 1 to times) {
f
}
val endTime = System.nanoTime
println(s"Average time taken in $name for $times runs: " +
(endTime - startTime).toDouble / (times * 1000000.0) + " millis")
}

val snappy = new org.apache.spark.sql.SnappySession(spark.sparkContext)
val testDF = snappy.range(100000000).selectExpr("id", "concat('sym', cast((id % 100) as varchar(10))) as sym")
snappy.sql("drop table if exists snappyTable")
snappy.sql("create table snappyTable (id bigint not null, sym varchar(10) not null) using column")
benchmark("Snappy insert perf", 1, 0) {testDF.write.insertInto("snappyTable") }
benchmark("Snappy perf") {snappy.sql("select sym, avg(id) from snappyTable group by sym").collect()}

单机模式

左图为本地模式,右图为伪分布式模式:分别启动locator(左下)、server(DataServer,右上)、
leader(左上),quickstartdir为右下(share-nothing store).

snappy
snappy

伪分布式模式的三个组件都在本机启动,使用不同的文件夹。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
$ cd snappydata-0.9-bin
$ mkdir -p node-a/locator1 node-b/server1 node-c/lead1

$ bin/snappy locator start -dir=node-a/locator1

Starting SnappyData Locator using peer discovery on: 0.0.0.0[10334]
Starting Thrift server for SnappyData at address localhost/127.0.0.1[1527]
Logs generated in /home/qihuang.zheng/snappydata-0.9-bin/node-a/locator1/snappylocator.log
SnappyData Locator pid: 27651 status: running

$ bin/snappy server start -dir=node-b/server1 -locators=dp0652:10334

Starting SnappyData Server using locators for peer discovery: dp0652:10334
Starting Thrift server for SnappyData at address localhost/127.0.0.1[1528]
Logs generated in /home/qihuang.zheng/snappydata-0.9-bin/node-b/server1/snappyserver.log
SnappyData Server pid: 29595 status: running
Distributed system now has 2 members.
Other members: dp0652(27651:locator)<v0>:32709

$ bin/snappy leader start -dir=node-c/lead1 -locators=dp0652:10334

Starting SnappyData Leader using locators for peer discovery: localhost:10334
Logs generated in /home/qihuang.zheng/snappydata-0.9-bin/node-c/lead1/snappyleader.log
SnappyData Leader pid: 29860 status: running
Distributed system now has 3 members.
Other members: dp0652(27651:locator)<v0>:32709, dp0652(29595:datastore)<v7>:9553

如果要修改地址,可以用xx=xx的方式,
比如(修改locator的地址)[https://snappydatainc.github.io/snappydata/reference/configuration_parameters/start-locator/]

1
bin/snappy locator start -dir=node-a/locator1 -start-locator=192.168.6.52[1529]

关闭各个组件:

1
2
3
bin/snappy locator stop -dir=node-a/locator1
bin/snappy server stop -dir=node-b/server1
bin/snappy leader stop -dir=node-c/lead1

执行spark-shell,并指定snappydata的连接地址为localhost:1527.

1
2
3
4
5
bin/spark-shell --driver-memory=4g \
--conf spark.snappydata.connection=localhost:1527 \
--conf spark.snappydata.store.sys-disk-dir=quickstartdatadir2 \
--conf spark.snappydata.store.log-file=quickstartdatadir2/quickstart.log \
--driver-java-options="-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSClassUnloadingEnabled -XX:MaxNewSize=1g"

如果打开http://192.168.6.52:4042,有spark app的页面,但是没有dashboard的页面。
打开http://192.168.6.52:5050/dashboard/,可以查看snappydata的web ui。

5050类似于spark standalone的8082 web-ui,4040类似于spark app的ui。

snappy2

一键启动三个组件

上面三个启动脚本可以用一个脚本执行,这种情况默认的文件夹在work下。

1
2
3
sbin/snappy-start-all.sh
sbin/snappy-stop-all.sh
sbin/snappy-status-all.sh

snappy-start-all.sh会在本地启动一个locator,一个server,一个leader.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
$ sbin/snappy-start-all.sh
Starting SnappyData Locator using peer discovery on: localhost[10334], other locators: localhost[10334]
Starting Thrift server for SnappyData at address localhost/127.0.0.1[1527]
Logs generated in /home/qihuang.zheng/snappydata-0.9-bin/work/localhost-locator-1/snappylocator.log
SnappyData Locator pid: 7949 status: running

Starting SnappyData Server using locators for peer discovery: localhost[10334]
Starting Thrift server for SnappyData at address localhost/127.0.0.1[1528]
Logs generated in /home/qihuang.zheng/snappydata-0.9-bin/work/localhost-server-1/snappyserver.log
SnappyData Server pid: 8176 status: running
Distributed system now has 2 members.
Other members: localhost(7949:locator)<v0>:37846

Starting SnappyData Leader using locators for peer discovery: localhost[10334]
Logs generated in /home/qihuang.zheng/snappydata-0.9-bin/work/localhost-lead-1/snappyleader.log
SnappyData Leader pid: 8488 status: running
Distributed system now has 3 members.
Other members: localhost(7949:locator)<v0>:37846, dp0652(8176:datastore)<v1>:24462

查看默认work下的目录

  • lead:类似于Spark的Driver,文件夹是spark-jobserver,放了作业和jar包
  • locator:
  • server:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
$ tree work/
work/
├── localhost-lead-1
│   ├── BACKUPGFXD-DEFAULT-DISKSTORE.if
│   ├── DRLK_IFGFXD-DEFAULT-DISKSTORE.lk
│   ├── snappyleader.gfs
│   ├── snappyleader.log
│   ├── snappyleader.pid
│   ├── spark-jobserver
│   │   ├── filedao
│   │   │   └── data
│   │   │   ├── configs.data
│   │   │   ├── jars.data
│   │   │   └── jobs.data
│   │   └── upload
│   │   └── files.data
│   └── start_snappyleader.log
├── localhost-locator-1
│   ├── BACKUPGFXD-DEFAULT-DISKSTORE_1.crf
│   ├── BACKUPGFXD-DEFAULT-DISKSTORE_1.drf
│   ├── BACKUPGFXD-DEFAULT-DISKSTORE.if
│   ├── datadictionary
│   │   ├── BACKUPGFXD-DD-DISKSTORE_1.crf
│   │   ├── BACKUPGFXD-DD-DISKSTORE_1.drf
│   │   ├── BACKUPGFXD-DD-DISKSTORE.if
│   │   └── DRLK_IFGFXD-DD-DISKSTORE.lk
│   ├── DRLK_IFGFXD-DEFAULT-DISKSTORE.lk
│   ├── locator10334state.dat
│   ├── locator10334views.log
│   ├── snappylocator.gfs
│   ├── snappylocator.log
│   ├── snappylocator.pid
│   └── start_snappylocator.log
├── localhost-server-1
│   ├── BACKUPGFXD-DEFAULT-DISKSTORE_1.crf
│   ├── BACKUPGFXD-DEFAULT-DISKSTORE_1.drf
│   ├── BACKUPGFXD-DEFAULT-DISKSTORE.if
│   ├── datadictionary
│   │   ├── BACKUPGFXD-DD-DISKSTORE_1.crf
│   │   ├── BACKUPGFXD-DD-DISKSTORE_1.drf
│   │   ├── BACKUPGFXD-DD-DISKSTORE.if
│   │   └── DRLK_IFGFXD-DD-DISKSTORE.lk
│   ├── DRLK_IFGFXD-DEFAULT-DISKSTORE.lk
│   ├── snappyserver.gfs
│   ├── snappyserver.log
│   ├── snappyserver.pid
│   └── start_snappyserver.log
└── members.txt

client

先停止snappydata,然后修改远程机器conf下的servers, locators, leads.
将localhost改为主机地址:192.168.6.52,再重启snappydata。

注意:默认启动时,使用的是localhost,work下的文件夹页是localhost开头。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
[qihuang.zheng@dp0652 snappydata-0.9-bin]$ sbin/snappy-start-all.sh
192.168.6.52: Starting SnappyData Locator using peer discovery on: 192.168.6.52[10334], other locators: 192.168.6.52:10334
192.168.6.52: Starting Thrift server for SnappyData at address /192.168.6.52[1527]
192.168.6.52: Logs generated in /home/qihuang.zheng/snappydata-0.9-bin/work/192.168.6.52-locator-1/snappylocator.log
192.168.6.52: SnappyData Locator pid: 45151 status: running
192.168.6.52: Starting SnappyData Server using locators for peer discovery: 192.168.6.52:10334
192.168.6.52: Starting Thrift server for SnappyData at address /192.168.6.52[1528]
192.168.6.52: Logs generated in /home/qihuang.zheng/snappydata-0.9-bin/work/192.168.6.52-server-1/snappyserver.log
192.168.6.52: SnappyData Server pid: 45860 status: running
192.168.6.52: Distributed system now has 2 members.
192.168.6.52: Other members: dp0652(45151:locator)<v0>:48205
192.168.6.52: Starting SnappyData Leader using locators for peer discovery: 192.168.6.52:10334
192.168.6.52: Logs generated in /home/qihuang.zheng/snappydata-0.9-bin/work/192.168.6.52-lead-1/snappyleader.log
192.168.6.52: SnappyData Leader pid: 46726 status: running
192.168.6.52: Distributed system now has 3 members.
192.168.6.52: Other members: dp0652(45860:datastore)<v1>:8287, dp0652(45151:locator)<v0>:48205

查看进程

1
2
3
45860 io.snappydata.tools.ServerLauncher server -critical-heap-percentage=90 -eviction-heap-percentage=81 locators=192.168.6.52:10334 log-file=snappyserver.log -client-bind-address=192.168.6.52
46726 io.snappydata.tools.LeaderLauncher server locators=192.168.6.52:10334 log-file=snappyleader.log -run-netserver=false
45151 io.snappydata.tools.LocatorLauncher server locators=192.168.6.52:10334 start-locator=192.168.6.52:10334 log-file=snappylocator.log -client-bind-address=192.168.6.52 -peer-discovery-address=192.168.6.52 jmx-manager=true

本机下载snappydata的二进制包,并启动snappy脚本,通过thrift/jdbc连接远程的snappydata cluster

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
➜  snappydata-0.9-bin bin/snappy
SnappyData 版本 0.9
snappy> connect client '192.168.6.52:1527';
九月 14, 2017 3:43:43 下午 java.util.logging.LogManager$RootLogger log
信息: Starting client on '10.57.4.219' with ID='7059|2017/09/14 15:43:43.185 CST'
Using CONNECTION0
snappy> show connections ;
CONNECTION0* - jdbc:snappydata:thrift://192.168.6.52[1527]
* = 当前连接
snappy> show tables;
TABLE_SCHEM |TABLE_NAME |TABLE_TYPE |REMARKS
--------------------------------------------------------------------------------------
SYS |ASYNCEVENTLISTENERS |SYSTEM TABLE|
SYS |GATEWAYRECEIVERS |SYSTEM TABLE|
SYS |GATEWAYSENDERS |SYSTEM TABLE|

文章目录
  1. 1. SnappyData
    1. 1.1. 开发模式
    2. 1.2. 单机模式
    3. 1.3. 一键启动三个组件
  2. 2. client