Cassandra操作

  • 优雅关闭节点
  • 不同集群数据同步
    • 增量备份
    • 准备工作:新集群搭建和建表
    • 开始:先测试一张表的迁移
    • 整个集群做一次完整的快照迁移
    • 增量备份(需要多次执行)
    • 机房迁移
      • 直接从当前集群同步到上海集群
      • 新集群安装
        1. 表结构
        1. 数据迁移
        1. 增量数据迁移
        1. 数据验证
  • nodetool工具
    • 集群状态: nodetool decribecluster
    • 删除节点: removenode
    • 替换节点: replace_address
    • 重新加入
    • 网络状态netstats: 显示一个节点的Active Stream
    • 节点信息: nodetool info
    • Gossip信息: nodetool gossipinfo
    • 表相关的信息:nodetool cfstats forseti.velocity
    • compactionhistory
    • 节点sstable数量异常
  • tpstats
  • sstable writer
    • 文件数过多
    • 导入数据后,用refresh
    • 通过增大内存来增大sstable大小(内存不足)
    • 分表?
  • sstableloader
  • jHiccup

优雅关闭节点

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
[admin@cass048169 ~]$ /usr/install/cassandra/bin/nodetool stopdaemon
Cassandra has shutdown.
error: 拒绝连接
-- StackTrace --
java.net.ConnectException: 拒绝连接
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at java.net.Socket.connect(Socket.java:528)
at java.net.Socket.<init>(Socket.java:425)
at java.net.Socket.<init>(Socket.java:208)
at sun.rmi.transport.proxy.RMIDirectSocketFactory.createSocket(RMIDirectSocketFactory.java:40)
at sun.rmi.transport.proxy.RMIMasterSocketFactory.createSocket(RMIMasterSocketFactory.java:147)
at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:613)
at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:216)
at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:202)
at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:129)
at com.sun.jmx.remote.internal.PRef.invoke(Unknown Source)
at javax.management.remote.rmi.RMIConnectionImpl_Stub.close(Unknown Source)
at javax.management.remote.rmi.RMIConnector.close(RMIConnector.java:512)
at javax.management.remote.rmi.RMIConnector.close(RMIConnector.java:452)
at org.apache.cassandra.tools.NodeProbe.close(NodeProbe.java:237)
at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:295)
at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:206)

[admin@cass048169 ~]$
[admin@cass048169 ~]$
[admin@cass048169 ~]$ ps -ef|grep cassandra
admin 40084 37927 0 09:46 pts/0 00:00:00 grep cassandra

不同集群数据同步

1.sstableloader的文件夹必须是keyspace/table格式,否则空指针异常:

1
2
3
4
5
6
7
8
9
10
[qihuang.zheng@dp0652 ~]$ /usr/install/cassandra/bin/sstableloader -d localhost 1445938244634
Exception in thread "main" java.lang.NullPointerException
at org.apache.cassandra.io.sstable.SSTableLoader.<init>(SSTableLoader.java:59)
at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:80)

$ cd /home/admin/data/cassandra/data/forseti/velocity/snapshots
$ /usr/install/cassandra/bin/sstableloader -d 192.168.6.52 1445938244634
Exception in thread "main" java.lang.NullPointerException
at org.apache.cassandra.io.sstable.SSTableLoader.<init>(SSTableLoader.java:59)
at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:80)

2.做快照后会在表目录生成snapshots文件夹, 不能直接在表目录上使用sstableloader:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
$ /usr/install/cassandra/bin/sstableloader -d 192.168.6.52 /home/admin/data/cassandra/data/forseti/velocity/snapshots/1445938244634
Could not retrieve endpoint ranges:
InvalidRequestException(why:No such keyspace: snapshots)
Run with --debug to get full stack trace or --help to get help.

[qihuang.zheng@mysql006070 ~]$ /usr/install/cassandra/bin/sstableloader -d 192.168.6.52 /home/admin/data/cassandra/data/forseti/velocity
Established connection to initial hosts
Opening sstables and calculating sections to stream
Exception in thread "main" FSWriteError in /home/admin/data/cassandra/data/forseti/velocity/forseti-velocity-jb-414-Summary.db
at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:122)
at org.apache.cassandra.io.sstable.SSTableReader.loadSummary(SSTableReader.java:546)
at org.apache.cassandra.io.sstable.SSTableReader.openForBatch(SSTableReader.java:173)
at org.apache.cassandra.io.sstable.SSTableLoader$1.accept(SSTableLoader.java:107)
at java.io.File.list(File.java:1155)
at org.apache.cassandra.io.sstable.SSTableLoader.openSSTables(SSTableLoader.java:68)
at org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:150)
at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:95)
Caused by: java.nio.file.AccessDeniedException: /home/admin/data/cassandra/data/forseti/velocity/forseti-velocity-jb-414-Summary.db
at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:118)
... 7 more
[qihuang.zheng@mysql006070 ~]$ sudo -u admin /usr/install/cassandra/bin/sstableloader -d 192.168.6.52 /home/admin/data/cassandra/data/forseti/velocity
Caused by: java.net.UnknownHostException: mysql006070: 未知的名称或服务
at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:901)
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1293)
at java.net.InetAddress.getLocalHost(InetAddress.java:1469)
... 11 more

3.使用sstableloader时, 在表目录下不能存在snapshots目录. http://tonywutao.github.io/2013/10/17/SSS-Table-Loader-in-Cassandra/
但是做快照时, snapshots目录又是生成在表目录下, 所以不能直接用sstableloader, 解决办法是将快照文件夹拷贝到别的地方,并按照keyspace/table格式.
如果运行sstableloader的机器的内存不足, 会报OOM, 可以加大内存, 但是如果机器本身内存不足, 最好的办法只能是scp到一台比较大内存的机器.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.TreeMap.put(TreeMap.java:569)
at java.util.TreeSet.add(TreeSet.java:255)
at org.apache.cassandra.io.compress.CompressionMetadata.getChunksForSections(CompressionMetadata.java:226)
at org.apache.cassandra.streaming.messages.OutgoingFileMessage.<init>(OutgoingFileMessage.java:76)
at org.apache.cassandra.streaming.StreamTransferTask.addTransferFile(StreamTransferTask.java:56)
at org.apache.cassandra.streaming.StreamSession.addTransferFiles(StreamSession.java:340)
at org.apache.cassandra.streaming.StreamPlan.transferFiles(StreamPlan.java:138)
at org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:178)
at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:95)
ERROR 13:32:10,104 Error in ThreadPoolExecutor
java.lang.OutOfMemoryError: Java heap space
at org.apache.cassandra.utils.BackgroundActivityMonitor.readAndCompute(BackgroundActivityMonitor.java:84)
at org.apache.cassandra.utils.BackgroundActivityMonitor.getIOWait(BackgroundActivityMonitor.java:125)
at org.apache.cassandra.utils.BackgroundActivityMonitor$BackgroundActivityReporter.run(BackgroundActivityMonitor.java:153)
at org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:80)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
at org.apache.cassandra.io.compress.CompressionMetadata.getChunksForSections(CompressionMetadata.java:226)
at org.apache.cassandra.streaming.messages.OutgoingFileMessage.<init>(OutgoingFileMessage.java:76)
at org.apache.cassandra.streaming.StreamTransferTask.addTransferFile(StreamTransferTask.java:56)
at org.apache.cassandra.streaming.StreamSession.addTransferFiles(StreamSession.java:340)
at org.apache.cassandra.streaming.StreamPlan.transferFiles(StreamPlan.java:138)
at org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:178)
at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:95)

http://stackoverflow.com/questions/32715401/cassandra-migrate-keyspace-data-from-multinode-cluster-to-singlenode-cluster

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
ERROR [StreamReceiveTask:124] 2016-06-10 18:09:31,717 StreamReceiveTask.java:183 - Error applying streamed data:
org.apache.cassandra.io.FSReadError: java.io.IOException: Map failed
at org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.createSegments(MmappedSegmentedFile.java:399) ~[apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.complete(MmappedSegmentedFile.java:365) ~[apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.io.util.SegmentedFile$Builder.complete(SegmentedFile.java:174) ~[apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.io.sstable.SSTableWriter.finish(SSTableWriter.java:463) ~[apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:447) ~[apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:442) ~[apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.streaming.StreamReceiveTask$OnCompletionRunnable.run(StreamReceiveTask.java:141) ~[apache-cassandra-2.1.13.jar:2.1.13]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_51]
at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_51]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_51]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_51]
at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]
Caused by: java.io.IOException: Map failed
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:888) ~[na:1.7.0_51]
at org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.createSegments(MmappedSegmentedFile.java:392) ~[apache-cassandra-2.1.13.jar:2.1.13]
... 11 common frames omitted
Caused by: java.lang.OutOfMemoryError: Map failed
at sun.nio.ch.FileChannelImpl.map0(Native Method) ~[na:1.7.0_51]
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:885) ~[na:1.7.0_51]
... 12 common frames omitted
ERROR [StreamReceiveTask:124] 2016-06-10 18:09:31,717 JVMStabilityInspector.java:117 - JVM state determined to be unstable. Exiting forcefully due to:
java.lang.OutOfMemoryError: Map failed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
ERROR [CompactionExecutor:10060] 2016-07-10 15:29:45,520 CassandraDaemon.java:229 - Exception in thread Thread[CompactionExecutor:10060,1,main]
java.lang.RuntimeException: Out of native memory occured, You can avoid it by increasing the system ram space or by increasing bloom_filter_fp_chance.
at org.apache.cassandra.utils.obs.OffHeapBitSet.<init>(OffHeapBitSet.java:48) ~[apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.utils.FilterFactory.createFilter(FilterFactory.java:84) ~[apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.utils.FilterFactory.getFilter(FilterFactory.java:78) ~[apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.io.sstable.SSTableWriter$IndexWriter.<init>(SSTableWriter.java:592) ~[apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.io.sstable.SSTableWriter.<init>(SSTableWriter.java:141) ~[apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.db.compaction.CompactionTask.createCompactionWriter(CompactionTask.java:308) ~[apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:190) ~[apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:73) ~[apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) ~[apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:263) ~[apache-cassandra-2.1.13.jar:2.1.13]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_51]
at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_51]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_51]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_51]
at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]

增量备份

先做snaptshot,然后修改配置,并重启集群

在新版的Cassandra中有nodetool enablebackup选项可以直接修改,而不用重启!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
host=`ifconfig | grep "192.168.4" | awk '/inet addr/{sub("addr:",""); print $2}'`
sed -i -e "s/incremental_backups: false/incremental_backups: true/g" /usr/install/cassandra/conf/cassandra.yaml
cat /usr/install/cassandra/conf/cassandra.yaml | grep incremental_backups
/usr/install/cassandra/bin/nodetool -h $host snapshot forseti_fp
/usr/install/cassandra/bin/nodetool flush
kill -9 `/usr/install/java/bin/jps | grep CassandraDaemon |awk '{print $1}'`
ps -ef | grep cassandra
sleep 10s
/usr/install/cassandra/bin/cassandra

ll /home/admin/cassandra/data/forseti_fp/android_device_session/snapshots
ll /home/admin/cassandra/data/forseti_fp/android_device_session/backups
du -sh /home/admin/cassandra/data/forseti_fp/android_device_session/snapshots
du -sh /home/admin/cassandra/data/forseti_fp/android_device_session/backups

After a system-wide snapshot is performed, you can enable incremental backups on each node to backup data that has changed
since the last snapshot: each time an SSTable is flushed, a hard link is copied into a /backups subdirectory of the data directory

是先做全局的快照(每个节点都做),然后再开启增量备份:修改配置,重启每个节点。
如果没有先做全局快照,而是每个节点做完快照后,立即修改配置,然后重启,接着处理下一个节点,有没有影响?
不过可以通过制定nodetool的-h选项一次性在一个节点上就做完全的快照,而不用登陆每个节点一个一个做快照!根本不用pssh!

快照后生成的文件在:data_directory_location/keyspace_name/table_name/snapshots/UUID/
增量备份后文件生成在:data_directory_location/keyspace_name/table_name/backups/

incremental backups combine with snapshots to provide a dependable, up-to-date backup mechanism.
增量备份依赖于快照中的内容,提供可靠的、最新的备份机制。

问题:如果把快照删除了,增量备份还可以正常工作吗?

准备工作:新集群搭建和建表

1
2
3
4
cluster_name: fp_back
data_file_directories: [/data01,/data02,/data03,/data04,/data05,/data06,/data07,/data08,/data09,/data10,/data11,/data12]
saved_caches_directory: /home/admin/cassandra/saved_caches
commitlog_directory: /home/admin/cassandra/commitlog
1
2
3
4
5
6
7
wget http://192.168.47.211:8000/apache-cassandra-2.1.13.tar.gz
tar zxf apache-cassandra-2.1.13.tar.gz
host=`ifconfig | grep "192.168.50" | awk '/inet addr/{sub("addr:",""); print $2}'`
sed -i -e "s#192.168.47.211#$host#g" apache-cassandra-2.1.13/conf/cassandra.yaml
sed -i -e "s#192.168.47.211#$host#g" apache-cassandra-2.1.13/conf/cassandra-env.sh
mkdir /home/admin/cassandra
apache-cassandra-2.1.13/bin/cassandra

将原集群的脚本导成cql文件,并连接目标集群,导入数据库,可以修改副本数。 不需要登录目标集群操作!

1
2
3
4
5
6
7
/usr/install/cassandra/bin/cqlsh 192.168.48.159 -e 'desc keyspace forseti_fp' > forseti_fp.cql
/usr/install/cassandra/bin/cqlsh 192.168.50.20 -f forseti_fp.cql
CREATE KEYSPACE forseti_fp WITH replication = {
'class': 'NetworkTopologyStrategy',
'DC2': '1',
'DC1': '1'
};

开始:先测试一张表的迁移

快照文件不能直接用sstableloader操作

1
2
3
4
5
6
7
8
9
10
11
12
13
$ ll /home/admin/cassandra/data/forseti_fp/android_device_session/snapshots
1464167616242
$ /usr/install/cassandra/bin/sstableloader -d 192.168.50.20 /home/admin/cassandra/data/forseti_fp/android_device_session/snapshots
Could not retrieve endpoint ranges:
InvalidRequestException(why:No such keyspace: android_device_session)
$ /usr/install/cassandra/bin/sstableloader -d 192.168.50.20 /home/admin/cassandra/data/forseti_fp/android_device_session/snapshots/1464167616242
Could not retrieve endpoint ranges:
InvalidRequestException(why:No such keyspace: snapshots)

snap=`ls /home/admin/cassandra/data/forseti_fp/android_device_session/snapshots`
mv /home/admin/cassandra/data/forseti_fp/android_device_session/snapshots/$snap /home/admin/cassandra/ && cd /home/admin/cassandra && mkdir forseti_fp
mv $snap android_device_session && mv android_device_session forseti_fp
nohup /usr/install/cassandra/bin/sstableloader -d 192.168.50.20,192.168.50.21,192.168.50.22,192.168.50.23,192.168.50.24 /home/admin/cassandra/forseti_fp/android_device_session &

默认sstableloader的内存只有512M,修改为3072M后,可以成功导入:

1
2
3
4
5
6
7
8
9
10
11
12
13
[admin@spark015010 ~]$ du -sh /data*/forseti_fp/android_device_session-*
4.0K /data01/forseti_fp/android_device_session-980e940023af11e6a5ea87f007ccb41c
4.0K /data02/forseti_fp/android_device_session-980e940023af11e6a5ea87f007ccb41c
4.0K /data03/forseti_fp/android_device_session-980e940023af11e6a5ea87f007ccb41c
4.0K /data04/forseti_fp/android_device_session-980e940023af11e6a5ea87f007ccb41c
4.0K /data05/forseti_fp/android_device_session-980e940023af11e6a5ea87f007ccb41c
4.0K /data06/forseti_fp/android_device_session-980e940023af11e6a5ea87f007ccb41c
4.0K /data07/forseti_fp/android_device_session-980e940023af11e6a5ea87f007ccb41c
4.0K /data08/forseti_fp/android_device_session-980e940023af11e6a5ea87f007ccb41c
4.0K /data09/forseti_fp/android_device_session-980e940023af11e6a5ea87f007ccb41c
3.5G /data10/forseti_fp/android_device_session-980e940023af11e6a5ea87f007ccb41c
4.0K /data11/forseti_fp/android_device_session-980e940023af11e6a5ea87f007ccb41c
4.0K /data12/forseti_fp/android_device_session-980e940023af11e6a5ea87f007ccb41c

内存不足时报错:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
ERROR 03:58:12 [Stream #4f0fadf0-23b1-11e6-bbf2-2592342d3b2e] Streaming error occurred
java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.TreeMap.put(TreeMap.java:569) ~[na:1.7.0_51]
at java.util.TreeSet.add(TreeSet.java:255) ~[na:1.7.0_51]
at org.apache.cassandra.io.compress.CompressionMetadata.getChunksForSections(CompressionMetadata.java:287) ~[apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.streaming.messages.FileMessageHeader$FileMessageHeaderSerializer.serialize(FileMessageHeader.java:172) ~[apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.streaming.messages.OutgoingFileMessage.serialize(OutgoingFileMessage.java:82) ~[apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:49) ~[apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:41) ~[apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:45) ~[apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:351) ~[apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:323) ~[apache-cassandra-2.1.13.jar:2.1.13]
at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]
progress: [/192.168.50.24]0:0/19 0 % [/192.168.50.21]0:1/19 0 % [/192.168.50.20]0:0/19 0 % [/192.168.50.23]0:2/19 0 % [/192.168.50.22]0:1/19 0 % total: 0% 0 MB/s(avg: 0 MB/s)
ERROR 03:58:12 [Stream #4f0fadf0-23b1-11e6-bbf2-2592342d3b2e] Remote peer 192.168.50.23 failed stream session.
progress: [/192.168.50.24]0:0/19 0 % [/192.168.50.21]0:1/19 0 % [/192.168.50.20]0:0/19 0 % [/192.168.50.23]0:2/19 0 % [/192.168.50.22]0:3/19 0 % total: 0% 0 MB/s(avg: 0 MB/s)
/usr/install/cassandra/bin/sstableloader: line 53:
15459 已杀死 "$JAVA" $JAVA_AGENT -ea -cp "$CLASSPATH" $JVM_OPTS -Xmx$MAX_HEAP_SIZE -Dcassandra.storagedir="$cassandra_storagedir" -Dlogback.configurationFile=logback-tools.xml org.apache.cassandra.tools.BulkLoader "$@"

sstableloader完成后的日志:

1
2
3
4
5
6
7
8
9
10
11
Established connection to initial hosts
Opening sstables and calculating sections to stream
....
100% total: 100% 0 MB/s(avg: 24 MB/s)
Summary statistics:
Connections per host: : 1
Total files transferred: : 85
Total bytes transferred: : 2291137192904
Total duration (ms): : 87700111 --> 24小时!
Average transfer rate (MB/s): : 24
Peak transfer rate (MB/s): : 24

一次完整的snapshot,android_device_session表在每个节点的耗时统计

1
2
3
4
5
6
7
8
9
节点             Load       耗时(ms)   传输量(bytes)   大小/实际大小
192.168.48.228 2.51 TB 69222991 1808724189277 80G/1.7T
192.168.48.227 2.67 TB 24897246 590477418187 284G/1.9T
192.168.48.226 2.58 TB 73820237 1928301531978 249G/1.8T
192.168.48.176 5.9 TB 77773491 2032068642522 1018G/1.9T
192.168.48.161 4.28 TB 78947850 2060434319116 46G/?
192.168.48.160 4.34 TB 87700111 2291137192904 ?/1.1T
192.168.48.175 3.01 TB 68711040 1794574379750 178G/?
192.168.48.159 4.46 TB 88088722 2299781089484 ?/1.4T

整个集群做一次完整的快照迁移

准备目录结构(只需执行一次):sh table.sh
/home/admin/cassandra/forseti_fp下创建表目录

1
2
3
4
5
6
7
8
9
10
11
12
13
cd /home/admin/cassandra/forseti_fp
for file in /home/admin/cassandra/data/forseti_fp/*
do
if test -d $file
then
table=`basename $file`
snap=`ls $file/snapshots`
if [ -n "$snap" ]; then
echo $table $snap
mkdir $table
fi
fi
done

执行一次snapshot全量数据迁移(只需执行一次): nohup sh snap.sh > snap_alltable.log &
拷贝/home/admin/cassandra/data/forseti_fp/$table/snapshots/$snap/*/home/admin/cassandra/forseti_fp/$table

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
for file in /home/admin/cassandra/data/forseti_fp/*
do
if test -d $file
then
table=`basename $file`
snap=`ls $file/snapshots`
if [ -n "$snap" ]; then
mv $file/snapshots/$snap/* /home/admin/cassandra/forseti_fp/$table
if [ "smart_device_map" == $table ]||[ "android_device_session_temp" == $table ]||[ "android_device_session" == $table ]||[ "android_device" == $table ]||[ "analysis" == $table ]||[ "device_session" == $table ]; then
echo " "
else
echo $table $snap
/usr/install/cassandra/bin/sstableloader -d 192.168.50.20,192.168.50.21,192.168.50.22,192.168.50.23,192.168.50.24 /home/admin/cassandra/forseti_fp/$table
fi
fi
fi
done

注意:android_device_session表已经同步过了,所以不需要再次同步

增量备份(需要多次执行)

执行增量迁移: nohup sh increment.sh > increment0616.log 2>&1 &
/home/admin/cassandra/forseti_fp/$table/backups/* 拷贝到 /home/admin/cassandra/forseti_fp/$table

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
for file in /home/admin/cassandra/data/forseti_fp/*
do
if test -d $file
then
table=`basename $file`
if [ -d "$file/backups" ]; then
rm /home/admin/cassandra/forseti_fp/$table/*
cd $file/backups
ls | xargs -t -I {} mv {} /home/admin/cassandra/forseti_fp/$table/

if [ "smart_device_map" == $table ]||[ "android_device_session_temp" == $table ]||[ "android_device" == $table ]||[ "analysis" == $table ]||[ "device_session" == $table ]; then
echo " "
else
echo $table
/usr/install/cassandra/bin/sstableloader -d 192.168.50.20,192.168.50.21,192.168.50.22,192.168.50.23,192.168.50.24 /home/admin/cassandra/forseti_fp/$table
fi
fi
fi
done

注意:要把android_device_session表加上,backups都必须同步

vi increment_timer.sh

1
2
3
4
5
6
7
8
9
10
11
12
ps -fe|grep BulkLoader |grep -v grep
if [ $? -ne 0 ]
then
start=$(date +%s)
echo "可以开始处理。。。"
sh increment.sh
end=$(date +%s)
timeT=$(( $end - $start ))
echo "COST:$timeT, start:$start, end: $end"
else
echo "BulkLoader正在运行,请稍等"
fi

定时任务脚本:

1
2
3
0 */2 * * * sh /home/admin/cassandra/increment_timer.sh
tail -f /var/spool/mail/admin
crontab -e

228一个节点所有表的耗时(一次backups), 大概1G花费1min

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
[admin@192-168-48-228 cassandra]$ cat nohup.out  | grep "Total duration"
50M android_device_index Total duration (s): : 26
525G android_device_session Total duration (s): : 25291(7h=420m)
71M android_device_time Total duration (s): : 30
5.3G api_invoke_result Total duration (s): : 561
1.6G device Total duration (s): : 288
3.9G devicedbmap Total duration (s): : 316
66G device_session_map Total duration (s): : 4563(76min)
2.1G ios_device_index Total duration (s): : 151
4.5G ios_device_info Total duration (s): : 342
4.8G ios_device_session Total duration (s): : 353
373M ip_device_id Total duration (s): : 53
7.3G page_info Total duration (s): : 536
3.3G tcp_stack_ua Total duration (s): : 276
9.6G tcp_syn_data Total duration (s): : 686

机房迁移

准备工作:

  1. 老集群的snapshot和backups同步到新集群
  2. ??

直接从当前集群同步到上海集群

有节点挂掉

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
[admin@192-168-48-228 ~]$ /usr/install/cassandra/bin/sstableloader -d 10.21.21.20 -t 50 /home/admin/cassandra/forseti_fp/device
ERROR 11:11:56 Error creating pool to /10.21.21.22:9042
com.datastax.driver.core.TransportException: [/10.21.21.22:9042] Cannot connect
at com.datastax.driver.core.Connection$1.operationComplete(Connection.java:156) [cassandra-driver-core-2.2.0-rc2-SNAPSHOT-20150617-shaded.jar:na]
at com.datastax.driver.core.Connection$1.operationComplete(Connection.java:139) [cassandra-driver-core-2.2.0-rc2-SNAPSHOT-20150617-shaded.jar:na]
at com.datastax.shaded.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680) [cassandra-driver-core-2.2.0-rc2-SNAPSHOT-20150617-shaded.jar:na]
at com.datastax.shaded.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:603) [cassandra-driver-core-2.2.0-rc2-SNAPSHOT-20150617-shaded.jar:na]
at com.datastax.shaded.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:563) [cassandra-driver-core-2.2.0-rc2-SNAPSHOT-20150617-shaded.jar:na]
at com.datastax.shaded.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424) [cassandra-driver-core-2.2.0-rc2-SNAPSHOT-20150617-shaded.jar:na]
at com.datastax.shaded.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:268) [cassandra-driver-core-2.2.0-rc2-SNAPSHOT-20150617-shaded.jar:na]
at com.datastax.shaded.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:284) [cassandra-driver-core-2.2.0-rc2-SNAPSHOT-20150617-shaded.jar:na]
at com.datastax.shaded.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528) [cassandra-driver-core-2.2.0-rc2-SNAPSHOT-20150617-shaded.jar:na]
at com.datastax.shaded.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) [cassandra-driver-core-2.2.0-rc2-SNAPSHOT-20150617-shaded.jar:na]
at com.datastax.shaded.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) [cassandra-driver-core-2.2.0-rc2-SNAPSHOT-20150617-shaded.jar:na]
at com.datastax.shaded.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) [cassandra-driver-core-2.2.0-rc2-SNAPSHOT-20150617-shaded.jar:na]
at com.datastax.shaded.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) [cassandra-driver-core-2.2.0-rc2-SNAPSHOT-20150617-shaded.jar:na]
at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]
Caused by: java.net.ConnectException: 拒绝连接: /10.21.21.22:9042
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[na:1.7.0_51]
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) ~[na:1.7.0_51]
at com.datastax.shaded.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224) ~[cassandra-driver-core-2.2.0-rc2-SNAPSHOT-20150617-shaded.jar:na]
at com.datastax.shaded.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:281) [cassandra-driver-core-2.2.0-rc2-SNAPSHOT-20150617-shaded.jar:na]
... 6 common frames omitted
Established connection to initial hosts
Opening sstables and calculating sections to stream
...Streaming relevant part of /home/admin/cassan

版本不一样,拒绝连接

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
[admin@fp-cass048159 device]$ /usr/install/cassandra/bin/sstableloader -d 10.21.21.20 -t 50 /home/admin/cassandra/forseti_fp/$table
Could not retrieve endpoint ranges:
org.apache.thrift.transport.TTransportException: java.net.ConnectException: 拒绝连接
java.lang.RuntimeException: Could not retrieve endpoint ranges:
at org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:342)
at org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:156)
at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:109)
Caused by: org.apache.thrift.transport.TTransportException: java.net.ConnectException: 拒绝连接
at org.apache.thrift.transport.TSocket.open(TSocket.java:187)
at org.apache.thrift.transport.TFramedTransport.open(TFramedTransport.java:81)
at org.apache.cassandra.thrift.TFramedTransportFactory.openTransport(TFramedTransportFactory.java:41)
at org.apache.cassandra.tools.BulkLoader$ExternalClient.createThriftClient(BulkLoader.java:380)
at org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:302)
... 2 more
Caused by: java.net.ConnectException: 拒绝连接
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at org.apache.thrift.transport.TSocket.open(TSocket.java:182)
... 6 more
1
2
3
[admin@fp-cass048159 ~]$ /usr/install/cassandra/bin/cqlsh 10.21.21.20
Connection error: ('Unable to connect to any servers', {'10.21.21.20':
ProtocolError("cql_version '3.2.1' is not supported by remote (w/ native protocol). Supported versions: [u'3.3.1']",)})

sstableloader 100%后卡住

1
2
3
4
5
[admin@192-168-48-228 ~]$ tail -c 500 nohup.out
progress: [/10.21.21.23]0:60/60 100% [/10.21.21.21]0:65/65 100% [/10.21.21.20]0:63/63 100% [/10.21.21.19]0:60/60 100%
[/10.21.21.26]0:60/60 100% [/10.21.21.25]0:63/63 100% [/10.21.21.24]0:64/64 100%
progress: [/10.21.21.23]0:60/60 100% [/10.21.21.21]0:65/65 100% [/10.21.21.20]0:63/63 100% [/10.21.21.19]0:60/60 100%
[/10.21.21.26]0:60/60 100% [/10.21.21.25]0:63/63 100% [/10.21.21.24]0:64/64 100% total: 100% 0 MB/s(avg: 2 MB/s)
1
2
3
4
5
 S0C    S1C     S0U     S1U      EC       EU        OC         OU       PC     PU         YGC    YGCT    FGC    FGCT     GCT
11264.0 11264.0 3525.9 0.0 1032704.0 346823.5 342016.0 9527.1 21504.0 21335.9 24 0.466 0 0.000 0.466
11264.0 11264.0 3525.9 0.0 1032704.0 346823.5 342016.0 9527.1 21504.0 21335.9 24 0.466 0 0.000 0.466
11264.0 11264.0 3525.9 0.0 1032704.0 346823.5 342016.0 9527.1 21504.0 21335.9 24 0.466 0 0.000 0.466
11264.0 11264.0 3525.9 0.0 1032704.0 346823.5 342016.0 9527.1 21504.0 21335.9 24 0.466 0 0.000 0.466

FP主要的表迁移:

1
2
3
4
5
device
android_device_index
android_device_time
ios_device_index
ios_device_info

查看数据量:

1
2
3
4
5
6
$ cd /home/admin/cassandra/data/forseti_fp
$ du -sh * | egrep 'android_device_index|android_device_time|ios_device_index|ios_device_info'
869M android_device_index
1023M android_device_time
23G ios_device_index
54G ios_device_info

48.228: 还剩device和ios_device_info
227: 升级完,7-7 9
226: 升级完,7-8 2
159:升级完,7-8 6

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
#####先测试一个表
ll /home/admin/cassandra/data/forseti_fp/android_device_index/snapshots/$snapTime
mkdir -p /home/admin/cassandra/forseti_fp/device
cd /home/admin/cassandra/data/forseti_fp/device/snapshots/$snapTime
ls | xargs -t -I {} mv {} /home/admin/cassandra/forseti_fp/device/
/usr/install/cassandra/bin/sstableloader -d 10.21.21.20 -t 50 /home/admin/cassandra/forseti_fp/device

####脚本循环方式 load.sh
rm -rf /home/admin/cassandra/forseti_fp/
/usr/install/cassandra/bin/nodetool snapshot forseti_fp > snap.log
snapTime=`cat snap.log | grep directory | awk '{print $3}'`
nums=('device' 'android_device_index' 'android_device_time' 'ios_device_index' 'ios_device_info')
for table in ${nums[@]};
do
mkdir -p /home/admin/cassandra/forseti_fp/$table
cd /home/admin/cassandra/data/forseti_fp/$table/snapshots/$snapTime
ls | xargs -t -I {} mv {} /home/admin/cassandra/forseti_fp/$table/
/usr/install/cassandra/bin/sstableloader -d 10.21.21.20 -t 100 /home/admin/cassandra/forseti_fp/$table
done
echo "end."

crontab -e
00 6 8 7 * sh /home/admin/load.sh > load.log 2>&1

多个节点做快照

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
nodetool status |grep RAC|awk '{print $2}' | while read ip; do echo $ip; nodetool -h $ip snapshot forseti_fp; done
192.168.48.227
Snapshot directory: 1467716730094
192.168.48.226
Snapshot directory: 1467716742257
192.168.48.176
Snapshot directory: 1467716753938
192.168.48.161
Requested creating snapshot(s) for [forseti_fp] with snapshot name [1467716770566]
192.168.48.228
Snapshot directory: 1467716780650
192.168.48.160
Requested creating snapshot(s) for [forseti_fp] with snapshot name [1467716792487]
192.168.48.175
Requested creating snapshot(s) for [forseti_fp] with snapshot name [1467716802910]
192.168.48.159
Requested creating snapshot(s) for [forseti_fp] with snapshot name [1467716815575]

以159为例:
[admin@fp-cass048159 ~]$ ll /home/admin/cassandra/data/forseti_fp/android_device_session/snapshots/
总用量 4
drwxr-xr-x. 2 admin admin 4096 7月 5 19:07 1467716815575

declare -A map=(["192.168.48.227"]="1467716730094" ["192.168.48.226"]="1467716742257"
["192.168.48.176"]="1467716753938" ["192.168.48.161"]="1467716770566"
["192.168.48.228"]="1467716780650" ["192.168.48.160"]="1467716792487"
["192.168.48.175"]="1467716802910" ["192.168.48.159"]="1467716815575")
host=`ifconfig | grep "192.168.4" | awk '/inet addr/{sub("addr:",""); print $2}'`
snapshot=${map["$host"]}
snapshotAbs="/home/admin/cassandra/data/forseti_fp/android_device_session/snapshots/$snapshot"

#rm -rf /home/admin/cassandra/forseti_fp/
#mkdir -p /home/admin/cassandra/forseti_fp/
table="device"
mkdir -p /home/admin/cassandra/forseti_fp/$table
cd $snapshotAbs
ls | xargs -t -I {} mv {} /home/admin/cassandra/forseti_fp/$table/
/usr/install/cassandra/bin/sstableloader -t 50 -d 10.21.21.20 /home/admin/cassandra/forseti_fp/$table

新集群安装

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
#rm -rf apache-cassandra-2.2.6 & rm apache-cassandra-2.2.6-bin.tar.gz & rm -rf test
wget http://192.168.47.211:8000/apache-cassandra-2.2.6-bin.tar.gz
tar zxf apache-cassandra-2.2.6-bin.tar.gz

#cluster=sh_velocity
#seeds=10.21.21.11,10.21.21.12,10.21.21.13

#cluster=sh_fp
#seeds=10.21.21.19,10.21.21.20,10.21.21.21

cluster=sh_galaxy
seeds=10.21.21.131,10.21.21.132

mkdir data
host=`ifconfig | grep "10.21.21." | awk '/inet addr/{sub("addr:",""); print $2}'`
sed -i -e "s/localhost/$host/g" apache-cassandra-2.2.6/conf/cassandra.yaml
sed -i -e "s#localhost#$host#g" apache-cassandra-2.2.6/conf/cassandra-env.sh
sed -i -e "s/127.0.0.1/$seeds/g" apache-cassandra-2.2.6/conf/cassandra.yaml
sed -i -e "s/Test Cluster/$cluster/g" apache-cassandra-2.2.6/conf/cassandra.yaml
ln -s apache-cassandra-2.2.6 cassandra

cassandra/bin/cassandra
cassandra/bin/nodetool status
cassandra/bin/nodetool -h 10.21.21.10 status
cassandra/bin/nodetool -h 10.21.21.19 status
cassandra/bin/nodetool -h 10.21.21.131 status

1. 表结构

1
2
3
4
5
6
7
8
9
10
11
/usr/install/cassandra/bin/cqlsh -e 'desc keyspace forseti' 192.168.47.202 > forseti_api.cql
/usr/install/cassandra/bin/cqlsh -e 'desc keyspace forseti' 192.168.48.162 > forseti.cql
/usr/install/cassandra/bin/cqlsh -e 'desc keyspace forseti_fp' 192.168.48.159 > forseti_fp.cql
/usr/install/cassandra/bin/cqlsh -e 'desc keyspace realtime' 192.168.49.56 > forseti_galaxy.cql

wget http://192.168.48.162:8000/forseti.cql
wget http://192.168.48.162:8000/forseti_fp.cql
wget http://192.168.48.162:8000/forseti_galaxy.cql
cassandra/bin/cqlsh 10.21.21.11 -f forseti.cql
cassandra/bin/cqlsh 10.21.21.19 -f forseti_fp.cql
cassandra/bin/cqlsh 10.21.21.131 -f forseti_galaxy.cql

2. 数据迁移

3. 增量数据迁移

4. 数据验证

1
2
3
4
5
6
/usr/install/hadoop/bin/hadoop fs -cat /user/tongdun/velocity/raw/2016-6-9/part-00000 | head
115.55.206|sdo|sdo_client|ip3|1465401548233|1465401548238838F30790E445243217|{"state":"0","ipAddressProvince":"河南省","ipAddressCity":"鹤壁市","ipAddress":"115.55.206.177","ext_is_gplus_login":"0","ext_is_bingding_mobile":"0","eventType":"Login","eventOccurTime":"1465401548233","deviceId":"00-0C-29-BC-62-D1","accountLogin":"166f751ebf90b9fbb38977d75b1265f6","eventId":"login_client","partnerCode":"sdo","ip3":"115.55.206","location":"鹤壁市","status":"Review"}

select * from velocity_app where attribute='115.55.206' and type='ip3' and partner_code='sdo' and app_name='sdo_client' and sequence_id > '1465401548238838F30790E445243210' and sequence_id < '1465401548238838F30790E445243220';

115.55.206 | sdo | sdo_client | ip3 | 1465401548238838F30790E445243217 | {"state":"0","ipAddressProvince":"河南省","ipAddressCity":"鹤壁市","ipAddress":"115.55.206.177","ext_is_gplus_login":"0","ext_is_bingding_mobile":"0","eventType":"Login","eventOccurTime":"1465401548233","deviceId":"00-0C-29-BC-62-D1","accountLogin":"166f751ebf90b9fbb38977d75b1265f6","eventId":"login_client","partnerCode":"sdo","ip3":"115.55.206","location":"鹤壁市","status":"Review"} | 1465401548233

nodetool工具

集群状态: nodetool decribecluster

启动OpsCenter后, 在8888上新建Cluster时报异常:

1
2
3
4
5
6
7
2015-10-31 13:46:07+0800 [forseti_cluster]  WARN: Node 192.168.47.206 is reporting a schema disagreement: 
{UUID('bc083db4-b544-38d1-a37c-be3e78db1df1'): ['192.168.47.229'],
UUID('67462d4d-a9c9-38a1-afd8-1a3917232b8a'): ['192.168.47.203', '192.168.47.228', '192.168.47.227', '192.168.47.202', '192.168.47.225', '192.168.47.204', '192.168.47.205', '192.168.47.206', '192.168.47.221', '192.168.47.222', '192.168.47.224']}
2015-10-31 13:46:07+0800 [] WARN: [control connection] No schema built on connect; retrying without wait for schema agreement
2015-10-31 13:46:13+0800 [] WARN: Host 192.168.47.229 has been marked down
2015-10-31 13:46:14+0800 [] WARN: Failed to create connection pool for new host 192.168.47.229: errors=Timed out creating connection, last_host=None
2015-10-31 13:47:07+0800 [] INFO: Host 192.168.47.229 may be up; will prepare queries and open connection pool

按照这里的方式, 如果状态为UNREACHABLE,则重启后再观察.
http://docs.datastax.com/en/cassandra/2.1/cassandra/troubleshooting/trblshootSchemaDisagree.html

果然有一个状态是UNREACHABLE的:

1
2
3
4
5
6
7
8
[qihuang.zheng@cass047202 ~]$ nodetool describecluster
Cluster Information:
Name: forseti_cluster
Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Schema versions:
3e4c6580-3d3c-327b-986c-63086e93e94f: [192.168.47.206, 192.168.47.222, 192.168.47.221, 192.168.47.204, 192.168.47.205, 192.168.47.202, 192.168.47.203, 192.168.47.228, 192.168.47.224, 192.168.47.225, 192.168.47.227]
UNREACHABLE: [192.168.47.229]

发现229当掉, 重启后, 版本不一致:

1
2
3
4
5
6
7
8
9
[qihuang.zheng@cass047202 ~]$ nodetool describecluster
Cluster Information:
Name: forseti_cluster
Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Schema versions:
3e4c6580-3d3c-327b-986c-63086e93e94f: [192.168.47.206, 192.168.47.222, 192.168.47.221, 192.168.47.204, 192.168.47.205, 192.168.47.202, 192.168.47.203, 192.168.47.228, 192.168.47.224, 192.168.47.225, 192.168.47.227]

d2601534-c7ff-394d-a656-33d9da3dc574: [192.168.47.229]

但是在229上查看到所有节点都不可达:

1
2
3
4
5
6
7
[qihuang.zheng@192-168-47-229 ~]$ nodetool describecluster
Cluster Information:
Name: forseti_cluster
Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Schema versions:
UNREACHABLE: [192.168.47.206, 192.168.47.222, 192.168.47.221, 192.168.47.204, 192.168.47.205, 192.168.47.202, 192.168.47.203, 192.168.47.228, 192.168.47.229, 192.168.47.224, 192.168.47.225, 192.168.47.227]

229过了段时间因为GC, 再次重启, 再次查看, 版本一样了:

1
2
3
4
5
6
7
[qihuang.zheng@192-168-47-229 ~]$ nodetool describecluster
Cluster Information:
Name: forseti_cluster
Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Schema versions:
977ca50d-70ab-3f45-a1e6-ca18560b0c29: [192.168.47.206, 192.168.47.222, 192.168.47.221, 192.168.47.204, 192.168.47.205, 192.168.47.202, 192.168.47.203, 192.168.47.228, 192.168.47.229, 192.168.47.224, 192.168.47.225, 192.168.47.227]

删除节点: removenode

http://zhaoyanblog.com/archives/533.html

A.对正在运行Cassandra进程的节点下线,或者在其他节点-h指定要下线的目标节点. 运行命令的节点会有NodeCmd进程.

1
2
/usr/install/apache-cassandra-2.0.16/bin/nodetool decommission
/usr/install/apache-cassandra-2.0.16/bin/nodetool decommission -h 192.168.47.226

B.如果节点已经下线,比如进程被杀死, 则在正常的节点上删除节点:

1
2
/usr/install/apache-cassandra-2.0.16/bin/nodetool status 获取宕机节点的host-id
/usr/install/apache-cassandra-2.0.16/bin/nodetool removenode 2591cec9-42f9-4f60-a622-00d463910994

C.进程挂住的解决办法:
1).线上采用decommission后, 发现执行了很久都没有完成. 后来发现是streamming的一个bug:
nodetool removenode hangs: https://issues.apache.org/jira/browse/CASSANDRA-6542,
http://stackoverflow.com/questions/25943261/nodetool-removenode-stuck-during-removal
Streams hang in repair: https://issues.apache.org/jira/browse/CASSANDRA-8472

Streaming hanging is a familiar trouble in my cluster (2.0/2.1). In my experience,
the node that being restarted recently won’t hang the streaming.
So each time when I want to add or remove a node, I will restart all nodes one by one.
这种方式在线上是不可行的, 需要重启线上的每个节点. 于是尝试使用force直接中断.

2).杀掉226的Cassandra进程, 在202上使用B方式removenode删除226. 不过removenode也存在hang的情况.

1
2
[qihuang.zheng@cass047202 ~]$ nodetool removenode status
RemovalStatus: Removing token (-9184682133698409841). Waiting for replication confirmation from [/192.168.47.206,/192.168.47.222,/192.168.47.221,/192.168.47.204,/192.168.47.205,/192.168.47.202,/192.168.47.224,/192.168.47.225].

3).由于nodetool命令都是调用JMX方法,所以尽管kill掉进程,命令对应的进程实际是还存在的.

1
2
3
4
5
[qihuang.zheng@cass047202 ~]$ jps
8596 NodeCmd
[qihuang.zheng@cass047202 ~]$ kill -9 8596
[qihuang.zheng@cass047202 ~]$ nodetool removenode status
RemovalStatus: Removing token (-9184682133698409841). Waiting for replication confirmation from [/192.168.47.206,/192.168.47.222,/192.168.47.221,/192.168.47.204,/192.168.47.205,/192.168.47.202,/192.168.47.224,/192.168.47.225].

4).删除操作正在进行,不能重复执行removenode命令,不过提示信息给了我们另一种解决方案.

1
2
3
4
5
6
7
8
9
10
[qihuang.zheng@cass047202 ~]$ /usr/install/cassandra/bin/nodetool status
UN 192.168.47.202 612.13 GB 256 7.4% abaa0cbc-09d3-4990-8698-ff4d2f2bb4f7 RAC1
DL 192.168.47.226 406.34 GB 256 7.6% 2591cec9-42f9-4f60-a622-00d463910994 RAC1

[qihuang.zheng@cass047202 ~]$ nodetool removenode 2591cec9-42f9-4f60-a622-00d463910994
Exception in thread "main" java.lang.UnsupportedOperationException: This node is already processing a removal. Wait for it to complete, or use 'removenode force' if this has failed.
at org.apache.cassandra.service.StorageService.removeNode(StorageService.java:3342)
...
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)

5).如果removenode失败, 则强制删除, 使用nodetool removenode force命令,不需要跟上host-id.

1
2
3
4
5
6
[qihuang.zheng@cass047202 ~]$ nodetool removenode force 2591cec9-42f9-4f60-a622-00d463910994
Missing an argument for removenode (either status, force, or an ID)
usage: java org.apache.cassandra.tools.NodeCmd --host <arg> <command>

[qihuang.zheng@cass047202 ~]$ nodetool removenode force
RemovalStatus: Removing token (-9184682133698409841). Waiting for replication confirmation from [/192.168.47.206,/192.168.47.222,/192.168.47.221,/192.168.47.204,/192.168.47.205,/192.168.47.202,/192.168.47.224,/192.168.47.225].

6).成功删除后, 下线的226节点不会出现在状态列表中, 并且集群的Owns也发生了变化. gossipinfo可以查看状态为removed.

1
2
3
4
5
6
7
8
9
10
[qihuang.zheng@cass047202 ~]$ nodetool removenode status
RemovalStatus: No token removals in process.

[qihuang.zheng@cass047202 ~]$ /usr/install/cassandra/bin/nodetool status
UN 192.168.47.202 612.13 GB 256 8.1% abaa0cbc-09d3-4990-8698-ff4d2f2bb4f7 RAC1

[qihuang.zheng@cass047204 ~]$ nodetool gossipinfo
/192.168.47.226
STATUS:removed,2591cec9-42f9-4f60-a622-00d463910994,1446339305944
REMOVAL_COORDINATOR:REMOVER,abaa0cbc-09d3-4990-8698-ff4d2f2bb4f7

D.Active Streams:

下图是对225进行decommission,225的数据通过streams转移到其他节点:

streams

替换节点: replace_address

http://zhaoyanblog.com/archives/591.html
http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_replace_node_t.html

CAUTION:
1.Wait at least 72 hours to ensure that old node information is removed from gossip.
If removed from the property file too soon, problems may result.
2.auto_bootstrap不能设置为false:
java.lang.RuntimeException: Trying to replace_address with auto_bootstrap disabled will not work, check your configuration

要被替换的节点状态为DN, 新启动的节点启动时加上-Dcassandra.replace_address选项:

1
2
3
4
5
6
7
sudo -u admin /usr/install/cassandra/bin/cassandra -Dcassandra.replace_address=192.168.47.229

WARN 13:49:37,423 Token -3920394820366823756 changing ownership from /192.168.47.229 to /192.168.48.160
INFO 13:50:40,122 Node /192.168.47.229 is now part of the cluster
WARN 13:50:40,127 Not updating token metadata for /192.168.47.229 because I am replacing it
INFO 13:50:40,127 Nodes /192.168.47.229 and /192.168.48.160 have the same token -1001533036333769848. Ignoring /192.168.47.229
INFO 13:51:10,272 FatClient /192.168.47.229 has been silent for 30000ms, removing from gossip

查看节点的状态:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
[qihuang.zheng@fp-cass048160 ~]$ nodetool status
-- Address Load Tokens Owns Host ID Rack
UN 192.168.48.161 460.9 GB 256 20.1% 54b3a7e0-f778-4087-98c3-ac84e56f77e6 RAC1
UN 192.168.48.160 6.02 MB 256 21.2% 18a87c56-dcba-4614-adba-804aa7761a06 RAC1
UN 192.168.47.228 5.4 TB 256 19.2% f3d26148-d2da-479c-ae9e-ae41aced1be9 RAC1
UN 192.168.48.159 483.48 GB 256 19.5% df9a693b-efc1-41bc-9a42-cf868ea75e65 RAC1
UN 192.168.47.227 686.97 GB 256 20.0% 02575631-6ccb-4803-81fd-5bf7a978726d RAC1

[qihuang.zheng@192-168-47-227 ~]$ nodetool status
-- Address Load Tokens Owns Host ID Rack
UN 192.168.48.161 460.78 GB 256 20.1% 54b3a7e0-f778-4087-98c3-ac84e56f77e6 RAC1
UN 192.168.47.228 5.4 TB 256 19.2% f3d26148-d2da-479c-ae9e-ae41aced1be9 RAC1
DN 192.168.47.229 937.05 GB 256 21.2% 18a87c56-dcba-4614-adba-804aa7761a06 RAC1
UN 192.168.48.159 483.41 GB 256 19.5% df9a693b-efc1-41bc-9a42-cf868ea75e65 RAC1
UN 192.168.47.227 686.93 GB 256 20.0% 02575631-6ccb-4803-81fd-5bf7a978726d RAC1

1.上面发现一个奇怪的现象就是229和160的HostID都是一样的, 所以日志中会有have the same token ignoring
2.227并不认识160, 而是认为229状态为DN
3.160上能看到其他节点, 但是看不到229.

过了有一周, 再次查看发现还是上面的状况, 而且160的日志还是经常打印Ignoring…

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
[qihuang.zheng@fp-cass048160 ~]$ nodetool status
-- Address Load Tokens Owns Host ID Rack
UN 192.168.48.161 972.25 GB 256 20.1% 54b3a7e0-f778-4087-98c3-ac84e56f77e6 RAC1
UN 192.168.48.160 250.64 GB 256 21.2% 18a87c56-dcba-4614-adba-804aa7761a06 RAC1
UN 192.168.47.228 5.89 TB 256 19.2% f3d26148-d2da-479c-ae9e-ae41aced1be9 RAC1
UN 192.168.48.159 1 TB 256 19.5% df9a693b-efc1-41bc-9a42-cf868ea75e65 RAC1
UN 192.168.47.227 1.15 TB 256 20.0% 02575631-6ccb-4803-81fd-5bf7a978726d RAC1

[qihuang.zheng@192-168-47-227 ~]$ nodetool status
-- Address Load Tokens Owns Host ID Rack
UN 192.168.48.161 972.16 GB 256 20.1% 54b3a7e0-f778-4087-98c3-ac84e56f77e6 RAC1
UN 192.168.47.228 5.89 TB 256 19.2% f3d26148-d2da-479c-ae9e-ae41aced1be9 RAC1
DN 192.168.47.229 937.05 GB 256 21.2% 18a87c56-dcba-4614-adba-804aa7761a06 RAC1
UN 192.168.48.159 1 TB 256 19.5% df9a693b-efc1-41bc-9a42-cf868ea75e65 RAC1
UN 192.168.47.227 1.15 TB 256 20.0% 02575631-6ccb-4803-81fd-5bf7a978726d RAC1

索性把160停掉重启(注意要加上auto_bootstrap:false), 现在227上能观察到160了. 但是160上还是看不到229.
不过更加奇怪的是229的HostID变成了null(因为出现160就不能同时再出现229了).这样子没办法removenode了.
其实也不能removenode了, 因为229和160的HostID之前是一样的!如果根据hostId删除,会把160也都删除掉了.

1
2
3
4
5
6
7
8
[qihuang.zheng@192-168-47-227 ~]$ nodetool status
-- Address Load Tokens Owns Host ID Rack
UN 192.168.48.161 973.32 GB 256 17.4% 54b3a7e0-f778-4087-98c3-ac84e56f77e6 RAC1
UN 192.168.48.160 250.93 GB 256 15.4% 18a87c56-dcba-4614-adba-804aa7761a06 RAC1
UN 192.168.47.228 5.89 TB 256 16.5% f3d26148-d2da-479c-ae9e-ae41aced1be9 RAC1
DN 192.168.47.229 937.05 GB 256 17.8% null RAC1
UN 192.168.48.159 1 TB 256 16.4% df9a693b-efc1-41bc-9a42-cf868ea75e65 RAC1
UN 192.168.47.227 1.15 TB 256 16.5% 02575631-6ccb-4803-81fd-5bf7a978726d RAC1

不能根据hostId来removenode 229, 那怎么删除229节点?

下面采用JMX的方式也不行: https://gist.github.com/justenwalker/8338334

1
2
3
4
5
6
7
8
9
[qihuang.zheng@192-168-47-227 ~]$ java -jar ./jmxterm.jar
Welcome to JMX terminal. Type "help" for available commands.
$>open localhost:7199
#Connection to localhost:7199 is opened
$>bean org.apache.cassandra.net:type=Gossiper
#bean is set to org.apache.cassandra.net:type=Gossiper
$>run unsafeAssassinateEndpoint 192.168.47.229
#calling operation unsafeAssassinateEndpoint of mbean org.apache.cassandra.net:type=Gossiper
#RuntimeMBeanException: java.lang.NullPointerException

https://gist.github.com/nstielau/3373649

1
2
3
4
5
6
7
8
9
10
java -jar jmxsh-R*.jar -h 192.168.47.227 -p 7199

% [Hit Enter to go into Browse Mode]
Select a domain: [Enter number for org.apache.cassandra.net]
Select an mbean: [Enter number for org.apache.cassandra.net:type=Gossiper]
Select an attribute or operation: [Enter number for unsafeAssassinateEndpoint(String p1)]
p1 (String): 192.168.47.229

It may also be possible to run it directly (untested):
% jmx_invoke -m org.apache.cassandra.net:type=Gossiper unsafeAssassinateEndpoint <STALE-IP-ADDRESS>

但最后还是报错NPE:

1
2
3
4
5
6
7
8
9
10
11
12
%
Entering browse mode.
====================================================
Available Domains:
1. org.apache.cassandra.internal
2. org.apache.cassandra.metrics
6. org.apache.cassandra.request
7. org.apache.cassandra.net ⬅️
10. org.apache.cassandra.db
SERVER: service:jmx:rmi:///jndi/rmi://192.168.47.227:7199/jmxrmi

Exception caught: java.lang.NullPointerException

查看每个节点自己的peers, 发现229和160的host_id一样:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
[qihuang.zheng@192-168-47-227 ~]$ cqlsh -e "select peer, host_id from system.peers;" 192.168.47.227
peer | host_id
----------------+--------------------------------------
192.168.47.228 | f3d26148-d2da-479c-ae9e-ae41aced1be9
192.168.47.229 | 18a87c56-dcba-4614-adba-804aa7761a06
192.168.48.160 | 18a87c56-dcba-4614-adba-804aa7761a06
192.168.48.161 | 54b3a7e0-f778-4087-98c3-ac84e56f77e6
192.168.48.159 | df9a693b-efc1-41bc-9a42-cf868ea75e65

[qihuang.zheng@192-168-47-227 ~]$ cqlsh -e "select peer, host_id from system.peers;" 192.168.48.159

peer | host_id
----------------+--------------------------------------
192.168.47.228 | f3d26148-d2da-479c-ae9e-ae41aced1be9
192.168.47.227 | 02575631-6ccb-4803-81fd-5bf7a978726d
192.168.47.229 | 18a87c56-dcba-4614-adba-804aa7761a06
192.168.48.160 | 18a87c56-dcba-4614-adba-804aa7761a06
192.168.48.161 | 54b3a7e0-f778-4087-98c3-ac84e56f77e6

[qihuang.zheng@192-168-47-227 ~]$ nodetool status -h 192.168.48.159
-- Address Load Tokens Owns Host ID Rack
UN 192.168.48.161 1.15 TB 256 17.4% 54b3a7e0-f778-4087-98c3-ac84e56f77e6 RAC1
UN 192.168.48.160 461.6 GB 256 15.4% 18a87c56-dcba-4614-adba-804aa7761a06 RAC1
UN 192.168.47.228 6.09 TB 256 16.5% f3d26148-d2da-479c-ae9e-ae41aced1be9 RAC1
DN 192.168.47.229 937.05 GB 256 17.8% null RAC1
UN 192.168.48.159 1.2 TB 256 16.4% df9a693b-efc1-41bc-9a42-cf868ea75e65 RAC1
UN 192.168.47.227 1.34 TB 256 16.5% 02575631-6ccb-4803-81fd-5bf7a978726d RAC1

因为peers表的主键是peer, 所以可以根据peer删除一条记录:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
cqlsh -e "delete from system.peers where peer='192.168.47.229';" 192.168.47.227
cqlsh -e "delete from system.peers where peer='192.168.47.229';" 192.168.47.228
cqlsh -e "delete from system.peers where peer='192.168.47.229';" 192.168.48.159
cqlsh -e "delete from system.peers where peer='192.168.47.229';" 192.168.48.161

[qihuang.zheng@192-168-47-227 ~]$ cqlsh -e "select peer, host_id from system.peers;" 192.168.48.159
peer | host_id
----------------+--------------------------------------
192.168.47.228 | f3d26148-d2da-479c-ae9e-ae41aced1be9
192.168.47.227 | 02575631-6ccb-4803-81fd-5bf7a978726d
192.168.48.160 | 18a87c56-dcba-4614-adba-804aa7761a06
192.168.48.161 | 54b3a7e0-f778-4087-98c3-ac84e56f77e6

nodetool status -h 192.168.48.159
虽然peers中已经把229移除了, 但是staus还是能看到229为null
UN 192.168.48.160 467.14 GB 256 15.4% 18a87c56-dcba-4614-adba-804aa7761a06 RAC1
DN 192.168.47.229 937.05 GB 256 17.8% null RAC1

http://stackoverflow.com/questions/20549284/cassandra-how-to-remove-a-dead-node

另一种方式: 重启229, 这时候会分配一个新的HostID, 然后停掉, 再用removenode移除229.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
[qihuang.zheng@192-168-47-227 ~]$ nodetool status
-- Address Load Tokens Owns Host ID Rack
UN 192.168.48.161 1.16 TB 256 17.1% 54b3a7e0-f778-4087-98c3-ac84e56f77e6 RAC1
UN 192.168.48.160 473.62 GB 256 16.1% 18a87c56-dcba-4614-adba-804aa7761a06 RAC1
UN 192.168.47.228 6.1 TB 256 16.8% f3d26148-d2da-479c-ae9e-ae41aced1be9 RAC1
UN 192.168.47.229 44.07 KB 256 16.7% 41211503-27cd-47b0-b7c0-e5a7a4074d34 RAC1 ⬅️
UN 192.168.48.159 1.22 TB 256 17.8% df9a693b-efc1-41bc-9a42-cf868ea75e65 RAC1
UN 192.168.47.227 1.35 TB 256 15.4% 02575631-6ccb-4803-81fd-5bf7a978726d RAC1

[qihuang.zheng@192-168-47-227 ~]$ nodetool removenode status
RemovalStatus: Removing token (-9189073978895940412). Waiting for replication confirmation from [/192.168.48.161,/192.168.48.160,/192.168.47.228,/192.168.48.159,/192.168.47.227].

[qihuang.zheng@192-168-47-227 ~]$ nodetool ring | grep 9189073978895940412
192.168.47.229 RAC1 Down Leaving 44.07 KB 16.73% -9189073978895940412

[qihuang.zheng@192-168-47-227 ~]$ nodetool removenode force
RemovalStatus: Removing token (-9189073978895940412). Waiting for replication confirmation from [/192.168.48.161,/192.168.48.160,/192.168.47.228,/192.168.48.159,/192.168.47.227].
[qihuang.zheng@192-168-47-227 ~]$ nodetool status
-- Address Load Tokens Owns Host ID Rack
UN 192.168.48.161 1.16 TB 256 20.6% 54b3a7e0-f778-4087-98c3-ac84e56f77e6 RAC1
UN 192.168.48.160 474.38 GB 256 20.2% 18a87c56-dcba-4614-adba-804aa7761a06 RAC1
UN 192.168.47.228 6.1 TB 256 20.4% f3d26148-d2da-479c-ae9e-ae41aced1be9 RAC1
UN 192.168.48.159 1.22 TB 256 20.1% df9a693b-efc1-41bc-9a42-cf868ea75e65 RAC1
UN 192.168.47.227 1.35 TB 256 18.8% 02575631-6ccb-4803-81fd-5bf7a978726d RAC1

重新加入

228节点正在sstablelader,它实际上没有挂,但是其他节点确认为228挂了。 手动removenode后,其他节点认为228挂了。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
[admin@fp-cass048160 ~]$ /usr/install/cassandra/bin/nodetool status
-- Address Load Tokens Owns Host ID Rack
UN 192.168.48.227 2.89 TB 256 ? f6136233-08b3-4cad-bd37-57ab6dc4622b RAC1
UN 192.168.48.226 2.81 TB 256 ? 6f4416e1-4493-4909-8427-3738ae72fd82 RAC1
UN 192.168.48.176 6.12 TB 256 ? 08491b7a-8e9f-4b5e-b22b-2e73c455fd3f RAC1
UN 192.168.48.161 4.48 TB 256 ? 54b3a7e0-f778-4087-98c3-ac84e56f77e6 RAC1
UN 192.168.48.160 4.57 TB 256 ? 18a87c56-dcba-4614-adba-804aa7761a06 RAC1
UN 192.168.48.175 3.21 TB 256 ? 5bbd4400-2132-42b6-91c5-389592b75423 RAC1
UN 192.168.48.159 4.69 TB 256 ? df9a693b-efc1-41bc-9a42-cf868ea75e65 RAC1
DN 192.168.48.228 2.72 TB 256 ? dbf53446-35b6-4d18-80e2-396bc633924c RAC1
[admin@fp-cass048160 ~]$ nohup nodetool removenode dbf53446-35b6-4d18-80e2-396bc633924c &
[admin@fp-cass048160 ~]$ nodetool removenode force
[admin@fp-cass048160 ~]$ /usr/install/cassandra/bin/nodetool status
-- Address Load Tokens Owns Host ID Rack
UN 192.168.48.227 2.89 TB 256 ? f6136233-08b3-4cad-bd37-57ab6dc4622b RAC1
UN 192.168.48.226 2.81 TB 256 ? 6f4416e1-4493-4909-8427-3738ae72fd82 RAC1
UN 192.168.48.176 6.12 TB 256 ? 08491b7a-8e9f-4b5e-b22b-2e73c455fd3f RAC1
UN 192.168.48.161 4.48 TB 256 ? 54b3a7e0-f778-4087-98c3-ac84e56f77e6 RAC1
UN 192.168.48.160 4.57 TB 256 ? 18a87c56-dcba-4614-adba-804aa7761a06 RAC1
UN 192.168.48.175 3.21 TB 256 ? 5bbd4400-2132-42b6-91c5-389592b75423 RAC1
UN 192.168.48.159 4.69 TB 256 ? df9a693b-efc1-41bc-9a42-cf868ea75e65 RAC1

但是228节点认为其他节点都是好的。 如果尝试加入,会说已经在组中了。但是实际上它并不属于组了。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[admin@192-168-48-228 ~]$ /usr/install/cassandra/bin/nodetool status
-- Address Load Tokens Owns Host ID Rack
UN 192.168.48.227 2.88 TB 256 ? f6136233-08b3-4cad-bd37-57ab6dc4622b RAC1
UN 192.168.48.226 2.79 TB 256 ? 6f4416e1-4493-4909-8427-3738ae72fd82 RAC1
UN 192.168.48.176 6.1 TB 256 ? 08491b7a-8e9f-4b5e-b22b-2e73c455fd3f RAC1
UN 192.168.48.161 4.47 TB 256 ? 54b3a7e0-f778-4087-98c3-ac84e56f77e6 RAC1
UN 192.168.48.228 2.72 TB 256 ? dbf53446-35b6-4d18-80e2-396bc633924c RAC1
UN 192.168.48.160 4.55 TB 256 ? 18a87c56-dcba-4614-adba-804aa7761a06 RAC1
UN 192.168.48.175 3.2 TB 256 ? 5bbd4400-2132-42b6-91c5-389592b75423 RAC1
UN 192.168.48.159 4.67 TB 256 ? df9a693b-efc1-41bc-9a42-cf868ea75e65 RAC1

Note: Non-system keyspaces don't have the same replication settings, effective ownership information is meaningless
[admin@192-168-48-228 ~]$ /usr/install/cassandra/bin/nodetool join
nodetool: This node has already joined the ring.

网络状态netstats: 显示一个节点的Active Stream

正常没有stream:

1
2
3
4
5
6
7
8
9
10
[admin@cass047202 ~]$ nodetool netstats
Mode: NORMAL
Not sending any streams.
Read Repair Statistics:
Attempted: 749
Mismatch (Blocking): 4
Mismatch (Background): 13
Pool Name Active Pending Completed
Commands n/a 0 581681910
Responses n/a 0 519983825

stream分成几种类型:bulkload(即通过sstableloader), nodetool repair等

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
[admin@cass047202 ~]$ nodetool netstats
Mode: NORMAL
Bulk Load 839a5ef0-44cf-11e6-b6f3-dff3aadcb7ef
/127.0.0.1 (using /192.168.48.168)
Receiving 272 files, 4388357561 bytes total. Already received 2 files, 46586511 bytes total
/home/admin/cassandra/data/md5s/md5_id_0-e842c330440a11e69e5f2bcdca057dae/md5s-md5_id_0-tmp-ka-63-Data.db 16194058/16194058 bytes(100%) received from idx:0/127.0.0.1
/home/admin/cassandra/data/md5s/md5_id_0-e842c330440a11e69e5f2bcdca057dae/md5s-md5_id_0-tmp-ka-64-Data.db 16202304/16202304 bytes(100%) received from idx:0/127.0.0.1
/home/admin/cassandra/data/md5s/md5_id_0-e842c330440a11e69e5f2bcdca057dae/md5s-md5_id_0-tmp-ka-65-Data.db 14190149/16008570 bytes(88%) received from idx:0/127.0.0.1
Read Repair Statistics:
Attempted: 749
Mismatch (Blocking): 4
Mismatch (Background): 13
Pool Name Active Pending Completed
Commands n/a 0 581694648
Responses n/a 0 520000034

1.正在加入集群的节点JOINING:

1
2
3
4
5
6
7
8
9
[qihuang.zheng@cass047224 cassandra]$ /usr/install/cassandra/bin/nodetool netstats
Mode: JOINING
Bootstrap fcca1bf0-4a6e-11e5-8677-6da22c7e072c
/192.168.47.222
Receiving 929 files, 259726626064 bytes total
/home/admin/cassandra/data/forseti_fp/tcp_syn_data/forseti_fp-tcp_syn_data-tmp-jb-198-Data.db 120433605/120433605 bytes(100%) received from /192.168.47.222
/192.168.47.203
Receiving 976 files, 214785965744 bytes total
/home/admin/cassandra/data/forseti/ip_mobilephone/forseti-ip_mobilephone-tmp-jb-16-Data.db 30801577/30801577 bytes(100%) received from /192.168.47.203

2.离开集群的节点LEAVING:

1
2
3
4
5
[qihuang.zheng@cass047225 snapshots]$ nodetool netstats | head
Mode: LEAVING
Restore replica count dad915d0-7d6f-11e5-8681-9f6f9f8ad5ca
/192.168.47.221
Sending 243 files, 7877174416 bytes total

下线的节点除了发送数据给其他节点, 也会接收数据.

1
2
3
4
5
6
[qihuang.zheng@cass047225 ~]$ nodetool netstats | grep "files"
Sending 183 files, 2169787693 bytes total
Sending 223 files, 6492707205 bytes total
....
Receiving 242 files, 10438497913 bytes total
Receiving 279 files, 12407597530 bytes total

3.正常的节点NORMAL, 会发送文件给其他节点, 并从下线的节点接收文件:

1
2
3
4
5
6
7
8
9
[qihuang.zheng@spark047245 ~]$ /usr/install/cassandra/bin/nodetool netstats | head
Mode: NORMAL
Restore replica count db2a4310-7d6f-11e5-a4ef-8f719a2aece0
/192.168.47.205
Sending 309 files, 25090493062 bytes total

Unbootstrap 00ecfbb0-7ea6-11e5-9266-f38b27f65aa6
/192.168.47.225
Receiving 556 files, 58062261421 bytes total

节点信息: nodetool info

主要关注KeyCache, 容量为512M(key_cache_size_in_mb), 用了500M左右, 命中率为65%. 没有开启RowCache(row_cache_size_in_mb:0).
堆内存16G, 用了6G(执行命令的这一时刻,并不是说总是占用这么点), 堆外内存off-heap使用了1G. 异常50个.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
[qihuang.zheng@cass047202 ~]$ nodetool info
Token : (invoke with -T/--tokens to see all 256 tokens)
ID : abaa0cbc-09d3-4990-8698-ff4d2f2bb4f7
Gossip active : true
Thrift active : true
Native Transport active: true
Load : 618.81 GB
Generation No : 1445525962
Uptime (seconds) : 651191
Heap Memory (MB) : 6167.96 / 15974.44
Off Heap Memory (MB) : 1154.13
Data Center : DC1
Rack : RAC1
Exceptions : 50
Key Cache : size 520120120 (bytes), capacity 536870912 (bytes), 112077861 hits, 182128983 requests, 0.656 recent hit rate, 14400 save period in seconds
Row Cache : size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests, NaN recent hit rate, 0 save period in seconds

仅仅查看每个节点的KeyCache:

1
2
./pssh.sh ip_all.txt "/usr/install/cassandra/bin/nodetool info | tail -2 | head -1"
./pssh.sh ip_all.txt "/usr/install/cassandra/bin/nodetool info | sed -n '14p'"

Gossip信息: nodetool gossipinfo

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
nodetool gossipinfo
/192.168.47.205
STATUS:NORMAL,-1027890057681052346
/192.168.47.206
STATUS:NORMAL,-1004054309250591595
/192.168.47.202
STATUS:NORMAL,-1062467338696068910
/192.168.47.204
STATUS:NORMAL,-1130382709140432588
/192.168.47.224
STATUS:NORMAL,-1137590811836749748
/192.168.47.203
STATUS:NORMAL,-1024266383569750575
/192.168.47.222
STATUS:NORMAL,-105769922317262244
/192.168.47.225
STATUS:NORMAL,-1061781412716791842
/192.168.47.221
STATUS:NORMAL,-104095308743394398
/192.168.47.229
STATUS:LEFT,-9088453765955214068,1448966282585

表相关的信息:nodetool cfstats forseti.velocity

1.查看表的信息, 前面四行是当前的读写延迟和读写次数.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
[qihuang.zheng@cass047202 cassandra]$ nodetool cfstats forseti.velocity
Keyspace: forseti
Read Count: 10470099
Read Latency: 1.3186399419909973 ms.
Write Count: 146970362
Write Latency: 0.06062576270989929 ms.
Pending Tasks: 0
Table: velocity
SSTable count: 2144
SSTables in each level: [1, 10, 96, 723, 1314, 0, 0, 0, 0]
Space used (live), bytes: 509031385679
Space used (total), bytes: 523815500936
Off heap memory used (total), bytes: 558210701
SSTable Compression Ratio: 0.23635049381008288
Number of keys (estimate): 269787648
Memtable cell count: 271431
Memtable data size, bytes: 141953019
Memtable switch count: 1713
Local read count: 10470099
Local read latency: 1.266 ms
Local write count: 146970371
Local write latency: 0.053 ms
Pending tasks: 0
Bloom filter false positives: 534721
Bloom filter false ratio: 0.13542
Bloom filter space used, bytes: 180529808
Bloom filter off heap memory used, bytes: 180512656
Index summary off heap memory used, bytes: 118613037
Compression metadata off heap memory used, bytes: 259085008
Compacted partition minimum bytes: 104
Compacted partition maximum bytes: 190420296972
Compacted partition mean bytes: 8656
Average live cells per slice (last five minutes): 0.0
Average tombstones per slice (last five minutes): 0.0

[qihuang.zheng@spark047211 ~]$ nodetool -h 192.168.48.159 cfstats forseti_fp.android_device_session
Keyspace: forseti_fp
Read Count: 3436820
Read Latency: 0.7271564521854504 ms.
Write Count: 1242325989
Write Latency: 0.01608114556074058 ms.
Pending Flushes: 0
Table: android_device_session
SSTable count: 14
Space used (live): 3315056965329
Space used (total): 3315312862453
Space used by snapshots (total): 0
Off heap memory used (total): 1813094460
SSTable Compression Ratio: 0.37206192803944754
Number of keys (estimate): 954623103 -> 9亿
Memtable cell count: 92654
Memtable data size: 104729033
Memtable off heap memory used: 0
Memtable switch count: 13386
Local read count: 3436820
Local read latency: 0.728 ms
Local write count: 1242326281
Local write latency: 0.017 ms
Pending flushes: 0
Bloom filter false positives: 15
Bloom filter false ratio: 0.00000
Bloom filter space used: 607412928
Bloom filter off heap memory used: 607412816
Index summary off heap memory used: 138144620
Compression metadata off heap memory used: 1067537024
Compacted partition minimum bytes: 125
Compacted partition maximum bytes: 4866323
Compacted partition mean bytes: 10051
Average live cells per slice (last five minutes): 0.10287086386072478
Maximum live cells per slice (last five minutes): 7.0
Average tombstones per slice (last five minutes): 0.14188850295078587
Maximum tombstones per slice (last five minutes): 2164.0

[qihuang.zheng@spark047211 ~]$ nodetool -h 192.168.48.162 cfstats forseti.velocity_app
Keyspace: forseti
Read Count: 120443046
Read Latency: 1.1013535886995087 ms.
Write Count: 2058015125
Write Latency: 0.016679299918653415 ms.
Pending Tasks: 0
Table: velocity_app
SSTable count: 24
Space used (live), bytes: 2670044515793
Space used (total), bytes: 2670239936283
Off heap memory used (total), bytes: 2958340937
SSTable Compression Ratio: 0.28255581535232427
Number of keys (estimate): 1976325248 (19亿)
Memtable cell count: 613404
Memtable data size, bytes: 233170920
Memtable switch count: 4214
Local read count: 120443053
Local read latency: 0.931 ms
Local write count: 2058015151
Local write latency: 0.015 ms
Pending tasks: 0
Bloom filter false positives: 45097
Bloom filter false ratio: 0.00732
Bloom filter space used, bytes: 1230425688
Bloom filter off heap memory used, bytes: 1230425496
Index summary off heap memory used, bytes: 578011001
Compression metadata off heap memory used, bytes: 1149904440
Compacted partition minimum bytes: 150
Compacted partition maximum bytes: 158(GB),683(MB),580(KB),810
Compacted partition mean bytes: 5225

[qihuang.zheng@spark047211 ~]$ nodetool -h 192.168.48.162 cfstats forseti.velocity_partner
Keyspace: forseti
Read Count: 3664048
Read Latency: 1.265197941730021 ms.
Write Count: 1963754310
Write Latency: 0.017543929064120042 ms.
Pending Tasks: 0
Table: velocity_partner
SSTable count: 5798
SSTables in each level: [1, 11/10, 90, 881, 4813, 0, 0, 0, 0]
Space used (live), bytes: 1237251258441
Space used (total), bytes: 1240223108780
Off heap memory used (total), bytes: 1220821758
SSTable Compression Ratio: 0.2518164031088747
Number of keys (estimate): 821395584 8亿
Memtable cell count: 775752
Memtable data size, bytes: 265533044
Memtable switch count: 4217
Local read count: 3664048
Local read latency: 1.200 ms
Local write count: 1963754352
Local write latency: 0.017 ms
Pending tasks: 0
Bloom filter false positives: 599
Bloom filter false ratio: 0.00000
Bloom filter space used, bytes: 503778320
Bloom filter off heap memory used, bytes: 503731936
Index summary off heap memory used, bytes: 144796454
Compression metadata off heap memory used, bytes: 572293368
Compacted partition minimum bytes: 259
Compacted partition maximum bytes: 91,830,775,932
Compacted partition mean bytes: 6265
Average live cells per slice (last five minutes): 1.0
Average tombstones per slice (last five minutes): 0.0

[qihuang.zheng@spark047211 ~]$ nodetool -h 192.168.48.162 cfstats forseti.velocity_global
Keyspace: forseti
Read Count: 27019937
Read Latency: 0.8387888214173111 ms.
Write Count: 2057412727
Write Latency: 0.01946266525841317 ms.
Pending Tasks: 0
Table: velocity_global
SSTable count: 5468
SSTables in each level: [7/4, 12/10, 77, 703, 4667, 0, 0, 0, 0]
Space used (live), bytes: 1476150066588
Space used (total), bytes: 1477238480867
Off heap memory used (total), bytes: 1092884433
SSTable Compression Ratio: 0.24933201816062378
Number of keys (estimate): 558798080 5亿
Memtable cell count: 120625
Memtable data size, bytes: 38130612
Memtable switch count: 4250
Local read count: 27019937
Local read latency: 1.128 ms
Local write count: 2057412797
Local write latency: 0.016 ms
Pending tasks: 0
Bloom filter false positives: 1700
Bloom filter false ratio: 0.00000
Bloom filter space used, bytes: 319642480
Bloom filter off heap memory used, bytes: 319598736
Index summary off heap memory used, bytes: 100862649
Compression metadata off heap memory used, bytes: 672423048
Compacted partition minimum bytes: 311
Compacted partition maximum bytes: 568G,591M,960K,032
Compacted partition mean bytes: 10777
Average live cells per slice (last five minutes): 0.0
Average tombstones per slice (last five minutes): 0.0

velocity在当前节点的keys有2.6亿. 其中最大的partition有190G! 说明存在一些很宽的行wide rows. 平均每个partition的大小是104Byte.

http://docs.datastax.com/en/cql/3.1/cql/cql_reference/compactSubprop.html

sstable_size_in_mb默认是160MB. The target size for SSTables that use the leveled compaction strategy.
Although SSTable sizes should be less or equal to sstable_size_in_mb, 尽管SSTable的大小应该比默认的160M要小或相等.
it is possible to have a larger SSTable during compaction. 在Compaction过程中,可能产生更大的SSTable.
This occurs when data for a given partition key is exceptionally large. 当某个分区键的数据非常大时,
The data is not split into two SSTables. 分区很大的键在compact时不会拆分到两个SSTable文件中(因为partition key只会在一个SSTable中).

2.线程池状态: tpstatus可以查看的内容和system.log中打印的StatsLogger差不多.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
[qihuang.zheng@cass047202 cassandra]$ nodetool tpstats
Pool Name Active Pending Completed Blocked All time blocked
ReadStage 2 0 10960442 0 0
RequestResponseStage 0 0 447942623 0 0
MutationStage 0 0 239315421 0 0
ReadRepairStage 0 0 1124868 0 0
ReplicateOnWriteStage 0 0 0 0 0
GossipStage 0 0 1477282 0 0
CacheCleanupExecutor 0 0 0 0 0
MigrationStage 0 0 46 0 0
MemoryMeter 0 0 10694 0 0
FlushWriter 0 0 11360 0 511
ValidationExecutor 0 0 0 0 0
InternalResponseStage 0 0 0 0 0
AntiEntropyStage 0 0 0 0 0
MemtablePostFlusher 0 0 16487 0 0
MiscStage 0 0 0 0 0
PendingRangeCalculator 0 0 39 0 0
CompactionExecutor 3 3 47113 0 0
commitlog_archiver 0 0 0 0 0
HintedHandoff 0 1 867 0 0

Message type Dropped
RANGE_SLICE 0
READ_REPAIR 0
PAGED_RANGE 0
BINARY 0
READ 0
MUTATION 0
_TRACE 0
REQUEST_RESPONSE 2
COUNTER_MUTATION 0

比如正在运行的CompactionExecutor有3个, 用compactionstats可以查看正在运行Compaction的表:

1
2
3
4
5
6
7
[qihuang.zheng@cass047202 cassandra]$ nodetool compactionstats
pending tasks: 4
compaction type keyspace table completed total unit progress
Compaction forseti velocity 9600330302 12886138624 bytes 74.50%
Compaction forseti device_account 64712962 1923986398 bytes 3.36%
Tombstone Compaction forseti velocity 95382333 170991248015 bytes 0.06%
Active compaction remaining time : 0h00m38s

compactionhistory

https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsCompactionHistory.html

最后一列表示rows_merged:{tables:rows}. For example: {1:3, 3:1} means 3 rows were taken from one SSTable (1:3)
and 1 row taken from 3 SSTables (3:1) to make the one SSTable in that compaction operation.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
nodetool compactionhistory | grep velocity | sort -r -k 4 > compactionhistory.log
nodetool compactionhistory | grep velocity | sort -r -k 4 | awk '{for(i=4;i<=NF;i++) printf"%s\t",$i} {print ""}' | head
nodetool compactionhistory | grep velocity | sort -r -k 4 | awk '{if(NF>9) print $0}'

3cb01450-e5d7-11e5-8fe9-2bcdca057dae forseti velocity 1457514830357 985256393 869306399 {1:325289, 2:11651}
ff100470-e5d6-11e5-8fe9-2bcdca057dae forseti velocity 1457514726967 161269131 159732166 {1:207378, 2:9918}
bafef020-e5d6-11e5-8fe9-2bcdca057dae forseti velocity 1457514612770 1171434739 1167942768 {1:811453, 2:28598}
b7c08e00-e5d6-11e5-8fe9-2bcdca057dae forseti velocity 1457514607328 144946713 143529681 {1:187320, 2:10070}
6f66d380-e5d6-11e5-8fe9-2bcdca057dae forseti velocity 1457514485944 128493647 127148807 {1:167405, 2:9312}
2be5ca30-e5d6-11e5-8fe9-2bcdca057dae forseti velocity 1457514372691 112061194 111124311 {1:151315, 2:6405}
2a335f40-e5d6-11e5-8fe9-2bcdca057dae forseti velocity 1457514369844 2078394794 2073816751 {1:1298699, 2:39956}
04070880-e5d6-11e5-8fe9-2bcdca057dae forseti velocity 1457514305799 103532684 102206816 {1:137231, 2:9020}
bb648530-e5d5-11e5-8fe9-2bcdca057dae forseti velocity 1457514183938 86832984 85571719 {1:115214, 2:8749}
750254f0-e5d5-11e5-8fe9-2bcdca057dae forseti velocity 1457514065855 69678994 68516377 {1:93817, 2:7968}
3b9a2260-e5d5-11e5-8fe9-2bcdca057dae forseti velocity 1457513969542 53715409 51918781 {1:69930, 2:6969, 3:2522}

history的信息解释

1
2
3
4
5
6
7
8
9
10
11
12
13
14
id                                     ks           cf                  compactedAt       bytes_in        bytes_out       rows_compacted
3cb01450-e5d7-11e5-8fe9-2bcdca057dae forseti velocity 1457514830357 985,256,393 869,306,399 {1:325289, 2:11651}

985MB,一个文件的有325289条, 读取两个文件的有11651

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx model_result credit_id_labels xxxxxxxxxxxxx 7,166,010 7,166,549 {1:8954}

7MB,一个文件的有8954条,平均每条数据的大小是7,166/8964=0.8KB

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx model_result credit_id_labels_v2 xxxxxxxxxxxxx 65,738,206 65,799,090 {1:113483}

65MB,一个文件的有113483条,每条大小=65738206/113483=579Bytes

这两张表的不同点是:credit_id_labels有多个字段,而credit_id_labels_v2将所有字段用json表示

system.log中对于每次Compaction操作都有记录本次合并了多少个文件:

1
2
3
4
5
6
7
8
9
10
INFO [CompactionExecutor:48884] 2016-03-11 08:08:34,162 CompactionTask.java (line 120) Compacting [
SSTableReader(path='/home/admin/cassandra/data/system/compactions_in_progress/system-compactions_in_progress-jb-947338-Data.db'),
SSTableReader(path='/home/admin/cassandra/data/system/compactions_in_progress/system-compactions_in_progress-jb-947335-Data.db'),
SSTableReader(path='/home/admin/cassandra/data/system/compactions_in_progress/system-compactions_in_progress-jb-947334-Data.db'),
SSTableReader(path='/home/admin/cassandra/data/system/compactions_in_progress/system-compactions_in_progress-jb-947336-Data.db'),
SSTableReader(path='/home/admin/cassandra/data/system/compactions_in_progress/system-compactions_in_progress-jb-947337-Data.db')]
INFO [CompactionExecutor:48884] 2016-03-11 08:08:34,181 CompactionTask.java (line 299) Compacted 5 sstables to [
/home/admin/cassandra/data/system/compactions_in_progress/system-compactions_in_progress-jb-947339,].
1,803 bytes to 1,044 (~57% of original) in 18ms = 0.055313MB/s.
12 total partitions merged to 8. Partition merge counts were {1:8, 2:2, }

1:8表示:有8行从一个sstables中获取
2:2表示:有2行从两个sstables中获取

疑问:partitions数量从12个被合并为8个,为什么partition会减少呢?
totalSourceRows=12, totalkeysWritten=8
rows和keys的概念是不同的。 相同key可以有多个rows。
所以12条记录中,会有4条记录的row-key和8条中的相同。最后的keys数量也就只有4个。

节点sstable数量异常

某个节点读延迟异常高,700ms

fp_readrt_high

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
[qihuang.zheng@fp-cass048160 ~]$ nodetool cfstats forseti_fp.android_device_session
Keyspace: forseti_fp
Read Count: 608718
Read Latency: 414.3243900525366 ms.
Write Count: 594753618
Write Latency: 0.025195973536725928 ms.
Pending Tasks: 0
Table: android_device_session
SSTable count: 32459
Space used (live), bytes: 2788804636549
Space used (total), bytes: 2795041174866
Off heap memory used (total), bytes: 2909029224
SSTable Compression Ratio: 0.3635126499232498
Number of keys (estimate): 1299672448
Memtable cell count: 30494
Memtable data size, bytes: 109313490
Memtable switch count: 19297
Local read count: 608718
Local read latency: 866.253 ms
Local write count: 594753660
Local write latency: 0.024 ms

同一个集群其他节点的读延迟很少,sstable数量不超过20个。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
[qihuang.zheng@fp-cass048159 ~]$ nodetool cfstats forseti_fp.android_device_session
Keyspace: forseti_fp
Read Count: 266951
Read Latency: 0.9562793471461055 ms.
Write Count: 799860155
Write Latency: 0.024167126153446163 ms.
Pending Tasks: 0
Table: android_device_session
SSTable count: 16
Space used (live), bytes: 4644725077858
Space used (total), bytes: 4659648681740
Off heap memory used (total), bytes: 2942472988
SSTable Compression Ratio: 0.3659509172064071
Number of keys (estimate): 1344413568
Memtable cell count: 30468
Memtable data size, bytes: 107505453
Memtable switch count: 25428
Local read count: 266951
Local read latency: 0.774 ms
Local write count: 799860178
Local write latency: 0.022 ms
1
2
3
4
5
6
7
8
9
10
11
12
13
[qihuang.zheng@cass048169 ~]$ nodetool status |grep RAC|awk '{print $2}' | while read ip; do echo $ip; nodetool -h $ip cfstats forseti.velocity_app | grep "SSTable count"; done
192.168.48.163
SSTable count: 42
192.168.48.162
SSTable count: 34
192.168.48.174
SSTable count: 26
192.168.48.173
SSTable count: 26
192.168.48.171
SSTable count: 30
192.168.48.169
SSTable count: 2620

磁盘空间和status的负载不同,差距不是一般大:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
[qihuang.zheng@cass048169 ~]$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda3 271G 8.3G 249G 4% /
tmpfs 16G 0 16G 0% /dev/shm
/dev/sda1 190M 30M 151M 17% /boot
/dev/sdb1 11T 8.3T 1.6T 85% /home

[qihuang.zheng@cass048169 ~]$ nodetool status
Note: Ownership information does not include topology; for complete information, specify a keyspace
-- Address Load Tokens Owns Host ID Rack
UN 192.168.48.163 2.34 TB 256 15.9% 003a7621-a62b-4c29-83ac-6724d2d749ab RAC1
UN 192.168.48.162 2.11 TB 256 15.3% 9d507979-f309-4a8b-98e2-92385701dcfe RAC1
UN 192.168.48.174 2.35 TB 256 17.9% bc66ae3b-d694-4286-9c89-09ea46ea740d RAC1
UN 192.168.48.173 2.39 TB 256 17.4% 6de0bf8b-620b-4b47-b47b-1d8ecb82f20d RAC1
UN 192.168.48.171 2.24 TB 256 15.6% d9b4f2e3-54a4-47d6-95e1-55a6f9734e79 RAC1
UN 192.168.48.169 2.63 TB 256 17.9% 2a80470e-1dc4-4734-a268-946886fe25b7 RAC1

[qihuang.zheng@cass048169 ~]$ du -sh /home/admin/cassandra/data/forseti/*
3.4T /home/admin/cassandra/data/forseti/velocity_app
2.5T /home/admin/cassandra/data/forseti/velocity_global
2.5T /home/admin/cassandra/data/forseti/velocity_partner
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
[qihuang.zheng@cass048169 ~]$ nodetool compactionstats
pending tasks: 4117
compaction type keyspace table completed total unit progress
Compaction forsetivelocity_partner 454792709 608062664 bytes 74.79%
Compaction forseti velocity_app 5711586451164 6322585431145 bytes 90.34%
Compaction forseti velocity_global 11137129414 79062862183 bytes 14.09%
Compaction forsetivelocity_partner 3724946719 3887650106 bytes 95.81%
Compaction forseti velocity_app 145281458 167205065 bytes 86.89%
Compaction forsetivelocity_partner 112338305659 774832769029 bytes 14.50%
Compaction forseti velocity_global 24832092819 41353336206 bytes 60.05%
Compaction forseti velocity_global 4981924282 6189383778 bytes 80.49%
Active compaction remaining time : 2h48m48s

[qihuang.zheng@fp-cass048162 velocity_app]$ nodetool compactionstats
pending tasks: 3082
compaction type keyspace table completed total unit progress
Compaction forseti velocity_global 11519537972 58891073447 bytes 19.56%
Compaction forsetivelocity_partner 519435163 900884457 bytes 57.66%
Compaction forsetivelocity_partner 218779482632 268314967832 bytes 81.54%
Compaction forsetivelocity_partner 53136709412 77669440754 bytes 68.41%
Compaction forsetivelocity_partner 96402475 225578061 bytes 42.74%
Compaction forsetivelocity_partner 13945308 205008051 bytes 6.80%
Compaction forseti velocity_app 92188611730 284042451190 bytes 32.46%
Compaction forsetivelocity_partner 205665564 700947805 bytes 29.34%
Active compaction remaining time : 0h39m03s

[qihuang.zheng@cass047202 ~]$ nodetool compactionstats
pending tasks: 1
compaction type keyspace table completed total unit progress
Compaction forseti velocity 4094787925 6749331650 bytes 60.67%
Active compaction remaining time : 0h00m19s

tpstats

正常情况下不应该出现有Dropped的,但是在Driver报错:All host(s) tried for query failed,… connection has been closed.
是不是连接数过多,直接拒绝?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
[admin@mysql047012 ~]$ /usr/install/cassandra/bin/nodetool tpstats
Pool Name Active Pending Completed Blocked All time blocked
CounterMutationStage 0 0 0 0 0
ReadStage 0 0 4302492 0 0
RequestResponseStage 0 0 0 0 0
MutationStage 15 0 44176401 0 0
ReadRepairStage 0 0 0 0 0
GossipStage 0 0 0 0 0
CacheCleanupExecutor 0 0 0 0 0
AntiEntropyStage 0 0 0 0 0
MigrationStage 0 0 0 0 0
Sampler 0 0 0 0 0
ValidationExecutor 0 0 0 0 0
CommitLogArchiver 0 0 0 0 0
MiscStage 0 0 0 0 0
MemtableFlushWriter 0 0 920 0 0
MemtableReclaimMemory 0 0 925 0 0
PendingRangeCalculator 0 0 1 0 0
MemtablePostFlush 0 0 5245 0 0
CompactionExecutor 1 1 328827 0 0
InternalResponseStage 0 0 0 0 0
HintedHandoff 0 0 0 0 0
Native-Transport-Requests 19 0 13670768 0 6885

Message type Dropped
RANGE_SLICE 0
READ_REPAIR 0
PAGED_RANGE 0
BINARY 0
READ 0
MUTATION 1808
_TRACE 0
REQUEST_RESPONSE 0
COUNTER_MUTATION 0
1
2
3
4
5
6
7
8
9
10
INFO  [SharedPool-Worker-40] 2016-06-23 17:38:19,848 Message.java:532 - Unexpected exception during request; channel = [id: 0x04539b0a, /192.168.47.34:64733 :> /192.168.47.12:9042]
java.io.IOException: Error while read(...): Connection reset by peer
at io.netty.channel.epoll.Native.readAddress(Native Method) ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
at io.netty.channel.epoll.EpollSocketChannel$EpollSocketUnsafe.doReadBytes(EpollSocketChannel.java:675) ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
at io.netty.channel.epoll.EpollSocketChannel$EpollSocketUnsafe.epollInReady(EpollSocketChannel.java:714) ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:326) ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:264) ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137) ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]

sstable writer

Loads newly placed SSTables onto the system without a restart.

http://www.planetcassandra.org/blog/bulk-loading-options-for-cassandra/

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
小批量测试
cp /home/admin/md5id_20160601/data/md5_id/data-md5_id-ka-2165*
cd ~/cassandra/data/md5/md5_id-f88d3930345811e694a62bcdca057dae/
rename data md5 *
nodetool -h 192.168.47.202 refresh md5 md5_id

#重命名所有文件
cd /home/admin/md5id_20160601/data/md5_id/
rename data md5 *

#找出所有不同编号
for f in $( ls | cut -d'-' -f-4 | uniq | head -5 ); do
echo $f
done

#一次多个,但是如何循环所有,直到文件夹都处理完毕? --》使用while循环判断文件夹下存在文件时处理
ls | cut -d'-' -f-4 | uniq | head -5 | while read f; do #分组,一次可以多个数据文件,而一个数据文件总共包含8个相关组件
mv $f-* ~/cassandra/data/md5/md5_id-f88d3930345811e694a62bcdca057dae/ #移动到cassandra的目标位置
done
nodetool -h 192.168.47.202 refresh md5 md5_id #刷新sstable

#模拟把分多次把一个文件夹test的文件搬到另一个文件夹test1
rm -rf test test1 && mkdir test && cd test && touch 10 11 12 13 14 15 20 21 22 23 30 31 32 && cd ~/ && mkdir test1

cd test
flag="BEGIN"
while [ ! -d test ]; do
ls | head -2 | while read f; do
mv $f ~/test1
done
#确保只需要执行一次, test的文件都被移动到test1后,test下没有文件,判断依据是:ls -A为空
#但是如果没有加任何条件,while循环会无条件判断,因为while中是判断存在文件夹
if [[ "`ls -A ~/test`" = "" && $flag != "OVER" ]]; then
flag="OVER"
echo "DONE"
fi
done

#可以把判断放在while循环里!--GOOD
cd test
flag="BEGIN"
while [[ ! -d test && $flag != "OVER" ]]; do
ls | head -2 | while read f; do
mv $f ~/test1
done
if [ "`ls -A ~/test`" = "" ]; then
flag="OVER"
fi
done
echo "OVER..."

#当然可以把if的所有条件都放到while里
cd test
while [[ ! -d test && "`ls -A ~/test`" != "" ]]; do
ls | head -2 | while read f; do
mv $f ~/test1
done
done
echo "OVER..."

##最终脚本
startT=$(date +%s)
cd /home/admin/md5id_20160601/data/md5_id
while [[ ! -d /home/admin/md5id_20160601/data/md5_id && "`ls -A /home/admin/md5id_20160601/data/md5_id`" != "" ]]; do
echo "......"
start=$(date +%s)
ls | cut -d'-' -f-4 | uniq | head -5 | while read f; do
cfile="$f-*"
echo $cfile
mv $cfile /home/admin/cassandra/data/md5/md5_id-f88d3930345811e694a62bcdca057dae/
done
nodetool -h 192.168.47.202 refresh md5 md5_id
end=$(date +%s)
time=$(( $end - $start ))
echo "耗时:$time"
done
echo "OVER..."
endT=$(date +%s)
timeT=$(( $endT - $startT ))
echo "Total耗时:$timeT"

ONLY ONE SHELL: IF RUN TOO LONG, JUST CTRL+z, bg, jobs:

1
2
cd /home/admin/md5id_20160601/data/md5_id
ls | cut -d'-' -f-4 | uniq | while read f; do echo "batch $f"; mv $f-* /home/admin/cassandra/data/md5/md5_id-f88d3930345811e694a62bcdca057dae/; nodetool -h 192.168.47.202 refresh md5 md5_id; done

文件数过多

大批量同时进行,报错生成hs_err_pid9998.log

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 350224384 bytes for committing reserved memory.
# Possible reasons:
# The system is out of physical RAM or swap space
# In 32 bit mode, the process size limit was hit
# Possible solutions:
# Reduce memory load on the system
# Increase physical memory or swap space
# Check if swap backing store is full
# Use 64 bit Java on a 64 bit OS
# Decrease Java heap size (-Xmx/-Xms)
# Decrease number of Java threads
# Decrease Java thread stack sizes (-Xss)
# Set larger code cache with -XX:ReservedCodeCacheSize=
# This output file may be truncated or incomplete.
#
# Out of Memory Error (os_linux.cpp:2726), pid=9998, tid=140593796617984
#
# JRE version: (7.0_51-b13) (build )
# Java VM: Java HotSpot(TM) 64-Bit Server VM (24.51-b03 mixed mode linux-amd64 compressed oops)
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#

--------------- T H R E A D ---------------

Current thread (0x00007fde84009800): JavaThread "Unknown thread" [_thread_in_vm, id=10000, stack(0x00007fde8b3e1000,0x00007fde8b4e2000)]

Stack: [0x00007fde8b3e1000,0x00007fde8b4e2000], sp=0x00007fde8b4e01a0, free space=1020k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V [libjvm.so+0x992f4a] VMError::report_and_die()+0x2ea
V [libjvm.so+0x4931ab] report_vm_out_of_memory(char const*, int, unsigned long, char const*)+0x9b
V [libjvm.so+0x81338e] os::Linux::commit_memory_impl(char*, unsigned long, bool)+0xfe
V [libjvm.so+0x81383f] os::Linux::commit_memory_impl(char*, unsigned long, unsigned long, bool)+0x4f
V [libjvm.so+0x813a2c] os::pd_commit_memory(char*, unsigned long, unsigned long, bool)+0xc
V [libjvm.so+0x80daea] os::commit_memory(char*, unsigned long, unsigned long, bool)+0x2a
V [libjvm.so+0x87fcd3] PSVirtualSpace::expand_by(unsigned long)+0x53
V [libjvm.so+0x86eaf3] PSOldGen::initialize(ReservedSpace, unsigned long, char const*, int)+0x103
V [libjvm.so+0x299043] AdjoiningGenerations::AdjoiningGenerations(ReservedSpace, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long)+0x3e3
V [libjvm.so+0x8341e0] ParallelScavengeHeap::initialize()+0x550
V [libjvm.so+0x9664ca] Universe::initialize_heap()+0xca
V [libjvm.so+0x967699] universe_init()+0x79
V [libjvm.so+0x5a9625] init_globals()+0x65
V [libjvm.so+0x94ef8d] Threads::create_vm(JavaVMInitArgs*, bool*)+0x1ed
V [libjvm.so+0x6307e4] JNI_CreateJavaVM+0x74
C [libjli.so+0x2f8e] JavaMain+0x9e

因为内存不足,直接挂掉,日志文件中并不会有报错信息!

节点挂掉后,重启报错:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
ERROR 01:47:33 Exception encountered during startup
org.apache.cassandra.io.FSReadError: java.lang.NullPointerException
at org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeftovers(ColumnFamilyStore.java:668) ~[apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:308) [apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:564) [apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:653) [apache-cassandra-2.1.13.jar:2.1.13]
Caused by: java.lang.NullPointerException: null
at org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeftovers(ColumnFamilyStore.java:660) ~[apache-cassandra-2.1.13.jar:2.1.13]
... 3 common frames omitted
FSReadError in Failed to remove unfinished compaction leftovers (file: /home/admin/cassandra/data/md5/md5_id-f88d3930345811e694a62bcdca057dae/md5-md5_id-ka-1051-Statistics.db). See log for details.
at org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeftovers(ColumnFamilyStore.java:668)
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:308)
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:564)
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:653)
Caused by: java.lang.NullPointerException
at org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeftovers(ColumnFamilyStore.java:660)
... 3 more
Exception encountered during startup: java.lang.NullPointerException

Polling page always armed
Deoptimize 4
GenCollectForAllocation 1
CMS_Initial_Mark 1
CMS_Final_Remark 1
EnableBiasedLocking 1
RevokeBias 77
BulkRevokeBias 5
Exit 1
15 VM operations coalesced during safepoint
Maximum sync time 6133 ms
Maximum vm operation time (except for Exit VM operation) 521 ms

正常来说,一个Data文件附属多个其他组件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
正常的文件
[admin@cass047202 md5_id-f88d3930345811e694a62bcdca057dae]$ ll md5-md5_id-ka-996-*
-rw-rw-r--. 1 admin admin 66859 6月 2 23:06 md5-md5_id-ka-996-CompressionInfo.db
-rw-rw-r--. 1 admin admin 298635289 6月 2 23:06 md5-md5_id-ka-996-Data.db
-rw-rw-r--. 1 admin admin 10 6月 2 23:06 md5-md5_id-ka-996-Digest.sha1
-rw-rw-r--. 1 admin admin 16 6月 2 23:06 md5-md5_id-ka-996-Filter.db
-rw-rw-r--. 1 admin admin 242118240 6月 2 23:06 md5-md5_id-ka-996-Index.db
-rw-rw-r--. 1 admin admin 9895 6月 17 17:53 md5-md5_id-ka-996-Statistics.db
-rw-r--r--. 1 admin admin 113248 6月 16 16:23 md5-md5_id-ka-996-Summary.db
-rw-rw-r--. 1 admin admin 91 6月 2 23:06 md5-md5_id-ka-996-TOC.txt

但是报错的这个文件只有三个,没有statics等。。。
[admin@cass047202 md5_id-f88d3930345811e694a62bcdca057dae]$ ll md5-md5_id-ka-1051*
-rw-rw-r--. 1 admin admin 282M 6月 1 17:56 /home/admin/cassandra/data/md5/md5_id-f88d3930345811e694a62bcdca057dae/md5-md5_id-ka-1051-Data.db
-rw-rw-r--. 1 admin admin 231M 6月 1 17:56 /home/admin/cassandra/data/md5/md5_id-f88d3930345811e694a62bcdca057dae/md5-md5_id-ka-1051-Index.db
-rw-rw-r--. 1 admin admin 1.8M 6月 17 17:55 /home/admin/cassandra/data/md5/md5_id-f88d3930345811e694a62bcdca057dae/md5-md5_id-ka-1051-Summary.db

tmp临时文件中也没有
[admin@cass047202 md5_id-f88d3930345811e694a62bcdca057dae]$ ll | grep tmp
-rw-rw-r--. 2 admin admin 9967157248 6月 17 18:54 md5-md5_id-tmp-ka-354-Data.db
-rw-rw-r--. 2 admin admin 7710769152 6月 17 18:54 md5-md5_id-tmp-ka-354-Index.db
-rw-rw-r--. 2 admin admin 1224817304 6月 17 10:01 md5-md5_id-tmp-ka-4329-Data.db
-rw-rw-r--. 2 admin admin 950730752 6月 17 10:01 md5-md5_id-tmp-ka-4329-Index.db
-rw-rw-r--. 1 admin admin 0 6月 17 18:54 md5-md5_id-tmp-ka-48053-Data.db
-rw-rw-r--. 1 admin admin 0 6月 17 18:54 md5-md5_id-tmp-ka-48053-Index.db
-rw-rw-r--. 2 admin admin 9049241296 6月 17 18:54 md5-md5_id-tmp-ka-526-Data.db
-rw-rw-r--. 2 admin admin 6999441408 6月 17 18:54 md5-md5_id-tmp-ka-526-Index.db
-rw-rw-r--. 2 admin admin 9967157248 6月 17 18:54 md5-md5_id-tmplink-ka-354-Data.db
-rw-rw-r--. 2 admin admin 7710769152 6月 17 18:54 md5-md5_id-tmplink-ka-354-Index.db
-rw-rw-r--. 2 admin admin 1224817304 6月 17 10:01 md5-md5_id-tmplink-ka-4329-Data.db
-rw-rw-r--. 2 admin admin 950730752 6月 17 10:01 md5-md5_id-tmplink-ka-4329-Index.db
-rw-rw-r--. 2 admin admin 9049241296 6月 17 18:54 md5-md5_id-tmplink-ka-526-Data.db
-rw-rw-r--. 2 admin admin 6999441408 6月 17 18:54 md5-md5_id-tmplink-ka-526-Index.db

删除这些文件后, 重启没有问题,但是查询时报错:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
WARN  [SharedPool-Worker-52] 2016-06-19 10:23:30,376 AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread Thread[SharedPool-Worker-52,5,main]: {}
java.lang.RuntimeException: java.lang.RuntimeException: java.io.FileNotFoundException: /home/admin/cassandra/data/md5/md5_id-f88d3930345811e694a62bcdca057dae/md5-md5_id-ka-139-Data.db (打开的文件过>多)
at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2244) ~[apache-cassandra-2.1.13.jar:2.1.13]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_51]
at org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) ~[apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) [apache-cassandra-2.1.13.jar:2.1.13]
at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]
Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: /home/admin/cassandra/data/md5/md5_id-f88d3930345811e694a62bcdca057dae/md5-md5_id-ka-139-Data.db (打开的文件过多)
at org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:52) ~[apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.io.util.CompressedPoolingSegmentedFile.createReader(CompressedPoolingSegmentedFile.java:85) ~[apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.io.sstable.SSTableReader.openDataReader(SSTableReader.java:2075) ~[apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.io.sstable.SSTableScanner.<init>(SSTableScanner.java:84) ~[apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.io.sstable.SSTableScanner.getScanner(SSTableScanner.java:63) ~[apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.io.sstable.SSTableReader.getScanner(SSTableReader.java:1859) ~[apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.db.RowIteratorFactory.getIterator(RowIteratorFactory.java:67) ~[apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.db.ColumnFamilyStore.getSequentialIterator(ColumnFamilyStore.java:2074) ~[apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:2191) ~[apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.db.RangeSliceCommand.executeLocally(RangeSliceCommand.java:132) ~[apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.service.StorageProxy$LocalRangeSliceRunnable.runMayThrow(StorageProxy.java:1567) ~[apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2241) ~[apache-cassandra-2.1.13.jar:2.1.13]
... 4 common frames omitted
Caused by: java.io.FileNotFoundException: /home/admin/cassandra/data/md5/md5_id-f88d3930345811e694a62bcdca057dae/md5-md5_id-ka-139-Data.db (打开的文件过多)
at java.io.RandomAccessFile.open(Native Method) ~[na:1.7.0_51]
at java.io.RandomAccessFile.<init>(RandomAccessFile.java:241) ~[na:1.7.0_51]
at org.apache.cassandra.io.util.RandomAccessReader.<init>(RandomAccessReader.java:65) ~[apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.io.compress.CompressedRandomAccessReader.<init>(CompressedRandomAccessReader.java:70) ~[apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:48) ~[apache-cassandra-2.1.13.jar:2.1.13]
... 15 common frames omitted

WARN [epollEventLoopGroup-2-1] 2016-06-19 10:23:54,839 Slf4JLogger.java:151 - An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception.
java.io.IOException: Error during accept(...): 打开的文件过多
at io.netty.channel.epoll.Native.accept(Native Method) ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
at io.netty.channel.epoll.EpollServerSocketChannel$EpollServerSocketUnsafe.epollInReady(EpollServerSocketChannel.java:102) ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:326) [netty-all-4.0.23.Final.jar:4.0.23.Final]
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:264) [netty-all-4.0.23.Final.jar:4.0.23.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) [netty-all-4.0.23.Final.jar:4.0.23.Final]
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137) [netty-all-4.0.23.Final.jar:4.0.23.Final]
at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]

文件夹数过多:–>生成时确保每个文件大点,比如1G

1
2
3
4
5
6
7
8
9
10
11
12
13
[admin@cass047202 md5_id-f88d3930345811e694a62bcdca057dae]$ ll -rth | grep Data | wc -l
2127
[admin@cass047202 md5_id-f88d3930345811e694a62bcdca057dae]$ ll -rth | grep Data | head
-rw-rw-r--. 1 admin admin 281M 6月 1 11:20 md5-md5_id-ka-4225-Data.db
-rw-rw-r--. 1 admin admin 282M 6月 1 11:21 md5-md5_id-ka-46880-Data.db
-rw-rw-r--. 1 admin admin 285M 6月 1 11:22 md5-md5_id-ka-632-Data.db
-rw-rw-r--. 1 admin admin 284M 6月 1 11:23 md5-md5_id-ka-47652-Data.db
-rw-rw-r--. 1 admin admin 281M 6月 1 11:25 md5-md5_id-ka-3954-Data.db
-rw-rw-r--. 1 admin admin 287M 6月 1 11:26 md5-md5_id-ka-47131-Data.db
-rw-rw-r--. 1 admin admin 281M 6月 1 11:27 md5-md5_id-ka-1002-Data.db
-rw-rw-r--. 1 admin admin 281M 6月 1 11:28 md5-md5_id-ka-491-Data.db
-rw-rw-r--. 1 admin admin 286M 6月 1 11:29 md5-md5_id-ka-210-Data.db
-rw-rw-r--. 1 admin admin 281M 6月 1 11:30 md5-md5_id-ka-43627-Data.db

导入数据后,用refresh

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
[admin@cass047202 cassandra]$ cqlsh 192.168.47.202
Connected to forseti_cluster at 192.168.47.202:9042.
[cqlsh 5.0.1 | Cassandra 2.1.13 | CQL spec 3.2.1 | Native protocol v3]
Use HELP for help.
cqlsh> use data;
cqlsh:data> select * from md5_id limit 1;
Warning: schema version mismatch detected, which might be caused by DOWN nodes; if this is not the case, check the schema versions of your nodes in system.local and system.peers.
Schema metadata was not refreshed. See log for details.
cqlsh:data> quit
[admin@cass047202 cassandra]$ cqlsh 192.168.47.202
Connection error: ('Unable to connect to any servers', {'192.168.47.202': OperationTimedOut('errors=None, last_host=None',)})
[admin@cass047202 cassandra]$ nodetool status
-- Address Load Tokens Owns Host ID Rack
UN 192.168.47.206 802.88 GB 256 ? 75f42842-e3ac-4bbe-947d-6b7537a521da RAC1
UN 192.168.47.222 670.08 GB 256 ? 1cc2c236-8def-4f2b-8149-28d591fc6b05 RAC1
UN 192.168.47.204 627.63 GB 256 ? 91ad3d42-4207-46fe-8188-34c3f0b2dbd2 RAC1
UN 192.168.47.221 677.56 GB 256 ? 87e100ed-85c4-44cb-9d9f-2d602d016038 RAC1
UN 192.168.47.205 724.58 GB 256 ? ac6313c8-e0b5-463b-8f90-55dc0f59e476 RAC1
UN 192.168.47.202 1.72 TB 256 ? abaa0cbc-09d3-4990-8698-ff4d2f2bb4f7 RAC1
UN 192.168.47.203 1.01 TB 256 ? 19b0b9cc-cad2-4b61-8da6-95423fe94af8 RAC1
UN 192.168.47.224 739.47 GB 256 ? 27e84abe-fb06-47ff-8861-130767ee006b RAC1
UN 192.168.47.225 677.46 GB 256 ? 216c67cf-de7d-4190-9d0d-441fc16a7f71 RAC1


select key,bootstrapped,broadcast_address,cluster_name,cql_version,data_center,gossip_generation,host_id,listen_address,native_protocol_version,partitioner,
rack,release_version,rpc_address,schema_version,thrift_version from system.local;

key | bootstrapped | broadcast_address | cluster_name | cql_version | data_center | gossip_generation | host_id | listen_address | native_protocol_version | rack | release_version | rpc_address | schema_version | thrift_version
-------+--------------+-------------------+-----------------+-------------+-------------+-------------------+--------------------------------------+----------------+-------------------------+-------+-----------------+----------------+--------------------------------------+----------------
local | COMPLETED | 192.168.47.203 | forseti_cluster | 3.2.1 | DC1 | 1464055043 | 19b0b9cc-cad2-4b61-8da6-95423fe94af8 | 192.168.47.203 | 3 | RAC1 | 2.1.13 | 192.168.47.203 | 73e0e7c4-03b1-3db6-8967-d7c0144faa5c | 19.39.0

select peer,data_center,host_id,preferred_ip,rack,release_version,rpc_address,schema_version from system.peers;
peer | data_center | host_id | preferred_ip | rack | release_version | rpc_address | schema_version
----------------+-------------+--------------------------------------+--------------+------+-----------------+----------------+--------------------------------------
192.168.47.205 | DC1 | ac6313c8-e0b5-463b-8f90-55dc0f59e476 | null | RAC1 | 2.1.13 | 192.168.47.205 | 73e0e7c4-03b1-3db6-8967-d7c0144faa5c
192.168.47.221 | DC1 | 87e100ed-85c4-44cb-9d9f-2d602d016038 | null | RAC1 | 2.1.13 | 192.168.47.221 | 73e0e7c4-03b1-3db6-8967-d7c0144faa5c
192.168.47.202 | DC1 | abaa0cbc-09d3-4990-8698-ff4d2f2bb4f7 | null | RAC1 | 2.1.13 | 192.168.47.202 | 73e0e7c4-03b1-3db6-8967-d7c0144faa5c
192.168.47.225 | DC1 | 216c67cf-de7d-4190-9d0d-441fc16a7f71 | null | RAC1 | 2.1.13 | 192.168.47.225 | 73e0e7c4-03b1-3db6-8967-d7c0144faa5c
192.168.47.224 | DC1 | 27e84abe-fb06-47ff-8861-130767ee006b | null | RAC1 | 2.1.13 | 192.168.47.224 | 73e0e7c4-03b1-3db6-8967-d7c0144faa5c
192.168.47.204 | DC1 | 91ad3d42-4207-46fe-8188-34c3f0b2dbd2 | null | RAC1 | 2.1.13 | 192.168.47.204 | 73e0e7c4-03b1-3db6-8967-d7c0144faa5c
192.168.47.206 | DC1 | 75f42842-e3ac-4bbe-947d-6b7537a521da | null | RAC1 | 2.1.13 | 192.168.47.206 | 73e0e7c4-03b1-3db6-8967-d7c0144faa5c
192.168.47.222 | DC1 | 1cc2c236-8def-4f2b-8149-28d591fc6b05 | null | RAC1 | 2.1.13 | 192.168.47.222 | 73e0e7c4-03b1-3db6-8967-d7c0144faa5c

[admin@cass047202 cassandra]$ nodetool describecluster
Cluster Information:
Name: forseti_cluster
Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Schema versions:
73e0e7c4-03b1-3db6-8967-d7c0144faa5c: [192.168.47.206, 192.168.47.222, 192.168.47.221, 192.168.47.204, 192.168.47.205, 192.168.47.202, 192.168.47.203, 192.168.47.224, 192.168.47.225]

[admin@cass047202 cassandra]$ nodetool -h 192.168.47.203 describecluster
Cluster Information:
Name: forseti_cluster
Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Schema versions:
73e0e7c4-03b1-3db6-8967-d7c0144faa5c: [192.168.47.206, 192.168.47.222, 192.168.47.221, 192.168.47.204, 192.168.47.205, 192.168.47.203, 192.168.47.224, 192.168.47.225]
UNREACHABLE: [192.168.47.202]

[admin@cass047203 ~]$ nodetool status
-- Address Load Tokens Owns Host ID Rack
UN 192.168.47.205 724.58 GB 256 ? ac6313c8-e0b5-463b-8f90-55dc0f59e476 RAC1
DN 192.168.47.202 1.72 TB 256 ? abaa0cbc-09d3-4990-8698-ff4d2f2bb4f7 RAC1
UN 192.168.47.203 1.01 TB 256 ? 19b0b9cc-cad2-4b61-8da6-95423fe94af8 RAC1


INFO 01:50:05 Harmless error reading saved cache /home/admin/cassandra/saved_caches/KeyCache-ba.db
java.lang.RuntimeException: Cache schema version b48bf712-fad2-3951-bb18-aa178e738b30 does not match current schema version 73e0e7c4-03b1-3db6-8967-d7c0144faa5c
at org.apache.cassandra.cache.AutoSavingCache.loadSaved(AutoSavingCache.java:188) ~[apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.cache.AutoSavingCache$3.call(AutoSavingCache.java:148) [apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.cache.AutoSavingCache$3.call(AutoSavingCache.java:144) [apache-cassandra-2.1.13.jar:2.1.13]
at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_51]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_51]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_51]
at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]


只要不操作:select * from md5_id limit 1; 其他节点就不会认为202DN掉了。

通过增大内存来增大sstable大小(内存不足)

内存为512M,每个数据文件和索引文件的大小都差不多(281M,231M)。能不能减少Index文件的占用??

1
2
3
4
5
6
7
8
9
10
11
12
13
[admin@cass047203 md5_id]$ ll -rth|grep Data | wc -l
2576
[admin@cass047203 md5_id]$ cd ..
[admin@cass047203 data]$ du -sh *
1.3T md5_id
[admin@cass047204 md5_id]$ ll data-md5_id-ka-2010-* -h
-rw-rw-r--. 1 admin admin 66K 6月 2 22:26 data-md5_id-ka-2010-CompressionInfo.db
-rw-rw-r--. 1 admin admin 281M 6月 2 22:26 data-md5_id-ka-2010-Data.db
-rw-rw-r--. 1 admin admin 10 6月 2 22:26 data-md5_id-ka-2010-Digest.sha1
-rw-rw-r--. 1 admin admin 16 6月 2 22:26 data-md5_id-ka-2010-Filter.db
-rw-rw-r--. 1 admin admin 231M 6月 2 22:26 data-md5_id-ka-2010-Index.db
-rw-rw-r--. 1 admin admin 9.7K 6月 2 22:26 data-md5_id-ka-2010-Statistics.db
-rw-rw-r--. 1 admin admin 91 6月 2 22:26 data-md5_id-ka-2010-TOC.txt

内存为2048,直接在运行Cassnadra的线上跑

nohup java -cp guava-19.0.jar:rainbow-table-1.0.0-SNAPSHOT-jar-with-dependencies.jar \
cn.fraudmetrix.vulcan.rainbowtable.sstable.BulkLoadIdCard -table md5_id -partition 13 -memory 1024 > rainbow-table-idcard.log.1 2>&1 &

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.Arrays.copyOfRange(Arrays.java:2694)
at java.lang.String.<init>(String.java:203)
at java.lang.StringBuilder.toString(StringBuilder.java:405)
at cn.fraudmetrix.vulcan.rainbowtable.util.AbstractGenData.last1(AbstractGenData.java:220)
at cn.fraudmetrix.vulcan.rainbowtable.util.AbstractGenData.genIdCardOneProvince(AbstractGenData.java:82)
at cn.fraudmetrix.vulcan.rainbowtable.util.AbstractGenData.genIdCard(AbstractGenData.java:63)
at cn.fraudmetrix.vulcan.rainbowtable.util.AbstractGenData.genData(AbstractGenData.java:55)
at cn.fraudmetrix.vulcan.rainbowtable.sstable.BulkLoadIdCard.main(BulkLoadIdCard.java:116)

Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x0000000632f80000, 436731904, 0) failed; error='无法分配内存' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 436731904 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /home/admin/hs_err_pid33681.log

因为Cassandra本身占用了很大内存,运行sstable的程序内存不足,可以先停掉Cassandra。为了不影响线上,只能停掉一个节点,生成完数据,再停另一个节点。。

1
2
3
4
5
[admin@cass047225 ~]$ free -h
total used free shared buffers cached
Mem: 31G 30G 470M 716K 30M 3.8G
-/+ buffers/cache: 27G 4.3G
Swap: 0B 0B 0B

设置内存大小,并使用G1,刚开始还好,过了一段时间,控制台打印的WriteCount几乎不动了,查看GC,这个时候会发现OC=OU。只能等FGC完成后,才有可能继续打印,
不过没过多久,这种情况会继续下去。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
nohup java -Xmx16g -Xms16g -XX:+UseG1GC -XX:MaxGCPauseMillis=1000 -cp guava-19.0.jar:rainbow-table-1.0.0-SNAPSHOT-jar-with-dependencies.jar \
cn.fraudmetrix.vulcan.rainbowtable.sstable.BulkLoadIdCard -table md5_id -partition $myRange -memory 4096 > rainbow-table-idcard.log 2>&1 &

jstat -gc -h 10 `jps | grep BulkLoadIdCard |awk '{print $1}'` 1000 | \
awk '{printf("%10s\t%10s\t%10s\t%10s\t%10s\t%10s\t%10s\t%10s\t%10s\t%10s\t\n",$1,$5,$3,$4,$6,$8,$11,$12,$13,$14)}'

[admin@cass047225 ~]$ jstat -gc -h 10 `jps | grep BulkLoadIdCard |awk '{print $1}'` 1000
S0C S1C S0U S1U EC EU OC OU PC PU YGC YGCT FGC FGCT GCT
0.0 106496.0 0.0 106496.0 4603904.0 507904.0 12066816.0 10133504.0 16384.0 12784.5 52 30.335 0 0.000 30.335
0.0 106496.0 0.0 106496.0 4603904.0 1138688.0 12066816.0 10133504.0 16384.0 12784.5 52 30.335 0 0.000 30.335
0.0 106496.0 0.0 106496.0 4603904.0 2007040.0 12066816.0 10133504.0 16384.0 12784.5 52 30.335 0 0.000 30.335
0.0 106496.0 0.0 106496.0 4603904.0 2883584.0 12066816.0 10133504.0 16384.0 12784.5 52 30.335 0 0.000 30.335
0.0 106496.0 0.0 106496.0 4603904.0 3702784.0 12066816.0 10133504.0 16384.0 12784.5 52 30.335 0 0.000 30.335
0.0 106496.0 0.0 106496.0 4603904.0 4333568.0 12066816.0 10133504.0 16384.0 12784.5 52 30.335 0 0.000 30.335
0.0 425984.0 0.0 425984.0 4333568.0 229376.0 12017664.0 10129408.0 16384.0 12784.5 53 30.975 0 0.000 30.975
0.0 425984.0 0.0 425984.0 4333568.0 942080.0 12017664.0 10129408.0 16384.0 12784.5 53 30.975 0 0.000 30.975
0.0 425984.0 0.0 425984.0 4333568.0 1433600.0 12017664.0 10129408.0 16384.0 12784.5 53 30.975 0 0.000 30.975
0.0 425984.0 0.0 425984.0 4333568.0 1785856.0 12017664.0 10129408.0 16384.0 12784.5 54 30.975 0 0.000 30.975
0.0 106496.0 0.0 106496.0 4128768.0 245760.0 12541952.0 10579968.0 16384.0 12784.5 54 32.056 0 0.000 32.056
0.0 106496.0 0.0 106496.0 4128768.0 1064960.0 12541952.0 10579968.0 16384.0 12784.5 54 32.056 0 0.000 32.056
0.0 106496.0 0.0 106496.0 4128768.0 1671168.0 12541952.0 10579968.0 16384.0 12784.5 54 32.056 0 0.000 32.056
0.0 106496.0 0.0 106496.0 4128768.0 2433024.0 12541952.0 10579968.0 16384.0 12784.5 54 32.056 0 0.000 32.056
0.0 106496.0 0.0 106496.0 4128768.0 3170304.0 12541952.0 10579968.0 16384.0 12784.5 54 32.056 0 0.000 32.056
0.0 106496.0 0.0 106496.0 4128768.0 3833856.0 12541952.0 10579968.0 16384.0 12784.5 54 32.056 0 0.000 32.056
0.0 409600.0 0.0 409600.0 3457024.0 147456.0 12910592.0 10575872.0 16384.0 12784.5 55 32.641 0 0.000 32.641
0.0 409600.0 0.0 409600.0 3457024.0 655360.0 12910592.0 10575872.0 16384.0 12784.5 55 32.641 0 0.000 32.641
0.0 409600.0 0.0 409600.0 3457024.0 1269760.0 12910592.0 10575872.0 16384.0 12784.5 55 32.641 0 0.000 32.641
0.0 409600.0 0.0 409600.0 3457024.0 1900544.0 12910592.0 10575872.0 16384.0 12784.5 55 32.641 0 0.000 32.641

0.0 0.0 0.0 0.0 0.0 0.0 16777216.0 16770048.5 16384.0 12784.5 133 65.236 2 61.382 126.618
S0C S1C S0U S1U EC EU OC OU PC PU YGC YGCT FGC FGCT GCT
0.0 0.0 0.0 0.0 0.0 0.0 16777216.0 16770048.5 16384.0 12784.5 133 65.236 2 61.382 126.618
0.0 0.0 0.0 0.0 0.0 0.0 16777216.0 16770048.5 16384.0 12784.5 133 65.236 2 61.382 126.618
0.0 0.0 0.0 0.0 0.0 0.0 16777216.0 16770048.5 16384.0 12784.5 133 65.236 2 61.382 126.618
0.0 0.0 0.0 0.0 0.0 0.0 16777216.0 16770048.5 16384.0 12784.5 133 65.236 2 61.382 126.618
0.0 0.0 0.0 0.0 0.0 0.0 16777216.0 16770048.5 16384.0 12784.5 133 65.236 2 61.382 126.618
0.0 0.0 0.0 0.0 0.0 0.0 16777216.0 16770048.5 16384.0 12784.5 133 65.236 2 61.382 126.618
0.0 0.0 0.0 0.0 0.0 0.0 16777216.0 16770048.5 16384.0 12784.5 133 65.236 2 61.382 126.618
0.0 0.0 0.0 0.0 0.0 0.0 16777216.0 16770048.5 16384.0 12784.5 133 65.236 2 61.382 126.618
0.0 0.0 0.0 0.0 0.0 0.0 16777216.0 16770048.5 16384.0 12784.5 133 65.236 2 61.382 126.618
0.0 0.0 0.0 0.0 884736.0 450560.0 15892480.0 15681440.2 16384.0 12784.5 133 65.236 2 122.783 188.019
S0C S1C S0U S1U EC EU OC OU PC PU YGC YGCT FGC FGCT GCT
0.0 57344.0 0.0 57344.0 827392.0 376832.0 15892480.0 15680244.2 16384.0 12784.5 134 65.394 2 122.783 188.177
0.0 106496.0 0.0 106496.0 778240.0 163840.0 15892480.0 15701920.2 16384.0 12784.5 135 65.691 2 122.783 188.474
0.0 106496.0 0.0 106496.0 778240.0 729088.0 15892480.0 15701920.2 16384.0 12784.5 136 65.691 2 122.783 188.474
0.0 73728.0 0.0 73728.0 811008.0 606208.0 15892480.0 15796935.3 16384.0 12784.5 136 66.004 2 122.783 188.786
0.0 65536.0 0.0 65536.0 819200.0 532480.0 15892480.0 15869856.2 16384.0 12784.5 137 66.209 2 122.783 188.992
0.0 65536.0 0.0 65536.0 819200.0 770048.0 15892480.0 15869856.2 16384.0 12784.5 138 66.209 2 122.783 188.992
0.0 65536.0 0.0 65536.0 819200.0 770048.0 15892480.0 15869856.2 16384.0 12784.5 138 66.209 2 122.783 188.992
0.0 65536.0 0.0 65536.0 352256.0 352256.0 16359424.0 16353184.2 16384.0 12784.5 139 67.845 2 122.783 190.628
0.0 65536.0 0.0 65536.0 352256.0 352256.0 16359424.0 16353184.2 16384.0 12784.5 139 67.845 2 122.783 190.628
0.0 65536.0 0.0 65536.0 352256.0 352256.0 16359424.0 16353184.2 16384.0 12784.5 139 67.845 2 122.783 190.628
S0C S1C S0U S1U EC EU OC OU PC PU YGC YGCT FGC FGCT GCT
0.0 65536.0 0.0 65536.0 352256.0 352256.0 16359424.0 16353184.2 16384.0 12784.5 139 67.845 2 122.783 190.628
0.0 0.0 0.0 0.0 0.0 0.0 16777216.0 16770976.2 16384.0 12784.5 140 71.751 3 122.783 194.533
0.0 0.0 0.0 0.0 0.0 0.0 16777216.0 16770976.2 16384.0 12784.5 140 71.751 3 122.783 194.533
0.0 0.0 0.0 0.0 0.0 0.0 16777216.0 16770976.2 16384.0 12784.5 140 71.751 3 122.783 194.533
0.0 0.0 0.0 0.0 0.0 0.0 16777216.0 16770976.2 16384.0 12784.5 140 71.751 3 122.783 194.533
0.0 0.0 0.0 0.0 0.0 0.0 16777216.0 16770976.2 16384.0 12784.5 140 71.751 3 122.783 194.533
0.0 0.0 0.0 0.0 0.0 0.0 16777216.0 16770976.2 16384.0 12784.5 140 71.751 3 122.783 194.533
0.0 0.0 0.0 0.0 0.0 0.0 16777216.0 16770976.2 16384.0 12784.5 140 71.751 3 122.783 194.533
0.0 0.0 0.0 0.0 0.0 0.0 16777216.0 16770976.2 16384.0 12784.5 140 71.751 3 122.783 194.533
0.0 0.0 0.0 0.0 0.0 0.0 16777216.0 16770976.2 16384.0 12784.5 140 71.751 3 122.783 194.533
S0C S1C S0U S1U EC EU OC OU PC PU YGC YGCT FGC FGCT GCT
0.0 0.0 0.0 0.0 0.0 0.0 16777216.0 16770976.2 16384.0 12784.5 140 71.751 3 122.783 194.533
0.0 0.0 0.0 0.0 0.0 0.0 16777216.0 16770976.2 16384.0 12784.5 140 71.751 3 122.783 194.533
0.0 0.0 0.0 0.0 0.0 0.0 16777216.0 16770976.2 16384.0 12784.5 140 71.751 3 122.783 194.533
0.0 0.0 0.0 0.0 811008.0 385024.0 15966208.0 15960817.7 16384.0 12784.5 140 71.751 3 173.703 245.454
0.0 0.0 0.0 0.0 811008.0 811008.0 15966208.0 15960817.7 16384.0 12784.5 141 71.751 3 173.703 245.454
0.0 0.0 0.0 0.0 811008.0 811008.0 15966208.0 15960817.7 16384.0 12784.5 141 71.751 3 173.703 245.454
0.0 0.0 0.0 0.0 286720.0 155648.0 16490496.0 16485105.7 16384.0 12784.5 141 73.666 3 173.703 247.369
0.0 0.0 0.0 0.0 286720.0 286720.0 16490496.0 16485105.7 16384.0 12784.5 142 73.666 3 173.703 247.369
0.0 0.0 0.0 0.0 0.0 0.0 16777216.0 16771825.7 16384.0 12784.5 143 75.252 4 173.703 248.955
0.0 0.0 0.0 0.0 0.0 0.0 16777216.0 16771825.7 16384.0 12784.5 143 75.252 4 173.703 248.955
0.0 0.0 0.0 0.0 0.0 0.0 16777216.0 16771825.7 16384.0 12784.5 143 75.252 4 173.703 248.955
S0C S1C S0U S1U EC EU OC OU PC PU YGC YGCT FGC FGCT GCT

为什么FGC之后,几乎马上又重现?实际上内存中的数据还没有刷写到磁盘上,当然这部分内存就不能回收了。这就导致FGC实际上并没有回收多少内存。

1
2
3
4
5
6
[admin@cass047225 ~]$ tail -f rainbow-table-idcard.log
WriteCount:65531,SAMPLE:92FF12AE4A27EC01692DC05F041ECF44,13010219701011217X
[admin@cass047225 ~]$ cat rainbow-table-idcard.log | grep "WriteCount" | wc -l
515
[admin@cass047225 ~]$ du -sh *
12K md5_id20160622

写了将近51565531=3000万,还是没有生成一个sstable文件。 因为buffer size设置为4G,只有Memtable达到4G时,才会刷写一个sstable文件。
一条记录”92FF12AE4A27EC01692DC05F041ECF44,13010219701011217X”只有52个字符(52byte),再加上写sstable还有其他组件比如index,
就算一条记录=1KB=1024byte,1万记录=10000kb=10M, 3000万=3000
10M=300G

以buffer=4G为例,总内存=4G=4000M=4000M1000KB=4000000KB=400万KB,如果内存中只以一条记录=52byte计算。
总共可以存放最大记录数=`400万
1000byte/50byte=400万*20=800万`,但是上面将近3000万,为什么没有结果?

唯一的办法是减少buffer大小到2G/1G…。通过让内存中的数据尽快刷写到磁盘上!释放掉内存中的数据。
否则数据源源不断写到内存中,无法及时释放,即使FGC也无能为力。

运行过程,查看堆栈信息

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
[admin@cass047225 md5_id]$ jps -lm
7594 cn.fraudmetrix.vulcan.rainbowtable.sstable.BulkLoadIdCard -table md5_id -partition 13,14,15,21 -memory 2048
[admin@cass047225 md5_id]$ top -Hp 7594
top - 10:19:32 up 211 days, 15:36, 3 users, load average: 2.77, 3.62, 3.68
Tasks: 34 total, 4 running, 30 sleeping, 0 stopped, 0 zombie
Cpu(s): 8.6%us, 3.0%sy, 17.3%ni, 70.6%id, 0.2%wa, 0.0%hi, 0.3%si, 0.0%st
Mem: 32794428k total, 32493800k used, 300628k free, 31944k buffers
Swap: 0k total, 0k used, 0k free, 10400404k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
7625 admin 20 0 21.8g 19g 16m R 101.7 63.6 17:55.41 java
7603 admin 20 0 21.8g 19g 16m R 99.7 63.6 28:13.74 java
7627 admin 20 0 21.8g 19g 16m R 99.7 63.6 13:13.43 java
7628 admin 20 0 21.8g 19g 16m R 99.7 63.6 13:13.73 java

[admin@cass047225 ~]$ ./show-busy-javathreads.sh
Busy(73.4%) thread(7603/0x1db3) stack of java process(7594) under user(admin):
"main" prio=10 tid=0x00007f7468009000 nid=0x1db3 runnable [0x00007f7471aa9000]
java.lang.Thread.State: RUNNABLE
at java.util.Arrays.copyOf(Arrays.java:2271)
at java.lang.StringCoding.safeTrim(StringCoding.java:79)
at java.lang.StringCoding.encode(StringCoding.java:365)
at java.lang.String.getBytes(String.java:939)
at org.apache.cassandra.utils.ByteBufferUtil.bytes(ByteBufferUtil.java:225)
at org.apache.cassandra.serializers.AbstractTextSerializer.serialize(AbstractTextSerializer.java:49)
at org.apache.cassandra.serializers.AbstractTextSerializer.serialize(AbstractTextSerializer.java:26)
at org.apache.cassandra.db.marshal.AbstractType.decompose(AbstractType.java:73)
at org.apache.cassandra.io.sstable.CQLSSTableWriter.addRow(CQLSSTableWriter.java:142)
at org.apache.cassandra.io.sstable.CQLSSTableWriter.addRow(CQLSSTableWriter.java:118)
at cn.fraudmetrix.vulcan.rainbowtable.sstable.BulkLoadIdCard.batchWrite(BulkLoadIdCard.java:55)
at cn.fraudmetrix.vulcan.rainbowtable.util.AbstractGenData.genIdCardOneProvince(AbstractGenData.java:86)
at cn.fraudmetrix.vulcan.rainbowtable.util.AbstractGenData.genIdCard(AbstractGenData.java:63)
at cn.fraudmetrix.vulcan.rainbowtable.util.AbstractGenData.genData(AbstractGenData.java:55)
at cn.fraudmetrix.vulcan.rainbowtable.sstable.BulkLoadIdCard.main(BulkLoadIdCard.java:116)

Busy(46.8%) thread(7625/0x1dc9) stack of java process(7594) under user(admin):
"G1 Concurrent Refinement Thread#0" prio=10 tid=0x00007f746803c800 nid=0x1dc9 runnable

Busy(34.5%) thread(7628/0x1dcc) stack of java process(7594) under user(admin):
"Gang worker#1 (G1 Parallel Marking Threads)" prio=10 tid=0x00007f746806b800 nid=0x1dcc runnable

Busy(34.5%) thread(7627/0x1dcb) stack of java process(7594) under user(admin):
"Gang worker#0 (G1 Parallel Marking Threads)" prio=10 tid=0x00007f7468069800 nid=0x1dcb runnable

Busy(25.6%) thread(7688/0x1e08) stack of java process(7594) under user(admin):
"Thread-2" prio=10 tid=0x00007f7469e3b000 nid=0x1e08 waiting on condition [0x00007f73f012c000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000003fb131618> (a java.util.concurrent.SynchronousQueue$TransferStack)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:458)
at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359)
at java.util.concurrent.SynchronousQueue.take(SynchronousQueue.java:925)
at org.apache.cassandra.io.sstable.SSTableSimpleUnsortedWriter$DiskWriter.run(SSTableSimpleUnsortedWriter.java:240)

打印存活的对象占比

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
[admin@cass047225 ~]$ jmap -histo:live 7594

num #instances #bytes class name
----------------------------------------------
1: 48953579 2,349,771,792 java.nio.HeapByteBuffer
2: 48959365 2,154,707,824 [B
3: 48953434 1566509888 org.apache.cassandra.db.BufferCell
4: 24476718 1370696168 [Lorg.apache.cassandra.db.Cell;
5: 24533905 981356200 java.util.TreeMap$Entry
6: 24476720 979068800 org.apache.cassandra.io.sstable.CQLSSTableWriter$BufferedWriter$1
7: 24476718 783254976 org.apache.cassandra.db.composites.CompoundSparseCellName
8: 24476720 587441280 org.apache.cassandra.dht.LongToken
9: 24476720 587441280 org.apache.cassandra.db.DeletionInfo
10: 24476719 587441256 org.apache.cassandra.db.BufferDecoratedKey
11: 94817 9075720 [C
12: 27311 3820688 <constMethodKlass>
13: 27311 3505968 <methodKlass>
14: 2349 2919216 <constantPoolKlass>
15: 94761 2274264 java.lang.String
16: 2349 1619680 <instanceKlassKlass>
17: 1882 1474880 <constantPoolCacheKlass>
18: 57261 1374264 java.lang.Long
19: 28499 1367952 org.apache.cassandra.io.sstable.IndexSummaryBuilder$ReadableBoundary

内存缓冲区设置为2G时,运行省份13花费:69385;内存为512,花费:75597
2G时每个SSTable的文件大小为1G多,512M时,每个文件的大小为281M
只有一半的原因是Data文件和Index文件都差不多大。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
内存为512M
[admin@cass047221 md5_id]$ ll data-md5_id-ka-1578-* -rth
-rw-rw-r--. 1 admin admin 16 6月 24 06:23 data-md5_id-ka-1578-Filter.db
-rw-rw-r--. 1 admin admin 231M 6月 24 06:23 data-md5_id-ka-1578-Index.db
-rw-rw-r--. 1 admin admin 282M 6月 24 06:23 data-md5_id-ka-1578-Data.db
-rw-rw-r--. 1 admin admin 66K 6月 24 06:23 data-md5_id-ka-1578-CompressionInfo.db
-rw-rw-r--. 1 admin admin 9.7K 6月 24 06:23 data-md5_id-ka-1578-Statistics.db
-rw-rw-r--. 1 admin admin 8 6月 24 06:23 data-md5_id-ka-1578-Digest.sha1
-rw-rw-r--. 1 admin admin 91 6月 24 06:23 data-md5_id-ka-1578-TOC.txt

内存为2G
[admin@cass047225 md5_id]$ ll data-md5_id-ka-1-* -rth
-rw-rw-r--. 1 admin admin 16 6月 22 09:47 data-md5_id-ka-1-Filter.db
-rw-rw-r--. 1 admin admin 924M 6月 22 09:47 data-md5_id-ka-1-Index.db
-rw-rw-r--. 1 admin admin 1.2G 6月 22 09:47 data-md5_id-ka-1-Data.db
-rw-rw-r--. 1 admin admin 262K 6月 22 09:47 data-md5_id-ka-1-CompressionInfo.db
-rw-rw-r--. 1 admin admin 10 6月 22 09:47 data-md5_id-ka-1-Digest.sha1
-rw-rw-r--. 1 admin admin 9.7K 6月 22 09:47 data-md5_id-ka-1-Statistics.db
-rw-rw-r--. 1 admin admin 91 6月 22 09:47 data-md5_id-ka-1-TOC.txt

分表?

生成数据按照省份,对应表为md5_id_省份编号,比如md5_id_13。
key - value = B566B86D791DB56E4F149B42A2E84B5A,130100196303225584

最终表有: md5_id_13, md5_id_14,md5_id_15… 总共有30张表
每张表数据量大概1000亿/30=30亿

问题: 给定md5值,怎么知道要查询哪张表?
建立md5值和省份的对应关系? 或者不用建立,查询时所有表依次查询

1
2
3
4
du -sh * | grep md5_id201606
cat rainbow-table-idcard.log.2 | grep seconds
cat rainbow-table-idcard.log.2 | grep WriteCount | wc -l
ll md5_id20160623/data/md5_id | grep Data | wc -l
运行节点 省份 数据源个数 耗时(s) 数据占用空间 数据条数 SSTable数量 导入耗时(s) 状态
192.168.47.202 22,23,31,32 156,301,32,289=778 29580,59443,6066,57166 1.1T 173812*65531+16516 2164
192.168.47.203 33,34,35,36 230,256,201,239=926 38062,41404,32837,41002 1.3T 206875*65531 2576
192.168.47.204 37,41 383,367 61807,60992
192.168.47.206 46,50,51 72,51,512=635 13851,10217,95683 885G 141864*65531 1767
192.168.47.221 52,53,54 161,246,161= 20690,31109,20306 791G 126896*65531+20992=8315642768=83亿 1580 25758 ✅✅
192.168.47.225 13,14,15,21 390,274,210,209=1083 69385,47905,36801,36486 1.5T 784
192.168.48.168 61,62,63,64,65,11,12 231,199,70,65,172,28,23=788 29579,25761,9046,8602,22679,3699,3040 176047*65531 2192 1.1T
192.168.48.165 43,44,45 284,383,253=920 36764,49562,32799

,42 ==?259

sstableloader

168

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
cd md5_id20160623 && mv data md5 && cd md5_id && rename data md5 *
MAX_HEAP_SIZE="256M"
MAX_HEAP_SIZE="3096M"
nohup /usr/install/cassandra/bin/sstableloader -d 192.168.47.202 /home/admin/md5_id20160623/md5/md5_id &
nohup apache-cassandra-2.1.13/bin/sstableloader -d 192.168.47.202 /home/admin/md5_id20160627/md5/md5_id &

Summary statistics:
Connections per host: : 1
Total files transferred: : 14220
Total bytes transferred: : 579(gb),024(mb),927(kb),946(b)
Total duration (ms): : 25758,956 = 7hour
Average transfer rate (MB/s): : 21
Peak transfer rate (MB/s): : 21

[admin@cass047202 ~]$ nodetool cfstats md5.md5_id | grep keys
Number of keys (estimate): 5567257543

如果遇到节点挂掉:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
[/192.168.47.222, /192.168.47.204, /192.168.47.203, /192.168.47.224]
java.util.concurrent.ExecutionException: org.apache.cassandra.streaming.StreamException: Stream failed
at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:125)
Caused by: org.apache.cassandra.streaming.StreamException: Stream failed
at org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85)
at com.google.common.util.concurrent.Futures$4.run(Futures.java:1172)
at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202)
at org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:208)
at org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:184)
at org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:415)
at org.apache.cassandra.streaming.StreamSession.complete(StreamSession.java:607)
at org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:471)
at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:256)
at java.lang.Thread.run(Thread.java:744)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
ERROR [CompactionExecutor:19] 2016-06-29 10:06:19,266 CassandraDaemon.java:229 - Exception in thread Thread[CompactionExecutor:19,1,main]
java.lang.RuntimeException: Not enough space for compaction, estimated sstables = 1, expected write size = 221195610
at org.apache.cassandra.db.compaction.CompactionTask.checkAvailableDiskSpace(CompactionTask.java:296) ~[apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:124) ~[apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:73) ~[apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) ~[apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.db.compaction.CompactionManager$8.runMayThrow(CompactionManager.java:626) ~[apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-2.1.13.jar:2.1.13]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_51]
at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_51]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_51]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_51]
at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]
ERROR [HintedHandoffManager:1] 2016-06-29 10:06:19,267 CassandraDaemon.java:229 - Exception in thread Thread[HintedHandoffManager:1,1,main]
java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.RuntimeException: Not enough space for compaction, estimated sstables = 1, expected write size = 221195610
at org.apache.cassandra.db.HintedHandOffManager.compact(HintedHandOffManager.java:282) ~[apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.db.HintedHandOffManager.scheduleAllDeliveries(HintedHandOffManager.java:522) ~[apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.db.HintedHandOffManager.access$000(HintedHandOffManager.java:93) ~[apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.db.HintedHandOffManager$1.run(HintedHandOffManager.java:182) ~[apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118) ~[apache-cassandra-2.1.13.jar:2.1.13]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_51]
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) [na:1.7.0_51]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) [na:1.7.0_51]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [na:1.7.0_51]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_51]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_51]
at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]
Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: Not enough space for compaction, estimated sstables = 1, expected write size = 221195610
at java.util.concurrent.FutureTask.report(FutureTask.java:122) [na:1.7.0_51]
at java.util.concurrent.FutureTask.get(FutureTask.java:188) [na:1.7.0_51]
at org.apache.cassandra.db.HintedHandOffManager.compact(HintedHandOffManager.java:278) ~[apache-cassandra-2.1.13.jar:2.1.13]
... 11 common frames omitted
Caused by: java.lang.RuntimeException: Not enough space for compaction, estimated sstables = 1, expected write size = 221195610
at org.apache.cassandra.db.compaction.CompactionTask.checkAvailableDiskSpace(CompactionTask.java:296) ~[apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:124) ~[apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:73) ~[apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) ~[apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.db.compaction.CompactionManager$8.runMayThrow(CompactionManager.java:626) ~[apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-2.1.13.jar:2.1.13]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_51]
at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_51]
... 3 common frames omitted
WARN [STREAM-IN-/127.0.0.1] 2016-06-29 10:06:29,500 CompressedStreamReader.java:115 - [Stream ab888f50-3d8f-11e6-8b24-dff3aadcb7ef] Error while reading partition DecoratedKey(-677751571849691691, 4435344444443536393133383636353144303931304334383341363032413842) from stream on ks='md5' and table='md5_id'./usr

jHiccup

http://hao.jobbole.com/jhiccup/

  1. 给bin/cassandra启动命令,添加jHiccup:
1
2
sudo vi /usr/install/cassandra/bin/cassandra
exec "/usr/install/jHiccup-2.0.6/jHiccup" $NUMACTL "$JAVA" $JVM_OPTS $cassandra_parms -cp "$CLASSPATH" $props "$class"

注意必须先停止Cassanra,然后再启动(废话,不停止,再次启动肯定有问题)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
HiccupMeter: Failed to open log file.
INFO 06:02:01 Classpath: /usr/install/cassandra/bin/../conf:.../usr/install/jHiccup-2.0.6/jHiccup.jar:/usr/install/cassandra/bin/../lib/jamm-0.3.0.jar
INFO 06:02:01 JVM Arguments: [-javaagent:/usr/install/jHiccup-2.0.6/jHiccup.jar=, -ea, -javaagent:/usr/install/cassandra/bin/../lib/jamm-0.3.0.jar, -XX:
WARN 06:02:01 Unable to lock JVM memory (ENOMEM). This can result in part of the JVM being swapped out, especially with mmapped I/O enabled. Increase RLIMIT_MEMLOCK or run Cassandra as root.
WARN 06:02:01 JMX is not enabled to receive remote connections. Please see cassandra-env.sh for more info.
ERROR 06:02:01 Error starting local jmx server:
java.rmi.server.ExportException: Port already in use: 7199; nested exception is:
java.net.BindException: 地址已在使用

[qihuang.zheng@dp0652 ~]$ export _JAVA_OPTIONS='-javaagent:/usr/install/jHiccup-2.0.6/jHiccup.jar="-d 20000 -i 1000"' && sudo -u admin /usr/install/cassandra/bin/cassandra
[qihuang.zheng@dp0652 ~]$ Picked up _JAVA_OPTIONS: -javaagent:/usr/install/jHiccup-2.0.6/jHiccup.jar="-d 20000 -i 1000"
HiccupMeter: Failed to open log file.
INFO 06:24:09 JVM Arguments: [-ea, -javaagent:/usr/install/cassandra/bin/../lib/jamm-0.3.0.jar, -Dcassandra.storagedir=/usr/install/cassandra/bin/../data, -javaagent:/usr/install/jHiccup-2.0.6/jHiccup.jar=-d 20000 -i 1000]
WARN 06:24:09 Unable to lock JVM memory (ENOMEM). This can result in part of the JVM being swapped out, especially with mmapped I/O enabled. Increase RLIMIT_MEMLOCK or run Cassandra as root.
WARN 06:24:09 JMX is not enabled to receive remote connections. Please see cassandra-env.sh for more info.
ERROR 06:24:09 Error starting local jmx server:
java.rmi.server.ExportException: Port already in use: 7199; nested exception is:
java.net.BindException: 地址已在使用

[qihuang.zheng@dp0652 ~]$ jps -lm
Picked up _JAVA_OPTIONS: -javaagent:/usr/install/jHiccup-2.0.6/jHiccup.jar="-d 20000 -i 1000"
35054 sun.tools.jps.Jps -lm
[qihuang.zheng@dp0652 ~]$ sudo -u admin jps -lm
Picked up _JAVA_OPTIONS: -javaagent:/usr/install/jHiccup-2.0.6/jHiccup.jar="-d 20000 -i 1000"
HiccupMeter: Failed to open log file.

正常启动Cassandra后,jHiccup.jar会作为agent:

1
2
[qihuang.zheng@dp0652 ~]$ ps -ef|grep cassandra
/usr/java/jdk1.7.0_51/bin/java -javaagent:/usr/install/jHiccup-2.0.6/jHiccup.jar= -ea -javaagent:/usr/install/cassandra/bin/../lib/jamm-0.3.0.jar -XX:...

正常生成的文件格式:hiccup.160505.1429.34233.hlog
会生成到pid指定的目录下,比如pid是cassandra的进程,则生成到/usr/install/cassandra下

但是在生产环境测试时,无法生产日志文件:

ps: the problem of jHiccup tool not working:

We use jHiccup -p $pid to attach jHiccup to Cassandra Service,
but unfortunately, there are non hlog file generated which should be in Cassandra install home.
Last week I had demonstration to Daniel, and Daniel say he will checkout.
Here is our steps on production environment

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
[admin@192-168-48-47 ~]$ ps -ef|grep cassandra
admin 27021 1 10 Apr29 ? 1-22:50:40 /usr/java/jdk1.7.0_51/bin/java ... org.apache.cassandra.service.CassandraDaemon

[admin@192-168-48-47 ~]$ /usr/install/jHiccup-2.0.6/jHiccup -p 27021 -v
jHiccup version 2.0.5
jHiccup executing: /usr/java/jdk1.7.0_51/bin/java -cp /usr/java/jdk1.7.0_51/lib/tools.jar:/usr/install/jHiccup-2.0.6/jHiccup.jar
org.jhiccup.HiccupMeterAttacher -v -j /usr/install/jHiccup-2.0.6/jHiccup.jar -p 27021
Attaching to process 27021 and launching jHiccup agent from jar 27021 with args: -d 0 -i 5000 -s 2 -r 1.0 -v

[admin@192-168-48-47 ~]$ ll /usr/install/cassandra/
drwxr-xr-x. 2 admin admin 4096 4月 25 11:08 bin
-rw-r--r--. 1 admin admin 263216 1月 26 22:21 CHANGES.txt
drwxr-xr-x. 3 admin admin 4096 4月 29 10:35 conf
drwxr-xr-x. 2 admin admin 4096 4月 25 11:08 interface
drwxr-xr-x. 3 admin admin 4096 4月 25 11:08 javadoc
drwxr-xr-x. 3 admin admin 4096 4月 25 11:08 lib
-rw-r--r--. 1 admin admin 11609 1月 26 22:21 LICENSE.txt
drwxr-xr-x. 2 admin admin 4096 5月 13 12:43 logs
-rw-r--r--. 1 admin admin 67603 1月 26 22:21 NEWS.txt
-rw-r--r--. 1 admin admin 2117 1月 26 22:21 NOTICE.txt
drwxr-xr-x. 3 admin admin 4096 4月 25 11:08 pylib
drwxr-xr-x. 4 admin admin 4096 4月 25 11:08 tools

原因是启动Cassandra用普通用户,然后通过sudo -u admin方式启动,但是/home/admin用户却无法访问普通用户的权限。

1
2
3
4
5
6
7
[admin@fp-cass048162 ~]$ pwdx 19649
19649: /home/qihuang.zheng

[admin@fp-cass048162 ~]$ lsof -p 19649 | grep cwd
java 19649 admin cwd DIR 8,17 4096 490733569 /home/qihuang.zheng

[admin@fp-cass048162 ~]$ readlink -e /proc/19649/cwd

解决方法:停止Cassandra,用admin用户登陆, 用admin用户启动,不需要sudo -u admin。
最后会生成文件到/home/admin下。正常的日志如下:

1
2
3
4
5
6
7
8
9
10
# Executing: HiccupMeter -d 0 -i 5000 -s 2 -r 1.0 -v
#[Logged with jHiccup version 2.0.6]
#[Histogram log format version 1.1]
#[StartTime: 1463708353.327 (seconds since epoch), Fri May 20 09:39:13 CST 2016]
"StartTimestamp","Interval_Length","Interval_Max","Interval_Compressed_Histogram"
1166.248,5.002,570.425,HISTIgAAAKp42pNpmazIwMApwAABTBDKT4GBgdnNYMcCBvsPUJlfDO8Yt3GfY1jCwM3AzsDIwAIUYwRCJjCJCZiwimITIx4w0kAlpW4amTYzUmAqI0UuGiibGfHqYCTaPEaK4oVxgGTR5RkpkB08qkkxi5EompEGqgfOTEweAjNiiGDDxKkaCiqxQyacMuSqpL6J1LGbCQyZoTQyxCZGvErq66aNe1igkBVMMgMAUHALXQ==
1171.250,5.000,42.729,HISTIgAAAHt42pNpmazIwMDqxgABTBDKT4GBgdnNYMcCBvsPUJk/jDwMf/huM8gzcDOwglWyMjDC1EMBIwN2gEscP2AkWjcjmSaT77aRaTP++GWkoe6BsxlTDyMBmxjJlCVs9mCxmRCfiWaqGQnELCMJsqSpZsRKM+KVxVAFADZ8CIQ=
1176.250,5.000,406.847,HISTIgAAAJt42pNpmazIwMDRwgABTBDKT4GBgdnNYMcCBvsPUJlnDJ8YhXjDGOoYBBnYgOpYGBiBJEg9IxgzMjCDSUYG8gAjDeQoN2PgbGagui5KzWCksbsYB0gv6boZaSY7VMxmJKCXdqoHymZGkmhqqqKFmeTYzIiCGTFEcOGBU0lN83BBJjxy5KijhUpqmMgEhsxQGhUyUyBGgm4AXGIJ5w==
1181.250,5.000,125.829,HISTIgAAAG942pNpmazIwMD2gAECmCCUnwIDA7ObwY4FDPYfoDK/Gf4wcvFlMnQzcABVsTEwAiEDFEPY2AAu8cEFGEdtHgRmMg4ZmxlpJjtUzGYkoJd2qgfKZkaSaGqqooWZ5NjMiIIZMURw4YFTSYJ5AGq7CUM=
1186.250,5.000,444.596,HISTIgAAAIR42pNpmazIwMCxggECmCCUnwIDA7ObwY4FDPYfoDIXGVkZvvGaMBxm4GBgAapjBGJmoDgjFA9mwEgnPSPNZka6u3BgbWakmSwtzaa2zYwk8SmRHTyqGUmi6a+K9mYyEsSMRKkaKiqxicAgEwoPH2SisrqBsxtU2zNhQErEaKIbAEcEChs=

文章目录
  1. 1. 优雅关闭节点
  2. 2. 不同集群数据同步
    1. 2.1. 增量备份
    2. 2.2. 准备工作:新集群搭建和建表
    3. 2.3. 开始:先测试一张表的迁移
    4. 2.4. 整个集群做一次完整的快照迁移
    5. 2.5. 增量备份(需要多次执行)
    6. 2.6. 机房迁移
      1. 2.6.1. 直接从当前集群同步到上海集群
      2. 2.6.2. 新集群安装
      3. 2.6.3. 1. 表结构
      4. 2.6.4. 2. 数据迁移
      5. 2.6.5. 3. 增量数据迁移
      6. 2.6.6. 4. 数据验证
  3. 3. nodetool工具
    1. 3.1. 集群状态: nodetool decribecluster
    2. 3.2. 删除节点: removenode
    3. 3.3. 替换节点: replace_address
    4. 3.4. 重新加入
    5. 3.5. 网络状态netstats: 显示一个节点的Active Stream
    6. 3.6. 节点信息: nodetool info
    7. 3.7. Gossip信息: nodetool gossipinfo
    8. 3.8. 表相关的信息:nodetool cfstats forseti.velocity
    9. 3.9. compactionhistory
    10. 3.10. 节点sstable数量异常
  4. 4. tpstats
  5. 5. sstable writer
    1. 5.1. 文件数过多
    2. 5.2. 导入数据后,用refresh
    3. 5.3. 通过增大内存来增大sstable大小(内存不足)
    4. 5.4. 分表?
  6. 6. sstableloader
  7. 7. jHiccup