2015-10-15

Cassandra陷阱

Cassandra的一些陷阱

Count查询
删除数据库 drop keyspace
Truncate清空表
Query查询的限制
- where
- fetchSize和limit
二级索引的删除
重命名表
query

Count查询

CQL默认显示10万条数据. 如果真正的数量不多, 可以增大limit值.

cqlsh:forseti> select count(*) from velocity_app limit 100000000;
 count
--------
 983872

但是如果数据很多, 增大limit也会导致超时, 因为默认的range query时间是15秒.

1 2	cqlsh:forseti> select count(*) from velocity limit 10000000; Request did not complete within rpc_timeout.

用CQL查询的count有98万, 但是预估的keys只有600多条. 这是因为keys是统计partiton key的数量.
而CQL中的count是统计columns的: 即一个partiton key有多个column, 每个column都算一条!

➜  ~  nodetool cfstats forseti.velocity_app -h 192.168.6.52
Keyspace: forseti
  Pending Tasks: 0
    Table: velocity_app
    SSTable count: 1
    SSTables in each level: [1, 0, 0, 0, 0, 0, 0, 0, 0]
    Space used (live), bytes: 13401166
    Space used (total), bytes: 13401166
    Off heap memory used (total), bytes: 6548
    SSTable Compression Ratio: 0.27455019357879173
    👉Number of keys (estimate): 640
    👉Memtable cell count: 393345
    Memtable data size, bytes: 60653197
    Memtable switch count: 4
    Compression metadata off heap memory used, bytes: 5912
    Compacted partition minimum bytes: 17085
    Compacted partition maximum bytes: 785939
    Compacted partition mean bytes: 91169
    👉Average live cells per slice (last five minutes): 918.0

每个slice切片的存活cells都近1000个, cell认为是column, slice认为是row. 所以columns = 600*1000=600000=60万.
注意还要加上Memtable的cell count近40万, 这样60万+40万看来跟CQL查询的90万就差不多了.

删除数据库 drop keyspace

drop keyspace后keyspace对应的文件夹仍然存在。要物理删除，必须到每个节点手动rm -rf删除文件夹。

cqlsh> desc keyspaces;

forseti_velocity  system       stresscql      forseti
quote             "OpsCenter"  wuzei          velocityrealtime
volecityrealtime  forseti_fp   system_traces  forseti_old

cqlsh> use data
cqlsh:data> drop keyspace data;
OperationTimedOut: errors={}, last_host=192.168.47.202   #出现这个错误，一定要再执行一次
cqlsh:data> drop keyspace data;
cqlsh:data> 
CREATE KEYSPACE data WITH replication = {
  'class': 'NetworkTopologyStrategy',
  'DC2': '1',
  'DC1': '1'
};
AlreadyExists: Keyspace 'data' already exists
cqlsh:data> 
CREATE TABLE md5_id (
  md5 text,
  id text,
  PRIMARY KEY (md5)
);
ServerError: <ErrorMessage code=0000 [Server error] message="java.lang.RuntimeException: java.util.concurrent.ExecutionException: 
java.lang.RuntimeException: javax.management.d: org.apache.cassandra.db:type=ColumnFamilies,keyspace=data,columnfamily=md5_id">

Truncate清空表

在CQL中truncate表数据后, df -h对应的空间并没有减少. 虽然表目录下没有数据文件了, 但是实际上它会生成快照.
所以要彻底删除表和数据, 最后对所有节点执行: nodetool clearsnapshot -h 192.168.47...

[qihuang.zheng@cass047221 forseti]$ cd /home/admin/cassandra/data/forseti && for i in $(ls -l |grep '^d' |du -s * |sort -nr|awk '{print $2}');do du -sh $i;done | head -10
590G  activity_detail
391G  velocity
62G account_association_info
8.1G  account_attached_ip
7.7G  account_binded_property
[qihuang.zheng@cass047221 forseti]$ ll activity_detail/
drwxr-xr-x. 3 admin admin 4096 10月 20 15:07 snapshots     ⬅️ 生成快照文件夹, 执行clearsnapshot会删除这个文件夹

超时

1	<stdin>:1:OperationTimedOut: errors={}, last_host=192.168.47.202

问题： truncate后需要执行flush吗？

Query查询的限制

通常的TimeSeries的DataModel的设计是: 相同的partition key会有很多记录. 比如weather_station_id.
一个气象站点每分钟一条数据, 则主键设计为: PRIMARY KEY(station_id, occur_min).
或者设计成compound partition key: PRIMARY KEY((station_id, day), occur_min)

desc table activity_detail;
CREATE TABLE activity_detail (
  sequence_id text,
  occur_time bigint,
  activity_map map<text, text>,
  ...
  PRIMARY KEY ((sequence_id), occur_time)
)

查询时可以只能是partiton key,不能单单根据cluster column(allow filtering除外):

--非partition key查询(虽然occur_time也是primary key的一部分),需要跟上allow filtering,但是会慢点
select * from activity_detail where occur_time=1432292854170;
select * from activity_detail where occur_time <1398873600000 limit 1;
Bad Request: Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. 
If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING

select * from activity_detail where occur_time <1398873600000 limit 1 ALLOW FILTERING;
select count(*) from activity_detail where occur_time <1398873600000 ALLOW FILTERING;

更新时必须跟上所有primary key:

--不支持子查询!
UPDATE activity_detail USING TTL 10 
SET occur_time=(select occur_time from activity_detail where sequence_id='1415881081492-08553997')
where sequence_id='1415881081492-08553997';

--Bad Request: PRIMARY KEY part occur_time found in SET part 不允许更新primary key!
UPDATE activity_detail USING TTL 10 SET occur_time=1415881081493 
where sequence_id='1415881081492-08553997';

--Bad Request: Missing mandatory PRIMARY KEY part occur_time 需要跟上所有primary key字段
UPDATE activity_detail USING TTL 30 SET geo_map={'test':'test'}
where sequence_id='1415881081492-08553997';

--OK,but TTL don't really work as expected! 注意要同时跟上primary key的所有字段!
UPDATE activity_detail USING TTL 30 SET geo_map={'test':'test'}
where sequence_id='1415881081492-08553997' and occur_time=1415881081493;

--Bad Request: Cannot use selection function ttl on collections
select ttl(geo_map) from activity_detail where sequence_id='1415881081492-08553997' ;

--Bad Request: Cannot use selection function ttl on PRIMARY KEY part occur_time
select ttl(occur_time) from activity_detail where sequence_id='1415881081492-08553997' ;

null与删除:
更新一个字段时指定TTL,TTL后字段值为null;
插入一条记录时指定TTL,TTL后这条记录消失;
DELETE删除一条记录,这条记录消失.

--数据仍然存在,只不过值为null. 
select geo_map from activity_detail where sequence_id='1415881081492-08553997' ;
 geo_map
---------
    null

--使用DELETE FROM可以成功删除! 
DELETE FROM activity_detail where sequence_id='1415881081492-08553997';
select * from activity_detail where sequence_id='1415881081492-08553997' ;
(0 rows)

where

http://www.datastax.com/dev/blog/a-deep-look-to-the-cql-where-clause

fetchSize和limit

http://docs.datastax.com/en/developer/java-driver/2.1/java-driver/jd-faq.html
http://stackoverflow.com/questions/22854558/datastax-driver-limit-option

What’s the difference between using setFetchSize() and LIMIT?
Basically, LIMIT controls the maximum number of results done on the Cassandra side, 在服务端限制
while the setFetchSize() method controls the amount of data transferred between Cassandra and the client.

假设一条row有5000个column, 如果只是设置fetchSize=1000, 则第一次返回1000行, 还有剩余的4000行, 会在什么时候返回?

二级索引的删除

1.不支持根据二级索引删除

CREATE INDEX ON velocity_app (sequence_id);

根据二级索引查询都是OK的:  
select * from velocity_app where sequence_id='1439981942982-51788461';
select * from velocity_app where attribute='229.195.196.98' and partner_code='demo' and app_name='autoTest' and type='ipAddress';
select * from velocity_app where attribute='229.195.196.98' and partner_code='demo' and app_name='autoTest' and type='ipAddress' 
  and sequence_id='1439981942982-51788461';
select * from velocity_app where attribute='229.195.196.98' and partner_code='demo' and app_name='autoTest' and type='ipAddress' 
  and timestamp=1439981942982 and sequence_id='1439981942982-51788461';

但是不支持二级索引的删除:  
🈲 delete from velocity_app where sequence_id='1439981942982-51788461';
🈲 delete from velocity_app where attribute='229.195.196.98' and sequence_id='1439981942982-51788461';
🈲 delete from velocity_app where attribute='229.195.196.98' and partner_code='demo' and app_name='autoTest' and type='ipAddress' and sequence_id='1439981942982-51788461';
>Bad Request: Non PRIMARY KEY sequence_id found in where clause

必须这样子删除:  
✅ delete from velocity_app where attribute='229.195.196.98' and partner_code='demo' and app_name='autoTest' and type='ipAddress';

上面的删除并没有sequence_id的限制条件. 为了能根据sequence_id删除记录,需要把sequence_id作为cluster key.

2.把sequence_id作为唯一的cluster key: 👈 We Choose This!!

CREATE TABLE velocity_seq (
  attribute text,
  partner_code text,
  app_name text,
  type text,
  "timestamp" bigint,
  event text,
  sequence_id text,
  PRIMARY KEY ((attribute, partner_code, app_name, type), sequence_id)
)WITH CLUSTERING ORDER BY (sequence_id DESC)

insert into velocity_seq(attribute,partner_code,app_name,type,"timestamp",event,sequence_id)values('test001','test001','test001','test001',12345,'event1','12345');
insert into velocity_seq(attribute,partner_code,app_name,type,"timestamp",event,sequence_id)values('test001','test001','test001','test001',12346,'event2','12346');
insert into velocity_seq(attribute,partner_code,app_name,type,"timestamp",event,sequence_id)values('test001','test001','test001','test001',12347,'event1','12347');
insert into velocity_seq(attribute,partner_code,app_name,type,"timestamp",event,sequence_id)values('test001','test001','test001','test001',12348,'event1','12348');

✅ delete from velocity_seq where attribute='test001' and partner_code='test001' and app_name='test001' and type='test001' and sequence_id='12345';

允许最后一个cluster key进行范围查询:  
✅ select * from velocity_seq where attribute='test001' and partner_code='test001' and app_name='test001' and type='test001' and sequence_id>='1234' and sequence_id<='1235';

原先使用timestamp作为cluster key作为cluster key, 不仅仅是为了按照时间倒序排列, 也可以作为TTL查询的限制条件避免查询tombstone: where timestamp>=currentTime-3M
因为sequence_id本身就有timestamp的业务意义, 所以也可以使用sequence_id来代替timestamp的限制来避免查询tombstone: where sequence_id>=’currentTime-3M’

假设timestamp=1439981914929, sequence_id=’1439981914929-81748374’, 假设currentTime-3M=1439981914930.
where timestamp>=1439981914930可以过滤这条数据. where sequence_id>=’1439981914930’也能过滤: 因为’1439981914930’ > ‘1439981914929-81748374’

insert into velocity_seq(attribute,partner_code,app_name,type,"timestamp",event,sequence_id)values('test001','test001','test001','test001',1439981914929,'event1','1439981914929-81748374');
insert into velocity_seq(attribute,partner_code,app_name,type,"timestamp",event,sequence_id)values('test001','test001','test001','test001',1439981914920,'event1','1439981914920-81748374');
insert into velocity_seq(attribute,partner_code,app_name,type,"timestamp",event,sequence_id)values('test001','test001','test001','test001',1439981914940,'event1','1439981914940-81748374');
insert into velocity_seq(attribute,partner_code,app_name,type,"timestamp",event,sequence_id)values('test001','test001','test001','test001',1439981914920,'event1','1439981914920-81748375');

 attribute | partner_code | app_name | type    | sequence_id            | event  | timestamp
-----------+--------------+----------+---------+------------------------+--------+---------------
   test001 |      test001 |  test001 | test001 | 1439981914940-81748374 | event1 | 1439981914940
   test001 |      test001 |  test001 | test001 | 1439981914929-81748374 | event1 | 1439981914929
   test001 |      test001 |  test001 | test001 | 1439981914920-81748375 | event1 | 1439981914920
   test001 |      test001 |  test001 | test001 | 1439981914920-81748374 | event1 | 1439981914920

✅ select * from velocity_seq where attribute='test001' and partner_code='test001' and app_name='test001' and type='test001' and sequence_id>='1439981914930';

3.原先就存在cluster key:timestamp, 删除时必须指定全部的primary key:

CREATE TABLE velocity_tseq (
  attribute text,
  partner_code text,
  app_name text,
  type text,
  "timestamp" bigint,
  event text,
  sequence_id text,
  PRIMARY KEY ((attribute, partner_code, app_name, type), "timestamp",sequence_id)
)WITH CLUSTERING ORDER BY ("timestamp" desc,sequence_id DESC);

insert into velocity_tseq(attribute,partner_code,app_name,type,"timestamp",event,sequence_id)values('test001','test001','test001','test001',12345,'event1','12345');
insert into velocity_tseq(attribute,partner_code,app_name,type,"timestamp",event,sequence_id)values('test001','test001','test001','test001',12346,'event2','12346');

🈲 delete from velocity_tseq where attribute='test001' and partner_code='test001' and app_name='test001' and type='test001' and sequence_id='12345';
Bad Request: Missing PRIMARY KEY part timestamp since sequence_id is set

✅ delete from velocity_tseq where attribute='test001' and partner_code='test001' and app_name='test001' and type='test001' and timestamp=12345 and sequence_id='12345';

指定sequence_id, 对timestamp采用范围查询, 失败!

insert into velocity_tseq(attribute,partner_code,app_name,type,"timestamp",event,sequence_id)values('test001','test001','test001','test001',12345,'event1','12345');
insert into velocity_tseq(attribute,partner_code,app_name,type,"timestamp",event,sequence_id)values('test001','test001','test001','test001',12346,'event1','12346');
insert into velocity_tseq(attribute,partner_code,app_name,type,"timestamp",event,sequence_id)values('test001','test001','test001','test001',12347,'event1','12347');
insert into velocity_tseq(attribute,partner_code,app_name,type,"timestamp",event,sequence_id)values('test001','test001','test001','test001',12348,'event1','12348');

不能跳过timestamp直接查询sequence_id:  
🈲 select * from velocity_tseq where attribute='test001' and partner_code='test001' and app_name='test001' and type='test001' and sequence_id='12345';

指定sequence_id,不能对timestamp进行范围查询:  
🈲 select * from velocity_tseq where attribute='test001' and partner_code='test001' and app_name='test001' and type='test001' and timestamp>=12346 and timestamp<=12347 and sequence_id='12345';
Bad Request: PRIMARY KEY column "sequence_id" cannot be restricted (preceding column "timestamp" is restricted by a non-EQ relation)

查询不行, 删除当然也不行了:  
🈲 delete from velocity_tseq where attribute='test001' and partner_code='test001' and app_name='test001' and type='test001' and timestamp>=12346 and timestamp<=12347 and sequence_id='12345';
Bad Request: Invalid operator >= for PRIMARY KEY part timestamp

http://www.datastax.com/dev/blog/a-deep-look-to-the-cql-where-clause
这里说对2.2以前的版本, 只能对cluster keys的最后一个column进行范围查询. 下面的表结构sequence_id用等号查询, timestamp用范围限定, 貌似能够符合.

4.调换sequence_id和timestamp, 指定sequence_id, 并对timstamp进行范围查询:

CREATE TABLE velocity_seqt (
  attribute text,
  partner_code text,
  app_name text,
  type text,
  "timestamp" bigint,
  event text,
  sequence_id text,
  PRIMARY KEY ((attribute, partner_code, app_name, type), sequence_id, "timestamp")
)WITH CLUSTERING ORDER BY (sequence_id DESC, "timestamp" desc);

insert into velocity_seqt(attribute,partner_code,app_name,type,"timestamp",event,sequence_id)values('test001','test001','test001','test001',12345,'event1','12345');
insert into velocity_seqt(attribute,partner_code,app_name,type,"timestamp",event,sequence_id)values('test001','test001','test001','test001',12346,'event1','12346');
insert into velocity_seqt(attribute,partner_code,app_name,type,"timestamp",event,sequence_id)values('test001','test001','test001','test001',12347,'event1','12347');
insert into velocity_seqt(attribute,partner_code,app_name,type,"timestamp",event,sequence_id)values('test001','test001','test001','test001',12348,'event1','12348');

查询时不指定cluster key, 但必须指定所有的partition key: 
✅ select * from velocity_seqt where attribute='test001' and partner_code='test001' and app_name='test001' and type='test001';
指定第一个cluster key:  
✅ select * from velocity_seqt where attribute='test001' and partner_code='test001' and app_name='test001' and type='test001' and sequence_id='12345';
指定所有的cluser key:  
✅ select * from velocity_seqt where attribute='test001' and partner_code='test001' and app_name='test001' and type='test001' and sequence_id='12345' and timestamp=12345;
但是不能跳过前面的cluser key, 直接使用后面的cluser key:  
🈲 select * from velocity_seqt where attribute='test001' and partner_code='test001' and app_name='test001' and type='test001' and timestamp=12345;

对最后一个cluser key查询范围也是没有问题的: 
✅ select * from velocity_seqt where attribute='test001' and partner_code='test001' and app_name='test001' and type='test001' and sequence_id='12345' and timestamp>=12345;
✅ select * from velocity_seqt where attribute='test001' and partner_code='test001' and app_name='test001' and type='test001' and sequence_id='12345' and timestamp>=12345 and timestamp<=12347;

查询没问题, 并不等于删除没问题: 只能用等号删除, 不能用>=范围删除.  
🈲 delete from velocity_seqt where attribute='test001' and partner_code='test001' and app_name='test001' and type='test001' and sequence_id='12345' and timestamp>=12345 and timestamp<=12347;
Bad Request: Invalid operator >= for PRIMARY KEY part timestamp

delete from velocity_seqt where attribute='test001' and partner_code='test001' and app_name='test001' and type='test001' and sequence_id='12345';

重命名表

https://issues.apache.org/jira/browse/CASSANDRA-1585
http://stackoverflow.com/questions/18112384/how-to-rename-table-in-cassandra-cql3

实验1：PK相同

create table fun(k int, v int, primary key((k)));
create table fun2(k int, v int, primary key((k)));

insert into fun(k,v)values(1,1);
insert into fun2(k,v)values(3,3);
insert into fun2(k,v)values(2,2);

flush,关闭集群，
修改fun2的文件名称为fun
删除fun,修改fun2为fun,
重启集群，查询
cd ~/ && nodetool flush && sudo -u admin kill -9 `sudo -u admin jps | grep CassandraDaemon |awk '{print $1}'`

cqlsh:test> select * from fun;
 k | v
---+---
 2 | 2
 3 | 3

实验2：PK相同，更改排序方式

create table test1(k int, v int, ts int, primary key((k),ts));
create table test2(k int, v int, ts int, primary key((k),ts)) WITH CLUSTERING ORDER BY (ts DESC);

insert into test1(k,v,ts)values(1,1,1);
insert into test1(k,v,ts)values(2,2,2);
insert into test2(k,v,ts)values(3,3,3);
insert into test2(k,v,ts)values(4,4,4);

cd ~/ && nodetool flush && sudo -u admin kill -9 `sudo -u admin jps | grep CassandraDaemon |awk '{print $1}'`
cd /home/admin/cassandra/data/test
sudo cp -r test1 test1_back
sudo cp -r test2 test2_back
sudo rm -r test1
sudo mv test2 test1 && cd test1
sudo rename test2 test1 *
cqlsh 192.168.6.52

由于没有删除表，所以test1仍然是升序，尽管数据是test2的，但那只是写到test2时的顺序

insert into test1(k,v,ts)values(5,5,5);
cqlsh:test> select * from test1;

 k | ts | v
---+----+---
 5 |  5 | 5
 4 |  4 | 4
 3 |  3 | 3

(3 rows)

cqlsh:test>     insert into test1(k,v,ts)values(1,1,1);
cqlsh:test> select * from test1;

 k | ts | v
---+----+---
 5 |  5 | 5
 1 |  1 | 1
 4 |  4 | 4
 3 |  3 | 3

那么就要修改Cassandra中system相关的表元数据才可以做到。

实验3：PK不同

1.准备两张表，表名以及PK不同

CREATE TABLE velocity_partner(
  attribute text,       
  partner_code text,    
  app_name text,        
  type text,            
  "timestamp" bigint,   
  event text,           
  sequence_id text,     
  PRIMARY KEY ((attribute, partner_code), sequence_id)
) WITH CLUSTERING ORDER BY (sequence_id DESC);

CREATE TABLE velocity_partner2(
  attribute text,       
  partner_code text,    
  app_name text,        
  type text,            
  "timestamp" bigint,   
  event text,           
  sequence_id text,     
  PRIMARY KEY ((attribute, partner_code, type), sequence_id)
) WITH CLUSTERING ORDER BY (sequence_id DESC);

insert into velocity_partner(attribute,partner_code,app_name,type,"timestamp",event,sequence_id)
values('test001','test001','test001','ip',123451,'event1','12345-1');
insert into velocity_partner(attribute,partner_code,app_name,type,"timestamp",event,sequence_id)
values('test002','test002','test002','account',123452,'event1','12345-2');

insert into velocity_partner2(attribute,partner_code,app_name,type,"timestamp",event,sequence_id)
values('test003','test003','test003','ip',123453,'event1','12345-3');
insert into velocity_partner2(attribute,partner_code,app_name,type,"timestamp",event,sequence_id)
values('test004','test004','test004','account',123454,'event1','12345-4');

select attribute,sequence_id,timestamp from velocity_partner;
select attribute,sequence_id,timestamp from velocity_partner2;

2.nodetool flush

3.关闭Cassandra

4.删除velocity_partner文件夹，将velocity_partner2重命名为velocity_partner,并且更改文件名：

1
2
3

sudo rm -r velocity_partner
sudo mv velocity_partner2 velocity_partner && cd velocity_partner
sudo rename partner2 partner *

5.重启Cassandra

cqlsh:test> select * from velocity_partner;
TSocket read 0 bytes
cqlsh:test> select * from velocity_partner2;
TSocket read 0 bytes
cqlsh:test> desc table velocity_partner;
[Errno 32] Broken pipe
cqlsh:test> desc table velocity_partner2;
[Errno 32] Broken pipe

query

WARN  [SharedPool-Worker-5] 2016-06-26 22:27:47,129 NoSpamLogger.java:94 - Unlogged batch covering 15 partitions detected against tables 
[forseti.account_attached_device, forseti.device_attached_ip, forseti.ip_attached_account, forseti.ip_attached_device, 
forseti.account_attached_ip, forseti.device_attached_account]. 
You should use a logged batch for atomicity, or asynchronous writes for performance.

本文标题:Cassandra陷阱

文章作者:任何忧伤,都抵不过世界的美丽

发布时间:2015年10月15日 - 00时00分

最后更新:2019年02月14日 - 21时42分

原始链接:http://github.com/zqhxuyuan/2015/10/15/Cassandra-hole/

许可协议: "署名-非商用-相同方式共享 3.0" 转载请保留原文链接及作者。