HBase QA

HBase Q & A

INCONSISTENT status

SYSTEM.FUNCTION表损坏

1
2
3
15/12/16 19:16:14 DEBUG client.ConnectionManager$HConnectionImplementation: 
locateRegionInMeta parentTable=hbase:meta, metaLocation=, attempt=6 of 35 failed; retrying after sleep of 4039 because:
No server address listed in hbase:meta for region SYSTEM.FUNCTION,,1450264148924.1125a5495cbbf583864609c67a2b804b. containing row

修复元数据,SYSTEM.FUNCTION表在HBase中不一致性

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
[qihuang.zheng@spark047213 bin]$ hbase hbck -fixMeta -fixAssignments
Number of empty REGIONINFO_QUALIFIER rows in hbase:meta: 0
Number of Tables: 6
ERROR: There is a hole in the region chain between and . You need to create a new .regioninfo and region dir in hdfs to plug the hole.
ERROR: Found inconsistency in table SYSTEM.FUNCTION
Summary:
hbase:meta is okay.
Number of regions: 1
Deployed on: spark047245,16020,1450263724611
SYSTEM.CATALOG is okay.
Number of regions: 1
Deployed on: spark047243,16020,1450263724657
table is okay.
Number of regions: 1
Deployed on: spark047244,16020,1450263724611
hbase:namespace is okay.
Number of regions: 1
Deployed on: spark047244,16020,1450263724611
SYSTEM.SEQUENCE is okay.
Number of regions: 256
Deployed on: spark047241,16020,1450263725671 spark047242,16020,1450263724591 spark047243,16020,1450263724657 spark047244,16020,1450263724611 spark047245,16020,1450263724611
SYSTEM.FUNCTION is okay.
Number of regions: 0
Deployed on:
SYSTEM.STATS is okay.
Number of regions: 1
Deployed on: spark047242,16020,1450263724591
1 inconsistencies detected.
Status: INCONSISTENT

HBase启动的时候, 用hbase shell的list是看不到任何table的. 启动sqlline后, phoenix会创建上面的四张系统表.
下面的第一行SYSTEM.FUNCTION标记为红色的. 正常的话如果表没有损坏,是不会出现Regions in Transition

hbase_function

解决方式: 删除ZooKeeper中的/hbase文件夹, 并重启HBase集群.

1
2
[qihuang.zheng@spark047213 ~]$ hbase zkcli
[zk: 192.168.47.84:2181,192.168.47.83:2181,192.168.47.86:2181(CONNECTED) 10] rmr /hbase

注意只是删除FUNCTION表貌似不起作用. delete /hbase/table/SYSTEM.FUNCTION, 要删除整个目录.

节点假死

检测到一个节点当掉了,但是在master页面看这个RS是没有问题的.

1
2
3
4
5
6
7
8
9
10
11
hbase(main):003:0> scan 'hbase:meta'
ROW COLUMN+CELL

ERROR: org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: Server spark047217,16020,1450843709048 not running, aborting
at org.apache.hadoop.hbase.regionserver.RSRpcServices.checkOpen(RSRpcServices.java:903)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2003)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31443)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2033)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)

hbase hbck检查:

1
2
3
4
5
6
7
8
9
10
11
12
ERROR: RegionServer: spark047209,16020,1450843709733 Unable to fetch region information. org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: Server spark047209,16020,1450843709733 not running, aborting
at org.apache.hadoop.hbase.regionserver.RSRpcServices.checkOpen(RSRpcServices.java:903)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.getOnlineRegion(RSRpcServices.java:1132)
at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:21110)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2033)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
at java.lang.Thread.run(Thread.java:744)

ERROR: hbase:meta is not found on any region.
ERROR: hbase:meta table is not consistent. Run HBCK with proper fix options to fix hbase:meta inconsistency. Exiting...

修复hbase hbck -fix失败:

1
2
3
ERROR: Region { meta => data.md5_id2,e4f02e929cd2c7bf53d63e6645904dcf,1450579287871.01431d11f5dfb2c318209ac2bb397340., hdfs => hdfs://tdhdfs/hbase/data/default/data.md5_id2/01431d11f5dfb2c318209ac2bb397340, deployed =>  } not deployed on any region server.
Trying to fix unassigned region...
Exception in thread "main" java.io.IOException: Region {ENCODED => 01431d11f5dfb2c318209ac2bb397340, NAME => 'data.md5_id2,e4f02e929cd2c7bf53d63e6645904dcf,1450579287871.01431d11f5dfb2c318209ac2bb397340.', STARTKEY => 'e4f02e929cd2c7bf53d63e6645904dcf', ENDKEY => 'e559899b9'} failed to move out of transition within timeout 120000ms

解决方式: 还是rm /hbase并重启集群.

建表失败

1
2
3
4
5
6
hbase(main):001:0> create 'data.md5_id2', 'id', {NUMREGIONS => 16, SPLITALGO => 'HexStringSplit'}
ERROR: Only 2 of 16 regions are online; retries exhausted.

hbase(main):005:0> scan "data.md5_id2"
ROW COLUMN+CELL
ERROR: No server address listed in hbase:meta for region data.md5_id2,,1450350723788.d22366cc7199a7a524c910f35e5fe06b. containing row

后台RegionServer报错(跟这么应该没关系,因为所有RegionServer节点都报这个错误):

1
2
3
4
5
6
7
8
9
10
11
0    [ReplicationExecutor-0] ERROR org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper  - ZooKeeper multi failed after 4 attempts
99 [PriorityRpcServer.handler=9,queue=1,port=16020] ERROR org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper - ZooKeeper getData failed after 4 attempts
101 [PriorityRpcServer.handler=9,queue=1,port=16020] ERROR org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher -
regionserver:16020-0x1515d5299ea3b15, quorum=192.168.47.84:2181,192.168.47.83:2181,192.168.47.86:2181, baseZNode=/hbase Received unexpected KeeperException, re-throwing exception
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/recovering-regions/aab33827b7f8a9a006fd49059728d900
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:359)
at org.apache.hadoop.hbase.zookeeper.ZKSplitLog.isRegionMarkedRecoveringInZK(ZKSplitLog.java:159)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.openRegion(RSRpcServices.java:1411)
102 [PriorityRpcServer.handler=9,queue=1,port=16020] ERROR org.apache.hadoop.hbase.regionserver.RSRpcServices - Can't retrieve recovering state from zookeeper
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/recovering-regions/aab33827b7f8a9a006fd49059728d900

hbase-create-err

解决方式: 还是rm /hbase并重启集群. 最后每个节点都会deployed.
因为集群只有14个RegionServer,而Region数量是16个,所以会有两个节点的Region数为2.

hdfs fsck /hbase

由于hdfs的块损坏,在启动hbase的时候, 会出现Region in transition...failed open.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
[qihuang.zheng@spark047213 ~]$ hbase hbck -fixMeta -fixAssignments
Number of live region servers: 14
Number of dead region servers: 0
Master: spark047213,16020,1454292347429
Number of backup masters: 0
Average load: 151.14285714285714
Number of requests: 0
Number of regions: 2116
Number of regions in transition: 6
...
Number of empty REGIONINFO_QUALIFIER rows in hbase:meta: 0
Number of Tables: 4
ERROR: Region { meta => data.md5_mob2,1c00182ae,1450693138672.30d0ca1e3c34e95d029e8a9abf877c85., hdfs => hdfs://tdhdfs/hbase/data/default/data.md5_mob2/30d0ca1e3c34e95d029e8a9abf877c85, deployed => } not deployed on any region server.
Trying to fix unassigned region...
ERROR: Region { meta => data.md5_mob2,f40010ac,1450686032580.61d1c4a83ebc953d35fc819c698a8b0f., hdfs => hdfs://tdhdfs/hbase/data/default/data.md5_mob2/61d1c4a83ebc953d35fc819c698a8b0f, deployed => } not deployed on any region server.
Trying to fix unassigned region...
ERROR: Region { meta => data.md5_mob2,1800067f2,1450693138672.680e340c757929c0d5befb690106d562., hdfs => hdfs://tdhdfs/hbase/data/default/data.md5_mob2/680e340c757929c0d5befb690106d562, deployed => } not deployed on any region server.
Trying to fix unassigned region...
ERROR: Region { meta => data.md5_mob2,55ffe6f3b,1450697992540.8b00ce8e47d8cce2fced405665cada13., hdfs => hdfs://tdhdfs/hbase/data/default/data.md5_mob2/8b00ce8e47d8cce2fced405665cada13, deployed => } not deployed on any region server.
Trying to fix unassigned region...
ERROR: Region { meta => data.md5_mob2,fbffd2a37,1450688301202.b6732381185758ec64af441618cf120c., hdfs => hdfs://tdhdfs/hbase/data/default/data.md5_mob2/b6732381185758ec64af441618cf120c, deployed => } not deployed on any region server.
Trying to fix unassigned region...
ERROR: Region { meta => data.md5_mob2,23fff727,1450695213674.e2e9b928571072f2d71d12fdc32ed0b3., hdfs => hdfs://tdhdfs/hbase/data/default/data.md5_mob2/e2e9b928571072f2d71d12fdc32ed0b3, deployed => } not deployed on any region server.
Trying to fix unassigned region...
ERROR: There is a hole in the region chain between 1800067f2 and 20000000. You need to create a new .regioninfo and region dir in hdfs to plug the hole.
ERROR: There is a hole in the region chain between 23fff727 and 280000:. You need to create a new .regioninfo and region dir in hdfs to plug the hole.
ERROR: There is a hole in the region chain between 55ffe6f3b and 57ffdc07. You need to create a new .regioninfo and region dir in hdfs to plug the hole.
ERROR: There is a hole in the region chain between f40010ac and f7ffeaab. You need to create a new .regioninfo and region dir in hdfs to plug the hole.
ERROR: Last region should end with an empty key. You need to create a new region and regioninfo in HDFS to plug the hole.
ERROR: Found inconsistency in table data.md5_mob2
Summary:
hbase:meta is okay.
Number of regions: 1
Deployed on: spark047244,16020,1454292349019
Table data.md5_mob2 is inconsistent.
Number of regions: 64
Deployed on: spark047207,16020,1454292350149 spark047209,16020,1454292350575 spark047212,16020,1454292350624 spark047215,16020,1454292349389 spark047216,16020,1454292349182 spark047217,16020,1454292348998 spark047218,16020,1454292348642 spark047219,16020,1454292349586 spark047223,16020,1454292348854 spark047241,16020,1454292349570 spark047242,16020,1454292349743 spark047243,16020,1454292349509 spark047244,16020,1454292349019 spark047245,16020,1454292350582
data.md5_id2 is okay.
Number of regions: 2049
Deployed on: spark047207,16020,1454292350149 spark047209,16020,1454292350575 spark047212,16020,1454292350624 spark047215,16020,1454292349389 spark047216,16020,1454292349182 spark047217,16020,1454292348998 spark047218,16020,1454292348642 spark047219,16020,1454292349586 spark047223,16020,1454292348854 spark047241,16020,1454292349570 spark047242,16020,1454292349743 spark047243,16020,1454292349509 spark047244,16020,1454292349019 spark047245,16020,1454292350582
data.md5_mob is okay.
Number of regions: 1
Deployed on: spark047245,16020,1454292350582
hbase:namespace is okay.
Number of regions: 1
Deployed on: spark047241,16020,1454292349570
11 inconsistencies detected.
Status: INCONSISTENT
Version: 1.0.2
Number of live region servers: 14
Number of dead region servers: 0
Master: spark047213,16020,1454292347429
Number of backup masters: 0
Average load: 151.14285714285714
Number of requests: 0
Number of regions: 2116
Number of regions in transition: 6
...
Number of empty REGIONINFO_QUALIFIER rows in hbase:meta: 0
Number of Tables: 4
Summary:
hbase:meta is okay.
Number of regions: 1
Deployed on: spark047244,16020,1454292349019
data.md5_mob2 is okay.
Number of regions: 70
Deployed on: spark047207,16020,1454292350149 spark047209,16020,1454292350575 spark047212,16020,1454292350624 spark047215,16020,1454292349389 spark047216,16020,1454292349182 spark047217,16020,1454292348998 spark047218,16020,1454292348642 spark047219,16020,1454292349586 spark047223,16020,1454292348854 spark047241,16020,1454292349570 spark047242,16020,1454292349743 spark047243,16020,1454292349509 spark047244,16020,1454292349019 spark047245,16020,1454292350582
data.md5_id2 is okay.
Number of regions: 2049
Deployed on: spark047207,16020,1454292350149 spark047209,16020,1454292350575 spark047212,16020,1454292350624 spark047215,16020,1454292349389 spark047216,16020,1454292349182 spark047217,16020,1454292348998 spark047218,16020,1454292348642 spark047219,16020,1454292349586 spark047223,16020,1454292348854 spark047241,16020,1454292349570 spark047242,16020,1454292349743 spark047243,16020,1454292349509 spark047244,16020,1454292349019 spark047245,16020,1454292350582
data.md5_mob is okay.
Number of regions: 1
Deployed on: spark047245,16020,1454292350582
hbase:namespace is okay.
Number of regions: 1
Deployed on: spark047241,16020,1454292349570
0 inconsistencies detected.
Status: OK

RegionSplit

第一次导入了一部分数据, 虽然建表时指定了16个Region, 在导入HTable后, Region数量增加了.

region_incr

BulkLoad没有移动而是复制和Split文件

http://chxt6896.github.io/hbase/2013/06/06/hbase-bulkload.html
BulkLoad只是移动文件, 会很快. 但是实际中在第一次导入一个文件夹很快, 导入第二个文件夹时,就出问题了:

hbase_copy

不同集群会复制

http://blackwing.iteye.com/blog/1991901这里说到没有使用同一个集群,会被认为不同的文件系统,就会拷贝,而不是移动.

1
2
3
4
5
HADOOP_CLASSPATH=`hbase-1.0.2/bin/hbase classpath` /usr/install/hadoop/bin/hadoop jar hbase-1.0.2/lib/hbase-server-1.0.2.jar completebulkload hdfs://tdhdfs/user/tongdun/id_hbase/id_hbase2_7 data.md5_id2
hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles hdfs://tdhdfs/user/tongdun/id_hbase/id_hbase2_7 data.md5_id2

hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /user/tongdun/id_hbase/id_hbase3_11 data.md5_id2
HADOOP_CLASSPATH=`hbase classpath` hadoop jar hbase-1.0.2/lib/hbase-server-1.0.2.jar completebulkload /user/tongdun/id_hbase/1 data.md5_id2

虽然添加了完整的前缀,但是还是会在_tmp下出现文件,说明还是在拷贝.但是这个时候是属于同一个HDFS集群的.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
public static void testSameHDFS() throws Exception{
Configuration hdfsConfig = new Configuration();
hdfsConfig.addResource("core-site.xml");
hdfsConfig.addResource("hdfs-site.xml");
FileSystem hdfsFS = FileSystem.get(hdfsConfig);
System.out.println(hdfsFS.getCanonicalServiceName());

Configuration config = new Configuration();
config.set("hbase.zookeeper.quorum","192.168.6.55,192.168.6.56,192.168.6.57");
config.set("hbase.zookeeper.property.clientPort", "2181");
HTable table = new HTable(config, "data.md5_id");

FileSystem fs = FileSystem.get(config);
FileSystem desFs = fs instanceof HFileSystem ? ((HFileSystem)fs).getBackingFs() : fs;
String destName = desFs.getCanonicalServiceName();
System.out.println(destName); //ha-hdfs:tdhdfs

boolean sameHDFS = FSHDFSUtils.isSameHdfs(config, hdfsFS, desFs);
System.out.println(sameHDFS); //true
}

在运行上面命令的时候并没有打印日志信息,因为slf4j冲突.

1
2
3
4
5
6
7
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/install/hadoop-2.4.1/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/qihuang.zheng/hbase-1.0.2/lib/phoenix-4.6.0-HBase-1.0-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/qihuang.zheng/hbase-1.0.2/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/qihuang.zheng/hbase-1.0.2/lib/phoenix-server-4.6.0-HBase-1.0-runnable.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

把hbase的slf4j和phoenix暂时移出去. 日志就出来了.

1
2
3
4
5
15/12/23 11:00:58 WARN mapreduce.LoadIncrementalHFiles: Skipping non-directory hdfs://tdhdfs/user/tongdun/id_hbase/id_hbase2_7/_SUCCESS
15/12/23 11:00:58 WARN mapreduce.LoadIncrementalHFiles: Trying to bulk load hfile hdfs://tdhdfs/user/tongdun/id_hbase/id_hbase2_7/id/131aa70ab69548dfab88666fb165cc61 with size: 10992654874 bytes can be problematic as it may lead to oversplitting.

15/12/23 11:01:01 INFO mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://tdhdfs/user/tongdun/id_hbase/id_hbase2_7/id/131aa70ab69548dfab88666fb165cc61 first=b000000144f06b8bd0249cf57659354e last=b85372dde886be24a5745dd5cbcd7cd6
15/12/23 11:01:01 INFO mapreduce.LoadIncrementalHFiles: HFile at hdfs://tdhdfs/user/tongdun/id_hbase/id_hbase2_7/id/131aa70ab69548dfab88666fb165cc61 no longer fits inside a single region. Splitting...

这就导致了上面我看到的_tmp下的文件是以128M递增, 因为有些文件很大,不能放入一个Region中,所以就要进行split.
hbase.hregion.max.filesize默认的大小是10G(107374182400),而我们的HFile有些超过10G,为了不Split,可以设置为50G=53687091200.

1
2
3
4
<property>
<name>hbase.hregion.max.filesize</name>
<value>53687091200</value>
</property>

Split Retry Many Times

But Still Not Work!!! Ask for help…

I Have a HFile generate by importtsv, the file is really large, from 100mb to 10G.
I have changed hbase.hregion.max.filesize to 50GB(53687091200). also specify src CanonicalServiceName same with hbase.

1
2
hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles hdfs://tdhdfs/user/tongdun/id_hbase/1 data.md5_id2 
HADOOP_CLASSPATH=`hbase classpath` hadoop jar hbase-1.0.2/lib/hbase-server-1.0.2.jar completebulkload /user/tongdun/id_hbase/1 data.md5_id2

But both completebulkload and LoadIncrementalHFiles did’t just mv/rename hfile expected.
but instead copy and split hfile happening, which take long time.

the log Split occured while grouping HFiles, retry attempt XXX will create child _tmp dir one by one level.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
2015-12-23 15:52:04,909 INFO  [LoadIncrementalHFiles-0] hfile.CacheConfig: CacheConfig:disabled
2015-12-23 15:52:05,006 INFO [LoadIncrementalHFiles-0] mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/01114a58782b4c369819673e4b3678ae first=f6eb30074a52ebb8c5f52ed1c85c2f0d last=f93061a29e9458fada2521ffe45ca385
2015-12-23 15:52:05,007 INFO [LoadIncrementalHFiles-0] mapreduce.LoadIncrementalHFiles: HFile at hdfs://tdhdfs/user/tongdun/id_hbase/1/id/01114a58782b4c369819673e4b3678ae no longer fits inside a single region. Splitting...
2015-12-23 15:53:38,639 INFO [LoadIncrementalHFiles-0] mapreduce.LoadIncrementalHFiles: Successfully split into new HFiles hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/9f6fe2d28ddc4f209be62757ace8611b.bottom and hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/9f6fe2d28ddc4f209be62757ace8611b.top
2015-12-23 15:53:39,173 INFO [main] mapreduce.LoadIncrementalHFiles: Split occured while grouping HFiles, retry attempt 1 with 2 files remaining to group or split
2015-12-23 15:53:39,186 INFO [LoadIncrementalHFiles-1] mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/9f6fe2d28ddc4f209be62757ace8611b.bottom first=f6eb30074a52ebb8c5f52ed1c85c2f0d last=f733d2c504f22f71b191014d72e4d124
2015-12-23 15:53:39,188 INFO [LoadIncrementalHFiles-2] mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/9f6fe2d28ddc4f209be62757ace8611b.top first=f733d2c6407f5758e860195b6d2c10c1 last=f93061a29e9458fada2521ffe45ca385
2015-12-23 15:53:39,189 INFO [LoadIncrementalHFiles-2] mapreduce.LoadIncrementalHFiles: HFile at hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/9f6fe2d28ddc4f209be62757ace8611b.top no longer fits inside a single region. Splitting...
2015-12-23 15:54:27,722 INFO [LoadIncrementalHFiles-2] mapreduce.LoadIncrementalHFiles: Successfully split into new HFiles hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/17ba0f42c4934f4c96218c784d3c3bb0.bottom and hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/17ba0f42c4934f4c96218c784d3c3bb0.top
2015-12-23 15:54:28,557 INFO [main] mapreduce.LoadIncrementalHFiles: Split occured while grouping HFiles, retry attempt 2 with 2 files remaining to group or split
2015-12-23 15:54:28,568 INFO [LoadIncrementalHFiles-4] mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/17ba0f42c4934f4c96218c784d3c3bb0.bottom first=f733d2c6407f5758e860195b6d2c10c1 last=f77c7d357a76ff92bb16ec1ef79f31fb
2015-12-23 15:54:28,568 INFO [LoadIncrementalHFiles-5] mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/17ba0f42c4934f4c96218c784d3c3bb0.top first=f77c7d3915c9a8b71c83c414aabd587d last=f93061a29e9458fada2521ffe45ca385
2015-12-23 15:54:28,568 INFO [LoadIncrementalHFiles-5] mapreduce.LoadIncrementalHFiles: HFile at hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/17ba0f42c4934f4c96218c784d3c3bb0.top no longer fits inside a single region. Splitting...
2015-12-23 15:55:08,992 INFO [LoadIncrementalHFiles-5] mapreduce.LoadIncrementalHFiles: Successfully split into new HFiles hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/f7162cec4e404eabbea479b2a5446294.bottom and hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/f7162cec4e404eabbea479b2a5446294.top
2015-12-23 15:55:09,424 INFO [main] mapreduce.LoadIncrementalHFiles: Split occured while grouping HFiles, retry attempt 3 with 2 files remaining to group or split
2015-12-23 15:55:09,431 INFO [LoadIncrementalHFiles-7] mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/f7162cec4e404eabbea479b2a5446294.bottom first=f77c7d3915c9a8b71c83c414aabd587d last=f7c525a83ee19ea166414e972c5d5541
2015-12-23 15:55:09,433 INFO [LoadIncrementalHFiles-8] mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/f7162cec4e404eabbea479b2a5446294.top first=f7c525aa2ec661c1c0707b02d1c4b4b3 last=f93061a29e9458fada2521ffe45ca385
2015-12-23 15:55:09,433 INFO [LoadIncrementalHFiles-8] mapreduce.LoadIncrementalHFiles: HFile at hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/f7162cec4e404eabbea479b2a5446294.top no longer fits inside a single region. Splitting...
2015-12-23 15:55:42,165 INFO [LoadIncrementalHFiles-8] mapreduce.LoadIncrementalHFiles: Successfully split into new HFiles hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/6610bd5d178e423fbe02db1865f834f0.bottom and hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/6610bd5d178e423fbe02db1865f834f0.top
2015-12-23 15:55:42,490 INFO [main] mapreduce.LoadIncrementalHFiles: Split occured while grouping HFiles, retry attempt 4 with 2 files remaining to group or split
2015-12-23 15:55:42,498 INFO [LoadIncrementalHFiles-10] mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/6610bd5d178e423fbe02db1865f834f0.bottom first=f7c525aa2ec661c1c0707b02d1c4b4b3 last=f80dcce8a4a14be406ddd1bdebc2eda2
2015-12-23 15:55:42,502 INFO [LoadIncrementalHFiles-11] mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/6610bd5d178e423fbe02db1865f834f0.top first=f80dccecf159d4999cb8e17446103d72 last=f93061a29e9458fada2521ffe45ca385
2015-12-23 15:55:42,502 INFO [LoadIncrementalHFiles-11] mapreduce.LoadIncrementalHFiles: HFile at hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/6610bd5d178e423fbe02db1865f834f0.top no longer fits inside a single region. Splitting...
2015-12-23 15:56:09,560 INFO [LoadIncrementalHFiles-11] mapreduce.LoadIncrementalHFiles: Successfully split into new HFiles hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/8f07441d8b7c4d3ba37b6b0917860f68.bottom and hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/8f07441d8b7c4d3ba37b6b0917860f68.top
2015-12-23 15:56:09,933 INFO [main] mapreduce.LoadIncrementalHFiles: Split occured while grouping HFiles, retry attempt 5 with 2 files remaining to group or split
2015-12-23 15:56:09,942 INFO [LoadIncrementalHFiles-13] mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/8f07441d8b7c4d3ba37b6b0917860f68.bottom first=f80dccecf159d4999cb8e17446103d72 last=f85673f473ead63c89e96c83b2058ca7
2015-12-23 15:56:09,943 INFO [LoadIncrementalHFiles-14] mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/8f07441d8b7c4d3ba37b6b0917860f68.top first=f85673fde3138dac07ce08881c9d0ccc last=f93061a29e9458fada2521ffe45ca385
2015-12-23 15:56:09,944 INFO [LoadIncrementalHFiles-14] mapreduce.LoadIncrementalHFiles: HFile at hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/8f07441d8b7c4d3ba37b6b0917860f68.top no longer fits inside a single region. Splitting...
2015-12-23 15:56:30,890 INFO [LoadIncrementalHFiles-14] mapreduce.LoadIncrementalHFiles: Successfully split into new HFiles hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/feaa0a6428f24a5294c87dd87c6bc5a6.bottom and hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/feaa0a6428f24a5294c87dd87c6bc5a6.top
2015-12-23 15:56:31,145 INFO [main] mapreduce.LoadIncrementalHFiles: Split occured while grouping HFiles, retry attempt 6 with 2 files remaining to group or split
2015-12-23 15:56:31,151 INFO [LoadIncrementalHFiles-16] mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/feaa0a6428f24a5294c87dd87c6bc5a6.bottom first=f85673fde3138dac07ce08881c9d0ccc last=f89f12a56b5af206188639f736877563
2015-12-23 15:56:31,151 INFO [LoadIncrementalHFiles-17] mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/feaa0a6428f24a5294c87dd87c6bc5a6.top first=f89f12a59e4a9c9bcbb42d0504318e25 last=f93061a29e9458fada2521ffe45ca385
2015-12-23 15:56:31,151 INFO [LoadIncrementalHFiles-17] mapreduce.LoadIncrementalHFiles: HFile at hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/feaa0a6428f24a5294c87dd87c6bc5a6.top no longer fits inside a single region. Splitting...
2015-12-23 15:56:44,959 INFO [LoadIncrementalHFiles-17] mapreduce.LoadIncrementalHFiles: Successfully split into new HFiles hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/3886569dba4041deb4487f49d0417ca6.bottom and hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/3886569dba4041deb4487f49d0417ca6.top
2015-12-23 15:56:46,826 INFO [main] mapreduce.LoadIncrementalHFiles: Split occured while grouping HFiles, retry attempt 7 with 2 files remaining to group or split
2015-12-23 15:56:46,832 INFO [LoadIncrementalHFiles-19] mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/3886569dba4041deb4487f49d0417ca6.bottom first=f89f12a59e4a9c9bcbb42d0504318e25 last=f8e7bc423ca4799459898439bf0f68b2
2015-12-23 15:56:46,833 INFO [LoadIncrementalHFiles-20] mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://tdhdfs/user/tongdun/id_hbase/1/id/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/3886569dba4041deb4487f49d0417ca6.top first=f8e7bc4bc8c2e7eac7f7e31bc116f8e0 last=f93061a29e9458fada2521ffe45ca385
2015-12-23 15:56:46,930 INFO [main] client.ConnectionManager$HConnectionImplementation: Closing master protocol: MasterService
2015-12-23 15:56:46,931 INFO [main] client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x3515d529acedbaa
2015-12-23 15:56:46,960 INFO [main] zookeeper.ZooKeeper: Session: 0x3515d529acedbaa closed
2015-12-23 15:56:46,960 INFO [main-EventThread] zookeeper.ClientCnxn: EventThread shut down

even though the process finished, original hfile did’t delete. I was wondering why mv/rename command not happend.

1
2
3
[qihuang.zheng@spark047213 ~]$ hadoop fs -du -h /user/tongdun/id_hbase/1/id/
3.3 G /user/tongdun/id_hbase/1/id/01114a58782b4c369819673e4b3678ae
6.0 G /user/tongdun/id_hbase/1/id/_tmp

A:This is because the table region changes, not match with the regions when you get the HFiles

Why Split

http://chengjianxiaoxue.iteye.com/blog/2229591
http://koven2049.iteye.com/blog/982831
http://www.cnblogs.com/shitouer/archive/2013/02/20/hbase-hfile-bulk-load.html

If the region boundaries have changed during the course of bulk load preparation, or between the preparation and completion steps, the completebulkload utility will automatically split the data files into pieces corresponding to the new boundaries. This process is not optimally efficient, so users should take care to minimize the delay between preparing a bulk load and importing it into the cluster, especially if other clients are simultaneously loading data through other means.

BulkLoad的限制条件: 仅适合初次数据导入,即表内数据为空,或者每次入库表内都无数据的情况

Hfile批量入Hbase,LoadIncrementalHFiles会检查HFILE的目录(结构是主目录->列族目录->hfile),把需要批量导入的Hfile放到一个列表中,依次执行。导入每一个Hfile,系统会判断hfile是否再某个region中还是跨了region(通过前面说的startkey,endkey进行判断)。如果在一个region中,就是进行导入,系统会通过rpc调用regionserver的方法,先把HFILE复制Region所在的文件系统中(如果处于不同的FS),然后直接将HFILE直接改名为Region的一个StoreFile,并更新该Region的Storefile列表,这样该HFILE就入库。如果在HFILE跨的region,那么需要对Hfile进行split(分成.top和.bottom两个文件,每个文件单属于一个region,也是Hfile格式的,放在_tmp下,),接下来把split之后的文件进行入库

如果数据特别大,而表中原来就有region,那么会执行切分工作,查找数据对应的region并装载

回到我们的场景中, 因为我们原始文件有多个, 所以每次MR生成的HFile虽然都是有序的.
但是最后所有文件夹都导入时(每个文件夹是由一个MR作业生成),也需要保证每个HFile有序, 就会发生Split!!
如果一开始只读取一个文件夹内的所有文件, 虽然reduce会慢点,但是能保证输出的HFile是全局有序的!

问:HFILE为什么需要排序?
答:那是应为REGION中的storefile是有序的,只有hfile有序,才能通过简单的拆分和rename来转换为storefile,而实现入库。

问:有时Hfile入库成功后,原Hfile会被移除,有时又不是?
答:如果Hfile的key属于单个region时,直接通过rename导入的,所以原文件不见了。而如果key跨了region,那么需要split到_tmp下,最终入库的的split后的文件,原hfile没有动,所以需要人工去删它。

问:如果HFILE跨域多个region,会不会有问题?
答:每次split都是分成两个文件,前一个文件肯定是属于单个region的,后一个文件就不一定了,所以,处理split后的文件时会做同样的判断,确保跨region的hfile不断的split。

1
2
3
4
5
6
7
8
9
hbase(main):008:0> truncate "data.md5_id2"
Truncating 'data.md5_id2' table (it may take a while):
- Disabling table...
- Truncating table...
0 row(s) in 54.3080 seconds

hbase(main):009:0> scan "data.md5_id2"
ROW COLUMN+CELL
0 row(s) in 0.0540 seconds

清空表之后, Region只剩一个了. 所以最好要删掉表, 并重建表.

Ref


文章目录
  1. 1. INCONSISTENT status
    1. 1.1. SYSTEM.FUNCTION表损坏
    2. 1.2. 节点假死
    3. 1.3. 建表失败
    4. 1.4. hdfs fsck /hbase
  2. 2. RegionSplit
    1. 2.1. BulkLoad没有移动而是复制和Split文件
      1. 2.1.1. 不同集群会复制
      2. 2.1.2. Split Retry Many Times
      3. 2.1.3. Why Split
  3. 3. Ref