Split拆分

说明:
Split entire table or pass a region to split individual region.  With the 
second parameter, you can specify an explicit split key for the region.  
Examples:
split 'tableName'
split 'namespace:tableName'
split 'regionName' # format: 'tableName,startKey,id'
split 'tableName', 'splitKey'
split 'regionName', 'splitKey'

首先执行scan ‘hbase:meta’查看元数据,共有三条记录

hbase(main):001:0> scan 'hbase:meta'
ROW  COLUMN+CELL 
 hbase:namespace,,15 column=info:regioninfo, timestamp=1500279250104, value={
 00279203600.844a078 ENCODED => 844a078aa357b6e4bf848b42ce46a713, NAME => 'hb
 aa357b6e4bf848b42ce ase:namespace,,1500279203600.844a078aa357b6e4bf848b42ce4
 46a713. 6a713.', STARTKEY => '', ENDKEY => ''}  
 hbase:namespace,,15 column=info:seqnumDuringOpen, timestamp=1500279210909, v
 00279203600.844a078 alue=\x00\x00\x00\x00\x00\x00\x00\x02   
 aa357b6e4bf848b42ce 
 46a713. 
 hbase:namespace,,15 column=info:server, timestamp=1500279210909, value=maste
 00279203600.844a078 r:16020 
 aa357b6e4bf848b42ce 
 46a713. 
 hbase:namespace,,15 column=info:serverstartcode, timestamp=1500279210909, va
 00279203600.844a078 lue=1500279162649   
 aa357b6e4bf848b42ce 
 46a713. 
 test,,1500293557739 column=info:regioninfo, timestamp=1500293644890, value={
 .e686fd30012f1c4cb6 ENCODED => e686fd30012f1c4cb6d78a241d911202, NAME => 'te
 d78a241d911202. st,,1500293557739.e686fd30012f1c4cb6d78a241d911202.', ST
 ARTKEY => '', ENDKEY => ''} 
 test,,1500293557739 column=info:seqnumDuringOpen, timestamp=1500293644890, v
 .e686fd30012f1c4cb6 alue=\x00\x00\x00\x00\x00\x00\x00\x02   
 d78a241d911202. 
 test,,1500293557739 column=info:server, timestamp=1500293644890, value=slave
 .e686fd30012f1c4cb6 1:16020 
 d78a241d911202. 
 test,,1500293557739 column=info:serverstartcode, timestamp=1500293644890, va
 .e686fd30012f1c4cb6 lue=1500279205588   
 d78a241d911202. 
 test1,,150028521182 column=info:regioninfo, timestamp=1500288301234, value={
 0.ee4ccd1877cf2d579 ENCODED => ee4ccd1877cf2d5793b41f62a1552257, NAME => 'te
 3b41f62a1552257.st1,,1500285211820.ee4ccd1877cf2d5793b41f62a1552257.', S
 TARTKEY => '', ENDKEY => ''}
 test1,,150028521182 column=info:seqnumDuringOpen, timestamp=1500288301234, v
 0.ee4ccd1877cf2d579 alue=\x00\x00\x00\x00\x00\x00\x00\x03   
 3b41f62a1552257.
 test1,,150028521182 column=info:server, timestamp=1500288301234, value=maste
 0.ee4ccd1877cf2d579 r:16020 
 3b41f62a1552257.
 test1,,150028521182 column=info:serverstartcode, timestamp=1500288301234, va
 0.ee4ccd1877cf2d579 lue=1500279162649   
 3b41f62a1552257.
3 row(s) in 2.0130 seconds

然后执行 split ‘test’

hbase(main):003:0> split 'test'
0 row(s) in 1.4880 seconds

执行scan ‘hbase:meta’查看元数据,结果为5条记录证明正在拆分

再次执行scan ‘hbase:meta’查看元数据,共有四条记录,证明拆分完成

merge_region合并

说明:

hbase(main):001:0> help 'merge_region'
Merge two regions. Passing 'true' as the optional third parameter will force
a merge ('force' merges regardless else merge will fail unless passed
adjacent regions. 'force' is for expert use only).

NOTE: You must pass the encoded region name, not the full region name so
this command is a little different from other region operations.  The encoded
region name is the hash suffix on region names: e.g. if the region name were
TestTable,0094429456,1289497600452.527db22f95c8a9e0116f0cc13c680396. then
the encoded region name portion is 527db22f95c8a9e0116f0cc13c680396

Examples:

  hbase> merge_region 'ENCODED_REGIONNAME', 'ENCODED_REGIONNAME'
  hbase> merge_region 'ENCODED_REGIONNAME', 'ENCODED_REGIONNAME', true

合并示例

hbase(main):002:0> merge_region '85ccd1a431c1f9c2736beeb544466616','234da87a49b63975c21b5b5f9c076b10',true
0 row(s) in 2.1020 seconds

再次执行scan ‘hbase:meta’查看元数据,共有三条记录,证明合并完成

Region拆分策略
Region 概念
Region是表获取和分布的基本元素,由每个列族的一个Store组成。对象层级图如下:

Table       (HBase table)

  Region       (Regions for the table)
       Store          (Store per ColumnFamily for each Region for the table)

           MemStore        (MemStore for each Store for each Region for the table)

             StoreFile       (StoreFiles for each Store for each Region for the table)

                 Block     (Blocks within a StoreFile within a Store for each Region for the table)
  • Region 大小

Region的大小是一个棘手的问题,需要考量如下几个因素。

Region是HBase中分布式存储和负载均衡的最小单元。不同Region分布到不同RegionServer上,但并不是存储的最小单元。

Region由一个或者多个Store组成,每个store保存一个columns family,每个Strore又由一个memStore和0至多个StoreFile 组成。memStore存储在内存中, StoreFile存储在HDFS上。
HBase通过将region切分在许多机器上实现分布式。也就是说,你如果有16GB的数据,只分了2个region, 你却有20台机器,有18台就浪费了。
region数目太多就会造成性能下降,现在比以前好多了。但是对于同样大小的数据,700个region比3000个要好。

region数目太少就会妨碍可扩展性,降低并行能力。有的时候导致压力不够分散。这就是为什么,你向一个10节点的HBase集群导入200MB的数据,大部分的节点是idle的。

RegionServer中1个region和10个region索引需要的内存量没有太多的差别。

最好是使用默认的配置,可以把热的表配小一点(或者受到split热点的region把压力分散到集群中)。如果你的cell的大小比较大(100KB或更大),就可以把region的大小调到1GB。region的最大大小在hbase配置文件中定义:

<property>
   <name>hbase.hregion.max.filesize</name>
   <value>10 * 1024 * 1024 * 1024</value>
 </property>

说明:

当region中的StoreFile大小超过了上面配置的值的时候,该region就会被拆分,具体的拆分策略见下文。

上面的值也可以针对每个表单独设置,例如在hbase shell中设置:

create 't','f'
disable 't'
alter 't', METHOD => 'table_att', MAX_FILESIZE => '134217728'
enable 't'

Region 拆分策略
Region的分割操作是不可见的,因为Master不会参与其中。RegionServer拆分region的步骤是,先将该region下线,然后拆分,将其子region加入到META元信息中,再将他们加入到原本的RegionServer中,最后汇报Master。

执行split的线程是CompactSplitThread。

自定义拆分策略
可以通过设置RegionSplitPolicy的实现类来指定拆分策略,RegionSplitPolicy类的实现类有:

ConstantSizeRegionSplitPolicy
    IncreasingToUpperBoundRegionSplitPolicy
        DelimitedKeyPrefixRegionSplitPolicy
        KeyPrefixRegionSplitPolicy

对于split,并不是设置了hbase.hregion.max.filesize(默认10G)为很大就保证不split了,需要有以下的算法:

IncreasingToUpperBoundRegionSplitPolicy,0.94.0默认region split策略。根据公式min(r^2*flushSize,maxFileSize)确定split的maxFileSize,其中r为在线region个数,maxFileSize由hbase.hregion.max.filesize指定。

ConstantSizeRegionSplitPolicy,仅仅当region大小超过常量值(hbase.hregion.max.filesize大小)时,才进行拆分。

DelimitedKeyPrefixRegionSplitPolicy,保证以分隔符前面的前缀为splitPoint,保证相同RowKey前缀的数据在一个Region中

KeyPrefixRegionSplitPolicy,保证具有相同前缀的row在一个region中(要求设计中前缀具有同样长度)。指定rowkey前缀位数划分region,通过读取table的prefix_split_key_policy.prefix_length属性,该属性为数字类型,表示前缀长度,在进行split时,按此长度对splitPoint进行截取。此种策略比较适合固定前缀的rowkey。当table中没有设置该属性,或其属性不为Integer类型时,指定此策略效果等同与使用IncreasingToUpperBoundRegionSplitPolicy。

IncreasingToUpperBoundRegionSplitPolicy

这是0.94.0默认region split策略。根据根据公式min(r^2*flushSize,maxFileSize)确定split的maxFileSize,这里假设flushSize为128M:

第一次拆分大小为:min(10G,1*1*128M)=128M 
第二次拆分大小为:min(10G,3*3*128M)=1152M 
第三次拆分大小为:min(10G,5*5*128M)=3200M 
第四次拆分大小为:min(10G,7*7*128M)=6272M 
第五次拆分大小为:min(10G,9*9*128M)=10G 
第五次拆分大小为:min(10G,11*11*128M)=10G 
可以看到,只有在第四次之后的拆分大小才为10G

配置拆分策略
你可以在hbase配置文件中定义全局的拆分策略,设置hbase.regionserver.region.split.policy的值即可,也可以在创建和修改表时候指定:

// 更新现有表的split策略

HBaseAdmin admin = new HBaseAdmin( conf);
HTable hTable = new HTable( conf, "test" );
HTableDescriptor htd = hTable.getTableDescriptor();
HTableDescriptor newHtd = new HTableDescriptor(htd);
newHtd.setValue(HTableDescriptor. SPLIT_POLICY, KeyPrefixRegionSplitPolicy.class .getName());// 指定策略
newHtd.setValue("prefix_split_key_policy.prefix_length", "2");
newHtd.setValue("MEMSTORE_FLUSHSIZE", "5242880"); // 5M
admin.disableTable( "test");
admin.modifyTable(Bytes. toBytes("test"), newHtd);
admin.enableTable( "test");

说明:

上面的不同策略可以在不同的业务场景下使用,特别是第三种和第四种一般关注和使用的比较少。
如果想关闭自动拆分改为手动拆分,建议同时修改hbase.hregion.max.filesize和hbase.regionserver.region.split.policy值。

编程实现table和Region的拆分合并
1、拆分

public static Configuration conf = null;
public static Connection conn = null;

/**
 * 运行时创建静态程序块
 */
static {
    try {
        // 新型API创建连接
        conf = HBaseConfiguration.create();
        conn = ConnectionFactory.createConnection(conf);
    } catch (IOException e) {
        e.printStackTrace();
    }

}

public static void main(String[] args) throws Exception {
    findRegions();
    splitTable();
}

/**
 * 查询区域
 * @throws IOException
 */
public static void findRegions() throws IOException{
    Admin admin = conn.getAdmin();
    List<HRegionInfo> list = admin.getTableRegions(TableName.valueOf("test"));
    System.out.println(list.size());
}

/**
 * 拆分表
 * @throws IOException
 */
public static void splitTable() throws IOException{
    Admin admin = conn.getAdmin();
    admin.split(TableName.valueOf("test"));
    System.out.println("拆分表成功");
}

2、部分结果

2017-07-17 23:04:58,623 INFO  [main-SendThread(slave2:2181)] zookeeper.ClientCnxn: Session establishment complete on server slave2/192.168.1.153:2181, sessionid = 0x15d502879ab0004, negotiated timeout = 90000
2017-07-17 23:04:59,790 INFO  [main-EventThread] zookeeper.ClientCnxn: EventThread shut down
2017-07-17 23:04:59,792 INFO  [main] zookeeper.ZooKeeper: Session: 0x15d502879ab0004 closed
1
2017-07-17 23:04:59,817 INFO  [main] zookeeper.RecoverableZooKeeper: Process identifier=hbase-admin-on-hconnection-0x397fbdb connecting to ZooKeeper ensemble=slave1:2181,slave2:2181
2017-07-17 23:04:59,818 INFO  [main] zookeeper.ZooKeeper: Initiating client connection, connectString=slave1:2181,slave2:2181 sessionTimeout=90000 watcher=hbase-admin-on-hconnection-0x397fbdb0x0, quorum=slave1:2181,slave2:2181, baseZNode=/hbase
2017-07-17 23:04:59,824 INFO  [main-SendThread(slave2:2181)] zookeeper.ClientCnxn: Opening socket connection to server slave2/192.168.1.153:2181. Will not attempt to authenticate using SASL (unknown error)
2017-07-17 23:04:59,827 INFO  [main-SendThread(slave2:2181)] zookeeper.ClientCnxn: Socket connection established to slave2/192.168.1.153:2181, initiating session
2017-07-17 23:04:59,871 INFO  [main-SendThread(slave2:2181)] zookeeper.ClientCnxn: Session establishment complete on server slave2/192.168.1.153:2181, sessionid = 0x15d502879ab0005, negotiated timeout = 90000
2017-07-17 23:05:00,008 INFO  [main] zookeeper.ZooKeeper: Session: 0x15d502879ab0005 closed

拆分成功
2017-07-17 23:05:00,009 INFO [main-EventThread] zookeeper.ClientCnxn: EventThread shut down
3、执行echo “scan ‘hbase:meta’” | hbase shell查看元数据,共有四条记录,证明拆分完成

4、拆分区域添加如下代码

/**
 * 拆分区域
 * @throws IOException
 */
public static void splitRegion() throws IOException{
    Admin admin = conn.getAdmin();
    admin.splitRegion(converBytes("855cd5b9660a1dd68757ef8344902ab2"));
    System.out.println("拆分区域成功");
}

5、合并区域添加如下代码

/**
 * 合并区域
 * @throws IOException
 */
public static void splitMerge_region() throws IOException{
    Admin admin = conn.getAdmin();
    admin.mergeRegions(converBytes("855cd5b9660a1dd68757ef8344902ab2"), converBytes("ed234e41da5b3ea3fdffec840e8eb29c"), true);
    System.out.println("合并区域成功");
}