表的设计
- 列蔟:推荐1-2个,能使用1个就不是使用2个
- 版本的设计:如果我们的项目不需要保存历史的版本,直接按照默认配置VERSIONS=1就OK。如果项目中需要保存历史的变更信息,就可以将VERSIONS设置为>1。但是设置为大于1也就意味着要占用更多的空间
- 数据的压缩:在创建表的时候,可以针对列蔟指定数据压缩方式(GZ、SNAPPY、LZO)。GZ方式是压缩比最高的,13%左右的空间,但是它的压缩和解压缩速度慢一些
避免热点的关键操作
-
预分区
- 在创建表的时候,配置一些策略,让一个table有多个region,分布在不同的HRegionServer中
- HBase会自动进行split,如果一个region过大,HBase会自动split成两个,就是根据rowkey来横向切分
-
rowkey设计
-
反转:举例:手机号码、时间戳,可以将手机号码反转 -
加盐:在rowkey前面加随机数,加了随机数之后,就会导致数据查询不出来,因为HBase默认是没有二级索引的 -
hash:根据rowkey中的某个部分取hash,因为hash每次计算都一样的值。所以,我们可以用hash操作获取数据 -
这几种策略,因为要将数据均匀分布在集群中的每个RegionServer,所以其核心就是把rowkey打散后放入到集群节点中,所以数据不再是有序的存储,会导致scan的效率下降
预分区
-
预分区有两种策略
- startKey、endKey来预分区 [10, 40, 50]
- 直接指定数量,startKey、endKey由hbase自动生成,还需要指定key的算法
-
HBase的数据都是存放在HDFS中
- /hbase/data/命名空间/表/列蔟/StoreFiles
建表指令
create_namespace 'MOMO_CHAT'
list_namespace
drop_namespace 'MOMO_CHAT'
describe_namespace 'MOMO_CHAT'
describe_namespace 'default'
create "MOMO_CHAT:MSG", "C1"
alter "MOMO_CHAT:MSG", {NAME => "C1", COMPRESSION => "GZ"}
disable "MOMO_CHAT:MSG"
drop "MOMO_CHAT:MSG"
create 'MOMO_CHAT:MSG', {NAME => "C1", COMPRESSION => "GZ"}, { NUMREGIONS => 6, SPLITALGO => 'HexStringSplit'}
可以看到已经有了六个region。
随机生成一条消息
- 通过ExcelReader工具类从Excel文件中读取数据,放入到一个Map结构中
- key:字段名
- value:List,字段对应的数据列表
- 创建getOneMessage方法,这个方法专门用来根据Excel读取到的数据,随机生成一个Msg实体对象
- 调用ExcelReader.randomColumn方法来随机获取一个列的数据
- 注意:消息使用的是系统当前时间,时间的格式是:年-月-日 小时:分钟:秒
public class MoMoMsgGen {
public static void main(String[] args) {
// 读取Excel文件中的数据
Map<String, List<String>> resultMap =
ExcelReader.readXlsx("D:\\课程研发\\51.V8.0_NoSQL_MQ\\2.HBase\\3.代码\\momo_chat_app\\data\\测试数据集.xlsx", "陌陌数据");
System.out.println(getOneMessage(resultMap));
}
/**
* 基于从Excel表格中读取的数据随机生成一个Msg对象
* @param resultMap Excel读取的数据(Map结构)
* @return 一个Msg对象
*/
public static Msg getOneMessage(Map<String, List<String>> resultMap) {
// 1. 构建Msg实体类对象
Msg msg = new Msg();
// 将当前系统的时间设置为消息的时间,以年月日 时分秒的形式存储
SimpleDateFormat simpleDateFormat = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
// 获取系统时间
Date now = new Date();
msg.setMsg_time(simpleDateFormat.format(now));
// 2. 调用ExcelReader中的randomColumn随机生成一个列的数据
// 初始化sender_nickyname字段,调用randomColumn随机取nick_name设置数据
msg.setSender_nickyname(ExcelReader.randomColumn(resultMap, "sender_nickyname"));
msg.setSender_account(ExcelReader.randomColumn(resultMap, "sender_account"));
msg.setSender_sex(ExcelReader.randomColumn(resultMap, "sender_sex"));
msg.setSender_ip(ExcelReader.randomColumn(resultMap, "sender_ip"));
msg.setSender_os(ExcelReader.randomColumn(resultMap, "sender_os"));
msg.setSender_phone_type(ExcelReader.randomColumn(resultMap, "sender_phone_type"));
msg.setSender_network(ExcelReader.randomColumn(resultMap, "sender_network"));
msg.setSender_gps(ExcelReader.randomColumn(resultMap, "sender_gps"));
msg.setReceiver_nickyname(ExcelReader.randomColumn(resultMap, "receiver_nickyname"));
msg.setReceiver_ip(ExcelReader.randomColumn(resultMap, "receiver_ip"));
msg.setReceiver_account(ExcelReader.randomColumn(resultMap, "receiver_account"));
msg.setReceiver_os(ExcelReader.randomColumn(resultMap, "receiver_os"));
msg.setReceiver_phone_type(ExcelReader.randomColumn(resultMap, "receiver_phone_type"));
msg.setReceiver_network(ExcelReader.randomColumn(resultMap, "receiver_network"));
msg.setReceiver_gps(ExcelReader.randomColumn(resultMap, "receiver_gps"));
msg.setReceiver_sex(ExcelReader.randomColumn(resultMap, "receiver_sex"));
msg.setMsg_type(ExcelReader.randomColumn(resultMap, "msg_type"));
msg.setDistance(ExcelReader.randomColumn(resultMap, "distance"));
msg.setMessage(ExcelReader.randomColumn(resultMap, "message"));
// 3. 注意时间使用系统当前时间
return msg;
}
}
生成rowkey
- ROWKEY = MD5Hash_发件人账号_收件人账号_消息时间戳
- MD5Hash.getMD5AsHex生成MD5值,为了缩短rowkey,取前8位
public static byte[] getRowkey(Msg msg) throws ParseException {
StringBuilder builder = new StringBuilder();
builder.append(msg.getSender_account());
builder.append("_");
builder.append(msg.getReceiver_account());
builder.append("_");
String msgDateTime = msg.getMsg_time();
SimpleDateFormat simpleDateFormat = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
Date msgDate = simpleDateFormat.parse(msgDateTime);
long timestamp = msgDate.getTime();
builder.append(timestamp);
String md5AsHex = MD5Hash.getMD5AsHex(builder.toString().getBytes());
String md5Hex8bit = md5AsHex.substring(0, 8);
String rowkeyString = md5Hex8bit + "_" + builder.toString();
System.out.println(rowkeyString);
return Bytes.toBytes(rowkeyString);
}
将随机生成的数据推入到HBase
public static void main(String[] args) throws ParseException, IOException {
Map<String, List<String>> resultMap =
ExcelReader.readXlsx("D:\\课程研发\\51.V8.0_NoSQL_MQ\\2.HBase\\3.代码\\momo_chat_app\\data\\测试数据集.xlsx", "陌陌数据");
Configuration config = HBaseConfiguration.create();
Connection connection = ConnectionFactory.createConnection(config);
Table table = connection.getTable(TableName.valueOf("MOMO_CHAT:MSG"));
int i = 0;
int MAX = 100000;
while (i < MAX) {
Msg msg = getOneMessage(resultMap);
byte[] rowkey = getRowkey(msg);
String cf = "C1";
String colMsg_time = "msg_time";
String colSender_nickyname = "sender_nickyname";
String colSender_account = "sender_account";
String colSender_sex = "sender_sex";
String colSender_ip = "sender_ip";
String colSender_os = "sender_os";
String colSender_phone_type = "sender_phone_type";
String colSender_network = "sender_network";
String colSender_gps = "sender_gps";
String colReceiver_nickyname = "receiver_nickyname";
String colReceiver_ip = "receiver_ip";
String colReceiver_account = "receiver_account";
String colReceiver_os = "receiver_os";
String colReceiver_phone_type = "receiver_phone_type";
String colReceiver_network = "receiver_network";
String colReceiver_gps = "receiver_gps";
String colReceiver_sex = "receiver_sex";
String colMsg_type = "msg_type";
String colDistance = "distance";
String colMessage = "message";
Put put = new Put(rowkey);
put.addColumn(Bytes.toBytes(cf), Bytes.toBytes(colMsg_time), Bytes.toBytes(msg.getMsg_time()));
put.addColumn(Bytes.toBytes(cf), Bytes.toBytes(colSender_nickyname), Bytes.toBytes(msg.getSender_nickyname()));
put.addColumn(Bytes.toBytes(cf), Bytes.toBytes(colSender_account), Bytes.toBytes(msg.getSender_account()));
put.addColumn(Bytes.toBytes(cf), Bytes.toBytes(colSender_sex), Bytes.toBytes(msg.getSender_sex()));
put.addColumn(Bytes.toBytes(cf), Bytes.toBytes(colSender_ip), Bytes.toBytes(msg.getSender_ip()));
put.addColumn(Bytes.toBytes(cf), Bytes.toBytes(colSender_os), Bytes.toBytes(msg.getSender_os()));
put.addColumn(Bytes.toBytes(cf), Bytes.toBytes(colSender_phone_type), Bytes.toBytes(msg.getSender_phone_type()));
put.addColumn(Bytes.toBytes(cf), Bytes.toBytes(colSender_network), Bytes.toBytes(msg.getSender_network()));
put.addColumn(Bytes.toBytes(cf), Bytes.toBytes(colSender_gps), Bytes.toBytes(msg.getSender_gps()));
put.addColumn(Bytes.toBytes(cf), Bytes.toBytes(colReceiver_nickyname), Bytes.toBytes(msg.getReceiver_nickyname()));
put.addColumn(Bytes.toBytes(cf), Bytes.toBytes(colReceiver_ip), Bytes.toBytes(msg.getReceiver_ip()));
put.addColumn(Bytes.toBytes(cf), Bytes.toBytes(colReceiver_account), Bytes.toBytes(msg.getReceiver_account()));
put.addColumn(Bytes.toBytes(cf), Bytes.toBytes(colReceiver_os), Bytes.toBytes(msg.getReceiver_os()));
put.addColumn(Bytes.toBytes(cf), Bytes.toBytes(colReceiver_phone_type), Bytes.toBytes(msg.getReceiver_phone_type()));
put.addColumn(Bytes.toBytes(cf), Bytes.toBytes(colReceiver_network), Bytes.toBytes(msg.getReceiver_network()));
put.addColumn(Bytes.toBytes(cf), Bytes.toBytes(colReceiver_gps), Bytes.toBytes(msg.getReceiver_gps()));
put.addColumn(Bytes.toBytes(cf), Bytes.toBytes(colReceiver_sex), Bytes.toBytes(msg.getReceiver_sex()));
put.addColumn(Bytes.toBytes(cf), Bytes.toBytes(colMsg_type), Bytes.toBytes(msg.getMsg_type()));
put.addColumn(Bytes.toBytes(cf), Bytes.toBytes(colDistance), Bytes.toBytes(msg.getDistance()));
put.addColumn(Bytes.toBytes(cf), Bytes.toBytes(colMessage), Bytes.toBytes(msg.getMessage()));
table.put(put);
++i;
System.out.println(i + " / " + MAX);
}
table.close();
connection.close();
}
这里写入数据的数量为10w,可以看到这个请求是均匀分布在region中的。
实现getMessage数据服务接口
使用scan + filter实现的
- 构建scan对象
- 构建4个filter(开始日期查询、结束日期查询、发件人、收件人)
- 构建一个Msg对象列表
public List<Msg> getMessage(String date, String sender, String receiver) throws Exception {
Scan scan = new Scan();
String startDateStr = date + " 00:00:00";
String endDateStr = date + " 23:59:59";
SingleColumnValueFilter startDateFilter = new SingleColumnValueFilter(Bytes.toBytes("C1")
, Bytes.toBytes("msg_time")
, CompareFilter.CompareOp.GREATER_OR_EQUAL
, new BinaryComparator(Bytes.toBytes(startDateStr)));
SingleColumnValueFilter endDateFilter = new SingleColumnValueFilter(Bytes.toBytes("C1")
, Bytes.toBytes("msg_time")
, CompareFilter.CompareOp.LESS_OR_EQUAL
, new BinaryComparator(Bytes.toBytes(endDateStr)));
SingleColumnValueFilter senderFilter = new SingleColumnValueFilter(Bytes.toBytes("C1")
, Bytes.toBytes("sender_account")
, CompareFilter.CompareOp.EQUAL
, new BinaryComparator(Bytes.toBytes(sender)));
SingleColumnValueFilter receiverFilter = new SingleColumnValueFilter(Bytes.toBytes("C1")
, Bytes.toBytes("receiver_account")
, CompareFilter.CompareOp.EQUAL
, new BinaryComparator(Bytes.toBytes(receiver)));
FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL
, startDateFilter
, endDateFilter
, senderFilter
, receiverFilter);
scan.setFilter(filterList);
Table table = connection.getTable(TableName.valueOf("MOMO_CHAT:MSG"));
ResultScanner resultScanner = table.getScanner(scan);
Iterator<Result> iterator = resultScanner.iterator();
ArrayList<Msg> msgList = new ArrayList<>();
while (iterator.hasNext()) {
Result result = iterator.next();
Msg msg = new Msg();
String rowkey = Bytes.toString(result.getRow());
List<Cell> cellList = result.listCells();
for (Cell cell : cellList) {
String columnName = Bytes.toString(cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength());
if (columnName.equals("msg_time")) {
msg.setMsg_time(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
}
if (columnName.equals("sender_nickyname")) {
msg.setSender_nickyname(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
}
if (columnName.equals("sender_account")) {
msg.setSender_account(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
}
if (columnName.equals("sender_sex")) {
msg.setSender_sex(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
}
if (columnName.equals("sender_ip")) {
msg.setSender_ip(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
}
if (columnName.equals("sender_os")) {
msg.setSender_os(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
}
if (columnName.equals("sender_phone_type")) {
msg.setSender_phone_type(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
}
if (columnName.equals("sender_network")) {
msg.setSender_network(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
}
if (columnName.equals("sender_gps")) {
msg.setSender_gps(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
}
if (columnName.equals("receiver_nickyname")) {
msg.setReceiver_nickyname(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
}
if (columnName.equals("receiver_ip")) {
msg.setReceiver_ip(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
}
if (columnName.equals("receiver_account")) {
msg.setReceiver_account(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
}
if (columnName.equals("receiver_os")) {
msg.setReceiver_os(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
}
if (columnName.equals("receiver_phone_type")) {
msg.setReceiver_phone_type(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
}
if (columnName.equals("receiver_network")) {
msg.setReceiver_network(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
}
if (columnName.equals("receiver_gps")) {
msg.setReceiver_gps(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
}
if (columnName.equals("receiver_sex")) {
msg.setReceiver_sex(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
}
if (columnName.equals("msg_type")) {
msg.setMsg_type(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
}
if (columnName.equals("distance")) {
msg.setDistance(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
}
if (columnName.equals("message")) {
msg.setMessage(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
}
}
msgList.add(msg);
}
resultScanner.close();
table.close();
return msgList;
}
先执行下这个shell看下结果
scan 'MOMO_CHAT:MSG' , {COLUMNS => ['C1:sender_account', 'C1:receiver_account', 'C1:msg_time'], FILTER => "SingleColumnValueFilter('C1', 'sender_account', =, 'binary:13514684105') AND SingleColumnValueFilter('C1', 'receiver_account', = , 'binary:13647128512')"}
最后附上代码地址:https://github.com/fafeidou/momo_chat_app
|