一、需求背景
- 本次任务主要是完成对表中的每一行数据进行求和计算。
- 涉及的表数据如下所示:
["Emc:0","MoVoiceCall:0","Mt:0","MoSig:2","MoData:1","HighPri:0","MoVideoCall:0","MoSms:0","MpsPri:0","McsPri:0"]
["Emc:0","MoVoiceCall:1","Mt:0","MoSig:26","MoData:1","HighPri:0","MoVideoCall:0","MoSms:0","MpsPri:0","McsPri:0"]
["Emc:0","MoVoiceCall:0","Mt:2","MoSig:15","MoData:8","HighPri:0","MoVideoCall:0","MoSms:0","MpsPri:0","McsPri:0"]
["Emc:0","MoVoiceCall:0","Mt:0","MoSig:4","MoData:0","HighPri:0","MoVideoCall:0","MoSms:0","MpsPri:0","McsPri:0"]
["Emc:0","MoVoiceCall:0","Mt:1","MoSig:2","MoData:3","HighPri:0","MoVideoCall:0","MoSms:0","MpsPri:0","McsPri:0"]
["Emc:0","MoVoiceCall:0","Mt:1","MoSig:9","MoData:3","HighPri:0","MoVideoCall:0","MoSms:0","MpsPri:0","McsPri:0"]
["Emc:0","MoVoiceCall:0","Mt:0","MoSig:0","MoData:0","HighPri:0","MoVideoCall:0","MoSms:0","MpsPri:0","McsPri:0"]
["Emc:0","MoVoiceCall:1","Mt:13","MoSig:2","MoData:27","HighPri:0","MoVideoCall:0","MoSms:0","MpsPri:0","McsPri:0"]
["Emc:0","MoVoiceCall:0","Mt:1","MoSig:0","MoData:2","HighPri:0","MoVideoCall:0","MoSms:0","MpsPri:0","McsPri:0"]
["Emc:0","MoVoiceCall:0","Mt:1","MoSig:4","MoData:1","HighPri:0","MoVideoCall:0","MoSms:0","MpsPri:0","McsPri:0"]
二、实现方法
(一) 通过对本次的需求任务的解读,将实现方法大概分为以下几个步骤:
- 完成数组中括号[]的去除工作。
·主要使用的是hive中的正则表达式替换函数:regexp_replace来完成本次的替换; ·语法: regexp_replace(string A, string B, string C) ·返回值: string
select regexp_replace(SalaryArray,'\"|\\[|\\]|\\{|\\}','') from 5gr1126;
效果图展示:
- 我们将每一行的字符串转换成map形式
·我们通过使用hive中的str_to_map函数来完成本次的转换操作; ·语法:str_to_map(字符串参数, 分隔符1, 分隔符2) ·返回值:使用两个分隔符将文本拆分为键值对
select str_to_map(regexp_replace(SalaryArray,'\"|\\[|\\]|\\{|\\}','')) from 5gr1126;
效果图展示:
- 获取map key-value对中的所有value值
·我们通过hive中的map_values函数来完成本次的获取值操作; ·语法:map_values(map) ·返回值:包含输入映射值的无序数组
select map_values(str_to_map(regexp_replace(SalaryArray,'\"|\\[|\\]|\\{|\\}',''))) from 5gr1126;
效果图展示:
- 每一行数值累加求和
·我们通过hive UDF自定义函数来完成本次的数值累加求和 ·输入参数:数组列表,开始位置,结束位置 ·输出结果:最终的和 ·逻辑主要包含对科学技术法数值的转换以及是否为数值的过滤
public class AddNumUDF extends UDF {
public static void main(String[] args) {
ArrayList<String> strArray = new ArrayList<String>();
strArray.add("1E2");
strArray.add("a");
System.out.println(new AddNumUDF().evaluate(strArray, 0, 2));
}
public double evaluate(ArrayList<String> list,
int from, int to) {
double result = 0;
if (list == null || list.size() < 1) {
return result;
}
List<String> subList;
if (to == -1) {
subList = list.subList(from, list.size());
} else {
subList = list.subList(from, to);
}
for (String i : subList) {
if (i.contains("e") || i.contains("E") || isInt(i)) {
BigDecimal bd = new BigDecimal(i);
String str = bd.toPlainString();
Double a = Double.parseDouble(str);
result += a;
} else {
i = "0";
BigDecimal bd = new BigDecimal(i);
String str = bd.toPlainString();
Double a = Double.parseDouble(str);
result += a;
}
}
return result;
}
public static boolean isInt(String v) {
if (v == null)
return false;
String regEx1 = "[\\-|\\+]?\\d+";
Pattern p;
Matcher m;
p = Pattern.compile(regEx1);
m = p.matcher(v);
if (m.matches()) return true;else return false;
}
}
- 程序打成jar包,上传HS2服务器本地或者HDFS;
- 客户端命令行添加jar包到hive的classpath:hive>add jar /xxx/udf.jar;
- 注册成为临时函数(给udf命名),create temporary function 函数名 as ‘UDF类全名称’;
- 测试
效果图如下所示:
|