??? Flink 的AggregateFunction是一个基于中间计算结果状态进行增量计算的函数。由于是迭代计算方式,所以,在窗口处理过程中,不用缓存整个窗口的数据,所以效率执行比较高。
@PublicEvolving
public interface AggregateFunction<IN, ACC, OUT> extends Function, Serializable {
...............................
}
自定义聚合函数需要实现AggregateFunction接口类,它有四个接口实现方法:
a.创建一个新的累加器,启动一个新的聚合,负责迭代状态的初始化
ACC createAccumulator(); b.对于数据的每条数据,和迭代数据的聚合的具体实现
ACC add(IN value, ACC accumulator); c.合并两个累加器,返回一个具有合并状态的累加器
ACC merge(ACC a, ACC b); d.从累加器获取聚合的结果
OUT getResult(ACC accumulator);
3.自定义聚合函数MyCountAggregate
-
package com.hadoop.ljs.flink110.aggreagate;
import org.apache.flink.api.common.functions.AggregateFunction;
/**
* @author: Created By lujisen
* @company ChinaUnicom Software JiNan
* @date: 2020-04-15 22:00
* @version: v1.0
* @description: com.hadoop.ljs.flink110.aggreagate
* 输入类型(IN)、累加器类型(ACC)和输出类型(OUT)。
*/
public class MyCountAggregate implements AggregateFunction<ProductViewData, Long, Long> {
@Override
public Long createAccumulator() {
/*访问量初始化为0*/
return 0L;
}
@Override
public Long add(ProductViewData value, Long accumulator) {
/*访问量直接+1 即可*/
return accumulator+1;
}
@Override
public Long getResult(Long accumulator) {
return accumulator;
}
/*合并两个统计量*/
@Override
public Long merge(Long a, Long b) {
return a+b;
}
}
4.自定义窗口函数 -
package com.hadoop.ljs.flink110.aggreagate;
import org.apache.flink.streaming.api.functions.windowing.WindowFunction;
import org.apache.flink.streaming.api.windowing.windows.TimeWindow;
import org.apache.flink.util.Collector;
/**
* @author: Created By lujisen
* @company ChinaUnicom Software JiNan
* @date: 2020-04-15 21:56
* @version: v1.0
* @description: com.hadoop.ljs.flink110.aggreagate
* *自定义窗口函数,封装成字符串
* *第一个参数是上面MyCountAggregate的输出,就是商品的访问量统计
* * 第二个参数 输出 这里为了演示 简单输出字符串
* * 第三个就是 窗口类 能获取窗口结束时间
*/
public class MyCountWindowFunction2 implements WindowFunction<Long,String,String, TimeWindow> {
@Override
public void apply(String productId, TimeWindow window, Iterable<Long> input, Collector<String> out) throws Exception {
/*商品访问统计输出*/
/*out.collect("productId"productId,window.getEnd(),input.iterator().next()));*/
out.collect("----------------窗口时间:"+window.getEnd());
out.collect("商品ID: "+productId+" 浏览量: "+input.iterator().next());
}
5.主函数,代码如下: -
import org.apache.flink.api.common.functions.FilterFunction;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.java.functions.KeySelector;
import org.apache.flink.streaming.api.TimeCharacteristic;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.timestamps.AscendingTimestampExtractor;
import org.apache.flink.streaming.api.windowing.time.Time;
/**
* @author: Created By lujisen
* @company ChinaUnicom Software JiNan
* @date: 2020-04-14 11:28
* @version: v1.0
* @description: com.hadoop.ljs.flink110.aggreagate
* 自定义聚合函数类和窗口类,进行商品访问量的统计,根据滑动时间窗口,按照访问量排序输出
*/
public class AggregateFunctionMain2 {
public static int windowSize=6000;/*滑动窗口大小*/
public static int windowSlider=3000;/*滑动窗口滑动间隔*/
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment senv = StreamExecutionEnvironment.getExecutionEnvironment();
senv.setParallelism(1);
senv.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
/*DataStream<String> sourceData = senv.socketTextStream("localhost",9000);*/
//从文件读取数据,也可以从socket读取数据
DataStream<String> sourceData = senv.readTextFile("D:\\projectData\\ProductViewData2.txt");
DataStream<ProductViewData> productViewData = sourceData.map(new MapFunction<String, ProductViewData>() {
@Override
public ProductViewData map(String value) throws Exception {
String[] record = value.split(",");
return new ProductViewData(record[0], record[1], Long.valueOf(record[2]), Long.valueOf(record[3]));
}
}).assignTimestampsAndWatermarks(new AscendingTimestampExtractor<ProductViewData>(){
@Override
public long extractAscendingTimestamp(ProductViewData element) {
return element.timestamp;
}
});
/*过滤操作类型为1 点击查看的操作*/
DataStream<String> productViewCount = productViewData.filter(new FilterFunction<ProductViewData>() {
@Override
public boolean filter(ProductViewData value) throws Exception {
if(value.operationType==1){
return true;
}
return false;
}
}).keyBy(new KeySelector<ProductViewData, String>() {
@Override
public String getKey(ProductViewData value) throws Exception {
return value.productId;
}
//时间窗口 6秒 滑动间隔3秒
}).timeWindow(Time.milliseconds(windowSize), Time.milliseconds(windowSlider))
/*这里按照窗口进行聚合*/
.aggregate(new MyCountAggregate(), new MyCountWindowFunction2());
//聚合结果输出
productViewCount.print();
senv.execute("AggregateFunctionMain2");
}
}
???这里自定义聚合函数演示完毕,感谢关注!!!
|