Flink
批流一体 支持数据类型 编程模型 * DataStream **** Data source 内置 对接第三方 自定义 Transformation sink 内置 第三方 自定义 时间 & 窗口 & WM ***** Connector State 状态管理 Table API & SQL 1.11 和1.10 系列 完全不一样 CEP 项目: 搞一个 成数据接入到后面 全是实时的
1. Flink 初用
1.1 flink 批处理
<scala.binary.version>2.12</scala.binary.version>
<flink.version>1.11.2</flink.version>
<dependencies>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-scala_${scala.binary.version}</artifactId>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-clients_${scala.binary.version}</artifactId>
</dependency>
</dependencies>
package com.hpznyf.flink.basic
import org.apache.flink.api.scala.ExecutionEnvironment
import org.apache.flink.api.scala._
object BatchWCApp {
def main(args: Array[String]): Unit = {
val env = ExecutionEnvironment.getExecutionEnvironment
val text = env.readTextFile("data/wc.txt")
text.flatMap(_.split(",")).map((_,1)).groupBy(0).sum(1).print()
}
}
1.2 flink 流处理
package com.hpznyf.flink.basic
import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment
import org.apache.flink.api.scala._
object StreamingWCApp {
def main(args: Array[String]): Unit = {
val env = StreamExecutionEnvironment.getExecutionEnvironment
val text = env.socketTextStream("hadoop001", 9527)
text.flatMap(_.split(",")).map((_,1)).keyBy(_._1).sum(1).print()
env.execute(getClass.getCanonicalName)
}
}
[root@hadoop001 ~]# nc -l -p 9527 ruoze,ruoze,ruoze pk,pk,pk
思考,这里面有没有状态? 肯定有啊!!! 前面有固定的分区号~
如果设置并行度
setParallelism(1)
那么就没有并行度,大家都在一个上面
1.3 java版本批处理
package com.ruozedata.flink;
import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.java.ExecutionEnvironment;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.util.Collector;
public class BatchWCApp {
public static void main(String[] args) throws Exception{
final ExecutionEnvironment executionEnvironment = ExecutionEnvironment.getExecutionEnvironment();
executionEnvironment.readTextFile("data/wc.txt")
.flatMap(new RuozedataFlatMapFunction())
.map(new RuozedataMapFunction())
.groupBy(0)
.sum(1)
.print();
}
}
class RuozedataFlatMapFunction implements FlatMapFunction<String,String>{
@Override
public void flatMap(String value, Collector<String> out) throws Exception {
String[] splits = value.split(",");
for(String word : splits){
out.collect(word);
}
}
}
class RuozedataMapFunction implements MapFunction<String, Tuple2<String,Integer>>{
@Override
public Tuple2<String, Integer> map(String value) throws Exception {
return new Tuple2(value, 1);
}
}
1.4 java版本流处理
package com.ruozedata.flink;
import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.util.Collector;
public class StreamWCApp {
public static void main(String[] args) throws Exception{
final StreamExecutionEnvironment executionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment();
executionEnvironment.socketTextStream("hadoop001", 9527)
.flatMap(new FlatMapFunction<String, String>() {
@Override
public void flatMap(String value, Collector<String> out) throws Exception {
String[] split = value.split(",");
for(String word : split){
out.collect(word);
}
}
}).map(new MapFunction<String, Tuple2<String, Integer>>() {
@Override
public Tuple2<String, Integer> map(String value) throws Exception {
return new Tuple2(value, 1);
}
}).keyBy(0).sum(1).print();
executionEnvironment.execute(StreamWCApp.class.getCanonicalName());
}
}
|