hudi整合hive提交到yarn报错
User class threw exception: org.apache.spark.sql.streaming.StreamingQueryException: Following instants have timestamps >= compactionInstant (20210710211157) Instants :[[20210713205004__deltacommit__COMPLETED], [20210713205200__deltacommit__COMPLETED], [20210714091634__deltacommit__COMPLETED]]
=== Streaming Query ===
Identifier: action2hudi [id = d73882f0-9623-4591-b0a8-78747bf56ab4, runId = f3986e44-be32-4f22-bd92-a93d016daa38]
Current Committed Offsets: {KafkaV2[Subscribe[news]]: {"news":{"0":7545}}}
Current Available Offsets: {KafkaV2[Subscribe[news]]: {"news":{"0":8120}}}
Current State: ACTIVE
Thread State: RUNNABLE
Logical Plan:
Project [cast(value#8 as string) AS value#101]
+- StreamingExecutionRelation KafkaV2[Subscribe[news]], [key#7, value#8, topic#9, partition#10, offset#11L, timestamp#12, timestampType#13]
at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:297)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:193)
Caused by: java.lang.IllegalArgumentException: Following instants have timestamps >= compactionInstant (20210710211157) Instants :[[20210713205004__deltacommit__COMPLETED], [20210713205200__deltacommit__COMPLETED], [20210714091634__deltacommit__COMPLETED]]
at org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:40)
hive 解析 hudi 数据结构需要 HoodieParquetInputFormat 类,属于 hudi-hadoop-mr-bundle-0.5.2-incubating.jar 包
导入对应版本的 hudi-hadoop-mr-bundle-0.5.2-incubating.jar 包到 hive/lib/ 目录下,重启 hive 相关服务(metastore,hiveserver2)。
|