[大数据] The column KEY._col2:0._col0 is not in the vectorization context...

开发: C++知识库 Java知识库 JavaScript Python PHP知识库人工智能区块链大数据移动开发嵌入式开发工具数据结构与算法开发测试游戏开发网络协议系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑笔记本显卡显示器固态硬盘硬盘耳机手机 iphone vivo oppo 小米华为单反装机图拉丁

-> 大数据 -> The column KEY._col2:0._col0 is not in the vectorization context... -> 正文阅读

[大数据]The column KEY._col2:0._col0 is not in the vectorization context...

问题出现场景

shell脚本运行hql时报错：

FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: The column KEY._col2:0._col0 is not in the vectorization context column map {KEY._col0=0, KEY._col1=1, KEY._col2=2, VALUE._col1=3}.

运行的SQL语句：

select content_sort, content_type, count(distinct postid) as content_cnt, sum(topic_num) as topic_cnt      
from tmp_post where content_sort = 'post' group by content_type, content_sort
         union
select content_sort, content_type, count(distinct  ugc_id) as content_cnt, sum(topic_num) as topic_cnt
from tmp_ugc where content_sort = 'blog' group by content_type, content_sort

依然是不懂啥原因，通过反复验证得出的结论是：有distinct + union函数就会导致hive的向量化执行失败（注：执行引擎为spark）

解决方案

在拜读了大佬博客：https://www.codenong.com/jscb200f6bd25b/
之后，我于是进到hive的客户端进行如下设置

结果再次执行上述shell脚本的时候，依然是同样的报错（看来这个并不支持全局设置？）
第二次尝试，直接在脚本中的sql语句执行之前加上这个设置，每次执行该语句之前都关闭向量化执行，最后脚本顺利执行结束。
结论：在hive客户端设置set hive.vectorized.execution.enabled = false;不生效，得在sql执行时配合使用该设置。