目录
准备工作
部署规划
Maven安装
下载地址
解压压缩包
配置阿里云镜像
?配置环境变量
spark源码编译
下载地址
解压
编译
配置hive on spark
spark配置
hive 配置
验证
参考文章
准备工作
- hadoop-3.2.0
- Scala2.11.8
部署规划
序号 | 服务器IP | 服务器主机名 | 安装软件 | 启动进程 | 1 | 192.168.21.131 | master | hadoop、zookeeper、hive-metastore、spark、hue | QuorumPeerMain、ResourceManager、JournalNode、DFSZKFailoverController、NameNode | 2 | 192.168.21.132 | slave1 | hadoop、zookeeper、spark | QuorumPeerMain、JournalNode、DataNode、NodeManager | 3 | 192.168.21.133 | slave2 | hadoop、zookeeper、spark | QuorumPeerMain、JournalNode、DataNode、NodeManager | 4 | 192.168.21.134 | slave3 | hadoop、spark | DataNode、NodeManager | 5 | 192.168.21.135 | master2 | hadoop、hive-Client | ResourceManager、DFSZKFailoverController、NameNode |
Maven安装
在规划安装spark的master、slave1、slave2、slave3上安装maven
下载地址
https://archive.apache.org/dist/maven/maven-3/
解压压缩包
tar xzvf apache-maven-3.2.5-bin.tar.gz
将文件夹移动至?/opt ?目录:
mv apache-maven-3.2.5 /opt/maven-3.2.5
配置阿里云镜像
vim maven3.2.5/conf/setting.xml
#添加以下配置
<!-- mirrors
| This is a list of mirrors to be used in downloading artifacts from remote repositories.
|
| It works like this: a POM may declare a repository to use in resolving certain artifacts.
| However, this repository may have problems with heavy traffic at times, so people have mirrored
| it to several places.
|
| That repository definition will have a unique id, so we can create a mirror reference for that
| repository, to be used as an alternate download site. The mirror site will be the preferred
| server for that repository.
|-->
<mirrors>
<!-- mirror
| Specifies a repository mirror site to use instead of a given repository. The repository that
| this mirror serves has an ID that matches the mirrorOf element of this mirror. IDs are used
| for inheritance and direct lookup purposes, and must be unique across the set of mirrors.
|
<mirror>
<id>mirrorId</id>
<mirrorOf>repositoryId</mirrorOf>
<name>Human Readable Name for this Mirror.</name>
<url>http://my.repository.com/repo/path</url>
</mirror>
-->
<mirror>
<id>nexus-aliyun</id>
<mirrorOf>central</mirrorOf>
<name>Nexus aliyun</name>
<url>http://maven.aliyun.com/nexus/content/groups/public</url>
</mirror>
</mirrors>
?配置环境变量
编辑?~/.bashrc,在最下方添加:
#maven
export MAVEN_HOME=/opt/maven-3.2.5
export PATH=$PATH:$MAVEN_HOME/bin
export MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=512m"
source ~/.bashrc
检查 Maven 是否成功安装:
mvn -version
spark源码编译
下载地址
https://archive.apache.org/dist/spark/spark-2.3.0/
解压
tar -zxvf spark-2.3.0.tgz
编译
进入spark-2.3.0目录
cd spark-2.3.0
执行
./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=3.2.0 -DskipTests clean package
进入spark-2.3.0/dev目录
cd dev/
执行
./make-distribution.sh --name 3.2.0-hive --tgz -Pyarn -Phadoop-2.7 -Dhadoop.version=3.2.0
这里的几个参数: –name 3.2.0-without hive 是编译文件的名字参数 -Pyarn 是支持yarn -Phadoop-2.7 是支持的hadoop版本,一开始使用的是3.2后来提示hadoop3.2不存在,只好改成2.7,编译成功 -Dhadoop.version=3.2.0 运行环境
编译完成后,生成安装包会出现在spark2.3.0目录下
配置hive on spark
spark配置
修改环境变量
vim ~/.bashrc
#spark
export SPARK_DIST_CLASSPATH=$HADOOP_HOME/bin/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
source ~/.bashrc
spark配置yarn模式
解压spark编译包后进入目录
spark-env.sh
cd conf/
cp spark-env.sh.template spark-env.sh
vim spark-env.sh
export HADOOP_HOME=/opt/hadoop-3.2.0
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export SPARK_DIST_CLASSPATH=$HADOOP_HOME/bin/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
spark-defaults.conf
cp spark-defaults.conf.template spark-defaults.conf
vim spark-defaults.conf
spark.master yarn
hive.execution.engine spark
spark.eventLog.enabled true
spark.eventLog.dir hdfs://ns:8020/spark/logs
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.executor.memory 4g
spark.executor.cores 2
spark.driver.memory 4g
spark.driver.cores 2
spark.executor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"
spark.yarn.jars hdfs://ns:8020/spark/jars
spark.executor.instances 8
hive 配置
拷贝 spark-defaults.conf到{hive-home}/conf目录下
拷贝spark/jars的三个包到{hive-home}/lib:
- scala-library
- spark-core
- spark-network-common
将spark的jars目录中的jar文件传到hdfs对应目录下(对应spark中配置)
#创建hdfs目录
hadoop fs -mkdir -p /spark/logs
hadoop fs -mkdir -p /spark/jars
hdfs dfs -chmod -R 777 /spark/logs
hdfs dfs -chmod -R 777 /spark/jars
#上传/spark/jars文件夹到hdfs的/spark目录
hadoop fs -put /opt/spark-2.3.0/jars/ /spark/
配置hive-site.xml
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>
<property>
<name>spark.yarn.jars</name>
<value>hdfs://ns:8020/spark/jars/*</value>
</property>
<property>
<name>spark.home</name>
<value>/opt/spark-2.3.0</value>
</property>
<property>
<name>spark.master</name>
<value>yarn</value>
</property>
<property>
<name>spark.executor.memory</name>
<value>4g</value>
</property>
<property>
<name>spark.executor.cores</name>
<value>2</value>
</property>
<property>
<name>spark.executor.instances</name>
<value>8</value>
</property>
<property>
<name>spark.driver.memory</name>
<value>4g</value>
</property>
<property>
<name>spark.driver.cores</name>
<value>2</value>
</property>
<property>
<name>spark.serializer</name>
<value>org.apache.spark.serializer.KryoSerializer</value>
</property>
验证
命令行输入 hive,进入hive CLI
set hive.execution.engine=spark; #默认是mr,在hive-site.xml里设置spark后,这一步可以不要
create table test(ts BIGINT,line STRING); #创建表
select count(*) from test;#若整个过程没有报错,并出现正确结果,则Hive on Spark配置成功。
参考文章
版权声明:本文为CSDN博主「三石君1991」的原创文章,hive on spark(yarn)安装部署 原文链接:https://blog.csdn.net/weixin_43860247/article/details/89184081
版权声明:本文为CSDN博主「三石君1991」的原创文章,Hive安装部署 原文链接:https://blog.csdn.net/weixin_43860247/article/details/89087941
|