IT数码 购物 网址 头条 软件 日历 阅读 图书馆
TxT小说阅读器
↓语音阅读,小说下载,古典文学↓
图片批量下载器
↓批量下载图片,美女图库↓
图片自动播放器
↓图片自动播放器↓
一键清除垃圾
↓轻轻一点,清除系统垃圾↓
开发: C++知识库 Java知识库 JavaScript Python PHP知识库 人工智能 区块链 大数据 移动开发 嵌入式 开发工具 数据结构与算法 开发测试 游戏开发 网络协议 系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑 笔记本 显卡 显示器 固态硬盘 硬盘 耳机 手机 iphone vivo oppo 小米 华为 单反 装机 图拉丁
 
   -> 大数据 -> spark集成hadoop -> 正文阅读

[大数据]spark集成hadoop

hadoop环境搭建请参考hadoop3.2.2集群搭建

环境

centos7、jdk1.8.0_311、scala-2.12.15、zookeeper-3.6.3、hadoop3.2.2、spark-3.2.1-bin-hadoop3.2

spark配置

  1. 配置${SPARK_HOME}/conf/spark-defaults.conf,添加如下内容:
spark.serializer                   org.apache.spark.serializer.KryoSerializer
spark.eventLog.enabled             true
spark.eventLog.dir                 hdfs://vmcluster/spark-history
spark.eventLog.compress            true
spark.yarn.historyServer.address   node-3:18080
spark.history.ui.port              18080
spark.history.fs.logDirectory      hdfs://vmcluster/spark-history
spark.history.retainedApplications 10
spark.history.fs.update.interval   5s

注意:将spark-defaults.conf.template文件名修改为spark-defaults.conf

  1. 配置${SPARK_HOME}/conf/spark-env.sh,添加如下内容:
export JAVA_HOME=/home/bigdata/env/jdk1.8.0_311
export SCALA_HOME=/home/bigdata/env/scala-2.12.15
export SPARK_HOME=/home/bigdata/env/spark-3.2.1-bin-hadoop3.2
export SPARK_CONF=${SPARK_HOME}/conf
export HADOOP_HOME=/home/bigdata/env/hadoop-3.2.2
export YARN_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop

注意:将spark-env.sh.template文件名修改为spark-env.sh

启动historyserver

start-history-server.sh

测试

提交spark自带的SparkPi进行测试,提交命令如下:

spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode cluster \
--driver-memory 1g \
--num-executors 1 \
--executor-memory 512m \
--executor-cores 1 \
--queue bigdata \
${SPARK_HOME}/examples/jars/spark-examples*.jar \
100

注意:配置spark的SPARK_HOME系统环境变量。
由于是cluster模式提交任务,结果不会输出到控制台。控制台日志输出如下:

2022-03-16 10:43:41,387 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2022-03-16 10:43:41,784 INFO yarn.Client: Requesting a new application from cluster with 3 NodeManagers
2022-03-16 10:43:42,334 INFO conf.Configuration: resource-types.xml not found
2022-03-16 10:43:42,335 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2022-03-16 10:43:42,357 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
2022-03-16 10:43:42,358 INFO yarn.Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
2022-03-16 10:43:42,358 INFO yarn.Client: Setting up container launch context for our AM
2022-03-16 10:43:42,359 INFO yarn.Client: Setting up the launch environment for our AM container
2022-03-16 10:43:42,367 INFO yarn.Client: Preparing resources for our AM container
2022-03-16 10:43:42,487 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
2022-03-16 10:43:43,802 INFO yarn.Client: Uploading resource file:/tmp/spark-d6ff4da4-4283-43fb-a517-9085d51a1e82/__spark_libs__7226558732161014901.zip -> hdfs://lvcluster/user/bigdata/.sparkStaging/application_1647396476966_0002/__spark_libs__7226558732161014901.zip
2022-03-16 10:43:56,526 INFO yarn.Client: Uploading resource file:/home/bigdata/env/spark-3.2.1-bin-hadoop3.2/examples/jars/spark-examples_2.12-3.2.1.jar -> hdfs://lvcluster/user/bigdata/.sparkStaging/application_1647396476966_0002/spark-examples_2.12-3.2.1.jar
2022-03-16 10:43:57,009 INFO yarn.Client: Uploading resource file:/tmp/spark-d6ff4da4-4283-43fb-a517-9085d51a1e82/__spark_conf__3589752284083344005.zip -> hdfs://lvcluster/user/bigdata/.sparkStaging/application_1647396476966_0002/__spark_conf__.zip
2022-03-16 10:43:57,203 INFO spark.SecurityManager: Changing view acls to: bigdata
2022-03-16 10:43:57,203 INFO spark.SecurityManager: Changing modify acls to: bigdata
2022-03-16 10:43:57,203 INFO spark.SecurityManager: Changing view acls groups to: 
2022-03-16 10:43:57,204 INFO spark.SecurityManager: Changing modify acls groups to: 
2022-03-16 10:43:57,204 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(bigdata); groups with view permissions: Set(); users  with modify permissions: Set(bigdata); groups with modify permissions: Set()
2022-03-16 10:43:57,254 INFO yarn.Client: Submitting application application_1647396476966_0002 to ResourceManager
2022-03-16 10:43:57,515 INFO impl.YarnClientImpl: Submitted application application_1647396476966_0002
2022-03-16 10:43:58,520 INFO yarn.Client: Application report for application_1647396476966_0002 (state: ACCEPTED)
2022-03-16 10:43:58,522 INFO yarn.Client: 
         client token: N/A
         diagnostics: AM container is launched, waiting for AM container to Register with RM
         ApplicationMaster host: N/A
         ApplicationMaster RPC port: -1
         queue: root.bigdata
         start time: 1647398637277
         final status: UNDEFINED
         tracking URL: http://server1:8088/proxy/application_1647396476966_0002/
         user: bigdata
2022-03-16 10:43:59,527 INFO yarn.Client: Application report for application_1647396476966_0002 (state: ACCEPTED)
2022-03-16 10:44:00,537 INFO yarn.Client: Application report for application_1647396476966_0002 (state: ACCEPTED)
2022-03-16 10:44:01,548 INFO yarn.Client: Application report for application_1647396476966_0002 (state: ACCEPTED)
2022-03-16 10:44:02,555 INFO yarn.Client: Application report for application_1647396476966_0002 (state: ACCEPTED)
2022-03-16 10:44:03,557 INFO yarn.Client: Application report for application_1647396476966_0002 (state: ACCEPTED)
2022-03-16 10:44:04,562 INFO yarn.Client: Application report for application_1647396476966_0002 (state: ACCEPTED)
2022-03-16 10:44:05,564 INFO yarn.Client: Application report for application_1647396476966_0002 (state: ACCEPTED)
2022-03-16 10:44:06,574 INFO yarn.Client: Application report for application_1647396476966_0002 (state: ACCEPTED)
2022-03-16 10:44:07,588 INFO yarn.Client: Application report for application_1647396476966_0002 (state: ACCEPTED)
2022-03-16 10:44:08,595 INFO yarn.Client: Application report for application_1647396476966_0002 (state: ACCEPTED)
2022-03-16 10:44:09,605 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:09,605 INFO yarn.Client: 
         client token: N/A
         diagnostics: N/A
         ApplicationMaster host: server1
         ApplicationMaster RPC port: 44451
         queue: root.bigdata
         start time: 1647398637277
         final status: UNDEFINED
         tracking URL: http://server1:8088/proxy/application_1647396476966_0002/
         user: bigdata
2022-03-16 10:44:10,617 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:11,630 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:12,643 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:13,653 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:14,658 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:15,667 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:16,709 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:17,722 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:18,727 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:19,730 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:20,737 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:21,749 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:22,752 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:23,760 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:24,782 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:25,791 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:26,793 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:27,803 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:28,809 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:29,822 INFO yarn.Client: Application report for application_1647396476966_0002 (state: FINISHED)
2022-03-16 10:44:29,823 INFO yarn.Client: 
         client token: N/A
         diagnostics: N/A
         ApplicationMaster host: server1
         ApplicationMaster RPC port: 44451
         queue: root.bigdata
         start time: 1647398637277
         final status: SUCCEEDED
         tracking URL: http://server1:8088/proxy/application_1647396476966_0002/
         user: bigdata
2022-03-16 10:44:29,843 INFO util.ShutdownHookManager: Shutdown hook called
2022-03-16 10:44:29,844 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-d6ff4da4-4283-43fb-a517-9085d51a1e82
2022-03-16 10:44:29,848 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-35dc976c-c371-4888-acc8-25e3a44d60a5

yarn web ui

在这里插入图片描述
在这里插入图片描述
在这里插入图片描述

yarn web ui 跳转到 spark web ui

在这里插入图片描述
在这里插入图片描述

还是比较简单,就不过多赘述。

  大数据 最新文章
实现Kafka至少消费一次
亚马逊云科技:还在苦于ETL?Zero ETL的时代
初探MapReduce
【SpringBoot框架篇】32.基于注解+redis实现
Elasticsearch:如何减少 Elasticsearch 集
Go redis操作
Redis面试题
专题五 Redis高并发场景
基于GBase8s和Calcite的多数据源查询
Redis——底层数据结构原理
上一篇文章      下一篇文章      查看所有文章
加:2022-03-17 22:14:33  更:2022-03-17 22:18:42 
 
开发: C++知识库 Java知识库 JavaScript Python PHP知识库 人工智能 区块链 大数据 移动开发 嵌入式 开发工具 数据结构与算法 开发测试 游戏开发 网络协议 系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑 笔记本 显卡 显示器 固态硬盘 硬盘 耳机 手机 iphone vivo oppo 小米 华为 单反 装机 图拉丁

360图书馆 购物 三丰科技 阅读网 日历 万年历 2024年11日历 -2024/11/24 6:59:04-

图片自动播放器
↓图片自动播放器↓
TxT小说阅读器
↓语音阅读,小说下载,古典文学↓
一键清除垃圾
↓轻轻一点,清除系统垃圾↓
图片批量下载器
↓批量下载图片,美女图库↓
  网站联系: qq:121756557 email:121756557@qq.com  IT数码