一、背景

spark版本：2.3.1
scala版本：2.11.8

二、conf配置说明

选项	值	说明	组合
spark.sql.crossJoin.enabled	true	值为true时，sql进行迪卡尔积join运算	1
spark.dynamicAllocation.enabled	ture	值为true时，spark就会启动ExecutorAllocationManager，动态管理执行器；	2
spark.shuffle.service.enabled	ture	值为true时，spark动态管理shuffle服务，与 ExecutorAllocationManager配合使用	2
spark.dynamicAllocation.initialExecutors	数值	初始化执行器数量	2
spark.dynamicAllocation.maxExecutors	数值	最多执行器数量	2
spark.dynamicAllocation.minExecutors	数值	最少执行器数量	2
spark.default.parallelism	数值	task的并行度，num-executors * executor-cores的2~3倍较为合适；该参数比较重要	3
spark.sql.adaptive.enabled	true	默认为false，自适应执行框架的开关	4
spark.sql.adaptive.skewedJoin.enabled	true	默认为 false ，倾斜处理开关	4
spark.driver.extraJavaOptions	-Dlog4j.configuration=file:log4j.properties / -Xss30M	driver 的jvm参数	5
spark.hadoop.ipc.client.fallback-to-simple-auth-allowed	true	hdfs跨集群数据迁移	6
spark.shuffle.memoryFraction	0.3	该参数代表了Executor内存中，分配给shuffle read task进行聚合操作的内存比例，默认是20%	7
spark.storage.memoryFraction	0.5	用于设置RDD持久化数据在Executor内存中能占的比例，默认是0.6,，默认Executor 60%的内存，可以用来保存持久化的RDD数据	8
hive.metastore.client.factory.class	com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory	aws Glue 数据单元管理	9
hive.exec.dynamici.partition	true	hive写操作，动态分区	10
hive.exec.dynamic.partition.mode	nonstrict	hive写操作，动态分区	10
spark.sql.sources.partitionOverwriteMode	dynamic	hive覆盖分区：动态分区	10

三、conf设置方式

3.1、代码配置

scala 两种设置如下

import org.apache.spark.sql.SparkSession


val spark: SparkSession = SparkSession.builder()
  .config(
    "hive.metastore.client.factory.class",
    "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory"
  ) // aws Glue 数据单元管理
  .enableHiveSupport()
  .config("hive.exec.dynamici.partition", true)
  .config("hive.exec.dynamic.partition.mode", "nonstrict")
  .getOrCreate()
  
spark.conf.set("spark.sql.sources.partitionOverwriteMode", "dynamic")

3.2、提交形式

spark-submit \
--name conf_example \
--master yarn \
--deploy-mode cluster \
--num-executors 1 \
--executor-cores 1 \
--executor-memory 1G \
--driver-memory 1G \
--class xxx.xxxx.xxxxx.xxx.xxxx \
--files conf.properties,log4j.properties,log4j2.xml \
--conf spark.hadoop.ipc.client.fallback-to-simple-auth-allowed=true \
--conf spark.driver.extraJavaOptions=-Dlog4j.configuration=file:log4j.properties \
--jars sss.jar,wwqq.jar \
main.jar

大数据最新文章

实现Kafka至少消费一次

亚马逊云科技：还在苦于ETL？Zero ETL的时代

初探MapReduce

【SpringBoot框架篇】32.基于注解+redis实现

Elasticsearch：如何减少 Elasticsearch 集

Go redis操作

Redis面试题

专题五 Redis高并发场景

基于GBase8s和Calcite的多数据源查询

Redis——底层数据结构原理

加:2021-11-24 08:01:04 更:2021-11-24 08:02:28

360图书馆购物三丰科技阅读网日历万年历 2025年11日历

-2025/11/27 16:31:16-

图片自动播放器
↓图片自动播放器↓

TxT小说阅读器
↓语音阅读,小说下载,古典文学↓

一键清除垃圾
↓轻轻一点,清除系统垃圾↓

图片批量下载器
↓批量下载图片,美女图库↓

网站联系: qq:121756557 email:121756557@qq.com IT数码