Spark-Shell 及 Spark-Submit
Spark-Shell
说明
spark-shell 是 Spark 自带的交互式 Shell 程序,方便用户进行交互式编程,用户可以在该命令行下可以用 scala 编写 spark 程序,适合学习测试时使用
操作命令
spark-shell
spark-shell --master local[N]
spark-shell --master local[*]
spark-shell --master spark://node01:7077,node02:7077
Spark-Submit
说明
apache.org
spark-submit 命令用来帮我们提交 jar 包给 Spark 集群/YARN ,让 Spark 集群/YARN 去执行
SPARK_HOME=/opt/server/spark
${SPARK_HOME}/bin/spark-submit --help
操作命令
① 在本地执行任务
SPARK_HOME=/opt/server/spark
${SPARK_HOME}/bin/spark-submit \
--master local[2] \
--class org.apache.spark.examples.SparkPi \
${SPARK_HOME}/examples/jars/spark-examples_2.11-2.4.5.jar \
10
② 在集群执行任务
SPARK_HOME=/opt/server/spark
${SPARK_HOME}/bin/spark-submit \
--master spark://node1:7077 \
--class org.apache.spark.examples.SparkPi \
${SPARK_HOME}/examples/jars/spark-examples_2.11-2.4.5.jar \
10
SPARK_HOME=/opt/server/spark
${SPARK_HOME}/bin/spark-submit \
--master spark://node1:7077,node2:7077 \
--class org.apache.spark.examples.SparkPi \
${SPARK_HOME}/examples/jars/spark-examples_2.11-2.4.5.jar \
10
常用参数
① 基本参数
--master
--master local[2]
--master spark://node1:7077,node2:7077
--master yarn
--deploy-mode cluster
--class org.apache.spark.examples.SparkPi
--name NAME
--jars JARS
--conf PROP=VALUE
--conf "PROP=VALUE"
--conf spark.eventLog.enabled=fasle
--conf "spark.executor.extraJavaOptions=-XX:PrintGCDetails -XX:+PrintGCTimeStamps"
② Driver Program 参数
--driver-memory MEM
--driver-class-path
--driver-cores NUM
--driver-cores NUM
--supervise
③ Executor 参数
--executor-memory MEM
--executor-memory 512m
--executor-cores NUM
--total-executor-cores NUM
--num-executors
--queue QUEUE_NAME
示例
本地模式: 8 个线程运行
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master local[8] \
/path/to/examples.jar \
100
Standalone 模式:Driver 运行在客户端
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master spark://node1:7077 \
--executor-memory 20G \
--total-executor-cores 100 \
/path/to/examples.jar \
1000
Standalone 模式:Driver 运行在集群,Driver 运行异常失败,可以自己重启
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master spark://node1:7077 \
--deploy-mode cluster \
--supervise \
--executor-memory 20G \
--total-executor-cores 100 \
/path/to/examples.jar \
1000
Yarn 集群模式
export HADOOP_CONF_DIR=XXX
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode cluster \
--executor-memory 20G \
--num-executors 50 \
/path/to/examples.jar \
1000
Standalone 模式:Driver 运行在集群,运行一个 python 程序
./bin/spark-submit \
--master spark://207.184.161.138:7077 \
examples/src/main/python/pi.py \
1000
Mesos集群:Driver 运行在集群上
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master mesos://207.184.161.138:7077 \
--deploy-mode cluster \
--supervise \
--executor-memory 20G \
--total-executor-cores 100 \
http://path/to/examples.jar \
1000
Kubernetes 集群
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master k8s://xx.yy.zz.ww:443 \
--deploy-mode cluster \
--executor-memory 20G \
--num-executors 50 \
http://path/to/examples.jar \
1000
|