21/08/09 11:24:52 WARN server.TransportChannelHandler: Exception in connection from xxxx/172.19.167.56:65256
java.io.IOException: Connection reset by peer
21/08/09 11:24:52 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_1618886499908_1999_01_000004 on host: dn4.cdh.tcredit.com. Exit status: 52. Diagnostics: Exception from container-launch.
Stack trace: ExitCodeException exitCode=52:
21/08/09 11:24:52 ERROR cluster.YarnScheduler: Lost executor 3 on dn4.cdh.tcredit.com: Container marked as failed: container_1618886499908_1999_01_000004 on host: dn4.cdh.tcredit.com. Exit status: 52. Diagnostics: Exception from container-launch.
Stack trace: ExitCodeException exitCode=52:
21/08/09 11:24:52 WARN scheduler.TaskSetManager: Lost task 23.0 in stage 31.0 (TID 3688, dn4.cdh.tcredit.com, executor 3): ExecutorLostFailure (executor 3 exited caused by one of the running tasks) Reason: Container marked as failed: container_1618886499908_1999_01_000004 on host: dn4.cdh.tcredit.com. Exit status: 52. Diagnostics: Exception from container-launch.
Stack trace: ExitCodeException exitCode=52:
数据倾斜导致:该节点在shuffle过程中分配到过多的数据导致 - shuffle结果会分配到同一个节点中,导致该节点崩溃。
原资源配置为:
spark_run_conf=" --master yarn-client --queue root.tianchuang --driver-memory 4G --executor-memory 2G --executor-cores 2 "
修改后的资源配置:
spark_run_conf=" --master yarn-client --queue root.tianchuang --num-executors 50 --executor-memory 4G --executor-cores 8 "
|