Spark源码解析 - spark-submit提交过程

以Spark 3.2.0版本为基准,yarn-cluster 模式的 spark application 提交过程如下

Client 提交 AM 到 yarn 过程

通过 spark-submit 提交 spark 任务之后,client 端将 ApplicationMaster 提交到 yarn 的过程如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
org.apache.spark.deploy.SparkSubmit#main:1052
org.apache.spark.deploy.SparkSubmit#doSubmit:1043
org.apache.spark.deploy.SparkSubmit#doSubmit:90
org.apache.spark.deploy.SparkSubmit#submit:203
org.apache.spark.deploy.SparkSubmit#submit:165
org.apache.spark.deploy.SparkSubmit#runMain:898
org.apache.spark.deploy.SparkSubmit#prepareSubmitEnvironment:748
org.apache.spark.deploy.SparkSubmit#runMain:939
org.apache.spark.deploy.SparkSubmit#runMain:955
org.apache.spark.deploy.yarn.YarnClusterApplication#start:1675
org.apache.spark.deploy.yarn.Client#run:1268
org.apache.spark.deploy.yarn.Client#submitApplication:203
org.apache.spark.deploy.yarn.Client#createApplicationSubmissionContext:1032
org.apache.spark.deploy.yarn.Client#submitApplication:207
// 至此,yarn 会启动 container 执行 org.apache.spark.deploy.yarn.ApplicationMaster
sequenceDiagram
    participant SparkSubmit
    participant YarnClusterApplication
    participant Client as org.apache.spark.deploy.yarn.Client
    SparkSubmit->>SparkSubmit: main(1052)
    SparkSubmit->>SparkSubmit: doSubmit(1043)
    SparkSubmit->>SparkSubmit: doSubmit(90)
    SparkSubmit->>SparkSubmit: submit(203)
    SparkSubmit->>SparkSubmit: submit(165)
    SparkSubmit->>SparkSubmit: runMain(898)
    SparkSubmit->>SparkSubmit: prepareSubmitEnvironment(748)
    SparkSubmit->>SparkSubmit: runMain(939)
    SparkSubmit->>SparkSubmit: runMain(955)
    Note right of SparkSubmit: 启动YarnClusterApplication
    SparkSubmit->>YarnClusterApplication: start(1675)
    YarnClusterApplication->>Client: run(1268)
    Client->>Client: submitApplication(203)
    Client->>Client: createApplicationSubmissionContext(1032)
    Client->>Client: submitApplication(207)

Driver 启动过程

1
2
3
4
5
org.apache.spark.deploy.yarn.ApplicationMaster#main:913
org.apache.spark.deploy.yarn.ApplicationMaster#run:273
org.apache.spark.deploy.yarn.ApplicationMaster#runDriver:501
org.apache.spark.deploy.yarn.ApplicationMaster#startUserApplication:737
// 至此,在driver上调用了用户编写类的main方法
sequenceDiagram
    participant ApplicationMaster
    ApplicationMaster->>ApplicationMaster: main(913)
    ApplicationMaster->>ApplicationMaster: run(273)
    ApplicationMaster->>ApplicationMaster: runDriver(501)

后续在Driver上,用户程序会调用spark的能力做执行,如SparkSession#sql,DataFrame#show

作者

jszero

发布于

2025-08-09

更新于

2025-08-18

许可协议

评论