天天看点

Spark-Core源码精读(2)、Master中的schedule详解

首先,上篇给大家介绍了spark中master,跟work的启动并注册源码之后,说明集群已经启动成功了,本篇来向大家介绍下spark中application提交到集群中的master资源调度源码,是怎么资源调度的,然后work上面的Driver跟work进程是怎么启动并开始工作的,好了,废话不多说直接开始

###什么时候会调用schedule?(资源调度)

其实每当一个新的application加入或者资源发生变化的时候都会调用schudule方法对资源进行重新分配,那么它是如何分配资源的呢?我们下面进行源码级别的分析。

schedule

我们先贴出schedule的源码:

private def schedule(): Unit = {
    //首先判断 master状态不是alive的话,直接返回
    //也就是说 statndby master是不会进行资源调度的
    if (state != RecoveryState.ALIVE) {
      return
    }
    // Drivers take strict precedence over executors
    //先注册Driver然后再注册executors
    //使用shuffle方法 获取存活的work节点
    //Random.shuffle的原理,大家要清楚,就是对传入的集合的元素
    //1、进行随机的打乱 目的:以免每次都将Driver分配到同一个work上面
    val shuffledAliveWorkers = Random.shuffle(workers.toSeq.filter(_.state == WorkerState.ALIVE))
    //得到Alive  状态的work长度
    val numWorkersAlive = shuffledAliveWorkers.size
    var curPos = 0
    //开始遍历 work
    //首先,调度driver
    /**
      * 为什么要调度driver,spark-on-yarn  cluster模式,Driver会被调度
      * standalone跟client模式,都会在本地直接启动driver
      * 而不会来注册driver,就更不可能让master调度driver了
      */
    //2、循环遍历启动Drivers  等待的drivers队列 一个数组
    for (driver <- waitingDrivers.toList) { // iterate over a copy of waitingDrivers
      // We assign workers to each waiting driver in a round-robin fashion. For each driver, we
      // start from the last worker that was assigned a driver, and continue onwards until we have
      // explored all alive workers.
      var launched = false
      // 2.1、判断是否有剩余的没有分配的Workers,并且尚未启动
      //已经获取到work状态 将cisited += 1
      var numWorkersVisited = 0
      //只要还有或者的work没有遍历到 就继续遍历
      //而且还是当前的driver还没有被启动 也就是launched = false
      while (numWorkersVisited < numWorkersAlive && !launched) {
        // 2.2、获取一个Worker,第一个的索引为0,后面的索引根据curPos = (curPos + 1) % numWorkersAlive进行计算
        val worker = shuffledAliveWorkers(curPos)
        numWorkersVisited += 1

        //如果当前的worker的空闲内存量 大于等于driver的内存
        //并且worker的空闲cpu数量 大于等于driver需要的cpu数量
        if (worker.memoryFree >= driver.desc.mem && worker.coresFree >=
          driver.desc.cores) {
          //启动driver
          launchDriver(worker, driver)
          //启动完成之后,并且将driver从waitingdrivbers队列中移除
          waitingDrivers -= driver
          launched = true
        }
        //将指针指向下一个work
        curPos = (curPos + 1) % numWorkersAlive
      }
    }
    //当所有的Driver都分配完了,以后,开始启动Executors
    startExecutorsOnWorkers()
  }
           
介绍一下上面的那个随机打乱方法

进入方法: Random.shuffle(workers.toSeq.filter(_.state == WorkerState.ALIVE))

def shuffle[T, CC[X] <: TraversableOnce[X]](xs: CC[T])(implicit bf: CanBuildFrom[CC[T], T, CC[T]]): CC[T] = {
    val buf = new ArrayBuffer[T] ++= xs

    def swap(i1: Int, i2: Int) {
      val tmp = buf(i1)
      buf(i1) = buf(i2)
      buf(i2) = tmp
    }
	//调用了swap方法 将数组内的数据调换位置了,
    for (n <- buf.length to 2 by -1) {
      val k = nextInt(n)
      swap(n - 1, k)
    }

    (bf(xs) ++= buf).result()
  }
           
启动Driver

我已经在上面的源码中对分配的流程进行了详细的注释,现在我们看一下launchDriver方法:

//在某一个worker上,启动driver
  private def launchDriver(worker: WorkerInfo, driver: DriverInfo) {
    //1、打印日志
    logInfo("Launching driver " + driver.id + " on worker " + worker.id)
    //将driver加入worker内存的缓存结构
    //将worker内使用的内存和cpu数量,都加上driver需要的内存和spu数量
    //2、向worker中添加driver的信息,包括增加已经使用的内存和cpu数量
    worker.addDriver(driver)
    //3、同时把worker也加入到driver内部的缓存结构中去
    driver.worker = Some(worker)
    //然后调用worker的endpoint 老版本是actor
    //4、发送LaunchDriver消息 让work来启动drivder
    worker.endpoint.send(LaunchDriver(driver.id, driver.desc))
    //5、将driver状态设置成RUNNING
    driver.state = DriverState.RUNNING
  }
           

接下来我们看一下对应的Worker在接收到LaunchDriver消息后是怎么处理的:

进入到worker的源码中找到LaunchDriver方法中

//work接收到LaunchDriver消息后 怎么处理的
    case LaunchDriver(driverId, driverDesc) =>
      //1、打日志
      logInfo(s"Asked to launch driver $driverId")
      //2、实例化DriverRunner线程


      val driver = new DriverRunner(
        conf,
        driverId,
        workDir,
        sparkHome,
        driverDesc.copy(command = Worker.maybeUpdateSSLSettings(driverDesc.command, conf)),
        self,
        workerUri,
        securityMgr)
      //3、实例化完成后 向drivers中添加该drivers的记录
      drivers(driverId) = driver
      //4、启动driver
      driver.start()
  //启动完成后记录资源的变化情况
      //然后跟踪start方法
      coresUsed += driverDesc.cores
      memoryUsed += driverDesc.mem
           

继续跟踪driver.start():

//启动一个线程来运行和管理driver
  private[worker] def start() = {
    new Thread("DriverRunner for " + driverId) {
      override def run() {
        var shutdownHook: AnyRef = null
        try {
          shutdownHook = ShutdownHookManager.addShutdownHook { () =>
            logInfo(s"Worker shutting down, killing driver $driverId")
            kill()
          }

          // prepare driver jars and run driver
          //准备driver需要的jar包 工作目录创建并返回路径的名称等信息

          //点进去这个方法 具体的启动Driver操作 在这里面
          val exitCode = prepareAndRunDriver()
           

进入prepareAndRunDriver()看看

private[worker] def prepareAndRunDriver(): Int = {
    val driverDir = createWorkingDirectory()
    val localJarFilename = downloadUserJar(driverDir)

    def substituteVariables(argument: String): String = argument match {
      case "{{WORKER_URL}}" => workerUrl
      case "{{USER_JAR}}" => localJarFilename
      case other => other
    }

    // TODO: If we add ability to submit multiple jars they should also be added here
    val builder = CommandUtils.buildProcessBuilder(driverDesc.command, securityManager,
      driverDesc.mem, sparkHome.getAbsolutePath, substituteVariables)
  //buildProcessBuilder   都是通过这个类启动的Driver
    runDriver(builder, driverDir, driverDesc.supervise)
  }
           

可以看到runDriver这个类 启动Driver(具体的启动Driver的操作,这里不再详细分析)

如果启动成功最后要向worker发送一条DriverStateChanged的消息,而Worker在接收到该消息后会调用handleDriverStateChanged方法进行一系列处理,具体的处理细节就不再说明,主要的就是向Master发送一条driverStateChanged的消息,Master在接收到该消息后移除Driver的信息:

case DriverStateChanged(driverId, state, exception) =>
      state match {
        case DriverState.ERROR | DriverState.FINISHED | DriverState.KILLED | DriverState.FAILED =>
          removeDriver(driverId, state, exception)
        case _ =>
          throw new Exception(s"Received unexpected state update for driver $driverId: $state")
      }
           
至此向Driver分配资源并启动Driver的过程结束,下面我们看一下启动Executors即执行startExecutorsOnWorkers()的流程。
//开始在work上启动executor
  private def startExecutorsOnWorkers(): Unit = {
    //采用的是先进先出的原则
    // Right now this is a very simple FIFO scheduler. We keep trying to fit in the first app
    // in the queue, then the second app, etc.
    for (app <- waitingApps if app.coresLeft > 0) {
      val coresPerExecutor: Option[Int] = app.desc.coresPerExecutor
      // Filter out workers that don't have enough resources to launch an executor
      //1  过滤出ALIVE状态并且满足资源要求的workers,同时按照空闲cpu cores的个数倒叙排列
      val usableWorkers = workers.toArray.filter(_.state == WorkerState.ALIVE)
        .filter(worker => worker.memoryFree >= app.desc.memoryPerExecutorMB &&
          worker.coresFree >= coresPerExecutor.getOrElse(1))
        .sortBy(_.coresFree).reverse
      //默认是spreadOutApps  true 2中算法?
      //开始在works上调度
      //2  决定再每隔worker上面分配多少个cpu cores
      val assignedCores = scheduleExecutorsOnWorkers(app, usableWorkers, spreadOutApps)

      // Now that we've decided how many cores to allocate on each worker, let's allocate them
      //3  开始分配
      for (pos <- 0 until usableWorkers.length if assignedCores(pos) > 0) {
        allocateWorkerResourceToExecutors(
          app, assignedCores(pos), coresPerExecutor, usableWorkers(pos))
      }
    }
  }
           

上面步骤中:如何决定在每个worker上分配多少个cores的

这里面有2中算法,默认使用第一种:

算法1:executors分配到尽可能多的workers上,比如有10个work节点,每个有10个core,你spark-submit提交的资源是20个core,那么每个work都会使用2个core,来跑你的spark程序,好处就是尽可能将所有的worker都使用

算法2:跟第一种相反,比如有10个work节点,每个有10个core,你spark-submit提交的资源是20个core,那么他会使用前2个work节点,每个节点使用10个core,其他8个空闲的,

默认使用第一种方式。

接下来就是具体为executors分配计算资源并启动executors的过程:
private def allocateWorkerResourceToExecutors(
      app: ApplicationInfo,
      assignedCores: Int,
      coresPerExecutor: Option[Int],
      worker: WorkerInfo): Unit = {
    // If the number of cores per executor is specified, we divide the cores assigned
    // to this worker evenly among the executors with no remainder.
    // Otherwise, we launch a single executor that grabs all the assignedCores on this worker.
    val numExecutors = coresPerExecutor.map { assignedCores / _ }.getOrElse(1)
    val coresToAssign = coresPerExecutor.getOrElse(assignedCores)
    for (i <- 1 to numExecutors) {
      //向application中添加executor的信息
      val exec = app.addExecutor(worker, coresToAssign)
      //启动executors
      launchExecutor(worker, exec)
      app.state = ApplicationState.RUNNING
    }
  }
           

启动executors:

//这里是干嘛的
  private def launchExecutor(worker: WorkerInfo, exec: ExecutorDesc): Unit = {
    logInfo("Launching executor " + exec.fullId + " on worker " + worker.id)
    //将executor加入worker内部的缓存
    worker.addExecutor(exec)
    //向worker发消息启动executor
    worker.endpoint.send(LaunchExecutor(masterUrl,
      exec.application.id, exec.id, exec.application.desc, exec.cores, exec.memory))
    //向executor对应的application的driver,发送ExecutorAddeds消息
    exec.application.driver.send(
      ExecutorAdded(exec.id, worker.id, worker.hostPort, exec.cores, exec.memory))
  }
           

worker在接收到启动executor的消息后执行具体的启动操作,并向Master汇报。

然后也要向driver发送executors的资源信息,driver收到信息后执行application,至此分配并启动executors的大致流程也就执行完毕。

最后用一张图总结一下启动Driver和Worker的简易流程:

Spark-Core源码精读(2)、Master中的schedule详解

继续阅读