環境:
Spark版本: 2.4.3
Kubernetes版本:v1.16.2
問題:
送出spark-submit example.jar 以cluster方式到k8s叢集,driver-pod報錯如下:
19/11/06 07:06:54 INFO ExecutorPodsAllocator: Going to request 5 executors from Kubernetes.
19/11/06 07:06:54 WARN WatchConnectionManager: Exec Failure: HTTP 403, Status: 403 -
java.net.ProtocolException: Expected HTTP 101 response but was '403 Forbidden'
at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:216)
at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:183)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:141)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
19/11/06 07:06:54 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed (this is expected if the application is shutting down.)
19/11/06 07:06:54 ERROR SparkContext: Error initializing SparkContext.
io.fabric8.kubernetes.client.KubernetesClientException:
at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$2.onFailure(WatchConnectionManager.java:201)
at okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:543)
at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:185)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:141)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
原因:
查了下,發現是EKS安全更新檔導緻Apache Spark作業失敗并出現權限錯誤。
Stack Overflow
EKS security patches cause Apache Spark jobs to fail with permissions error
Spark社群patch:
https://github.com/apache/spark/pull/25641
https://github.com/apache/spark/pull/25640
解決:
方法1. 該版本已在spark-2.4.4-release及之後版本修複,測試環境的話,直接替換修複後的spark版本或cherry-pick相關commit即可解決;
方法2. 問題的根本原因是spark依賴的jar包問題,是以可将spark/jars下的三個jar包,替換為4.4.0 及更高版本即可。
kubernetes-client-4.4.2.jar
kubernetes-model-4.4.2.jar
kubernetes-model-common-4.4.2.jar
jar包可通過maven倉庫擷取,如:
wget https://repo1.maven.org/maven2/io/fabric8/kubernetes-model/4.4.2/kubernetes-model-4.4.2.jar
補充:
1. 通過替換jar包方式,重新build并push鏡像後,重新spark-submit送出任務,發現仍報相同錯誤;
2. 原因應該是本地鏡像沒更新,仍然用的是舊的鏡像;
3. spark-submit 指令中添加: --conf spark.kubernetes.container.image.pullPolicy=Always,使用修改後新的image,問題解決。
至此,spark on kubernetes 官方demo完整送出指令如下:
spark-submit \
--master k8s://https://172.16.192.128:6443 \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.executor.instances=5 \
--conf spark.kubernetes.container.image=merrily01/repo:spark-2.4.3-image-merrily01 \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.kubernetes.container.image.pullPolicy=Always \
local:///opt/spark/examples/jars/spark-examples_2.11-2.4.3.jar