k8s on spark
spark 介绍
1、spark介绍
spark-submit可以直接用于将Spark应用程序提交到Kubernetes集群。提交机制的工作方式如下:
Spark创建在Kubernetes容器中运行的Spark驱动程序。
驱动程序将创建执行程序,这些执行程序也将在Kubernetes Pod中运行并连接到它们,并执行应用程序代码。
当应用程序完成时,执行程序pod终止并被清理,但是驱动程序pod保留日志,并在Kubernetes API中保持“完成”状态,直到最终对其进行垃圾收集或手动清理为止。
2、安装条件
部署k8s集群
节点可用内存大于2G
安装JAVA环境,jdk>=8
文档地址:http://spark.apache.org/docs/latest/running-on-kubernetes.html
安装部署
3、下载安装包
[[email protected] ~]# wget http://archive.apache.org/dist/spark/spark-2.4.3/spark-2.4.3-bin-hadoop2.7.tgz
[[email protected] ~]# tar xf spark-2.4.3-bin-hadoop2.7.tgz
[[email protected] ~]# mv spark-2.4.3-bin-hadoop2.7 /usr/local/spark-2.4.3
//添加环境变量
[[email protected] spark-2.4.3]# cat /etc/profile
export PATH=/usr/local/spark-2.4.3:$PATH
4、创建docker镜像
[[email protected] spark-2.4.0]# ./bin/docker-image-tool.sh -r wxtime -t 2.4.0 build
[[email protected] ~]# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
wxtime/spark-r 2.4.0 592aff869ffb 4 days ago 756MB
wxtime/spark-py 2.4.0 47e104fe2827 4 days ago 462MB
wxtime/spark
[[email protected] ~]# docker login 2.4.0 24aab7c864da 4 days ago 371MB
[[email protected] spark-2.4.0]# ./bin/docker-image-tool.sh -r wxtime -t 2.4.0 push
[[email protected] spark-2.4.0]# kubectl cluster-info
Kubernetes master is running at https://192.168.1.101:6443
5、测试
[[email protected] spark-2.4.0]# ./bin/spark-shell
scala> sc.parallelize(1 to 1000).count()
res1: Long = 1000
[[email protected] spark-2.4.0] kubectl create serviceaccount spark
[[email protected] spark-2.4.0] kubectl create clusterrolebinding spark-role --clusterrole=edit --service account=default:spark --namespace=default
6、以集群模式启动SparkPi
bin/spark-submit \
--master k8s://https://10.10.0.224:6443 \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.executor.instances=5 \
--conf spark.kubernetes.container.image=wxtime/spark:2.4.3 \
--conf spark.kubernetes.container.image.pullPolicy=Always \
local:///opt/spark/examples/jars/spark-examples_2.11-2.4.3.jar