天天看点

zeppelin 基于yarn-cluster模式环境部署

一、环境:

Spark-2.2.1-bin-hadoop2.7、zeppelin-0.8.1-bin-all、hadoop及hive环境

二、步骤:

1、下载zeppelin0.8.0及以上版本(支持yarn-cluster模式),下载地址:

https://zeppelin.apache.org/download.html  zeppelin-0.8.1-bin-all.tgz

(1)解压:tar -zxvf  zeppelin-0.8.1-bin-all.tgz  -C  /dir

(2)配置java、zeppelin环境变量JAVA_HOME、ZEPPELIN_HOME

export  ZEPPELIN_HOME=/usr/local/package/zeppelin-0.8.1-bin-all

export PATH=$PATH:$ZEPPELIN_HOME/bin

2、修改配置文件

mv shiro.ini.template shiro.ini

mv zeppelin-env.sh.template zeppelin-env.sh

mv zeppelin-site.xml.template zeppelin-site.xml

2.1 修改zeppelin-env.sh:

export MASTER=yarn

export HADOOP_CONF_DIR=/opt/hadoop/etc/hadoop

export SPARK_HOME=/opt/spark-2.2.1-bin-hadoop2.7

export SPARK_SUBMIT_OPTIONS="--files /opt/spark-2.2.1-bin-hadoop2.7/conf/hive-site.xml --conf spark.executor.heartbeatInterval=60s --conf spark.network.timeout=360s  --executor-memory 4G --driver-memory 6g  --executor-cores 40"

export ZEPPELIN_CONF_DIR=/opt/zeppelin-0.8.1-bin-all/conf

export ZEPPELIN_HOME=/opt/zeppelin-0.8.1-bin-all

2.2 指定jar包位置:

方法一:通过SPARK_SUBMIT_OPTIONS  --jars指定,example:

dependenceDir=/opt/spark-2.2.1-bin-hadoop2.7
for file in ${dependenceDir}/jars/*.jar
do
                if [ -n "$jar_files" ]; then
                        jar_files="$jar_files,$file"
                else
                        jar_files="$file"
                fi
done
export SPARK_SUBMIT_OPTIONS="--files ${jar_files}"
           

注:通过这种方式每次提交任务spark就会把$SPARK_HOME/jars/所有的jar上传到分布式缓存中,任务提交缓慢

方法二:

将$SPARK_HOME/jars/* 下spark运行依赖的jar上传到hdfs上

hadoop fs -mkdir hdfs://dbmtimehadoop/tmp/spark/lib_jars/

hadoop fs -put  $SPARK_HOME/jars/* hdfs://dbmtimehadoop/tmp/spark/lib_jars/

vi $SPARK_HOME/conf/spark-defaults.conf

添加如下内容:

spark.yarn.jars hdfs://dbmtimehadoop/tmp/spark/lib_jars/*.jar

参考:https://www.cnblogs.com/honeybee/p/6379599.html?utm_source=itdadao&utm_medium=referral

2.3 配置文件zeppelin-site.xml添加如下:

<property>

  <name>zeppelin.spark.concurrentSQL</name>

  <value>true</value>

  <description>Server address</description>

</property>

支持zeppelin多个任务并行运行

2.4 启动zeppelin,修改如下选项

zeppelin 基于yarn-cluster模式环境部署

到此为止,zeppelin安装完成,权限等的修改参照官方文档:https://zeppelin.apache.org/download.html

继续阅读