天天看点

编译安装Spark-0.9.0集群

其他基础环境安装请参考上一篇博文:

http://sofar.blog.51cto.com/353572/1352713

1、Scala 安装

http://www.scala-lang.org/files/archive/scala-2.10.3.tgz

# tar xvzf scala-2.10.3.tgz -C /usr/local

# cd /usr/local

# ln -s scala-2.10.3 scala

2、Spark 安装

http://d3kbcqa49mib13.cloudfront.net/spark-0.9.0-incubating.tgz

# tar xvzf spark-0.9.0-incubating.tgz -C /usr/local

# ln -s spark-0.9.0-incubating spark

# cd /usr/local/spark

# export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512m -XX:ReservedCodeCacheSize=512m"

# mvn -Pyarn -Dhadoop.version=2.2.0 -Dyarn.version=2.2.0 -DskipTests clean package

# cd /usr/local/spark/conf

# mv spark-env.sh.template spark-env.sh

# mkdir -p /data/spark/tmp

# vim spark-env.sh

export JAVA_HOME=/usr/local/jdk

export SCALA_HOME=/usr/local/scala

export HADOOP_HOME=/usr/local/hadoop

SPARK_LOCAL_DIR="/data/spark/tmp"

SPARK_JAVA_OPTS="-Dspark.storage.blockManagerHeartBeatMs=60000 -Dspark.local.dir=$SPARK_LOCAL_DIR -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:$SPARK_HOME/logs/gc.log -XX:+UseConcMarkSweepGC -XX:+UseCMSCompactAtFullCollection -XX:CMSInitiatingOccupancyFraction=60"

【注:在其他节点上也做同样配置】

## 在Master节点上执行

# cd /usr/local/spark && sbin/start-all.sh

3、相关测试

(1)、本地模式

# bin/run-example org.apache.spark.examples.SparkPi local

(2)、普通集群模式

# bin/run-example org.apache.spark.examples.SparkPi spark://namenode1:7077

# bin/run-example org.apache.spark.examples.SparkLR spark://namenode1:7077

# bin/run-example org.apache.spark.examples.SparkKMeans spark://namenode1:7077 file:/usr/local/spark/data/kmeans_data.txt 2 1

(3)、结合HDFS的集群模式

# hadoop fs -put README.md .

# MASTER=spark://namenode1:7077 bin/spark-shell

scala> val file = sc.textFile("hdfs://namenode1:9000/user/root/README.md")

scala> val count = file.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_+_)

scala> count.collect()

scala> :quit

继续阅读