天天看點

spark本地環境的搭建到運作第一個spark程式

搭建spark本地環境

搭建Java環境

(1)到官網下載下傳JDK

官網連結:https://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html

(2)解壓縮到指定的目錄

>sudo tar -zxvf jdk-8u91-linux-x64.tar.gz -C /usr/lib/jdk //版本号視自己安裝的而定      

(3)設定路徑和環境變量

>sudo vim /etc/profile      

在檔案的最後加上

export JAVA_HOME=/usr/lib/jdk/jdk1.8.0_91   
export JRE_HOME=${JAVA_HOME}/jre  
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib  
export PATH=${JAVA_HOME}/bin:$PATH      

(4)讓配置生效

source /etc/profile      

(5)驗證安裝是否成功

~$ java -version
java version "1.8.0_181"
Java(TM) SE Runtime Environment (build 1.8.0_181-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode)      

安裝Scala

(1)到官網下載下傳安裝包

官網連結:https://www.scala-lang.org/download/

(2)解壓縮到指定目錄

sudo tar -zxvf scala-2.11.8.tgz -C /usr/lib/scala //版本号視自己安裝的而定      

(3)設定路徑和環境變量

>sudo vim /etc/profile      

在檔案最後加上

export SCALA_HOME=/usr/lib/scala/scala-2.11.8  //版本号視自己安裝的而定
export PATH=${SCALA_HOME}/bin:$PATH      

(4)讓配制生效

source /etc/profile      

(5)驗證安裝是否成功

:~$ scala
Welcome to Scala 2.12.6 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_181).
Type in expressions for evaluation. Or try :help.

scala>       

安裝Spark

(1)到官網下載下傳安裝包

官網連結:http://spark.apache.org/downloads.html

(2)解壓縮到指定目錄

sudo tar -zxvf spark-1.6.1-bin-hadoop2.6.tgz -C /usr/lib/spark //版本号視自己安裝的而定      

(3)設定路徑和環境變量

>sudo vim /etc/profile      

在檔案最後加上

export SPARK_HOME=/usr/lib/spark/spark-1.6.1-bin-hadoop2.6
export PATH=${SPARK_HOME}/bin:$PATH      

(4)讓配置生效

source /etc/profile      

(5)驗證安裝是否成功

:~$ cd spark-1.6.1-bin-hadoop2.6
:~/spark-1.6.1-bin-hadoop2.6$ ./bin/spark-shell
Using Spark\'s default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
18/09/30 20:59:31 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/09/30 20:59:32 WARN Utils: Your hostname, pxh resolves to a loopback address: 127.0.1.1; using 10.22.48.4 instead (on interface wlan0)
18/09/30 20:59:32 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
18/09/30 20:59:45 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
Spark context Web UI available at http://10.22.48.4:4040
Spark context available as \'sc\' (master = local[*], app id = local-1538312374870).
Spark session available as \'spark\'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  \'_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.2.0
      /_/
         
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_181)
Type in expressions to have them evaluated.
Type :help for more information.      

安裝sbt

(1)到官網下載下傳安裝包

官網連結:https://www.scala-sbt.org/download.html

(2)解壓縮到指定目錄

tar -zxvf sbt-0.13.9.tgz -C /usr/local/sbt      

(3)在/usr/local/sbt 建立sbt腳本并添加以下内容

$ cd /usr/local/sbt
$ vim sbt
# 在sbt文本檔案中添加如下資訊:
BT_OPTS="-Xms512M -Xmx1536M -Xss1M -XX:+CMSClassUnloadingEnabled -XX:MaxPermSize=256M"
java $SBT_OPTS -jar /usr/local/sbt/bin/sbt-launch.jar "$@"       

(4)儲存後,為sbt腳本增加執行權限

$ chmod u+x sbt      

(5)設定路徑和環境變量

>sudo vim /etc/profile      

在檔案最後加上

export PATH=/usr/local/sbt/:$PATH      

(6)讓配置生效

source /etc/profile      

(7)驗證安裝是否成功

$ sbt sbt-version
//如果這條指令運作不成功請改為以下這條 >sbt sbtVersion
$ sbt sbtVersion
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=256M; support was removed in 8.0
[info] Loading project definition from /home/pxh/project
[info] Set current project to pxh (in build file:/home/pxh/)
[info] 1.2.1      

編寫Scala應用程式

(1)在終端建立一個檔案夾sparkapp作為應用程式根目錄

cd ~
mkdir ./sparkapp
mkdir -p ./sparkapp/src/main/scala  #建立所需的檔案夾結構      

(2)./sparkapp/src/main/scala在建立一個SimpleApp.scala的檔案并添加以下代碼

import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf

object SimpleApp {
    def main(args:Array[String]){
        val logFile = "file:///home/pxh/hello.ts"
        val conf = new SparkConf().setAppName("Simple Application")
        val sc = new SparkContext(conf)
        val logData = sc.textFile(logFile,2).cache()
        val numAs = logData.filter(line => line.contains("a")).count()
        println("Lines with a: %s".format(numAs))
    }
}      

(3)添加該獨立應用程式的資訊以及與Spark的依賴關系

vim ./sparkapp/simple.sbt      

在檔案中添加如下内容

name:= "Simple Project"
version:= "1.0"
scalaVersion :="2.11.8"
libraryDependencies += "org.apache.spark"%% "spark-core" % "2.2.0"      

(4)檢查整個應用程式的檔案結構

cd ~/sparkapp
find .      

檔案結構如下

.
./simple.sbt
./src
./src/main
./src/main/scala
./src/main/scala/SimpleApp.scala      

(5)将整個應用程式打包成JAR(首次運作的話會花費較長時間下載下傳依賴包,請耐心等待)

sparkapp$ /usr/local/sbt/sbt package
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=256M; support was removed in 8.0
[info] Loading project definition from /home/pxh/sparkapp/project
[info] Loading settings for project sparkapp from simple.sbt ...
[info] Set current project to Simple Project (in build file:/home/pxh/sparkapp/)
[success] Total time: 2 s, completed 2018-10-1 0:04:59      

(6)将生成的jar包通過spark-submit送出到Spark中運作

:~$ /home/pxh/spark-2.2.0-bin-hadoop2.7/bin/spark-submit --class "SimpleApp" /home/pxh/sparkapp/target/scala-2.11/simple-project_2.11-1.0.jar 2>&1 | grep "Lines with a:"
Lines with a: 3      

END........