天天看點

hadoop&spark叢集搭建1.準備工作2.實作ssh無密登入3.hadoop叢集搭建4.scala安裝5.spark安裝

1.準備工作

1.1 jdk下載下傳安裝

1.2 官網下載下傳:

scala-2.10.4.tgz(支援spark)

hadoop-2.6.0.tar.gz

spark-1.6.0-bin-hadoop2.6.tgz

1.3 準備三台虛拟機

centos6.3

位址:172.16.100.01,172.16.100.02,172.16.100.03,建立使用者:

useradd cluster

passwd cluster

修改三台機器的hosts,添加内容:

[[email protected] home]# vim /etc/hosts
           
172.16.100.01 master
172.16.100.02 slave1
172.16.100.03 slave2
           

2.實作ssh無密登入

ssh-keygen -t rsa(一路回車,圖形輸出表示成功)

cd /home/cluster/.ssh下面多出兩個檔案:

    私鑰檔案:id_raa

    公鑰檔案:id_rsa.pub

将三台虛拟機的公鑰id_rsa_pub的内容放到authorized_key中:

在/home/cluster/.shh目錄下執行:

    cat id_rsa.put >> authorized_keys

将authorized_keys放到另外兩台虛拟機下執行相同指令,最後将存入了三台虛拟機公鑰的authorized_keys檔案存入到三台虛拟機中。

修改三台虛拟機的authorized_keys檔案權限,chmod 644 authorized_keys

測試ssh之間是否互通,(互相測試是否通的,很重要)

# ssh 172.16.100.02
           

3.hadoop叢集搭建

先在master主機上配置

1)把下載下傳的hadoop-2.6.0.tar.gz解壓到hadoop目錄下

2)建立目錄:

mkdir -p /home/cluster/hadoop/{pids,storage}
mkdir -p /home/cluster/hadoop/storage/{hdfs,tmp}
mkdir -p /home/cluster/hadoop/storage/hdfs/{name,data}
           

3)配置環境變量:vim /etc/profile (也可以修改目前使用者的環境變量/home/cluster/.bashrc)

export HADOOP_HOME=/home/cluster/hadoop/hadoop-2.6.0  
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH  
           
source /etc/profile
           

4)修改配置檔案core-site.xml

<configuration>  
        <property>  
                <name>hadoop.tmp.dir</name>  
                <value>file:/home/cluster/hadoop/storage/tmp</value>  
        </property>  
        <property>  
                <name>fs.defaultFS</name>  
                <value>hdfs://master:9000</value>  
        </property>  
        <property>  
                <name>io.file.buffer.size</name>  
                <value>131072</value>  
        </property>
		<property>  
                <name>hadoop.proxyuser.spark.hosts</name>  
                <value>*</value>  
        </property>
		<property>  
                <name>hadoop.proxyuser.spark.groups</name>  
                <value>*</value>  
        </property>
		<property>  
                <name>hadoop.native.lib</name>  
                <value>true</value>  
        </property>
<configuration>  
           

5)修改配置檔案hdfs-site.xml

<configuration>  
		  <property>  
                <name>dfs.namenode.secondary.http-address</name>  
                <value>master:9001</value>  
        </property> 
        <property>  
                <name>dfs.namenode.name.dir</name>  
                <value>file:/home/cluster/hadoop/storage/hdfs/name</value>  
        </property>  
        <property>  
                <name>dfs.datanode.data.dir</name>  
                <value>file:/home/cluster/hadoop/storage/hdfs/data</value>  
        </property>  
        <property>  
                <name>dfs.replication</name>  
                <value>2</value>  
        </property>   
	    <property>  
	        <name>dfs.webhdfs.enabled</name>  
	        <value>true</value>  
	    </property>  
</configuration> 
           

6)修改配置檔案mapred-site.xml

<configuration>  
        <property>  
                <name>mapreduce.framework.name</name>  
                <value>yarn</value>  
                <final>true</final>  
        </property>  
  
    <property>  
        <name>mapreduce.jobtracker.http.address</name>  
        <value>master:50030</value>  
    </property>  
    <property>  
        <name>mapreduce.jobhistory.address</name>  
        <value>master:10020</value>  
    </property>  
    <property>  
        <name>mapreduce.jobhistory.webapp.address</name>  
        <value>master:19888</value>  
    </property>  
        <property>  
                <name>mapred.job.tracker</name>  
                <value>http://master:9001</value>  
        </property>  
</configuration> 
           

7)修改配置檔案yarn-site.xml

<configuration>  
 
        <property>  
                <name>yarn.resourcemanager.hostname</name>  
                <value> master</value>  
        </property>  
  
    <property>  
        <name>yarn.nodemanager.aux-services</name>  
        <value>mapreduce_shuffle</value>  
    </property>  
	 <property>  
        <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>  
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>  
    </property> 
    <property>  
        <name>yarn.resourcemanager.address</name>  
        <value> master:8032</value>  
    </property>  
    <property>  
        <name>yarn.resourcemanager.scheduler.address</name>  
        <value> master:8030</value>  
    </property>  
    <property>  
        <name>yarn.resourcemanager.resource-tracker.address</name>  
        <value> master:8031</value>  
    </property>  
    <property>  
        <name>yarn.resourcemanager.admin.address</name>  
        <value> master:8033</value>  
    </property>  
    <property>  
        <name>yarn.resourcemanager.webapp.address</name>  
        <value> master:8088</value>  
    </property>  
</configuration>  
           

8)vim  hadoop-env.sh  和yarn-env.sh  在開頭添加如下環境變量(一定要添加切勿少了)

    export JAVA_HOME=jdk路徑

9)配置slave

master
slave1
slave2
           

10)将整個hadoop檔案夾複制到其他兩台虛拟機下 scp指令:

scp -r hadoop 127.16.100.02:/home/cluster
scp -r hadoop 127.16.100.03:/home/cluster
           

并配置環境變量,關閉防火牆 service iptables stop  

11)驗證

在hadoop目錄下執行
bin/hdfs namenode –format (隻執行一次)
sbin/start-dfs.sh     #啟動HDFS
sbin/stop-dfs.sh     #停止HDFS
sbin/start-all.sh或者stop-all.sh
jps指令驗證

           
HDFS管理頁面http://10.10.4.124:50070
           

4.scala安裝

1)解壓

将下載下傳的scala-2.10.4.tgz解壓在/home/cluster/scala中

2)添加環境變量,并運作source /etc/profile使之生效

3)驗證,直接輸入scala

hadoop&amp;spark叢集搭建1.準備工作2.實作ssh無密登入3.hadoop叢集搭建4.scala安裝5.spark安裝

4)其他兩台虛拟機重複操作

5.spark安裝

1)解壓

将解壓的spark-1.6.0-bin-hadoop2.6.tgz解壓在/home/cluster/spark中(tar指令)

修改名字:mv spark-1.6.0-bin-hadoop2.6 spark

2)添加環境變量

export SPARK_HOME=/home/cluster/spark/spark
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
           

3)修改/home/cluster/spark/spark/conf目錄下:spark-env.sh

mv spark-env.sh.template spark-env.sh

添加環境變量:

hadoop&amp;spark叢集搭建1.準備工作2.實作ssh無密登入3.hadoop叢集搭建4.scala安裝5.spark安裝

4) mv log4j.properties.template log4j.properties

5)mv slaves.template  slaves

編輯内容:

master
slave01
slave02
           

6)修改相關目錄權限,否則無法啟動

chmod -R 777 spark

7)其他兩台虛拟機重複操作

8)啟動spark和關閉叢集

/home/cluster/spark/spark/sbin/start-all.sh

/home/cluster/spark/spark/sbin/stop-all.sh

jps指令檢視啟動程序情況

監控頁面:主機ip:8080

hadoop&amp;spark叢集搭建1.準備工作2.實作ssh無密登入3.hadoop叢集搭建4.scala安裝5.spark安裝

繼續閱讀