1.準備工作
1.1 jdk下載下傳安裝
1.2 官網下載下傳:
scala-2.10.4.tgz(支援spark)
hadoop-2.6.0.tar.gz
spark-1.6.0-bin-hadoop2.6.tgz
1.3 準備三台虛拟機
centos6.3
位址:172.16.100.01,172.16.100.02,172.16.100.03,建立使用者:
useradd cluster
passwd cluster
修改三台機器的hosts,添加内容:
[[email protected] home]# vim /etc/hosts
172.16.100.01 master
172.16.100.02 slave1
172.16.100.03 slave2
2.實作ssh無密登入
ssh-keygen -t rsa(一路回車,圖形輸出表示成功)
cd /home/cluster/.ssh下面多出兩個檔案:
私鑰檔案:id_raa
公鑰檔案:id_rsa.pub
将三台虛拟機的公鑰id_rsa_pub的内容放到authorized_key中:
在/home/cluster/.shh目錄下執行:
cat id_rsa.put >> authorized_keys
将authorized_keys放到另外兩台虛拟機下執行相同指令,最後将存入了三台虛拟機公鑰的authorized_keys檔案存入到三台虛拟機中。
修改三台虛拟機的authorized_keys檔案權限,chmod 644 authorized_keys
測試ssh之間是否互通,(互相測試是否通的,很重要)
# ssh 172.16.100.02
3.hadoop叢集搭建
先在master主機上配置
1)把下載下傳的hadoop-2.6.0.tar.gz解壓到hadoop目錄下
2)建立目錄:
mkdir -p /home/cluster/hadoop/{pids,storage}
mkdir -p /home/cluster/hadoop/storage/{hdfs,tmp}
mkdir -p /home/cluster/hadoop/storage/hdfs/{name,data}
3)配置環境變量:vim /etc/profile (也可以修改目前使用者的環境變量/home/cluster/.bashrc)
export HADOOP_HOME=/home/cluster/hadoop/hadoop-2.6.0
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
source /etc/profile
4)修改配置檔案core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/home/cluster/hadoop/storage/tmp</value>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>hadoop.proxyuser.spark.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.spark.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.native.lib</name>
<value>true</value>
</property>
<configuration>
5)修改配置檔案hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>master:9001</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/cluster/hadoop/storage/hdfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/cluster/hadoop/storage/hdfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
6)修改配置檔案mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<final>true</final>
</property>
<property>
<name>mapreduce.jobtracker.http.address</name>
<value>master:50030</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master:19888</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>http://master:9001</value>
</property>
</configuration>
7)修改配置檔案yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value> master</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value> master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value> master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value> master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value> master:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value> master:8088</value>
</property>
</configuration>
8)vim hadoop-env.sh 和yarn-env.sh 在開頭添加如下環境變量(一定要添加切勿少了)
export JAVA_HOME=jdk路徑
9)配置slave
master
slave1
slave2
10)将整個hadoop檔案夾複制到其他兩台虛拟機下 scp指令:
scp -r hadoop 127.16.100.02:/home/cluster
scp -r hadoop 127.16.100.03:/home/cluster
并配置環境變量,關閉防火牆 service iptables stop
11)驗證
在hadoop目錄下執行
bin/hdfs namenode –format (隻執行一次)
sbin/start-dfs.sh #啟動HDFS
sbin/stop-dfs.sh #停止HDFS
sbin/start-all.sh或者stop-all.sh
jps指令驗證
HDFS管理頁面http://10.10.4.124:50070
4.scala安裝
1)解壓
将下載下傳的scala-2.10.4.tgz解壓在/home/cluster/scala中
2)添加環境變量,并運作source /etc/profile使之生效
3)驗證,直接輸入scala
4)其他兩台虛拟機重複操作
5.spark安裝
1)解壓
将解壓的spark-1.6.0-bin-hadoop2.6.tgz解壓在/home/cluster/spark中(tar指令)
修改名字:mv spark-1.6.0-bin-hadoop2.6 spark
2)添加環境變量
export SPARK_HOME=/home/cluster/spark/spark
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
3)修改/home/cluster/spark/spark/conf目錄下:spark-env.sh
mv spark-env.sh.template spark-env.sh
添加環境變量:
4) mv log4j.properties.template log4j.properties
5)mv slaves.template slaves
編輯内容:
master
slave01
slave02
6)修改相關目錄權限,否則無法啟動
chmod -R 777 spark
7)其他兩台虛拟機重複操作
8)啟動spark和關閉叢集
/home/cluster/spark/spark/sbin/start-all.sh
/home/cluster/spark/spark/sbin/stop-all.sh
jps指令檢視啟動程序情況
監控頁面:主機ip:8080