hadoop叢集搭建
tools:
SecureCRT8.0
VMware12
RHEL 6.6 x64
網絡規劃:
hadoop0 192.168.248.150
hadoop1 192.168.248.151
hadoop2 192.168.248.152
hadoop3 192.168.248.153
設定靜态ip:
sudo vim /etc/sysconfig/network-scripts/ifcfg-eth0
IPADDR=192.168.248.15?
如果網絡不行(由于虛拟機克隆)
檢查/etc/udev/rule.d/70-persistent-net.rules
ifcfg-eth0中的HWADDR
設定hostname:
vim /etc/sysconfig/network
HOSTNAME=hadoop?
service iptables stop
vim /etc/selinux/config
SELINUX=disabled
setenforce 0
建立使用者:
useradd hadoop
passwd hadoop #hadoop
配置主機映射:
vim /etc/hosts
192.168.248.150 hadoop0
192.168.248.151 hadoop1
192.168.248.152 hadoop2
192.168.248.153 hadoop3
開啟ssh免密登入(NameNode登入其他DataNode)
生成非對稱秘鑰
su - hadoop
ssh-keygen -t rsa
拷貝NameNode公鑰到其他節點上
[hadoop@hadoop0 ~]$
cp id_rsa.pub authorized_keys
scp ~/.ssh/id_rsa.pub hadoop1:~/.ssh/authorized_keys
scp ~/.ssh/id_rsa.pub hadoop2:~/.ssh/authorized_keys
scp ~/.ssh/id_rsa.pub hadoop3:~/.ssh/authorized_keys
如果不能ssh免密登入,則其它節點上
chmod 600 authorized_keys
sftp上傳
sftp> ls
sftp> lcd D:\歸檔\軟體\筆記本\linux
sftp> lls
jdk-8u40-linux-x64.tar.gz web
sftp> put jdk-8u40-linux-x64.tar.gz
scp jdk-8u40-linux-x64.tar.gz hadoop?:~/
tar zxf jdk-8u40-linux-x64.tar.gz
su - root
mv /home/hadoop/jdk1.8.0_40/ /opt/
配置環境變量:
vim /etc/profile
export JAVA_HOME=/opt/jdk1.8.0_40
export JRE_HOME=/opt/jdk1.8.0_40/jre
export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH
export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH
配置生效:
source /etc/profile
驗證:
java -version
先配置一個節點(master,再複制到其它slave中)
sftp> put hadoop-2.6.4.tar.gz
[hadoop@hadoop0 ~]$ tar zxf hadoop-2.6.4.tar.gz
vim /etc/profile
export HADOOP_HOME=/home/hadoop/hadoop-2.6.4
export PATH=$HADOOP_HOME/bin:$PATH
配置檔案放在
$HADOOP_HOME/etc/hadoop/
下
修改一下配置:
vim hadoop-env.sh
export JAVA_HOME=/opt/jdk1.8.0_40
vim yarn-env.sh
export JAVA_HOME=/opt/jdk1.8.0_40
vim slaves (這裡沒有了master配置檔案)
hadoop1
hadoop2
hadoop3
vim core-site.xml
<configuration> <property> <name>hadoop.tmp.dir</name> <value>/home/hadoop/hadoop/tmp</value> <description>A base for other temporary directories.</description> </property> <property> <name>fs.defaultFS</name> <value>hdfs://hadoop0:9000</value> </property> <property> <name>io.file.buffer.size</name> <value>4096</value> </property> </configuration>
vim hdfs-site.xml
<configuration> <property> <name>dfs.namenode.name.dir</name> <value>/home/hadoop/hadoop/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/home/hadoop/hadoop/data</value> </property> <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.nameservices</name> <value>hadoop0</value> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>hadoop0:50090</value> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> <property> <name>dfs.permissions</name> <value>false</value> </property> </configuration>
cp mapred-site.xml.template mapred-site.xml
vim mapred-site.xml
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> <final>true</final> </property> <property> <name>mapreduce.jobtracker.http.address</name> <value>hadoop0:50030</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>hadoop0:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>hadoop0:19888</value> </property> <property> <name>mapred.job.tracker</name> <value>hadoop0:9001</value> </property> </configuration>
vim yarn-site.xml
<configuration> <!-- Site specific YARN configuration properties --> <property> <name>yarn.resourcemanager.hostname</name> <value>hadoop0</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>hadoop0:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>hadoop0:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>hadoop0:8031</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>hadoop0:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>hadoop0:8088</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> </configuration>
複制到其它節點中
[hadoop@hadoop0 ~]$ scp -r hadoop-2.6.4 hadoop?:~/
格式化檔案系統
namenode格式化檔案系統
$hadoop namenode -format
hadoop指令在~/hadoop-2.6.2/bin中
啟動程序
[hadoop@hadoop0 hadoop-2.6.4]$ sbin/hadoop-daemon.sh start namenode
[hadoop@hadoop0 hadoop-2.6.4]$ sbin/hadoop-daemon.sh start datanode
或者一次性啟動
[hadoop@hadoop0 hadoop-2.6.4]$ sbin/start-all.sh
檢查節點配置情況
hadoop dfsadmin -report
錯誤情況:
可能配置tag錯誤;
拒絕連接配接,關閉防火牆
[hadoop@hadoop0 hadoop-2.6.4]$ jps
4272 ResourceManager
4032 NameNode
4534 Jps
關閉
[hadoop@hadoop0 hadoop-2.6.4]$ sbin/stop-all.sh
網頁節點管理
http://192.168.248.150:8088/cluster
網頁資源管理
http://192.168.248.150:50070/dfshealth.html#tab-overview
參考:
http://blog.csdn.net/yaoxtao/article/details/49488181 http://www.cnblogs.com/hanganglin/p/4349919.htmlApache官網上提供的Hadoop本地庫是32位的,如果我們的Linux伺服器是64位的話,就會現問題。
我們在64位伺服器執行Hadoop指令時,則會報以下錯誤: WARNutil.NativeCodeLoader: Unable to load native-hadoop library for yourplatform... using builtin-Java classes where applicable
為了解決上述問題,我們就需要自己編譯一個64位的hadoop版本。
編譯hadoop2.6.1需要的軟體
yum install gcc
yum install cmake
yum install gcc-c++
wget
http://www-eu.apache.org/dist/maven/maven-3/3.3.9/binaries/apache-maven-3.3.9-bin.tar.gz安裝maven:
http://www.blogjava.net/caojianhua/archive/2011/04/02/347559.htmltar xvf apache-maven-3.3.9-bin.tar.gz mv apache-maven-3.3.9/ /usr/local/ vim /etc/profile export MAVEN_HOME=/usr/local/apache-maven-3.3.9 export PATH=$PATH:$MAVEN_HOME/bin source /etc/profile [root@hadoop0 hadoop]# mvn -v Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-11T00:41:47+08:00) Maven home: /usr/local/apache-maven-3.3.9 Java version: 1.8.0_40, vendor: Oracle Corporation Java home: /opt/jdk1.8.0_40/jre Default locale: zh_CN, platform encoding: UTF-8 OS name: "linux", version: "2.6.32-504.el6.x86_64", arch: "amd64", family: "unix"
要求版本protobuf-2.5.0
https://github.com/google/protobuf/releases/download/v2.5.0/protobuf-2.5.0.tar.gzcd protobuf-2.5.0/
./configure -prefix=/usr/local/protobuf-2.5.0
make && make install
vim /etc/profile
export PROTOBUF=/usr/local/protobuf-2.5.0
export PATH=PROTOBUF/bin:PATH
protoc –version
http://www-eu.apache.org/dist//ant/binaries/apache-ant-1.9.7-bin.tar.gz
tar zxf apache-ant-1.9.7-bin.tar.gz mv apache-ant-1.9.7/ /usr/local/ vim /etc/profile export ANT_HOME=/usr/local/apache-ant-1.9.7 export PATH=$PATH:$ANT_HOME/bin source /etc/profile yum install autoconf automake libtool yum install openssl-devel
安裝findbugs
http://findbugs.sourceforge.net/downloads.htmlvim /etc/profile
export FINDBUGS_HOME=/usr/local/findbugs-3.0.1/ export PATH=$FINDBUGS_HOME/bin:$PATH
http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.6.4/hadoop-2.6.4-src.tar.gz
tar zxf hadoop-2.6.4-src.tar.gz cd hadoop-2.6.4-src/
more BUILDING.txt
檢視如何編譯安裝
mvn package -DskipTests -Pdist,native,docs,其中docs根據需要添加。
or
mvn clean package -Pdist,native -DskipTests -Dtar
編譯過程中,需要下載下傳很多包,等待時間比較長。當看到hadoop各個項目都編譯成功,即出現一系列的SUCCESS之後,即為編譯成功。
編譯大概需要二三十分鐘,看到一堆的SUCCESS,說明編譯成功,編譯後的項目在hadoop-2.6.0-src/hadoop-dist/target/hadoop-2.6.0中。
隻需要将編譯後項目中的/hadoop-dist/lib/native檔案夾覆寫至之前32位的項目中即可(當然整個項目覆寫也可以),并在配置檔案
$HADOOP_HOME/etc/hadoop/hadoop-env.sh末尾添加:export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib:$HADOOP_HOME/lib/native"。
編譯過程中需要下載下傳安裝包,有時候可能由于網絡的原因,導緻安裝包下載下傳不完整,而出現編譯錯誤。
錯誤1:
Remote host closed connection during handshake: SSL peer shut down incorrectly.......
解決方案:需要重新新多編譯幾次即可通過。
mvn package -Pdist,native -DskipTests -Dtar
錯誤2:
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-javadoc-plugin:2.8.1:jar (module-javadocs) on project hadoop-annotations: MavenReportException: Error while creating archive:
[ERROR] Exit code: 1 - /home/hadoop/hadoop-2.6.4-src/hadoop-common-project/hadoop-annotations/src/main/java/org/apache/hadoop/classification/InterfaceStability.java:27: 錯誤: 意外的結束标記:
[ERROR] *
[ERROR] ^
[ERROR]
[ERROR] Command line was: /opt/jdk1.8.0_40/jre/../bin/javadoc @options @packages
[ERROR] Refer to the generated Javadoc files in ‘/home/hadoop/hadoop-2.6.4-src/hadoop-common-project/hadoop-annotations/target’ dir.
[ERROR] -> [Help 1]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1]
http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn -rf :hadoop-annotations
更換jdk1.7
http://www.oracle.com/technetwork/java/javase/downloads/java-archive-downloads-javase7-521261.html#jdk-7u80-oth-JPRtar zxf jdk-7u80-linux-x64.gz
mv jdk1.7.0_80/ /opt/
錯誤3:
[ERROR] Failed to execute goal org.apache.hadoop:hadoop-maven-plugins:2.6.4:protoc (compile-protoc) on project hadoop-common: org.apache.maven.plugin.MojoExecutionException: protoc version is ‘libprotoc 2.6.1’, expected version is ‘2.5.0’ -
protocbuf版本不對,下2.5.0
http://pan.baidu.com/s/1pJlZubTcd protobuf-2.5.0
./configure –prefix=/usr/local/protobuf-2.5.0/
make
make install
export PATH=$PATH:/usr/local/protobuf-2.5.0/bin
錯誤4:
[ERROR] Failed to execute goal org.codehaus.mojo.jspc:jspc-maven-plugin:2.0-alpha-3:compile (hdfs) on project hadoop-hdfs: Execution hdfs of goal org.codehaus.mojo.jspc:jspc-maven-plugin:2.0-alpha-3:compile failed: Plugin org.codehaus.mojo.jspc:jspc-maven-plugin:2.0-alpha-3 or one of its dependencies could not be resolved: The following artifacts could not be resolved: tomcat:jasper-compiler-jdt:jar:5.5.15, org.eclipse.jdt:core:jar:3.1.1: Could not transfer artifact tomcat:jasper-compiler-jdt:jar:5.5.15 from/to central (
https://repo.maven.apache.org/maven2): GET request of: tomcat/jasper-compiler-jdt/5.5.15/jasper-compiler-jdt-5.5.15.jar from central failed: Connection reset -> [Help 1]
mvn包缺少,可以自己到
https://repo.maven.apache.org/maven2/tomcat/jasper-compiler-jdt/5.5.15/下載下傳,放入~/.m2/指定目錄
mvn package -Pdist,native -DskipTests -Dtar -e -X
編譯完成
[INFO] hadoop-yarn-server-nodemanager ..................... SUCCESS [04:57 min] [INFO] hadoop-yarn-server-web-proxy ....................... SUCCESS [ 7.926 s] [INFO] hadoop-yarn-server-applicationhistoryservice ....... SUCCESS [ 11.552 s] [INFO] hadoop-yarn-server-resourcemanager ................. SUCCESS [ 37.191 s] [INFO] hadoop-yarn-server-tests ........................... SUCCESS [ 11.227 s] [INFO] hadoop-yarn-client ................................. SUCCESS [ 11.344 s] [INFO] hadoop-yarn-applications ........................... SUCCESS [ 0.236 s] [INFO] hadoop-yarn-applications-distributedshell .......... SUCCESS [ 6.616 s] [INFO] hadoop-yarn-applications-unmanaged-am-launcher ..... SUCCESS [ 3.877 s] [INFO] hadoop-yarn-site ................................... SUCCESS [ 0.171 s] [INFO] hadoop-yarn-registry ............................... SUCCESS [ 10.513 s] [INFO] hadoop-yarn-project ................................ SUCCESS [ 10.549 s] [INFO] hadoop-mapreduce-client ............................ SUCCESS [ 0.274 s] [INFO] hadoop-mapreduce-client-core ....................... SUCCESS [ 37.510 s] [INFO] hadoop-mapreduce-client-common ..................... SUCCESS [ 26.697 s] [INFO] hadoop-mapreduce-client-shuffle .................... SUCCESS [ 8.445 s] [INFO] hadoop-mapreduce-client-app ........................ SUCCESS [ 16.746 s] [INFO] hadoop-mapreduce-client-hs ......................... SUCCESS [ 14.682 s] [INFO] hadoop-mapreduce-client-jobclient .................. SUCCESS [02:22 min] [INFO] hadoop-mapreduce-client-hs-plugins ................. SUCCESS [ 4.852 s] [INFO] Apache Hadoop MapReduce Examples ................... SUCCESS [ 9.098 s] [INFO] hadoop-mapreduce ................................... SUCCESS [ 8.473 s] [INFO] Apache Hadoop MapReduce Streaming .................. SUCCESS [ 44.417 s] [INFO] Apache Hadoop Distributed Copy ..................... SUCCESS [ 56.786 s] [INFO] Apache Hadoop Archives ............................. SUCCESS [ 5.743 s] [INFO] Apache Hadoop Rumen ................................ SUCCESS [ 9.956 s] [INFO] Apache Hadoop Gridmix .............................. SUCCESS [ 8.721 s] [INFO] Apache Hadoop Data Join ............................ SUCCESS [ 6.602 s] [INFO] Apache Hadoop Ant Tasks ............................ SUCCESS [ 4.001 s] [INFO] Apache Hadoop Extras ............................... SUCCESS [ 6.550 s] [INFO] Apache Hadoop Pipes ................................ SUCCESS [ 16.753 s] [INFO] Apache Hadoop OpenStack support .................... SUCCESS [ 9.256 s] [INFO] Apache Hadoop Amazon Web Services support .......... SUCCESS [06:50 min] [INFO] Apache Hadoop Client ............................... SUCCESS [ 15.419 s] [INFO] Apache Hadoop Mini-Cluster ......................... SUCCESS [ 0.856 s] [INFO] Apache Hadoop Scheduler Load Simulator ............. SUCCESS [ 10.593 s] [INFO] Apache Hadoop Tools Dist ........................... SUCCESS [ 22.169 s] [INFO] Apache Hadoop Tools ................................ SUCCESS [ 0.118 s] [INFO] Apache Hadoop Distribution ......................... SUCCESS [01:20 min] [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 03:01 h [INFO] Finished at: 2016-07-23T11:25:52+08:00 [INFO] Final Memory: 90M/237M [INFO] ------------------------------------------------------------------------
在hadoop-dist/target/ 已經生成了可執行檔案,再次執行之前的配置hadoop的方法即可
複制之前的配置的時候用nano編輯
[hadoop@hadoop0 hadoop-2.6.4]$ sbin/start-all.sh This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh Starting namenodes on [hadoop0] hadoop0: starting namenode, logging to /home/hadoop/hadoop-2.6.4/logs/hadoop-hadoop-namenode-hadoop0.out hadoop2: starting datanode, logging to /home/hadoop/hadoop-2.6.4/logs/hadoop-hadoop-datanode-hadoop2.out hadoop1: starting datanode, logging to /home/hadoop/hadoop-2.6.4/logs/hadoop-hadoop-datanode-hadoop1.out hadoop3: starting datanode, logging to /home/hadoop/hadoop-2.6.4/logs/hadoop-hadoop-datanode-hadoop3.out starting yarn daemons starting resourcemanager, logging to /home/hadoop/hadoop-2.6.4/logs/yarn-hadoop-resourcemanager-hadoop0.out hadoop3: starting nodemanager, logging to /home/hadoop/hadoop-2.6.4/logs/yarn-hadoop-nodemanager-hadoop3.out hadoop2: starting nodemanager, logging to /home/hadoop/hadoop-2.6.4/logs/yarn-hadoop-nodemanager-hadoop2.out hadoop1: starting nodemanager, logging to /home/hadoop/hadoop-2.6.4/logs/yarn-hadoop-nodemanager-hadoop1.out
下載下傳hadoop-eclipse-plugin-2.6.4.jar
放入hadoop/plugins/
解壓縮hadoop-2.6.4.tar.gz到D:\develop\hadoop-2.6.4,在eclipse的Windows->Preferences的Hadoop Map/Reduce中設定安裝目錄。
打開Windows->Open Perspective中的Map/Reduce,在此perspective下進行hadoop程式開發。
打開Windows->Show View中的Map/Reduce Locations,選擇New Hadoop location…建立hadoop連接配接。
Map/Reduce(V2)Master和DFS Master的Host、Port分别為:
mapred-site.xml和core-site.xml中配置的位址及端口
配置Advanced parameters
設定參數hadoop.tmp.dir為core-site.xml中的配置
An internal error occurred during: “Map/Reduce location status updater”.
java.lang.NullPointerException
“NullPointerException” is just Because you don’t have any directory in your hdfs.
Try that to build a directory:
hdfs dfs -mkdir -p input
測試:
hadoop fs -put ./myTest*.txt input