天天看点

hadoop2.6.4 安装和编译hadoop集群搭建

hadoop集群搭建

tools:

SecureCRT8.0

VMware12

RHEL 6.6 x64

网络规划:

hadoop0 192.168.248.150

hadoop1 192.168.248.151

hadoop2 192.168.248.152

hadoop3 192.168.248.153

设置静态ip:

sudo vim /etc/sysconfig/network-scripts/ifcfg-eth0

IPADDR=192.168.248.15?

如果网络不行(由于虚拟机克隆)

检查/etc/udev/rule.d/70-persistent-net.rules

ifcfg-eth0中的HWADDR

设置hostname:

vim /etc/sysconfig/network

HOSTNAME=hadoop?

service iptables stop

vim /etc/selinux/config

SELINUX=disabled

setenforce 0

创建用户:

useradd hadoop

passwd hadoop #hadoop

配置主机映射:

vim /etc/hosts

192.168.248.150 hadoop0

192.168.248.151 hadoop1

192.168.248.152 hadoop2

192.168.248.153 hadoop3

开启ssh免密登录(NameNode登录其他DataNode)

生成非对称秘钥

su - hadoop

ssh-keygen -t rsa

拷贝NameNode公钥到其他节点上

[hadoop@hadoop0 ~]$

cp id_rsa.pub authorized_keys

scp ~/.ssh/id_rsa.pub hadoop1:~/.ssh/authorized_keys

scp ~/.ssh/id_rsa.pub hadoop2:~/.ssh/authorized_keys

scp ~/.ssh/id_rsa.pub hadoop3:~/.ssh/authorized_keys

如果不能ssh免密登录,则其它节点上

chmod 600 authorized_keys

sftp上传

sftp> ls

sftp> lcd D:\归档\软件\笔记本\linux

sftp> lls

jdk-8u40-linux-x64.tar.gz web

sftp> put jdk-8u40-linux-x64.tar.gz

scp jdk-8u40-linux-x64.tar.gz hadoop?:~/

tar zxf jdk-8u40-linux-x64.tar.gz

su - root

mv /home/hadoop/jdk1.8.0_40/ /opt/

配置环境变量:

vim /etc/profile

export JAVA_HOME=/opt/jdk1.8.0_40

export JRE_HOME=/opt/jdk1.8.0_40/jre

export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH

export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH

配置生效:

source /etc/profile

验证:

java -version

先配置一个节点(master,再复制到其它slave中)

sftp> put hadoop-2.6.4.tar.gz

[hadoop@hadoop0 ~]$ tar zxf hadoop-2.6.4.tar.gz

vim /etc/profile

export HADOOP_HOME=/home/hadoop/hadoop-2.6.4

export PATH=$HADOOP_HOME/bin:$PATH

配置文件放在

$HADOOP_HOME/etc/hadoop/

修改一下配置:

vim hadoop-env.sh

export JAVA_HOME=/opt/jdk1.8.0_40

vim yarn-env.sh

export JAVA_HOME=/opt/jdk1.8.0_40

vim slaves (这里没有了master配置文件)

hadoop1

hadoop2

hadoop3

vim core-site.xml

<configuration>             <property>                     <name>hadoop.tmp.dir</name>                     <value>/home/hadoop/hadoop/tmp</value>                     <description>A base for other temporary directories.</description>             </property>             <property>                     <name>fs.defaultFS</name>                     <value>hdfs://hadoop0:9000</value>             </property>             <property>                     <name>io.file.buffer.size</name>                     <value>4096</value>             </property>     </configuration>           

vim hdfs-site.xml

<configuration>             <property>                     <name>dfs.namenode.name.dir</name>                     <value>/home/hadoop/hadoop/name</value>             </property>             <property>                     <name>dfs.datanode.data.dir</name>                     <value>/home/hadoop/hadoop/data</value>             </property>             <property>                     <name>dfs.replication</name>                     <value>2</value>             </property>             <property>                     <name>dfs.nameservices</name>                     <value>hadoop0</value>             </property>             <property>                     <name>dfs.namenode.secondary.http-address</name>                     <value>hadoop0:50090</value>             </property>             <property>                     <name>dfs.webhdfs.enabled</name>                     <value>true</value>             </property>             <property>                     <name>dfs.permissions</name>                     <value>false</value>             </property>     </configuration>           

cp mapred-site.xml.template mapred-site.xml

vim mapred-site.xml

<configuration>             <property>                     <name>mapreduce.framework.name</name>                     <value>yarn</value>                     <final>true</final>             </property>             <property>                     <name>mapreduce.jobtracker.http.address</name>                     <value>hadoop0:50030</value>             </property>             <property>                     <name>mapreduce.jobhistory.address</name>                     <value>hadoop0:10020</value>             </property>             <property>                     <name>mapreduce.jobhistory.webapp.address</name>                     <value>hadoop0:19888</value>             </property>             <property>                     <name>mapred.job.tracker</name>                     <value>hadoop0:9001</value>             </property>     </configuration>           

vim yarn-site.xml

<configuration>     <!-- Site specific YARN configuration properties -->             <property>                     <name>yarn.resourcemanager.hostname</name>                     <value>hadoop0</value>             </property>             <property>                     <name>yarn.nodemanager.aux-services</name>                     <value>mapreduce_shuffle</value>             </property>             <property>                     <name>yarn.resourcemanager.address</name>                     <value>hadoop0:8032</value>             </property>             <property>                     <name>yarn.resourcemanager.scheduler.address</name>                     <value>hadoop0:8030</value>             </property>             <property>                     <name>yarn.resourcemanager.resource-tracker.address</name>                     <value>hadoop0:8031</value>             </property>             <property>                     <name>yarn.resourcemanager.admin.address</name>                     <value>hadoop0:8033</value>             </property>             <property>                     <name>yarn.resourcemanager.webapp.address</name>                     <value>hadoop0:8088</value>             </property>             <property>                     <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>                     <value>org.apache.hadoop.mapred.ShuffleHandler</value>             </property>     </configuration>           

复制到其它节点中

[hadoop@hadoop0 ~]$ scp -r hadoop-2.6.4 hadoop?:~/

格式化文件系统

namenode格式化文件系统

$hadoop namenode -format

hadoop命令在~/hadoop-2.6.2/bin中

启动进程

[hadoop@hadoop0 hadoop-2.6.4]$ sbin/hadoop-daemon.sh start namenode

[hadoop@hadoop0 hadoop-2.6.4]$ sbin/hadoop-daemon.sh start datanode

或者一次性启动

[hadoop@hadoop0 hadoop-2.6.4]$ sbin/start-all.sh

检查节点配置情况

hadoop dfsadmin -report

错误情况:

可能配置tag错误;

拒绝连接,关闭防火墙

[hadoop@hadoop0 hadoop-2.6.4]$ jps

4272 ResourceManager

4032 NameNode

4534 Jps

关闭

[hadoop@hadoop0 hadoop-2.6.4]$ sbin/stop-all.sh

网页节点管理

http://192.168.248.150:8088/cluster

网页资源管理

http://192.168.248.150:50070/dfshealth.html#tab-overview

参考:

http://blog.csdn.net/yaoxtao/article/details/49488181 http://www.cnblogs.com/hanganglin/p/4349919.html

Apache官网上提供的Hadoop本地库是32位的,如果我们的Linux服务器是64位的话,就会现问题。

我们在64位服务器执行Hadoop命令时,则会报以下错误:     WARNutil.NativeCodeLoader: Unable to load native-hadoop library for yourplatform... using builtin-Java classes where applicable           

为了解决上述问题,我们就需要自己编译一个64位的hadoop版本。

编译hadoop2.6.1需要的软件

yum install gcc

yum install cmake

yum install gcc-c++

wget

http://www-eu.apache.org/dist/maven/maven-3/3.3.9/binaries/apache-maven-3.3.9-bin.tar.gz

安装maven:

http://www.blogjava.net/caojianhua/archive/2011/04/02/347559.html
tar xvf apache-maven-3.3.9-bin.tar.gz     mv apache-maven-3.3.9/ /usr/local/     vim /etc/profile     export MAVEN_HOME=/usr/local/apache-maven-3.3.9     export PATH=$PATH:$MAVEN_HOME/bin     source /etc/profile     [root@hadoop0 hadoop]# mvn -v     Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-11T00:41:47+08:00)     Maven home: /usr/local/apache-maven-3.3.9     Java version: 1.8.0_40, vendor: Oracle Corporation     Java home: /opt/jdk1.8.0_40/jre     Default locale: zh_CN, platform encoding: UTF-8     OS name: "linux", version: "2.6.32-504.el6.x86_64", arch: "amd64", family: "unix"           

要求版本protobuf-2.5.0

https://github.com/google/protobuf/releases/download/v2.5.0/protobuf-2.5.0.tar.gz

cd protobuf-2.5.0/

./configure -prefix=/usr/local/protobuf-2.5.0

make && make install

vim /etc/profile

export PROTOBUF=/usr/local/protobuf-2.5.0

export PATH=PROTOBUF/bin:PATH

protoc –version

http://www-eu.apache.org/dist//ant/binaries/apache-ant-1.9.7-bin.tar.gz
tar zxf apache-ant-1.9.7-bin.tar.gz     mv apache-ant-1.9.7/ /usr/local/     vim /etc/profile     export ANT_HOME=/usr/local/apache-ant-1.9.7     export PATH=$PATH:$ANT_HOME/bin      source /etc/profile     yum install autoconf automake libtool     yum install openssl-devel           

安装findbugs

http://findbugs.sourceforge.net/downloads.html

vim /etc/profile

export FINDBUGS_HOME=/usr/local/findbugs-3.0.1/     export PATH=$FINDBUGS_HOME/bin:$PATH           

http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.6.4/hadoop-2.6.4-src.tar.gz
tar zxf hadoop-2.6.4-src.tar.gz     cd hadoop-2.6.4-src/           

more BUILDING.txt

查看如何编译安装

mvn package -DskipTests -Pdist,native,docs,其中docs根据需要添加。

or

mvn clean package -Pdist,native -DskipTests -Dtar

编译过程中,需要下载很多包,等待时间比较长。当看到hadoop各个项目都编译成功,即出现一系列的SUCCESS之后,即为编译成功。

编译大概需要二三十分钟,看到一堆的SUCCESS,说明编译成功,编译后的项目在hadoop-2.6.0-src/hadoop-dist/target/hadoop-2.6.0中。

  只需要将编译后项目中的/hadoop-dist/lib/native文件夹覆盖至之前32位的项目中即可(当然整个项目覆盖也可以),并在配置文件

$HADOOP_HOME/etc/hadoop/hadoop-env.sh末尾添加:export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib:$HADOOP_HOME/lib/native"。

编译过程中需要下载安装包,有时候可能由于网络的原因,导致安装包下载不完整,而出现编译错误。

错误1:

Remote host closed connection during handshake: SSL peer shut down incorrectly.......           

解决方案:需要重新新多编译几次即可通过。

mvn package -Pdist,native -DskipTests -Dtar

错误2:

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-javadoc-plugin:2.8.1:jar (module-javadocs) on project hadoop-annotations: MavenReportException: Error while creating archive:

[ERROR] Exit code: 1 - /home/hadoop/hadoop-2.6.4-src/hadoop-common-project/hadoop-annotations/src/main/java/org/apache/hadoop/classification/InterfaceStability.java:27: 错误: 意外的结束标记:

[ERROR] *

[ERROR] ^

[ERROR]

[ERROR] Command line was: /opt/jdk1.8.0_40/jre/../bin/javadoc @options @packages

[ERROR] Refer to the generated Javadoc files in ‘/home/hadoop/hadoop-2.6.4-src/hadoop-common-project/hadoop-annotations/target’ dir.

[ERROR] -> [Help 1]

[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.

[ERROR] Re-run Maven using the -X switch to enable full debug logging.

[ERROR] For more information about the errors and possible solutions, please read the following articles:

[ERROR] [Help 1]

http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException

[ERROR] After correcting the problems, you can resume the build with the command

[ERROR] mvn -rf :hadoop-annotations

更换jdk1.7

http://www.oracle.com/technetwork/java/javase/downloads/java-archive-downloads-javase7-521261.html#jdk-7u80-oth-JPR

tar zxf jdk-7u80-linux-x64.gz

mv jdk1.7.0_80/ /opt/

错误3:

[ERROR] Failed to execute goal org.apache.hadoop:hadoop-maven-plugins:2.6.4:protoc (compile-protoc) on project hadoop-common: org.apache.maven.plugin.MojoExecutionException: protoc version is ‘libprotoc 2.6.1’, expected version is ‘2.5.0’ -

protocbuf版本不对,下2.5.0

http://pan.baidu.com/s/1pJlZubT

cd protobuf-2.5.0

./configure –prefix=/usr/local/protobuf-2.5.0/

make

make install

export PATH=$PATH:/usr/local/protobuf-2.5.0/bin

错误4:

[ERROR] Failed to execute goal org.codehaus.mojo.jspc:jspc-maven-plugin:2.0-alpha-3:compile (hdfs) on project hadoop-hdfs: Execution hdfs of goal org.codehaus.mojo.jspc:jspc-maven-plugin:2.0-alpha-3:compile failed: Plugin org.codehaus.mojo.jspc:jspc-maven-plugin:2.0-alpha-3 or one of its dependencies could not be resolved: The following artifacts could not be resolved: tomcat:jasper-compiler-jdt:jar:5.5.15, org.eclipse.jdt:core:jar:3.1.1: Could not transfer artifact tomcat:jasper-compiler-jdt:jar:5.5.15 from/to central (

https://repo.maven.apache.org/maven2

): GET request of: tomcat/jasper-compiler-jdt/5.5.15/jasper-compiler-jdt-5.5.15.jar from central failed: Connection reset -> [Help 1]

mvn包缺少,可以自己到

https://repo.maven.apache.org/maven2/tomcat/jasper-compiler-jdt/5.5.15/

下载,放入~/.m2/指定目录

mvn package -Pdist,native -DskipTests -Dtar -e -X

编译完成

[INFO] hadoop-yarn-server-nodemanager ..................... SUCCESS [04:57 min]     [INFO] hadoop-yarn-server-web-proxy ....................... SUCCESS [  7.926 s]     [INFO] hadoop-yarn-server-applicationhistoryservice ....... SUCCESS [ 11.552 s]     [INFO] hadoop-yarn-server-resourcemanager ................. SUCCESS [ 37.191 s]     [INFO] hadoop-yarn-server-tests ........................... SUCCESS [ 11.227 s]     [INFO] hadoop-yarn-client ................................. SUCCESS [ 11.344 s]     [INFO] hadoop-yarn-applications ........................... SUCCESS [  0.236 s]     [INFO] hadoop-yarn-applications-distributedshell .......... SUCCESS [  6.616 s]     [INFO] hadoop-yarn-applications-unmanaged-am-launcher ..... SUCCESS [  3.877 s]     [INFO] hadoop-yarn-site ................................... SUCCESS [  0.171 s]     [INFO] hadoop-yarn-registry ............................... SUCCESS [ 10.513 s]     [INFO] hadoop-yarn-project ................................ SUCCESS [ 10.549 s]     [INFO] hadoop-mapreduce-client ............................ SUCCESS [  0.274 s]     [INFO] hadoop-mapreduce-client-core ....................... SUCCESS [ 37.510 s]     [INFO] hadoop-mapreduce-client-common ..................... SUCCESS [ 26.697 s]     [INFO] hadoop-mapreduce-client-shuffle .................... SUCCESS [  8.445 s]     [INFO] hadoop-mapreduce-client-app ........................ SUCCESS [ 16.746 s]     [INFO] hadoop-mapreduce-client-hs ......................... SUCCESS [ 14.682 s]     [INFO] hadoop-mapreduce-client-jobclient .................. SUCCESS [02:22 min]     [INFO] hadoop-mapreduce-client-hs-plugins ................. SUCCESS [  4.852 s]     [INFO] Apache Hadoop MapReduce Examples ................... SUCCESS [  9.098 s]     [INFO] hadoop-mapreduce ................................... SUCCESS [  8.473 s]     [INFO] Apache Hadoop MapReduce Streaming .................. SUCCESS [ 44.417 s]     [INFO] Apache Hadoop Distributed Copy ..................... SUCCESS [ 56.786 s]     [INFO] Apache Hadoop Archives ............................. SUCCESS [  5.743 s]     [INFO] Apache Hadoop Rumen ................................ SUCCESS [  9.956 s]     [INFO] Apache Hadoop Gridmix .............................. SUCCESS [  8.721 s]     [INFO] Apache Hadoop Data Join ............................ SUCCESS [  6.602 s]     [INFO] Apache Hadoop Ant Tasks ............................ SUCCESS [  4.001 s]     [INFO] Apache Hadoop Extras ............................... SUCCESS [  6.550 s]     [INFO] Apache Hadoop Pipes ................................ SUCCESS [ 16.753 s]     [INFO] Apache Hadoop OpenStack support .................... SUCCESS [  9.256 s]     [INFO] Apache Hadoop Amazon Web Services support .......... SUCCESS [06:50 min]     [INFO] Apache Hadoop Client ............................... SUCCESS [ 15.419 s]     [INFO] Apache Hadoop Mini-Cluster ......................... SUCCESS [  0.856 s]     [INFO] Apache Hadoop Scheduler Load Simulator ............. SUCCESS [ 10.593 s]     [INFO] Apache Hadoop Tools Dist ........................... SUCCESS [ 22.169 s]     [INFO] Apache Hadoop Tools ................................ SUCCESS [  0.118 s]     [INFO] Apache Hadoop Distribution ......................... SUCCESS [01:20 min]     [INFO] ------------------------------------------------------------------------     [INFO] BUILD SUCCESS     [INFO] ------------------------------------------------------------------------     [INFO] Total time: 03:01 h     [INFO] Finished at: 2016-07-23T11:25:52+08:00     [INFO] Final Memory: 90M/237M     [INFO] ------------------------------------------------------------------------           

在hadoop-dist/target/ 已经生成了可执行文件,再次执行之前的配置hadoop的方法即可

复制之前的配置的时候用nano编辑

[hadoop@hadoop0 hadoop-2.6.4]$ sbin/start-all.sh      This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh     Starting namenodes on [hadoop0]     hadoop0: starting namenode, logging to /home/hadoop/hadoop-2.6.4/logs/hadoop-hadoop-namenode-hadoop0.out     hadoop2: starting datanode, logging to /home/hadoop/hadoop-2.6.4/logs/hadoop-hadoop-datanode-hadoop2.out     hadoop1: starting datanode, logging to /home/hadoop/hadoop-2.6.4/logs/hadoop-hadoop-datanode-hadoop1.out     hadoop3: starting datanode, logging to /home/hadoop/hadoop-2.6.4/logs/hadoop-hadoop-datanode-hadoop3.out     starting yarn daemons     starting resourcemanager, logging to /home/hadoop/hadoop-2.6.4/logs/yarn-hadoop-resourcemanager-hadoop0.out     hadoop3: starting nodemanager, logging to /home/hadoop/hadoop-2.6.4/logs/yarn-hadoop-nodemanager-hadoop3.out     hadoop2: starting nodemanager, logging to /home/hadoop/hadoop-2.6.4/logs/yarn-hadoop-nodemanager-hadoop2.out     hadoop1: starting nodemanager, logging to /home/hadoop/hadoop-2.6.4/logs/yarn-hadoop-nodemanager-hadoop1.out           

下载hadoop-eclipse-plugin-2.6.4.jar

放入hadoop/plugins/

解压缩hadoop-2.6.4.tar.gz到D:\develop\hadoop-2.6.4,在eclipse的Windows->Preferences的Hadoop Map/Reduce中设置安装目录。

打开Windows->Open Perspective中的Map/Reduce,在此perspective下进行hadoop程序开发。

打开Windows->Show View中的Map/Reduce Locations,选择New Hadoop location…新建hadoop连接。

Map/Reduce(V2)Master和DFS Master的Host、Port分别为:

mapred-site.xml和core-site.xml中配置的地址及端口

配置Advanced parameters

设置参数hadoop.tmp.dir为core-site.xml中的配置

An internal error occurred during: “Map/Reduce location status updater”.

java.lang.NullPointerException

“NullPointerException” is just Because you don’t have any directory in your hdfs.

Try that to build a directory:

hdfs dfs -mkdir -p input

测试:

hadoop fs -put ./myTest*.txt input