天天看點

Hadoop2.6.0叢集搭建

先申明本人的安裝環境:CentOS6.7,Hadoop2.6,jdk 1.7

Hadoop2相比較于Hadoop1.x來說,HDFS的架構與MapReduce的都有較大的變化,且速度上和可用性上都有了很大的提高,

Hadoop新版2.6.0采用的了新的 map-reduce 架構(Yarn) 原理,結構較原來都有所改變,是以安裝、配置也都發生了改變。

原結構中:叢集節點主要為

master namenode tasktracker

slave1 datanode jobtracker

slave2

新結構中:

ResourceManager和MR JobHistory Server

NameNode

SecondaryNameNode

datanode1 NodeManager

datanode2 NodeManager

安裝Hadoop結構

192.168.199.241 ResourceManager

192.168.199.231 NameNode

192.168.199.232 DataNode1

192.168.199.242 DataNode2

通路連結:http://192.168.199.231:50070/

ResourceManager:

1、建立使用者

useradd hadoop

2.安裝jdk1.7

設定JAVA環境 變量

cd ~

vi .bashrc

export JAVA_HOME=/user/ java/jdk1.7.0_80

export JAVA_BIN=$JAVA_HOME/bin

export PATH=$PATH:$JAVA_BIN

export CLASSPATH=.:/lib/dt.jar:/lib/tools.jar

source .bashrc

3.安裝SSH,無密碼登入

ssh-keygen  -t rsa

cat /root/.ssh/id_rsa.pub>>/root/.ssh/authorized_keys

ssh localhost //測試是否可正常登入

注意這裡  如果希望無密碼登陸可以把DataNode的機器授權給Manager和NameNode

4、設定/etc/hosts和/etc/hostname

vi /etc/sysconfig/network

NETWORKING=yes

HOSTNAME=ResourceManager

vi /etc/hosts

127.0.0.1   localhost

192.168.199.241 ResourceManager

192.168.199.231 NameNode

192.168.199.232 DataNode1

192.168.199.242 DataNode2

Hadoop安裝及配置

各節點操作

一、下載下傳hadoop 2.6.0

wget  http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz

解壓放至/usr/local/hadoop :要給hadoop檔案夾授權

useradd hadoop

cd /usr/local/hadoop/

chown -R hadoop.hadoop /usr/local/hadoop

添加環境變量

export JAVA_HOME=/usr/java/jdk1.7.0_80

export JAVA_BIN=$JAVA_HOME/bin

export PATH=$PATH:$JAVA_BIN

export CLASSPATH=.:/lib/dt.jar:/lib/tools.jar

export HADOOP_HOME=/usr/local/hadoop

export PATH=$PATH:$HADOOP_HOME/bin

export PATH=$PATH:$HADOOP_HOME/sbin

二、配置hadoop

1、修改環境變量

修改conf/hadoop-env.shand conf/yarn-env.sh

至少要指定 JAVA_HMOE

export JAVA_HOME=/usr/java/jdk1.7.0_80

更改lib目錄

export HADOOP_OPTS=”$HADOOP_OPTS -Djava.library.path=/usr/local/hadoop/lib/”

如果後期找不到路徑等,還需要指定 HADOOP_PID_DIR和 HADOOP_SECURE_DN_PID_DIR

可能需要更改項:

日志存儲目錄:HADOOP_LOG_DIR / YARN_LOG_DIR

修改最大HEAPSIZE(MB),預設為1000M:HADOOP_HEAPSIZE / YARN_HEAPSIZE

2、配置hadoop

core-site.xml

<configuration>

        <property>

               <name>fs.defaultFS</name>

               <value>hdfs://NameNode:9000</value>   //NameNode為叢集名稱

        </property>

        <property>

               <name>io.file.buffer.size</name>

               <value>131072</value>

        </property>

</configuration>

hdfs-site.xml

<configuration>

        <property>

               <name>dfs.namenode.name.dir</name>

               <value>file:///usr/local/hadoop_data/dfs/name</value>           //NameNode節點目錄 ,需要提前建立,并可寫權限,如果目錄不存在,會忽略此配置

        </property>

         <property>

               <name>dfs.datanode.data.dir</name>

               <value>file:///usr/local/hadoop_data/dfs/data</value>            //DataNode節點目錄  ,需要提前建立,并可寫權限,如果目錄不存在,會忽略此配置

        </property>

</configuration>

vi yarn-site.xml

<configuration>

<!– Site specific YARN configuration properties –>

        <property>

               <name>yarn.nodemanager.aux-services</name>

               <value>mapreduce_shuffle</value>

        </property>

</configuration>

vi mapred-site.xml

<configuration>

        <property>

               <name>mapreduce.framework.name</name>

               <value>yarn</value>                                     //mapreduce的frmawork指定為yarn

        </property>

</configuration>

預設配置檔案目錄為$HADOOP_HOME/etc/hadoop下,要修改配置檔案目錄

可更改hadoop-env.sh

vi hadoop-env.sh

export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-“/conf”}

修改slave資訊

vi slaves

DataNode1

DataNode2

檢測hadoop是否安裝配置好:hadoop version

啟動Hadoop

1、 NameNode 節點格式化hdfs

hdfs namenode -format NameNode  //NameNode為叢集名稱

注:如果非第一次格式化HDFS檔案系統,

則需要在進行格式化操作前分别将NameNode和各個DataNode節點的dfs.namenode.name.dir目錄下的所有内容清空。預設會提示是否重寫

2、啟動hdfs

登陸ResourceManger執行start-yarn.sh指令啟動叢集資源管理系統yarn

[[email protected] sbin]# start-yarn.sh

starting yarn daemons

resourcemanager running as process 16431. Stop it first.

DataNode2: starting nodemanager, logging to/usr/local/hadoop/logs/yarn-root-nodemanager-DataNode2.out

DataNode1: starting nodemanager, logging to/usr/local/hadoop/logs/yarn-root-nodemanager-DataNode1.out

登陸NameNode執行start-dfs.sh指令啟動叢集HDFS檔案系統

[[email protected] hadoop]# start-dfs.sh

15/02/06 15:19:17 WARN util.NativeCodeLoader: Unable toload native-hadoop library for your platform… using builtin-java classes whereapplicable

Starting namenodes on [NameNode]

NameNode: starting namenode, logging to/usr/local/hadoop/logs/hadoop-root-namenode-NameNode.out

DataNode2: starting datanode, logging to/usr/local/hadoop/logs/hadoop-root-datanode-DataNode2.out

DataNode1: starting datanode, logging to/usr/local/hadoop/logs/hadoop-root-datanode-DataNode1.out

Starting secondary namenodes [0.0.0.0]

The authenticity of host ‘0.0.0.0 (0.0.0.0)’ can’t beestablished.

RSA key fingerprint is10:cd:13:8b:04:c0:51:c2:54:cc:3d:3e:17:5d:0c:17.

Are you sure you want to continue connecting (yes/no)?yes

0.0.0.0: Warning: Permanently added ‘0.0.0.0’ (RSA) tothe list of known hosts.

0.0.0.0: starting secondarynamenode, logging to/usr/local/hadoop/logs/hadoop-root-secondarynamenode-NameNode.out

15/02/06 15:19:39 WARN util.NativeCodeLoader: Unable toload native-hadoop library for your platform… using builtin-java classes whereapplicable

Hadoop基本操作指令:

hadoop fs -ls /    //檢視HDFS根目錄下檔案

hadoop fs -ls /testdir  //檢視HDFS 目錄/testdir下檔案

hadoop fs -cat /testdir/test1.txt   //檢視test1.txt内容

hadoop fs -put test2.txt /testdir   //将本地目錄下檔案test2.txt上傳至/testdir目錄中

mapred job -list all  //列出所有job程序,2.6以前版本中檢視方法為hadoop job -list all

繼續閱讀