先申明本人的安裝環境:CentOS6.7,Hadoop2.6,jdk 1.7
Hadoop2相比較于Hadoop1.x來說,HDFS的架構與MapReduce的都有較大的變化,且速度上和可用性上都有了很大的提高,
Hadoop新版2.6.0采用的了新的 map-reduce 架構(Yarn) 原理,結構較原來都有所改變,是以安裝、配置也都發生了改變。
原結構中:叢集節點主要為
master namenode tasktracker
slave1 datanode jobtracker
slave2
新結構中:
ResourceManager和MR JobHistory Server
NameNode
SecondaryNameNode
datanode1 NodeManager
datanode2 NodeManager
安裝Hadoop結構
192.168.199.241 ResourceManager
192.168.199.231 NameNode
192.168.199.232 DataNode1
192.168.199.242 DataNode2
通路連結:http://192.168.199.231:50070/
ResourceManager:
1、建立使用者
useradd hadoop
2.安裝jdk1.7
設定JAVA環境 變量
cd ~
vi .bashrc
export JAVA_HOME=/user/ java/jdk1.7.0_80
export JAVA_BIN=$JAVA_HOME/bin
export PATH=$PATH:$JAVA_BIN
export CLASSPATH=.:/lib/dt.jar:/lib/tools.jar
source .bashrc
3.安裝SSH,無密碼登入
ssh-keygen -t rsa
cat /root/.ssh/id_rsa.pub>>/root/.ssh/authorized_keys
ssh localhost //測試是否可正常登入
注意這裡 如果希望無密碼登陸可以把DataNode的機器授權給Manager和NameNode
4、設定/etc/hosts和/etc/hostname
vi /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=ResourceManager
vi /etc/hosts
127.0.0.1 localhost
192.168.199.241 ResourceManager
192.168.199.231 NameNode
192.168.199.232 DataNode1
192.168.199.242 DataNode2
Hadoop安裝及配置
各節點操作
一、下載下傳hadoop 2.6.0
wget http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz
解壓放至/usr/local/hadoop :要給hadoop檔案夾授權
useradd hadoop
cd /usr/local/hadoop/
chown -R hadoop.hadoop /usr/local/hadoop
添加環境變量
export JAVA_HOME=/usr/java/jdk1.7.0_80
export JAVA_BIN=$JAVA_HOME/bin
export PATH=$PATH:$JAVA_BIN
export CLASSPATH=.:/lib/dt.jar:/lib/tools.jar
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
二、配置hadoop
1、修改環境變量
修改conf/hadoop-env.shand conf/yarn-env.sh
至少要指定 JAVA_HMOE
export JAVA_HOME=/usr/java/jdk1.7.0_80
更改lib目錄
export HADOOP_OPTS=”$HADOOP_OPTS -Djava.library.path=/usr/local/hadoop/lib/”
如果後期找不到路徑等,還需要指定 HADOOP_PID_DIR和 HADOOP_SECURE_DN_PID_DIR
可能需要更改項:
日志存儲目錄:HADOOP_LOG_DIR / YARN_LOG_DIR
修改最大HEAPSIZE(MB),預設為1000M:HADOOP_HEAPSIZE / YARN_HEAPSIZE
2、配置hadoop
core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://NameNode:9000</value> //NameNode為叢集名稱
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///usr/local/hadoop_data/dfs/name</value> //NameNode節點目錄 ,需要提前建立,并可寫權限,如果目錄不存在,會忽略此配置
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///usr/local/hadoop_data/dfs/data</value> //DataNode節點目錄 ,需要提前建立,并可寫權限,如果目錄不存在,會忽略此配置
</property>
</configuration>
vi yarn-site.xml
<configuration>
<!– Site specific YARN configuration properties –>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
vi mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value> //mapreduce的frmawork指定為yarn
</property>
</configuration>
預設配置檔案目錄為$HADOOP_HOME/etc/hadoop下,要修改配置檔案目錄
可更改hadoop-env.sh
vi hadoop-env.sh
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-“/conf”}
修改slave資訊
vi slaves
DataNode1
DataNode2
檢測hadoop是否安裝配置好:hadoop version
啟動Hadoop
1、 NameNode 節點格式化hdfs
hdfs namenode -format NameNode //NameNode為叢集名稱
注:如果非第一次格式化HDFS檔案系統,
則需要在進行格式化操作前分别将NameNode和各個DataNode節點的dfs.namenode.name.dir目錄下的所有内容清空。預設會提示是否重寫
2、啟動hdfs
登陸ResourceManger執行start-yarn.sh指令啟動叢集資源管理系統yarn
[[email protected] sbin]# start-yarn.sh
starting yarn daemons
resourcemanager running as process 16431. Stop it first.
DataNode2: starting nodemanager, logging to/usr/local/hadoop/logs/yarn-root-nodemanager-DataNode2.out
DataNode1: starting nodemanager, logging to/usr/local/hadoop/logs/yarn-root-nodemanager-DataNode1.out
登陸NameNode執行start-dfs.sh指令啟動叢集HDFS檔案系統
[[email protected] hadoop]# start-dfs.sh
15/02/06 15:19:17 WARN util.NativeCodeLoader: Unable toload native-hadoop library for your platform… using builtin-java classes whereapplicable
Starting namenodes on [NameNode]
NameNode: starting namenode, logging to/usr/local/hadoop/logs/hadoop-root-namenode-NameNode.out
DataNode2: starting datanode, logging to/usr/local/hadoop/logs/hadoop-root-datanode-DataNode2.out
DataNode1: starting datanode, logging to/usr/local/hadoop/logs/hadoop-root-datanode-DataNode1.out
Starting secondary namenodes [0.0.0.0]
The authenticity of host ‘0.0.0.0 (0.0.0.0)’ can’t beestablished.
RSA key fingerprint is10:cd:13:8b:04:c0:51:c2:54:cc:3d:3e:17:5d:0c:17.
Are you sure you want to continue connecting (yes/no)?yes
0.0.0.0: Warning: Permanently added ‘0.0.0.0’ (RSA) tothe list of known hosts.
0.0.0.0: starting secondarynamenode, logging to/usr/local/hadoop/logs/hadoop-root-secondarynamenode-NameNode.out
15/02/06 15:19:39 WARN util.NativeCodeLoader: Unable toload native-hadoop library for your platform… using builtin-java classes whereapplicable
Hadoop基本操作指令:
hadoop fs -ls / //檢視HDFS根目錄下檔案
hadoop fs -ls /testdir //檢視HDFS 目錄/testdir下檔案
hadoop fs -cat /testdir/test1.txt //檢視test1.txt内容
hadoop fs -put test2.txt /testdir //将本地目錄下檔案test2.txt上傳至/testdir目錄中
mapred job -list all //列出所有job程序,2.6以前版本中檢視方法為hadoop job -list all