Hadoop
- Hadoop组成(面试重点)
-
- HDFS框架概述
- Yarn框架概述
- MapReduce框架概述
- 配置虚拟机
- 安装JDK和Hadoop
-
- 所有安装包统统拷到software中
- 配置环境变量
- 完全分布式
-
- 分发脚本
- 集群配置
-
- 1.集群规划
- 代码
-
- 集群启动
- ssh免密登录
- 集群启动
-
- 修改hadoop中的从机配置
- 启动
- 上传小文件
- 配置历史服务器
- 集群时间同步
Hadoop组成(面试重点)
Hadoop1.x和Hadoop2.x的区别:
在Hadoop1.x的时代,Hadoop中的MapReduce同时处理业务逻辑运算和资源调度,耦合性较大,在Hadoop2.x的时代,增加了Yarn,Yarn只负责资源的调度,MapReduce只负责运算
HDFS框架概述
1.NameNode(nn):存储文件的元数据,如文件名,文件目录结构,文件属性(生成时间、副本数、文件权限),以及每个文件的块列表和块所在的DataNode (本质上就是一个索引)
2.DataNode(dn):本地文件系统存储文件块数据,以及块数据校验和。
3.Secondary NameNode(2nn):用来监控HDFS状态的辅助后台程序,每隔一段时间获取HDFS元数据的快照(当nn不运行时,2nn可以用来恢复,但是恢复的有缺陷)
Yarn框架概述
![](https://img.laitimes.com/img/9ZDMuAjOiMmIsIjOiQnIsIiclRnblN2XjlGcjAzNfRHLGZkRGZkRfJ3bs92YsYTMfVmepNHL4tmeOd3ZE1UMRpHW4Z0MMBjVtJWd0ckW65UbM5WOHJWa5kHT20ESjBjUIF2X0hXZ0xCMx81dvRWYoNHLrdEZwZ1Rh5WNXp1bwNjW1ZUba9VZwlHdssmch1mclRXY39CXldWYtlWPzNXZj9mcw1ycz9WL49zZuBnL5kzM0UTM0QTMxITMxAjMwIzLc52YucWbp5GZzNmLn9Gbi1yZtl2Lc9CX6MHc0RHaiojIsJye.png)
MapReduce框架概述
MapReduce分为两个阶段,Map和Reduce
Map阶段:并行处理输入数据
Reduce阶段:对Map结果进行汇总
配置虚拟机
1.关闭防火墙
[yyx@hadoop01 ~]$ systemctl stop firewalld.service
==== AUTHENTICATING FOR org.freedesktop.systemd1.manage-units ===
Authentication is required to manage system services or units.
Authenticating as: yyx
Password:
==== AUTHENTICATION COMPLETE ===
[yyx@hadoop01 ~]$ systemctl disable firewalld.service
==== AUTHENTICATING FOR org.freedesktop.systemd1.manage-unit-files ===
Authentication is required to manage system service or unit files.
Authenticating as: yyx
Password:
==== AUTHENTICATION COMPLETE ===
Removed symlink /etc/systemd/system/multi-user.target.wants/firewalld.service.
Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
==== AUTHENTICATING FOR org.freedesktop.systemd1.reload-daemon ===
Authentication is required to reload the systemd state.
Authenticating as: yyx
Password:
==== AUTHENTICATION COMPLETE ===
2.创建一个一般用户,并且修改密码
991120y
3.在/opt目录创建software module文件夹,并更改所有权
drwxr-xr-x. 12 root root 4096 10月 3 13:31 module
drwxr-xr-x. 2 root root 6 10月 31 2018 rh
drwxrwxrwx. 2 root root 215 9月 28 20:43 software
-rw-r--r--. 1 root root 0 10月 9 18:32 text.txt
[root@hadoop01 opt]# chown yyx:yyx /opt/software /opt/module
[root@hadoop01 opt]# ll
总用量 4
drwxr-xr-x. 12 yyx yyx 4096 10月 3 13:31 module
drwxr-xr-x. 2 root root 6 10月 31 2018 rh
drwxrwxrwx. 2 yyx yyx 215 9月 28 20:43 software
-rw-r--r--. 1 root root 0 10月 9 18:32 text.txt
[root@hadoop01 opt]#
4.将用户添加到sudosers中
[[email protected] opt]# vim /etc/sudoers
## Allow root to run any commands anywhere
root ALL=(ALL) ALL
yyx ALL=(ALL) NOPASSWD:ALL
:wq!
5.修改Hosts文件
vim /etc/hosts
[root@hadoop01 opt]# vim /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.111.111 hadoop01 www.hadoop01.com
192.168.111.112 hadoop02 www.hadoop02.com
192.168.111.113 hadoop03 www.hadoop03.com
6.修改静态IP
[ro[email protected] opt]# vim /etc/sysconfig/network-scripts/ifcfg-ens33
7.修改主机名
hostnamectl set-hostname master(在master执行)
立即生效:bash
vim /etc/sysconfig/network
一定要和Host文件中一样
克隆两台虚拟机后,6、7每个都再做一次
安装JDK和Hadoop
所有安装包统统拷到software中
(先输入java -version,如果之前有,卸载掉,卸载命令
rpm -qa | grep java |xargs sudo rpm -e --nodeps
)
解压到module中
tar -zxvf jdk-8u144-linux-x64.tar.gz -C /opt/module
tar -zxvf hadoop-2.7.2.tar.gz -C /opt/module/
[yyx@hadoop01 software]$ cd /opt/module/
[yyx@hadoop01 module]$ ll
总用量 0
drwxr-xr-x. 9 yyx yyx 149 5月 22 2017 hadoop-2.7.2
drwxr-xr-x. 8 yyx yyx 255 7月 22 2017 jdk1.8.0_144
配置环境变量
## JAVA_HOME
export JAVA_HOME=/opt/module/jdk1.8.0_144
export PATH=$PATH:$JAVA_HOME/bin
## HADOOP_HOME
export HADOOP_HOME=/opt/module/hadoop-2.7.2
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
之后
source /etc/profile
检验:
[yyx@hadoop01 module]$ java -version
java version "1.8.0_144"
Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)
[yyx@hadoop01 module]$ hadoop version
Hadoop 2.7.2
Subversion Unknown -r Unknown
Compiled by root on 2017-05-22T10:49Z
Compiled with protoc 2.5.0
From source with checksum d0fda26633fa762bff87ec759ebe689c
This command was run using /opt/module/hadoop-2.7.2/share/hadoop/common/hadoop-common-2.7.2.jar
完全分布式
分发脚本
远程安全拷贝:scp
远程将Hadoop01的/opt/module拷贝到Hadoop02相同位置
scp -r hadoop01:/opt/module/hadoop-2.7.2 hadoop02:/opt/module
scp -r hadoop01:/opt/module/jdk1.8.0_144 hadoop02:/opt/module
分发脚本:
[yyx@hadoop01 ~]$ vim xsync
#!/bin/bash
#1.获取输入参数个数,如果没有参数,直接退出
pcount=$#
if ((pcount==0)); then
echo no args;
exit;
fi
#2 获取文件名称
p1=$1
fname=`basename $p1`
echo fname=$fname
#3 获取上级目录到绝对路径
pdir=`cd -P $(dirname $p1); pwd`
echo pdir=$pdir
#4 获取当前用户名称
user=`whoami`
#5 循环
for((host=02; host<04; host++)); do
echo ------------------- hadoop0$host --------------
rsync -av $pdir/$fname $user@hadoop0$host:$pdir
done
~
将/etc/profile发送给两台虚拟机,之后source一下,输入java -version,有结果,成功。
集群配置
1.集群规划
代码
1.进入hadoop中etc下hadoop目录,并修改配置文件core-site.xml
[yyx@hadoop01 hadoop]$ vim core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<!-- 指定HDFS中NameNode的地址 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop01:9000</value>
</property>
<!-- 指定Hadoop运行时产生文件的存储目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/module/hadoop-2.7.2/data/tmp</value>
</property>
</configuration>
~
2.HDFS配置文件
配置hadoop-env.sh
[yyx@hadoop02 hadoop]$ vim hadoop-env.sh
export JAVA_HOME=/opt/module/jdk1.8.0_144
配置hdfs-site.xml
[yyx@hadoop01 hadoop]$ vim hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<!-- 指定Hadoop辅助名称节点主机配置 -->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop03:50090</value>
</property>
</configuration>
配置yarn-env.sh
[yyx@hadoop01 hadoop]$ vim yarn-env.sh
export JAVA_HOME=/opt/module/jdk1.8.0_144
配置yarn-site.xml
[yyx@hadoop01 hadoop]$ vim yarn-site.xml
<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<!-- Reducer获取数据的方式 -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!-- 指定YARN的ResourceManager的地址 -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop02</value>
</property>
</configuration>
配置mapred-env.sh
export JAVA_HOME=/opt/module/jdk1.8.0_144
配置mapred-site.xml
[yyx@hadoop01 hadoop]$ cp mapred-site.xml.template mapred-site.xml
[yyx@hadoop01 hadoop]$ vim mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<!-- 指定MR运行在Yarn上 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<configuration>
</configuration>
脚本分发
最后在hadoop02上检验
[yyx@hadoop02 hadoop]$ vim /opt/module/hadoop-2.7.2/etc/hadoop/core-site.xml
成功
集群启动
第一次启动集群,要先执行命令
[yyx@hadoop01 hadoop]$ hdfs namenode -format
20/11/24 16:51:13 INFO common.Storage: Storage directory /opt/module/hadoop-2.7.2/data/tmp/dfs/name has been successfully formatted.
表示格式化成功
启动集群
[yyx@hadoop01 hadoop]$ hadoop-daemon.sh start namenode
[yyx@hadoop01 hadoop]$ hadoop-daemon.sh start datanode
同时,对hadoop02
[yyx@hadoop02 hadoop]$ hadoop-daemon.sh start datanode
对hadoop03
[yyx@hadoop03 ~]$ hadoop-daemon.sh start datanode
[yyx@hadoop03 ~]$ hadoop-daemon.sh start secondarynamenode
jps
[yyx@hadoop03 ~]$ jps
7843 SecondaryNameNode
7718 DataNode
7884 Jps
[yyx@hadoop02 hadoop]$ jps
7977 Jps
7903 DataNode
[yyx@hadoop01 hadoop]$ jps
8194 Jps
8042 NameNode
8122 DataNode
进入以下页面表示成功
关闭节点方法
hadoop-daemon.sh stop namenode
ssh免密登录
生成公钥私钥
[yyx@hadoop01 .ssh]$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/yyx/.ssh/id_rsa):
/home/yyx/.ssh/id_rsa already exists.
Overwrite (y/n)? y
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/yyx/.ssh/id_rsa.
Your public key has been saved in /home/yyx/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:RBfLxQTAln99Wo4XXe9qe7BJU9eftQi1Iej3YAdT2+8 yyx@hadoop01
The key's randomart image is:
+---[RSA 2048]----+
| .oo==+. |
| .++ *.oo .|
| .o.o =oo.=|
| . ..=.o..@|
| S o.= .OO|
| o*+=|
| . BE|
| = .|
| ..o |
+----[SHA256]-----+
[yyx@hadoop01 .ssh]$
发送给另外两台虚拟机
[yyx@hadoop01 .ssh]$ ssh-copy-id hadoop02
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/yyx/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
yyx@hadoop02's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'hadoop02'"
and check to make sure that only the key(s) you wanted were added.
检验
[yyx@hadoop01 .ssh]$ ssh hadoop02
Last login: Tue Nov 24 15:38:21 2020 from 192.168.111.1
[yyx@hadoop02 ~]$ exit
登出
Connection to hadoop02 closed.
集群启动
修改hadoop中的从机配置
[yyx@hadoop01 .ssh]$ cd /opt/module/hadoop-2.7.2/etc/hadoop/
[yyx@hadoop01 hadoop]$ vim slaves
hadoop01
hadoop02
hadoop03
同步
[yyx@hadoop01 hadoop]$ xsync slaves
fname=slaves
pdir=/opt/module/hadoop-2.7.2/etc/hadoop
------------------- hadoop02 --------------
sending incremental file list
slaves
sent 128 bytes received 41 bytes 112.67 bytes/sec
total size is 27 speedup is 0.16
------------------- hadoop03 --------------
sending incremental file list
slaves
sent 128 bytes received 41 bytes 338.00 bytes/sec
total size is 27 speedup is 0.16
启动
[yyx@hadoop01 hadoop]$ start-dfs.sh
Starting namenodes on [hadoop01]
yyx@hadoop01's password:
hadoop01: starting namenode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-yyx-namenode-hadoop01.out
yyx@hadoop01's password: hadoop03: starting datanode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-yyx-datanode-hadoop03.out
hadoop02: starting datanode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-yyx-datanode-hadoop02.out
hadoop01: starting datanode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-yyx-datanode-hadoop01.out
Starting secondary namenodes [hadoop03]
hadoop03: starting secondarynamenode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-yyx-secondarynamenode-hadoop03.out
此时,三台虚拟机都被启动
hadoop02yarn要单独启动
[yyx@hadoop02 ~]$ start-yarn.sh
上传小文件
hdfs上创建文件夹
[yyx@hadoop01 ~]$ hdfs dfs -mkdir -p /user/yyx/input
检验
[yyx@hadoop01 ~]$ hdfs dfs -ls /user/yyx
Found 1 items
drwxr-xr-x - yyx supergroup 0 2020-11-28 10:32 /user/yyx/input
在~下创建文件夹
[yyx@hadoop01 ~]$ mkdir yyx
上传、检验
[yyx@hadoop01 /]$ hdfs dfs -put /home/yyx/yyx /user/yyx/input/test
[yyx@hadoop01 /]$ hdfs dfs -ls /user/yyx/input
Found 1 items
drwxr-xr-x - yyx supergroup 0 2020-11-28 11:00 /user/yyx/input/test
查看HDFS下某个文件内容
hdfs dfs -cat 文件
[yyx@hadoop01 yyx]$ hdfs dfs -ls /user/yyx/input/test
Found 1 items
-rw-r--r-- 3 yyx supergroup 19 2020-11-28 11:11 /user/yyx/input/test/HDFStest.txt
[yyx@hadoop01 yyx]$ hdfs dfs -cat /user/yyx/input/test/HDFStest.txt
hdfs TEST
命令
删除HDFS下文件
rmr表示递归删除
[yyx@hadoop01 yyx]$ hdfs dfs -rmr /user/yyx/input/test
rmr: DEPRECATED: Please use 'rm -r' instead.
20/11/28 11:16:35 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
Deleted /user/yyx/input/test
配置历史服务器
[yyx@hadoop01 hadoop]$ vim mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<!-- 指定MR运行在Yarn上 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<!-- 历史服务器端地址 -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop101:10020</value>
</property>
<!-- 历史服务器web端地址 -->
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop101:19888</value>
</property>
</configuration>
集群时间同步
就是让三台服务器时间一致
[root@hadoop01 hadoop-2.7.2]# vim /etc/ntp.conf
# For more information about this file, see the man pages
# ntp.conf(5), ntp_acc(5), ntp_auth(5), ntp_clock(5), ntp_misc(5), ntp_mon(5).
driftfile /var/lib/ntp/drift
# Permit time synchronization with our time source, but do not
# permit the source to query or modify the service on this system.
restrict default nomodify notrap nopeer noquery
# Permit all access over the loopback interface. This could
# be tightened as well, but to do so would effect some of
# the administrative functions.
restrict 127.0.0.1
restrict ::1
# Hosts on local network are less restricted.
restrict 192.168.1.0 mask 255.255.255.0 nomodify notrap
# Use public servers from the pool.ntp.org project.
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
#server 0.centos.pool.ntp.org iburst
#server 1.centos.pool.ntp.org iburst
#server 2.centos.pool.ntp.org iburst
#server 3.centos.pool.ntp.org iburst
server 127.127.1.0
fudge 127.127.1.0 stratum 10
[root@hadoop01 hadoop-2.7.2]# vim /etc/sysconfig/ntpd
# Command line options for ntpd
OPTIONS="-g"
SYNC_HWCLOCK=yes
~
重启服务
Redirecting to /bin/systemctl start ntpd.service
[root@hadoop01 hadoop-2.7.2]# service ntpd status
Redirecting to /bin/systemctl status ntpd.service
● ntpd.service - Network Time Service
Loaded: loaded (/usr/lib/systemd/system/ntpd.service; disabled; vendor preset: disabled)
Active: active (running) since 六 2020-11-28 15:15:44 CST; 11s ago
Process: 10995 ExecStart=/usr/sbin/ntpd -u ntp:ntp $OPTIONS (code=exited, status=0/SUCCESS)
Main PID: 10996 (ntpd)
CGroup: /system.slice/ntpd.service
└─10996 /usr/sbin/ntpd -u ntp:ntp -g
11月 28 15:15:44 hadoop01 ntpd[10996]: Listen normally on 2 lo 127.0.0.1 UDP 123
11月 28 15:15:44 hadoop01 ntpd[10996]: Listen normally on 3 ens33 192.168.111.111 UDP 123
11月 28 15:15:44 hadoop01 ntpd[10996]: Listen normally on 4 lo ::1 UDP 123
11月 28 15:15:44 hadoop01 ntpd[10996]: Listen normally on 5 ens33 fe80::8f28:2c18:fe12:b5c1 UDP 123
11月 28 15:15:44 hadoop01 ntpd[10996]: Listen normally on 6 ens33 fe80::2bef:4450:62f9:7666 UDP 123
11月 28 15:15:44 hadoop01 ntpd[10996]: Listening on routing socket on fd #23 for interface updates
11月 28 15:15:44 hadoop01 ntpd[10996]: 0.0.0.0 c016 06 restart
11月 28 15:15:44 hadoop01 ntpd[10996]: 0.0.0.0 c012 02 freq_set kernel 0.000 PPM
11月 28 15:15:44 hadoop01 ntpd[10996]: 0.0.0.0 c011 01 freq_not_set
11月 28 15:15:45 hadoop01 ntpd[10996]: 0.0.0.0 c514 04 freq_mode
设置开机启动
[root@hadoop01 hadoop-2.7.2]# chkconfig ntpd on
对其他机器进行配置(root)
[root@hadoop02 yyx]# crontab -e
*/10 * * * * /usr/sbin/ntpdate hadoop01