作为一个hadoop新手来说,一上来就看《hadoop权威指南》是非常乏力的,所以我根据网上资料在自己有的一台机器上搭建了一下hadoop伪分布式,并且实例化了wordcount,感觉这位仁兄的总结还是非常到位,相比于其他人更精确。好下面就是文章的内容:
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
例子中所有命令都在/home/wangxing/hadoop-0.20.2下执行
1.安装配置java1.6(不累述)。配置完毕后,在命令行中输入java-version,如出现下列信息说明java环境安装成功。
java version "1.6.0_20"
Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
Java HotSpot(TM) Server VM (build 16.3-b01, mixed mode)
2.下载hadoop-0.20.2.tar.gz,放在用户根目录下,例如:/home/wangxing/hadoop-0.20.2:
下载地址:http://mirror.bjtu.edu.cn/apache/hadoop/common/hadoop-0.20.2/
解压:tar –zvxfhadoop-0.20.2.tar.gz
3.配置hadoop,hadoop 的主要配置都在hadoop-0.20.2/conf 下。
(1)在conf/hadoop-env.sh中配置Java 环境以及HADOOP_HOME、PATH,例如
export JAVA_HOME=/usr/local/jre1.6.0_24
export HADOOP_HOME=/home/wangxing/hadoop-0.20.2
export PATH=$PATH:/home/wangxing/hadoop-0.20.2/bin
(2)配置conf/core-site.xml、conf/hdfs-site.xml、conf/mapred-site.xml
[代码] core-site.xml
01 | <? xml version = "1.0" ?> |
02 | <? xml-stylesheet type = "text/xsl" href = "configuration.xsl" ?> |
04 | <!-- Put site-specific property overrides in this file. --> |
08 | < name >fs.default.name</ name > |
09 | < value >hdfs://localhost:9000/</ value > |
13 | < name >hadoop.tmp.dir</ name > |
14 | < value >/home/wangxing/hadoop-0.20.2/tmpdir</ value > |
[代码] hdfs-site.xml
01 | <? xml version = "1.0" ?> |
02 | <? xml-stylesheet type = "text/xsl" href = "configuration.xsl" ?> |
04 | <!-- Put site-specific property overrides in this file. --> |
08 | < name >dfs.replication</ name > |
13 | < name >dfs.name.dir</ name > |
14 | < value >/home/wangxing/hadoop-0.20.2/tmpdir/hdfs/name</ value > |
18 | < name >dfs.data.dir</ name > |
19 | < value >/home/wangxing/hadoop-0.20.2/tmpdir/hdfs/data</ value > |
[代码] mapred-site.xml
view source print ?
01 | <? xml version = "1.0" ?> |
02 | <? xml-stylesheet type = "text/xsl" href = "configuration.xsl" ?> |
04 | <!-- Put site-specific property overrides in this file. --> |
08 | < name >mapred.job.tracker</ name > |
09 | < value >localhost:9001</ value > |
13 | < name >mapred.local.dir</ name > |
14 | < value >/home/wangxing/hadoop-0.20.2/tmpdir/mapred/local</ value > |
18 | < name >mapred.system.dir</ name > |
19 | < value >/home/wangxing/hadoop-0.20.2/tmpdir/mapred/system</ value > |
4.格式化namenode、datanode:bin/hadoop namenode -format、bin/hadoop datanode -format
5.启动hadoop所有进程:bin/start-all.sh,我这边没有进行SSH无密码验证配置,所有namenode、datanode等的启动都需要输入用户登录linux的密码
6.查看hadoop进程启动情况:jps。正常情况下应该有NameNode、SecondaryNameNode、DataNode、JobTracker、TaskTracker
Hadoop在Linux下伪分布式的安装 wordcount实例的运行 ps:我第一次datanode没有启动是因为我的java_home指错了;后来namenode没法启动,删了/home/wangxing/hadoop-0.20.2/tmpdir/hdfs/data/current/VERSION文件就行了,前面那部分目录就是hdfs-site.xml配置文件中dfs.data.dir指向的目录
7.查看集群状态:bin/hadoop dfsadmin -report
Hadoop在Linux下伪分布式的安装 wordcount实例的运行 8.在/home/wangxing/hadoop-0.20.2创建目录test,在test下创建文本file01、file02,分别输入数个单词
9.在hdfs分布式文件系统创建目录input:bin/hadoop fs -mkdir input;之后可以使用bin/hadoop fs -ls查看
ps:删除目录:bin/hadoop fs -rmr ***;删除文件:bin/hadoop fs -rm ***
10.离开hodoop的安全模式:bin/hadoopdfsadmin -safemode leave
11.将文本文件放入hdfs分布式文件系统中:bin/hadoopfs -put /home/wangxing/hadoop-0.20.2/test/* input
Hadoop在Linux下伪分布式的安装 wordcount实例的运行 12.执行例子中的WordCount:bin/hadoop jarhadoop-0.20.2-examples.jar wordcount input output
13.查看执行结果:bin/hadoop dfs -cat output/*
Hadoop在Linux下伪分布式的安装 wordcount实例的运行 14.关闭hadoop所有进程:bin/stop-all.sh