天天看点

Hadoop在Linux下伪分布式的安装 wordcount实例的运行

作为一个hadoop新手来说,一上来就看《hadoop权威指南》是非常乏力的,所以我根据网上资料在自己有的一台机器上搭建了一下hadoop伪分布式,并且实例化了wordcount,感觉这位仁兄的总结还是非常到位,相比于其他人更精确。好下面就是文章的内容:

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

例子中所有命令都在/home/wangxing/hadoop-0.20.2下执行

1.安装配置java1.6(不累述)。配置完毕后,在命令行中输入java-version,如出现下列信息说明java环境安装成功。

java version "1.6.0_20"

Java(TM) SE Runtime Environment (build 1.6.0_20-b02)

Java HotSpot(TM) Server VM (build 16.3-b01, mixed mode)

2.下载hadoop-0.20.2.tar.gz,放在用户根目录下,例如:/home/wangxing/hadoop-0.20.2:

下载地址:http://mirror.bjtu.edu.cn/apache/hadoop/common/hadoop-0.20.2/

解压:tar –zvxfhadoop-0.20.2.tar.gz

3.配置hadoop,hadoop 的主要配置都在hadoop-0.20.2/conf 下。

(1)在conf/hadoop-env.sh中配置Java 环境以及HADOOP_HOME、PATH,例如

export JAVA_HOME=/usr/local/jre1.6.0_24

export HADOOP_HOME=/home/wangxing/hadoop-0.20.2

export PATH=$PATH:/home/wangxing/hadoop-0.20.2/bin

(2)配置conf/core-site.xml、conf/hdfs-site.xml、conf/mapred-site.xml

[代码] core-site.xml

01

<?

xml

version

=

"1.0"

?>

02

<?

xml-stylesheet

type

=

"text/xsl"

href

=

"configuration.xsl"

?>

03

04

<!-- Put site-specific property overrides in this file. -->

05

06

<

configuration

>

07

<

property

>

08

<

name

>fs.default.name</

name

>

09

<

value

>hdfs://localhost:9000/</

value

>

10

</

property

>

11

12

<

property

>

13

<

name

>hadoop.tmp.dir</

name

>

14

<

value

>/home/wangxing/hadoop-0.20.2/tmpdir</

value

>

15

</

property

>

16

</

configuration

>

[代码] hdfs-site.xml

01

<?

xml

version

=

"1.0"

?>

02

<?

xml-stylesheet

type

=

"text/xsl"

href

=

"configuration.xsl"

?>

03

04

<!-- Put site-specific property overrides in this file. -->

05

06

<

configuration

>

07

<

property

>

08

<

name

>dfs.replication</

name

>

09

<

value

>1</

value

>

10

</

property

>

11

12

<

property

13

<

name

>dfs.name.dir</

name

>

14

<

value

>/home/wangxing/hadoop-0.20.2/tmpdir/hdfs/name</

value

15

</

property

16

17

<

property

18

<

name

>dfs.data.dir</

name

19

<

value

>/home/wangxing/hadoop-0.20.2/tmpdir/hdfs/data</

value

20

</

property

>

21

</

configuration

>

[代码] mapred-site.xml

view source print ?

01

<?

xml

version

=

"1.0"

?>

02

<?

xml-stylesheet

type

=

"text/xsl"

href

=

"configuration.xsl"

?>

03

04

<!-- Put site-specific property overrides in this file. -->

05

06

<

configuration

>

07

<

property

>

08

<

name

>mapred.job.tracker</

name

>

09

<

value

>localhost:9001</

value

>

10

</

property

>

11

12

<

property

13

<

name

>mapred.local.dir</

name

14

<

value

>/home/wangxing/hadoop-0.20.2/tmpdir/mapred/local</

value

15

</

property

16

17

<

property

18

<

name

>mapred.system.dir</

name

19

<

value

>/home/wangxing/hadoop-0.20.2/tmpdir/mapred/system</

value

20

</

property

>

21

22

</

configuration

>

4.格式化namenode、datanode:bin/hadoop namenode -format、bin/hadoop datanode -format

5.启动hadoop所有进程:bin/start-all.sh,我这边没有进行SSH无密码验证配置,所有namenode、datanode等的启动都需要输入用户登录linux的密码

6.查看hadoop进程启动情况:jps。正常情况下应该有NameNode、SecondaryNameNode、DataNode、JobTracker、TaskTracker

Hadoop在Linux下伪分布式的安装 wordcount实例的运行

ps:我第一次datanode没有启动是因为我的java_home指错了;后来namenode没法启动,删了/home/wangxing/hadoop-0.20.2/tmpdir/hdfs/data/current/VERSION文件就行了,前面那部分目录就是hdfs-site.xml配置文件中dfs.data.dir指向的目录

7.查看集群状态:bin/hadoop dfsadmin  -report

Hadoop在Linux下伪分布式的安装 wordcount实例的运行

8.在/home/wangxing/hadoop-0.20.2创建目录test,在test下创建文本file01、file02,分别输入数个单词

9.在hdfs分布式文件系统创建目录input:bin/hadoop fs -mkdir input;之后可以使用bin/hadoop fs -ls查看

ps:删除目录:bin/hadoop fs -rmr ***;删除文件:bin/hadoop fs -rm ***

10.离开hodoop的安全模式:bin/hadoopdfsadmin -safemode leave

11.将文本文件放入hdfs分布式文件系统中:bin/hadoopfs -put /home/wangxing/hadoop-0.20.2/test/* input

Hadoop在Linux下伪分布式的安装 wordcount实例的运行

12.执行例子中的WordCount:bin/hadoop jarhadoop-0.20.2-examples.jar wordcount input output

13.查看执行结果:bin/hadoop dfs -cat output/*

Hadoop在Linux下伪分布式的安装 wordcount实例的运行

14.关闭hadoop所有进程:bin/stop-all.sh