1.使用示例程序实现单词统计
(1)wordcount程序
wordcount程序在hadoop的share目录下,如下:
1
2
3
4
5
6
7
8
9
<code>[root@leaf mapreduce]</code><code># pwd</code>
<code>/usr/local/hadoop/share/hadoop/mapreduce</code>
<code>[root@leaf mapreduce]</code><code># ls</code>
<code>hadoop-mapreduce-client-app-2.6.5.jar hadoop-mapreduce-client-jobclient-2.6.5-tests.jar</code>
<code>hadoop-mapreduce-client-common-2.6.5.jar hadoop-mapreduce-client-shuffle-2.6.5.jar</code>
<code>hadoop-mapreduce-client-core-2.6.5.jar hadoop-mapreduce-examples-2.6.5.jar</code>
<code>hadoop-mapreduce-client-hs-2.6.5.jar lib</code>
<code>hadoop-mapreduce-client-hs-plugins-2.6.5.jar lib-examples</code>
<code>hadoop-mapreduce-client-jobclient-2.6.5.jar sources</code>
就是这个hadoop-mapreduce-examples-2.6.5.jar程序。
(2)创建HDFS数据目录
创建一个目录,用于保存MapReduce任务的输入文件:
<code>[root@leaf ~]</code><code># hadoop fs -mkdir -p /data/wordcount</code>
创建一个目录,用于保存MapReduce任务的输出文件:
<code>[root@leaf ~]</code><code># hadoop fs -mkdir /output</code>
查看刚刚创建的两个目录:
<code>[root@leaf ~]</code><code># hadoop fs -ls /</code>
<code>drwxr-xr-x - root supergroup 0 2017-09-01 20:34 </code><code>/data</code>
<code>drwxr-xr-x - root supergroup 0 2017-09-01 20:35 </code><code>/output</code>
(3)创建一个单词文件,并上传到HDFS
创建的单词文件如下:
<code>[root@leaf ~]</code><code># cat myword.txt </code>
<code>leaf yyh</code>
<code>yyh xpleaf</code>
<code>katy ling</code>
<code>yeyonghao leaf</code>
<code>xpleaf katy</code>
上传该文件到HDFS中:
<code>[root@leaf ~]</code><code># hadoop fs -put myword.txt /data/wordcount</code>
在HDFS中查看刚刚上传的文件及内容:
<code>[root@leaf ~]</code><code># hadoop fs -ls /data/wordcount</code>
<code>-rw-r--r-- 1 root supergroup 57 2017-09-01 20:40 </code><code>/data/wordcount/myword</code><code>.txt</code>
<code>[root@leaf ~]</code><code># hadoop fs -cat /data/wordcount/myword.txt</code>
(4)运行wordcount程序
执行如下命令:
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
<code>[root@leaf ~]</code><code># hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.5.jar wordcount /data/wordcount /output/wordcount</code>
<code>...</code>
<code>17</code><code>/09/01</code> <code>20:48:14 INFO mapreduce.Job: Job job_local1719603087_0001 completed successfully</code>
<code>17</code><code>/09/01</code> <code>20:48:14 INFO mapreduce.Job: Counters: 38</code>
<code> </code><code>File System Counters</code>
<code> </code><code>FILE: Number of bytes </code><code>read</code><code>=585940</code>
<code> </code><code>FILE: Number of bytes written=1099502</code>
<code> </code><code>FILE: Number of </code><code>read</code> <code>operations=0</code>
<code> </code><code>FILE: Number of large </code><code>read</code> <code>operations=0</code>
<code> </code><code>FILE: Number of write operations=0</code>
<code> </code><code>HDFS: Number of bytes </code><code>read</code><code>=114</code>
<code> </code><code>HDFS: Number of bytes written=48</code>
<code> </code><code>HDFS: Number of </code><code>read</code> <code>operations=15</code>
<code> </code><code>HDFS: Number of large </code><code>read</code> <code>operations=0</code>
<code> </code><code>HDFS: Number of write operations=4</code>
<code> </code><code>Map-Reduce Framework</code>
<code> </code><code>Map input records=5</code>
<code> </code><code>Map output records=10</code>
<code> </code><code>Map output bytes=97</code>
<code> </code><code>Map output materialized bytes=78</code>
<code> </code><code>Input </code><code>split</code> <code>bytes=112</code>
<code> </code><code>Combine input records=10</code>
<code> </code><code>Combine output records=6</code>
<code> </code><code>Reduce input </code><code>groups</code><code>=6</code>
<code> </code><code>Reduce shuffle bytes=78</code>
<code> </code><code>Reduce input records=6</code>
<code> </code><code>Reduce output records=6</code>
<code> </code><code>Spilled Records=12</code>
<code> </code><code>Shuffled Maps =1</code>
<code> </code><code>Failed Shuffles=0</code>
<code> </code><code>Merged Map outputs=1</code>
<code> </code><code>GC </code><code>time</code> <code>elapsed (ms)=92</code>
<code> </code><code>CPU </code><code>time</code> <code>spent (ms)=0</code>
<code> </code><code>Physical memory (bytes) snapshot=0</code>
<code> </code><code>Virtual memory (bytes) snapshot=0</code>
<code> </code><code>Total committed heap usage (bytes)=241049600</code>
<code> </code><code>Shuffle Errors</code>
<code> </code><code>BAD_ID=0</code>
<code> </code><code>CONNECTION=0</code>
<code> </code><code>IO_ERROR=0</code>
<code> </code><code>WRONG_LENGTH=0</code>
<code> </code><code>WRONG_MAP=0</code>
<code> </code><code>WRONG_REDUCE=0</code>
<code> </code><code>File Input Format Counters </code>
<code> </code><code>Bytes Read=57</code>
<code> </code><code>File Output Format Counters </code>
<code> </code><code>Bytes Written=48</code>
(5)查看统计结果
如下:
<code>[root@leaf ~]</code><code># hadoop fs -cat /output/wordcount/part-r-00000</code>
<code>katy 2</code>
<code>leaf 2</code>
<code>ling 1</code>
<code>xpleaf 2</code>
<code>yeyonghao 1</code>
<code>yyh 2</code>
本文转自 xpleaf 51CTO博客,原文链接:http://blog.51cto.com/xpleaf/1962271,如需转载请自行联系原作者