天天看点

CDH5: 使用parcels配置lzo 一、Parcel 部署步骤 二、lzo parcels本地化三、修改配置四、验证

    1 下载: 首先需要下载 parcel。下载完成后,parcel 将驻留在 cloudera manager 主机的本地目录中。 

    2 分配: parcel 下载后,将分配到群集中的所有主机上并解压缩。 

    3 激活: 分配后,激活 parcel 为群集重启后使用做准备。激活前可能还需要升级。

    2、同时下载manifest.json,并根据manifest.json文件中的hash值创建sha文件(注意:sha文件的名称与parcels包名一样)

    3、命令行进入apache(如果没有安装,则需要安装)的网站根目录下,默认是/var/www/html,在此目录下创建lzo,并将这三个文件放在lzo目录中

    4、启动httpd服务,在浏览器查看,如http://ip/lzo,则结果如下:

CDH5: 使用parcels配置lzo 一、Parcel 部署步骤 二、lzo parcels本地化三、修改配置四、验证

    5、将发布的local parcels发布地址配置到远程 parcel 存储库 url地址中,见下图

CDH5: 使用parcels配置lzo 一、Parcel 部署步骤 二、lzo parcels本地化三、修改配置四、验证

 6、在cloud manager的parcel页面的可下载parcel中,就可以看到lzo parcels, 点击并进行下载

7、根据parcels的部署步骤,进行分配、激活。结果如下图

CDH5: 使用parcels配置lzo 一、Parcel 部署步骤 二、lzo parcels本地化三、修改配置四、验证

    修改hdfs的配置

    将io.compression.codecs属性值中追加,org.apache.hadoop.io.compress.lz4codec,

com.hadoop.compression.lzo.lzopcodec

    修改yarn配置

    将mapreduce.application.classpath的属性值修改为:$hadoop_mapred_home/*,$hadoop_mapred_home/lib/*,$mr2_classpath,/opt/cloudera/parcels/hadoop_lzo/lib/hadoop/lib/*

    将mapreduce.admin.user.env的属性值修改为:ld_library_path=$hadoop_common_home/lib/native:$java_library_path:/opt/cloudera/parcels/hadoop_lzo/lib/hadoop/lib/native

    create external table lzo(id int,name string)  row format delimited fields terminated by '#' stored as inputformat 'com.hadoop.mapred.deprecatedlzotextinputformat' outputformat 'org.apache.hadoop.hive.ql.io.hiveignorekeytextoutputformat' location '/test';

    创建一个data.txt,内容如下:  

    然后使用lzop命令对此文件压缩,然后上传到hdfs的/test目录下

    启动hive,建表并进行数据查询,结果如下:

hive> create external table lzo(id int,name string)  row format delimited fields terminated by '#' stored as inputformat 'com.hadoop.mapred.deprecatedlzotextinputformat' outputformat 'org.apache.hadoop.hive.ql.io.hiveignorekeytextoutputformat' location '/test';

ok

time taken: 0.108 seconds

hive> select * from lzo where id>2;

total mapreduce jobs = 1

launching job 1 out of 1

number of reduce tasks is set to 0 since there's no reduce operator

starting job = job_1404206497656_0002, tracking url = http://hadoop01.kt:8088/proxy/application_1404206497656_0002/

kill command = /opt/cloudera/parcels/cdh-5.0.1-1.cdh5.0.1.p0.47/lib/hadoop/bin/hadoop job  -kill job_1404206497656_0002

hadoop job information for stage-1: number of mappers: 1; number of reducers: 0

2014-07-01 17:30:27,547 stage-1 map = 0%,  reduce = 0%

2014-07-01 17:30:37,403 stage-1 map = 100%,  reduce = 0%, cumulative cpu 2.84 sec

2014-07-01 17:30:38,469 stage-1 map = 100%,  reduce = 0%, cumulative cpu 2.84 sec

2014-07-01 17:30:39,527 stage-1 map = 100%,  reduce = 0%, cumulative cpu 2.84 sec

mapreduce total cumulative cpu time: 2 seconds 840 msec

ended job = job_1404206497656_0002

mapreduce jobs launched: 

job 0: map: 1   cumulative cpu: 2.84 sec   hdfs read: 295 hdfs write: 15 success

total mapreduce cpu time spent: 2 seconds 840 msec

3       sz

4       sz

5       bx

time taken: 32.803 seconds, fetched: 3 row(s)

hive> set hive.exec.compress.output=true;

hive> set mapred.output.compression.codec=com.hadoop.compression.lzo.lzopcodec;

hive> create external table lzo2(id int,name string)  row format delimited fields terminated by '#' stored as inputformat 'com.hadoop.mapred.deprecatedlzotextinputformat' outputformat 'org.apache.hadoop.hive.ql.io.hiveignorekeytextoutputformat' location '/test';

time taken: 0.092 seconds

hive> insert into table lzo2 select * from lzo;

total mapreduce jobs = 3

launching job 1 out of 3

starting job = job_1404206497656_0003, tracking url = http://hadoop01.kt:8088/proxy/application_1404206497656_0003/

kill command = /opt/cloudera/parcels/cdh-5.0.1-1.cdh5.0.1.p0.47/lib/hadoop/bin/hadoop job  -kill job_1404206497656_0003

2014-07-01 17:33:47,351 stage-1 map = 0%,  reduce = 0%

2014-07-01 17:33:57,114 stage-1 map = 100%,  reduce = 0%, cumulative cpu 1.96 sec

2014-07-01 17:33:58,170 stage-1 map = 100%,  reduce = 0%, cumulative cpu 1.96 sec

mapreduce total cumulative cpu time: 1 seconds 960 msec

ended job = job_1404206497656_0003

stage-4 is selected by condition resolver.

stage-3 is filtered out by condition resolver.

stage-5 is filtered out by condition resolver.

moving data to: hdfs://hadoop01.kt:8020/tmp/hive-hdfs/hive_2014-07-01_17-33-22_504_966970548620625440-1/-ext-10000

loading data to table default.lzo2

table default.lzo2 stats: [num_partitions: 0, num_files: 2, num_rows: 0, total_size: 171, raw_data_size: 0]

job 0: map: 1   cumulative cpu: 1.96 sec   hdfs read: 295 hdfs write: 79 success

total mapreduce cpu time spent: 1 seconds 960 msec

time taken: 36.625 seconds