在做Hadoop黑馬日志分析項目的過程中,進行了表的連結。本篇部落格将結合Hive詳細說明Mysql表連結。:
1、統計每日的pv(浏覽量)
hive> create table hmbbs_pv
> as select count() as pv from hmbbs_table;
檢視運作結果:
hive> describe hmbbs_pv;
OK
pv bigint
Time taken: seconds
hive> select pv from hmbbs_pv;
Total MapReduce jobs =
Launching Job out of
Number of reduce tasks is set to since there's no reduce operator
Starting Job = job_1469064014798_0058, Tracking URL = http://hadoop22:/proxy/application_1469064014798_0058/
Kill Command = /usr/local/hadoop/bin/hadoop job -Dmapred.job.tracker=ignorethis -kill job_1469064014798_0058
Hadoop job information for Stage-: number of mappers: ; number of reducers:
-- ::, Stage- map = %, reduce = %
-- ::, Stage- map = %, reduce = %, Cumulative CPU sec
-- ::, Stage- map = %, reduce = %, Cumulative CPU sec
-- ::, Stage- map = %, reduce = %, Cumulative CPU sec
MapReduce Total cumulative CPU time: seconds msec
Ended Job = job_1469064014798_0058
MapReduce Jobs Launched:
Job : Map: Cumulative CPU: sec HDFS Read: HDFS Write: SUCCESS
Total MapReduce CPU Time Spent: seconds msec
OK
Time taken: seconds
2、統計每日的register(注冊使用者數)
hive> create table hmbbs_register
> as select count() as register
> from hmbbs_table
> where instr(urllog,'member.php?mod=register') > ;
檢視運作結果:
hive> describe hmbbs_register;
OK
register bigint
Time taken: seconds
hive> select register from hmbbs_register;
Total MapReduce jobs =
Launching Job out of
Number of reduce tasks is set to since there's no reduce operator
Starting Job = job_1469064014798_0061, Tracking URL = http://hadoop22:/proxy/application_1469064014798_0061/
Kill Command = /usr/local/hadoop/bin/hadoop job -Dmapred.job.tracker=ignorethis -kill job_1469064014798_0061
Hadoop job information for Stage-: number of mappers: ; number of reducers:
-- ::, Stage- map = %, reduce = %
-- ::, Stage- map = %, reduce = %, Cumulative CPU sec
-- ::, Stage- map = %, reduce = %, Cumulative CPU sec
MapReduce Total cumulative CPU time: seconds msec
Ended Job = job_1469064014798_0061
MapReduce Jobs Launched:
Job : Map: Cumulative CPU: sec HDFS Read: HDFS Write: SUCCESS
Total MapReduce CPU Time Spent: seconds msec
OK
Time taken: seconds
3、統計每日的獨立的ip
hive> create table hmbbs_ip as
> select count(distinct iplog) as ip
> from hmbbs_table;
檢視運作結果:
hive> describe hmbbs_ip;
OK
ip bigint
Time taken: seconds
hive> select ip from hmbbs_ip;
Total MapReduce jobs =
Launching Job out of
Number of reduce tasks is set to since there's no reduce operator
Starting Job = job_1469064014798_0063, Tracking URL = http://hadoop22:/proxy/application_1469064014798_0063/
Kill Command = /usr/local/hadoop/bin/hadoop job -Dmapred.job.tracker=ignorethis -kill job_1469064014798_0063
Hadoop job information for Stage-: number of mappers: ; number of reducers:
-- ::, Stage- map = %, reduce = %
-- ::, Stage- map = %, reduce = %, Cumulative CPU sec
-- ::, Stage- map = %, reduce = %, Cumulative CPU sec
-- ::, Stage- map = %, reduce = %, Cumulative CPU sec
MapReduce Total cumulative CPU time: seconds msec
Ended Job = job_1469064014798_0063
MapReduce Jobs Launched:
Job : Map: Cumulative CPU: sec HDFS Read: HDFS Write: SUCCESS
Total MapReduce CPU Time Spent: seconds msec
OK
Time taken: seconds
4、統計每日的獨立的跳出率
hive> CREATE TABLE hmbbs_jumper AS SELECT COUNT() AS jumper FROM (SELECT COUNT(iplog) AS times FROM hmbbs_table GROUP BY iplog HAVING times=) e ;
檢視運作結果:
hive> describe hmbbs_jumper;
OK
jumper bigint
Time taken: seconds
hive> select jumper from hmbbs_jumper;
Total MapReduce jobs =
Launching Job out of
Number of reduce tasks is set to since there's no reduce operator
Starting Job = job_1469064014798_0066, Tracking URL = http://hadoop22:/proxy/application_1469064014798_0066/
Kill Command = /usr/local/hadoop/bin/hadoop job -Dmapred.job.tracker=ignorethis -kill job_1469064014798_0066
Hadoop job information for Stage-: number of mappers: ; number of reducers:
-- ::, Stage- map = %, reduce = %
-- ::, Stage- map = %, reduce = %, Cumulative CPU sec
-- ::, Stage- map = %, reduce = %, Cumulative CPU sec
MapReduce Total cumulative CPU time: seconds msec
Ended Job = job_1469064014798_0066
MapReduce Jobs Launched:
Job : Map: Cumulative CPU: sec HDFS Read: HDFS Write: SUCCESS
Total MapReduce CPU Time Spent: seconds msec
OK
Time taken: seconds
到此,上面四個表已經擷取到了相應的運作結果:
hive> show tables;
OK
hmbbs_ip
hmbbs_jumper
hmbbs_pv
hmbbs_register
hmbbs_table
Time taken: seconds
hive> select * from hmbbs_ip;
OK
Time taken: seconds
hive> select * from hmbbs_jumper;
OK
Time taken: seconds
hive> select * from hmbbs_pv;
OK
Time taken: seconds
hive> select * from hmbbs_register;
OK
Time taken: seconds
接下來進行表連結:
表關聯:層
select from hmbbs_pv
join hmbbs_register on
join hmbbs_ip on
join hmbbs_jumper on
表關聯:層
select from hmbbs_pv
join hmbbs_register on =
join hmbbs_ip on =
join hmbbs_jumper on =
表關聯:層 (給每個表起别名:hmbbs_pv a hmbbs_register b hmbbs_ip c hmbbs_jumper d )
select from hmbbs_pv a
join hmbbs_register b on =
join hmbbs_ip c on =
join hmbbs_jumper d on =
表關聯:層 (取每個表中特定的字段)
select a.pv,b.register,c.ip,d.jumper
from hmbbs_pv a
join hmbbs_register b on =
join hmbbs_ip c on =
join hmbbs_jumper d on =
表關聯:層 (增加一個字段,變成個字段)
select '2013_05_30',a.pv,b.register,c.ip,d.jumper
from hmbbs_pv a
join hmbbs_register b on =
join hmbbs_ip c on =
join hmbbs_jumper d on =
如有問題,歡迎指正!