Hive：面試題

Q1：Maven離線倉庫的預設位置是什麼？

檢查離線倉庫：導入這裡需要的habase和hadoop依賴的包

1、建立maven預設的離線倉庫檔案夾.m2 （目前使用者的家目錄下）

$ mkdir ~/.m2/

2、解壓離線倉庫到預設位置

$ tar -zxf /opt/softwares/hbase+hadoop_repository.tar.gz -C ~/.m2/

Q2：Hive的主要作用是什麼？

1.Hive是基于hadoop的資料倉庫工具，可以将結構化的資料檔案映射成一張資料表，并且提供sql查詢。

相當于mapreduce的用戶端

Q3：配置hive-env.sh都涉及到哪些屬性？（中文描述）

1.添加JAVA_HOME路徑 JAVA_HOME=/opt/modules/jdk1.7.0_67

2.添加HADOOP_HOME路徑 HADOOP_HOME=/opt/modules/hadoop-2.5.0-cdh5.3.6/

3.添加HIVE_COF路徑 export HIVE_CONF_DIR=/opt/modules/hive-0.13.1-cdh5.3.6/conf

Q4：配置hive-site.xml都修改了哪些屬性，請寫出屬性名稱并解釋該屬性。

1.連接配接資料庫的url

2.資料庫驅動名

3.資料庫賬戶

4.資料庫密碼

<name>javax.jdo.option.ConnectionURL</name>

<value>jdbc:mysql://hadoop102:3306/metastore?createDatabaseIfNotExist=true</value>

<description>JDBC connect string for a JDBC metastore</description>

</property>

<name>javax.jdo.option.ConnectionDriverName</name>

<value>com.mysql.jdbc.Driver</value>

<description>Driver class name for a JDBC metastore</description>

</property>

<name>javax.jdo.option.ConnectionUserName</name>

<description>username to use against metastore database</description>

</property>

#因為在hive配置裡配置了mysql賬号密碼，是以hive直接直接連接配接使用mysql

<name>javax.jdo.option.ConnectionPassword</name>

<description>password to use against metastore database</description>

</property>

Q5：配置Mysql時，在CentOS6以及CentOS7中開啟Mysql服務的指令分别是什麼？

centos7 1.systemctl start mysqld.service

centos6 2.service mysql start/stop

Q6：Mysql中如何修改root密碼？

1.不用先登入mysql

mysqladmin -uroot -p123456 password 123

2.首先登入MySQL。

格式：

mysql> set password for 使用者名@localhost = password('新密碼');

mysql> set password for [email protected] = password('123');

Q7：Mysql中如何為使用者以及主機授權？每個參數的含義是什麼？授權結束後需要使用什麼指令使授權生效？

首先都需要先登入mysql

方法1. grant all priviliges on *.* to 'root'@'%' iddentified by '123456' with grant option

方法2. grant all on *.* to root@'hadoop102' identified by '123456';

(注意如果此處授權沒有執行别的使用者或者主機無法連接配接到mysql)

Q8：如何設定Hive産生log日志的目錄？

1.在hive安裝目錄下的conf/hive-log4j.properties (首先要建立mkdir logs)

hive.log.dir=/opt/modules/hive-0.13.1-cdh5.3.6/logs

Q9：啟動Hive的方式有哪些？

1.bin/hive

2.bin/hiveserver2

Q10：HiveServer2的作用是什麼，可以畫圖說明HiveServer2的角色定位。

1.Hiveserver2作用是允許多台主機通過beeline連接配接hiveserver2上，在通過hiveserver2連接配接到hive資料倉庫。

Q11：如何連接配接HiveServer2？寫出具體指令

1.bin/hiveserver2

2.bin/beeline

3.!connect jdbc:hive2://hadoop102:10000

Q12：Hive建立id，name，sex表的文法是什麼？

1. create table student(id int,name String ,sex String)

row format delimited fields terminated by '\t'

Q13：Hive的兩個重要參數是什麼？

1.hive -e ‘’ 從指令行執行指定的HQL

2.hive -f *.hql 執行hive腳本指令

Q14：Hive如何在腳本中傳入參數到HQL檔案，在HQL中又如何引用傳入的參數？

參考：https://mp.csdn.net/postedit/83180899

Q15：Hive中如何複制一張表的表結構（不帶有被複制表資料）

create table a like b;

Q16：Hive中追加導入資料的4種方式是什麼？請寫出簡要文法。

1.從本地導入： load data local inpath '/home/1.txt' (overwrite)into table student;

2.從Hdfs導入： load data inpath '/user/hive/warehouse/1.txt' (overwrite)into table student;

3.查詢導入： create table student1 as select * from student;(也可以具體查詢某項資料)

4.查詢結果導入：insert （overwrite）into table staff select * from track_log;

Q17：Hive導出資料有幾種方式？如何導出資料？

1.用insert overwrite導出方式

導出到本地：

insert overwrite local directory '/home/robot/1/2' rom format delimited fields terminated by '\t'

select * from staff;(遞歸建立目錄)

導出到HDFS

insert overwrite directory '/user/hive/1/2' rom format delimited fields terminated by '\t'

select * from staff;

2.Bash shell覆寫追加導出

例如：$ bin/hive -e "select * from staff;" > /home/z/backup.log

3.Sqoop把hive資料導出到外部

Q18：Hive幾種排序的特點

1.order by 全局排序

2.sort by 非全局排序

3.distribute by hash散列分區，常和sort by同時使用。即分區又排序，需要設定mapreduce.job.reduces的個數

4.cluster by 當distribute by 和sort by的字段相同時，等同于cluster by.可以看做特殊的distribute + sort

Q19：Sqoop如何導入資料，如何導出資料？

導入資料：MySQL，Oracle導入資料到Hadoop的HDFS、HIVE、HBASE等資料存儲系統；

導出資料：從Hadoop的檔案系統中導出資料到關系資料庫

1.将mysql資料導入到hive中。

bin/sqoop import \

--jdbc:mysql//hadoop102:3306/company \

--username root

--password 123456

--table staff

--terminated by '\t'

--m 1

2.用sqoop将hive中的資料導出到hdfs

bin/sqoop export \

--connect jdbc:mysql://hadoop102/test\

--username root \

--password 123456 \

--table employee \

--export-dir /user/hadoop/emp/

Q20：Hive如何關聯分區資料？

1. insert table staff

select * from staff1 where patition(coutry='china');

2.hive的hql查詢操作

create table t_access_times(username string,month string,salary int)

row format delimited fields terminated by ','; //row format delimited 是用來設定建立的表在加載資料的時候,支援的列分隔符

load data local inpath '/home/hadoop/t_access_times.dat' into table t_access_times;

A,2015-01,5

A,2015-01,15

B,2015-01,5

A,2015-01,8

B,2015-01,25

A,2015-01,5

A,2015-02,4

A,2015-02,6

B,2015-02,10

B,2015-02,5

1、第一步，先求個使用者的月總金額

select username,month,sum(salary) as salary from t_access_times group by username,month

+-----------+----------+---------+--+

| username | month | salary |

+-----------+----------+---------+--+

| A | 2015-01 | 33 |

| A | 2015-02 | 10 |

| B | 2015-01 | 30 |

| B | 2015-02 | 15 |

+-----------+----------+---------+--+

2、第二步，将月總金額表自己連接配接自己連接配接

+-------------+----------+-----------+-------------+----------+-----------+--+

+-------------+----------+-----------+-------------+----------+-----------+--+

| A | 2015-01 | 33 | A | 2015-01 | 33 |

| A | 2015-01 | 33 | A | 2015-02 | 10 |

| A | 2015-02 | 10 | A | 2015-01 | 33 |

| A | 2015-02 | 10 | A | 2015-02 | 10 |

| B | 2015-01 | 30 | B | 2015-01 | 30 |

| B | 2015-01 | 30 | B | 2015-02 | 15 |

| B | 2015-02 | 15 | B | 2015-01 | 30 |

| B | 2015-02 | 15 | B | 2015-02 | 15 |

+-------------+----------+-----------+-------------+----------+-----------+--+

3、第三步，從上一步的結果中

進行分組查詢，分組的字段是a.username a.month

求月累計值：将b.month <= a.month的所有b.salary求和即可

select A.username,A.month,max(A.salary) as salary,sum(B.salary) as accumulate

from

(select username,month,sum(salary) as salary from t_access_times group by username,month) A

inner join

(select username,month,sum(salary) as salary from t_access_times group by username,month) B

A.username=B.username

where B.month <= A.month

group by A.username,A.month

order by A.username,A.month;

Q21：Hive導出資料有幾種方式？如何導出資料？

1.用insert overwrite導出方式

導出到本地：

insert overwrite local directory '/home/robot/1/2' rom format delimited fields terminated by '\t'

select * from staff;(遞歸建立目錄)

導出到HDFS

2.insert overwrite directory '/user/hive/1/2' rom format delimited fields terminated by '\t'

select * from staff;

Bash shell覆寫追加導出

例如：$ bin/hive -e "select * from staff;" > /home/z/backup.log

3.Sqoop把hive資料導出到外部

Q22： Hive中追加導入資料的4種方式是什麼？請寫出簡要文法

1.從本地導入： load data local inpath '/home/1.txt' (overwrite)into table student;

2.從Hdfs導入： load data inpath '/user/hive/warehouse/1.txt' (overwrite)into table student;

3.查詢導入： create table student1 as select * from student;(也可以具體查詢某項資料)

4.查詢結果導入：insert （overwrite）into table staff select * from track_log;

Q23：Hive中如何複制一張表的表結構（不帶有被複制表資料）

create table a like b;

Q24：Hive幾種排序的特點

1.order by 全局排序

2.sort by 非全局排序

3.distribute by hash散列分區，常和sort by同時使用。即分區又排序，需要設定mapreduce.job.reduces的個數

4.cluster by 當distribute by 和sort by的字段相同時，等同于cluster by.可以看做特殊的distribute + sort

參考：https://blog.csdn.net/weixin_38750084/article/details/83033525

Q25：Sqoop如何導入資料，如何導出資料？

導入資料：MySQL，Oracle導入資料到Hadoop的HDFS、HIVE、HBASE等資料存儲系統；

導出資料：從Hadoop的檔案系統中導出資料到關系資料庫

1.将mysql資料導入到hive中。

bin/sqoop import \

--jdbc:mysql//hadoop102:3306/company \

--username root

--password 123456

--table staff

--terminated by '\t'

--m 1

2.用sqoop将hive中的資料導出到hdfs

bin/sqoop export \

--connect jdbc:mysql://hadoop102/test\

--username root \

--password 123456 \

--table employee \

--export-dir /user/hadoop/emp/

Hive是基于hadoop的資料倉庫工具，可以将結構化的資料檔案映射成一張資料表，并且提供sql查詢。

寫出将 text.txt 檔案放入 hive 中 test 表‘2016-10-10’ 分區的語句，test 的分區字段是 l_date：

LOAD DATA LOCAL INPATH '/your/path/test.txt' OVERWRITE INTO TABLE test PARTITION (l_date='2016-10-10')

Q26：請把下一語句用hive方式實作？

SELECT a.key,a.value

FROM a

WHERE a.key not in (SELECT b.key FROM b)

答案：

select a.key,a.value from a where a.key not exists (select b.key from b)

Q27：寫出hive中split、coalesce及collect_list函數的用法（可舉例）？

Split将字元串轉化為數組。

split('a,b,c,d' , ',') ==> ["a","b","c","d"]

COALESCE(T v1, T v2, …) 傳回參數中的第一個非空值；如果所有值都為 NULL，那麼傳回NULL。

collect_list列出該字段所有的值，不去重 select collect_list(id) from table;

Q28：簡要描述資料庫中的 null，說出null在hive底層如何存儲，并解釋selecta.* from t1 a left outer join t2 b on a.id=b.id where b.id is null; 語句的含義？

null與任何值運算的結果都是null, 可以使用is null、is not null函數指定在其值為null情況下的取值。

null在hive底層預設是用'\N'來存儲的，可以通過alter table test SET SERDEPROPERTIES('serialization.null.format' = 'a');來修改。

查詢出t1表中與t2表中id相等的所有資訊。

原文參考：https://blog.csdn.net/qq_26442553/article/details/78725690

Hive：面試題

目錄

Q1：Maven離線倉庫的預設位置是什麼？

Q2：Hive的主要作用是什麼？

Q3：配置hive-env.sh都涉及到哪些屬性？（中文描述）

Q4：配置hive-site.xml都修改了哪些屬性，請寫出屬性名稱并解釋該屬性。

Q5：配置Mysql時，在CentOS6以及CentOS7中開啟Mysql服務的指令分别是什麼？

Q6：Mysql中如何修改root密碼？

Q7：Mysql中如何為使用者以及主機授權？每個參數的含義是什麼？授權結束後需要使用什麼指令使授權生效？

Q8：如何設定Hive産生log日志的目錄？

Q9：啟動Hive的方式有哪些？

Q10：HiveServer2的作用是什麼，可以畫圖說明HiveServer2的角色定位。

Q11：如何連接配接HiveServer2？寫出具體指令

Q12：Hive建立id，name，sex表的文法是什麼？

Q13：Hive的兩個重要參數是什麼？

Q14：Hive如何在腳本中傳入參數到HQL檔案，在HQL中又如何引用傳入的參數？

Q15：Hive中如何複制一張表的表結構（不帶有被複制表資料）

Q16：Hive中追加導入資料的4種方式是什麼？請寫出簡要文法。

Q17：Hive導出資料有幾種方式？如何導出資料？

Q18：Hive幾種排序的特點

Q19：Sqoop如何導入資料，如何導出資料？

Q20：Hive如何關聯分區資料？

Q21：Hive導出資料有幾種方式？如何導出資料？

Q22： Hive中追加導入資料的4種方式是什麼？請寫出簡要文法

Q23：Hive中如何複制一張表的表結構（不帶有被複制表資料）

Q24：Hive幾種排序的特點

Q25：Sqoop如何導入資料，如何導出資料？

Q26：請把下一語句用hive方式實作？

Q27：寫出hive中split、coalesce及collect_list函數的用法（可舉例）？

Q28：簡要描述資料庫中的 null，說出null在hive底層如何存儲，并解釋selecta.* from t1 a left outer join t2 b on a.id=b.id where b.id is null; 語句的含義？

繼續閱讀