Hive 数据类型 + Hive sql
基本类型
整型
int tinyint (byte) smallint(short) bigint(long)
浮点型
float double
布尔
boolean
字符
string char(定长) varchar(变长)
时间类型
timestamp date
引用/复合类型
优点类似于容器(Container),便于我们操作数据
复合类型可以和复合类型相互嵌套
Array
存放相同类型的数据
数据按照索引进行查找,索引默认从0开始
user[0]
Map
一组键值对,通过key可以访问到value
key不能相同,相同的key会相互覆盖
map['first']
Struct(就是C语言中的结构体, golang中也有)
定义对象的属性,结构体的属性都是固定的
通过属性获取值
user.uname
类型转换
自动
任何整数类型都可以隐式地转换为一个范围更广的类型
所有整数类型、FLOAT和STRING类型都可以隐式地转换成DOUBLE。
TINYINT、SMALLINT、INT都可以转换为FLOAT。
BOOLEAN类型不可以转换为任何其它的类型。
强制
CAST('1' AS INT)
在设计表的时候,尽量将数据类型设置为合适的类型
防止以后操作中没必要的麻烦
DDL操作--数据库
库,表,字段等命名要注意命名规范
执行数据库组件的定义(创建,修改,删除)功能
执行任何的hivesql语句在语句末尾都要加上分号(😉
数据库
创建数据库
每创建一张表都会在HDFS文件系统中创建一个目录
create database ronnie;
create database if not exists ronnie;
创建数据库并制定存放的位置
create database ronnie location '/ronnie/ronnie_test;
删除数据库
drop database 库名;
drop database if exists 库名;
如果当前库不为空,级联删除
drop database if exists 库名 cascade;
修改数据库信息
数据库的其他元数据信息都是不可更改的
数据库名
数据库所在的目录位置。
alter database ronnie set dbproperties('createtime'='20170830');[设置库属性]
显示数据库
show databases;
hive> show databases;
OK
default
ronnie
Time taken: 0.228 seconds, Fetched: 2 row(s)
hive>
show databases like 'r*'; [模糊匹配]
hive> show databases like'r*';
OK
ronnie
Time taken: 0.01 seconds, Fetched: 1 row(s)
hive>
查看信息
desc database ronnie;
使用数据库
use ronnie;
DDL操作-表
表的创建方式:表示对数据的映射,所以表示根据数据来设计的
创建表
创建表写语句的时候,千万不要出现tab键,会出现乱码
创建数据文件,上传到Linux
创建userinfo表,会在数据库的文件夹中创建一个表名文件夹
将数据载入到表中
ronnieInfo.txt
1,luna,00000
2,slark,11111
3,sven,22222
4,anit_mage,33333
create table ronnieInfo(
id int,
uname string,
password string
)
row format delimited fields terminated by ',' lines terminated by '\n';
load data local inpath '/root/ronnieInfo.txt' overwrite into table ronnieInfo;
select * from ronnieInfo
select id from ronnieInfo where id = 2;
命令行显示:
hive> select * from ronnieInfo;
OK
1luna00000
2slark11111
3sven22222
4anit_mage33333
Time taken: 0.322 seconds, Fetched: 4 row(s)
hive> select id from ronnieInfo where id = 2;
OK
2
Time taken: 0.151 seconds, Fetched: 1 row(s)
重要指令集:
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name
(col_name data_type [COMMENT col_comment], ...)
[COMMENT table_comment]
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
[CLUSTERED BY (col_name, col_name, ...) ]
[SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
[ROW FORMAT row_format]
[STORED AS file_format]
[LOCATION hdfs_path]
CREATE
关键字,创建表
[EXTERNAL]
表的类型,内部表还是外部表
TABLE
创建的类型
[IF NOT EXISTS]
判断这个表是否存在
table_name
表名,要遵循命名规则
(col_name data_type [COMMENT col_comment], ...)
定义一个列 (列名1 数据类型1,列名2 数据类型1)
列与列之间用逗号隔开,最后一个列不需要加,
[COMMENT table_comment]
表的注释信息
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
创建分区表
[CLUSTERED BY (col_name, col_name, ...)
分桶
[SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
分桶
[ROW FORMAT row_format]
每一行数据切分的格式
[STORED AS file_format]
数据存放的格式
[LOCATION hdfs_path]
数据文件的地址
修改表
修改表的时候文件夹也会修改名字
ALTER TABLE ronnieInfo RENAME TO ronnie_info;
更新列
ALTER TABLE table_name CHANGE [COLUMN] col_old_name col_new_name column_type [COMMENT col_comment][FIRST|AFTER column_name];
增加替换列
ALTER TABLE table_name ADD|REPLACE COLUMNS (col_name data_type [COMMENT col_comment], ...);
查看表结构
desc table_name;
删除表
DROP TABLE [IF EXISTS] table_name;
例子:
1,alex,18,game-exercise-book,stu_addr:auckland-work_addr:wellington
2,john,26,shop-lib-learn,stu_addr:queensland-work_addr:sydney
3,paul,20,cook-eat,stu_addr:brisbane-work_addr:gold_coast
create table personInfo(
id int,
name string,
age int,
fav array,
addr struct
)
row format delimited fields terminated by ','
collection items terminated by '-'
map keys terminated by ':'
lines terminated by '\n';
load data local inpath '/root/personInfo.txt' overwrite into table personInfo;
select * from personInfo;
显示表:
hive> select * from personInfo;
OK
1alex18["game","exercise","book"]{"stu_addr":"stu_addr:auckland","work_addr":"work_addr:wellington"}
2john26["shop","lib","learn"]{"stu_addr":"stu_addr:queensland","work_addr":"work_addr:sydney"}
3paul20["cook","eat"]{"stu_addr":"stu_addr:brisbane","work_addr":"work_addr:gold_coast"}
Time taken: 0.058 seconds, Fetched: 3 row(s)
载入数据-load
数据一旦被导入就不可以被修改
数据会被存放到HDFS上,HDFS不支持数据的修改
语法结构
load data [local] inpath '/opt/module/datas/student.txt' overwrite | into table student [partition (partcol1=val1,…)];
load data 固定语法
[local] :如果有local说明分析本地数据,如果去掉local说明分析hdfs上的数据
inpath '/opt/module/datas/student.txt' 导入数据的路径
overwrite 新导入的数据覆盖以前的数据
into table student 导入到那张表中
Linux
load data local inpath '/root/personInfo.txt' into table personInfo;
load data local inpath '/root/ronnieInfo.txt' overwrite into table ronnie_info;
HDFS
load data inpath '/ronnie/hive/personInfo.txt' into table personInfo;
load data inpath '/ronnie/hive/ronnieInfo.txt' overwrite into table ronnie_info;
总结:
不管数据文件在哪,只要是内部表,数据文件都会拷贝一份到数据库表的文件夹中
如果是追加拷贝,查询数据的时候会查询所有的数据文件
当我删除数据文件的时候
载入数据-insert
查询t1表的数据插入到t2表中
1,admin
2,zs
3,ls
4,ww
create table t1(
id string,
name string
)
row format delimited fields terminated by ','
lines terminated by '\n';
load data local inpath '/root/t1.txt' into table t1;
create table t2(
name string
);
//会开启Mapreduce任务
insert overwrite table t2 select name from t1;
执行mapreduce结果:
hive> insert overwrite table t2 select name from t1;
Query ID = root_20190924045312_e3340ec4-55ad-4250-80c0-bf5f958eb4ab
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1569214475993_0001, Tracking URL = http://node03:8088/proxy/application_1569214475993_0001/
Kill Command = /opt/ronnie/hadoop-2.6.5/bin/hadoop job -kill job_1569214475993_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-09-24 04:53:20,136 Stage-1 map = 0%, reduce = 0%
2019-09-24 04:53:27,335 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.96 sec
MapReduce Total cumulative CPU time: 960 msec
Ended Job = job_1569214475993_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://ronnie/ronnie_hive/ronnie_test/t2/.hive-staging_hive_2019-09-24_04-53-12_193_1698682512625223581-1/-ext-10000
Loading data to table ronnie.t2
Table ronnie.t2 stats: [numFiles=1, numRows=4, totalSize=15, rawDataSize=11]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 0.96 sec HDFS Read: 3008 HDFS Write: 80 SUCCESS
Total MapReduce CPU Time Spent: 960 msec
OK
Time taken: 16.388 seconds
将一次查询的结果放入到多张表中
//在上面数据的基础上
create table t3(
id string
);
//会开启Mapreduce任务
from t1
INSERT OVERWRITE TABLE t2 SELECT name
INSERT OVERWRITE TABLE t3 SELECT id ;
MapReduce执行结果:
Query ID = root_20190924045620_5582ef76-bbdc-4b60-b9e1-ba9e63b65865
Total jobs = 5
Launching Job 1 out of 5
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1569214475993_0002, Tracking URL = http://node03:8088/proxy/application_1569214475993_0002/
Kill Command = /opt/ronnie/hadoop-2.6.5/bin/hadoop job -kill job_1569214475993_0002
Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 0
2019-09-24 04:56:27,406 Stage-2 map = 0%, reduce = 0%
2019-09-24 04:56:33,559 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.07 sec
MapReduce Total cumulative CPU time: 1 seconds 70 msec
Ended Job = job_1569214475993_0002
Stage-5 is selected by condition resolver.
Stage-4 is filtered out by condition resolver.
Stage-6 is filtered out by condition resolver.
Stage-11 is selected by condition resolver.
Stage-10 is filtered out by condition resolver.
Stage-12 is filtered out by condition resolver.
Moving data to: hdfs://ronnie/ronnie_hive/ronnie_test/t2/.hive-staging_hive_2019-09-24_04-56-20_574_2344930125947110148-1/-ext-10000
Moving data to: hdfs://ronnie/ronnie_hive/ronnie_test/t3/.hive-staging_hive_2019-09-24_04-56-20_574_2344930125947110148-1/-ext-10002
Loading data to table ronnie.t2
Loading data to table ronnie.t3
Table ronnie.t2 stats: [numFiles=1, numRows=0, totalSize=15, rawDataSize=0]
Table ronnie.t3 stats: [numFiles=1, numRows=0, totalSize=8, rawDataSize=0]
MapReduce Jobs Launched:
Stage-Stage-2: Map: 1 Cumulative CPU: 1.07 sec HDFS Read: 3981 HDFS Write: 153 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 70 msec
OK
Time taken: 14.425 seconds
按照原始SQL数据插入的方式
insert into t1 values ('id','5'),('name','yyz');
内部表与外部表
内部表
一般处理自己独享的数据,防止别人的误删除
删除表的时候,会一起将数据文件删除
内部表不适合和其他工具共享数据。
外部表
可以和别的表共享数据
删除表的时候,不会将数据文件删除
create EXTERNAL table ronnie_ex(
id int,
name string,
age int,
fav array,
addr struct
)
row format delimited fields terminated by ','
collection items terminated by '-'
map keys terminated by ':'
lines terminated by '\n';
//加载本地文件到外部表,文件会保存到表文件夹
load data local inpath '/root/ronnie_ex.txt' into table ronnie_ex;
//加载HDFS到外部表,依然会并拷贝一份到表文件夹
load data inpath '/ex/ronnie_ex.txt' into table ronnie_ex;
为了数据的共享,可以将外部表地址直接设置到数据地址
create EXTERNAL table ronnie_ex_location(
id int,
name string,
age int,
fav array,
addr struct
)
row format delimited fields terminated by ','
collection items terminated by '-'
map keys terminated by ':'
lines terminated by '\n'
location '/ronnie/ex';
外部表与内部表的切换(内-->外)
alter table personInfo set tblproperties('EXTERNAL'='TRUE');
alter table personInfo set tblproperties('EXTERNAL'='FALSE');
表的地址
修改表数据的存放地址
创建表的时候,会预先清空改文件夹中所有的数据
create table ronnieUserPath111(
id int,
name string,
age int,
fav array,
addr struct
)
row format delimited fields terminated by ','
collection items terminated by '-'
map keys terminated by ':'
lines terminated by '\n'
location '/ronnie/ex';
数据导出
将查询的结果导出到本地
insert overwrite local directory '/root/t11' select * from t1;
将查询的结果格式化导出到本地
insert overwrite local directory '/root/t12'
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' select * from t1;
将查询的结果导出到HDFS上
insert overwrite local directory '/ronnie/t13' select * from t1;
使用export/import导出数据
export table t1 to '/ronnie/hive/t1';
import from '/ronnie/hive/t1';