hivesql修改字段类型_Hive 数据类型 + Hive sql

Hive 数据类型 + Hive sql

基本类型

整型

int tinyint (byte) smallint(short) bigint(long)

浮点型

float double

布尔

boolean

字符

string char(定长) varchar(变长)

时间类型

timestamp date

引用/复合类型

优点类似于容器(Container),便于我们操作数据

复合类型可以和复合类型相互嵌套

Array

存放相同类型的数据

数据按照索引进行查找，索引默认从0开始

user[0]

Map

一组键值对，通过key可以访问到value

key不能相同，相同的key会相互覆盖

map['first']

Struct(就是C语言中的结构体, golang中也有)

定义对象的属性,结构体的属性都是固定的

通过属性获取值

user.uname

类型转换

自动

任何整数类型都可以隐式地转换为一个范围更广的类型

所有整数类型、FLOAT和STRING类型都可以隐式地转换成DOUBLE。

TINYINT、SMALLINT、INT都可以转换为FLOAT。

BOOLEAN类型不可以转换为任何其它的类型。

强制

CAST('1' AS INT)

在设计表的时候，尽量将数据类型设置为合适的类型

防止以后操作中没必要的麻烦

DDL操作--数据库

库，表，字段等命名要注意命名规范

执行数据库组件的定义(创建，修改，删除)功能

执行任何的hivesql语句在语句末尾都要加上分号(😉

数据库

创建数据库

每创建一张表都会在HDFS文件系统中创建一个目录

create database ronnie;

create database if not exists ronnie;

创建数据库并制定存放的位置

create database ronnie location '/ronnie/ronnie_test;

hivesql修改字段类型_Hive 数据类型 + Hive sql

删除数据库

drop database 库名;

drop database if exists 库名;

如果当前库不为空，级联删除

drop database if exists 库名 cascade;

修改数据库信息

数据库的其他元数据信息都是不可更改的

数据库名

数据库所在的目录位置。

alter database ronnie set dbproperties('createtime'='20170830');[设置库属性]

显示数据库

show databases;

hive> show databases;

default

ronnie

Time taken: 0.228 seconds, Fetched: 2 row(s)

hive>

show databases like 'r*'; [模糊匹配]

hive> show databases like'r*';

ronnie

Time taken: 0.01 seconds, Fetched: 1 row(s)

hive>

查看信息

desc database ronnie;

使用数据库

use ronnie;

DDL操作-表

表的创建方式:表示对数据的映射，所以表示根据数据来设计的

创建表

创建表写语句的时候，千万不要出现tab键,会出现乱码

创建数据文件，上传到Linux

创建userinfo表，会在数据库的文件夹中创建一个表名文件夹

将数据载入到表中

ronnieInfo.txt

1,luna,00000

2,slark,11111

3,sven,22222

4,anit_mage,33333

create table ronnieInfo(

id int,

uname string,

password string

)

row format delimited fields terminated by ',' lines terminated by '\n';

load data local inpath '/root/ronnieInfo.txt' overwrite into table ronnieInfo;

select * from ronnieInfo

select id from ronnieInfo where id = 2;

命令行显示:

hive> select * from ronnieInfo;

1luna00000

2slark11111

3sven22222

4anit_mage33333

Time taken: 0.322 seconds, Fetched: 4 row(s)

hive> select id from ronnieInfo where id = 2;

Time taken: 0.151 seconds, Fetched: 1 row(s)

重要指令集:

CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name

(col_name data_type [COMMENT col_comment], ...)

[COMMENT table_comment]

[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]

[CLUSTERED BY (col_name, col_name, ...) ]

[SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]

[ROW FORMAT row_format]

[STORED AS file_format]

[LOCATION hdfs_path]

CREATE

关键字，创建表

[EXTERNAL]

表的类型,内部表还是外部表

TABLE

创建的类型

[IF NOT EXISTS]

判断这个表是否存在

table_name

表名，要遵循命名规则

(col_name data_type [COMMENT col_comment], ...)

定义一个列 (列名1 数据类型1，列名2 数据类型1)

列与列之间用逗号隔开，最后一个列不需要加,

[COMMENT table_comment]

表的注释信息

[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]

创建分区表

[CLUSTERED BY (col_name, col_name, ...)

分桶

[SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]

分桶

[ROW FORMAT row_format]

每一行数据切分的格式

[STORED AS file_format]

数据存放的格式

[LOCATION hdfs_path]

数据文件的地址

修改表

修改表的时候文件夹也会修改名字

ALTER TABLE ronnieInfo RENAME TO ronnie_info;

更新列

ALTER TABLE table_name CHANGE [COLUMN] col_old_name col_new_name column_type [COMMENT col_comment][FIRST|AFTER column_name];

增加替换列

ALTER TABLE table_name ADD|REPLACE COLUMNS (col_name data_type [COMMENT col_comment], ...);

查看表结构

desc table_name;

删除表

DROP TABLE [IF EXISTS] table_name;

例子:

1,alex,18,game-exercise-book,stu_addr:auckland-work_addr:wellington

2,john,26,shop-lib-learn,stu_addr:queensland-work_addr:sydney

3,paul,20,cook-eat,stu_addr:brisbane-work_addr:gold_coast

create table personInfo(

id int,

name string,

age int,

fav array,

addr struct

)

row format delimited fields terminated by ','

collection items terminated by '-'

map keys terminated by ':'

lines terminated by '\n';

load data local inpath '/root/personInfo.txt' overwrite into table personInfo;

select * from personInfo;

显示表:

hive> select * from personInfo;

1alex18["game","exercise","book"]{"stu_addr":"stu_addr:auckland","work_addr":"work_addr:wellington"}

2john26["shop","lib","learn"]{"stu_addr":"stu_addr:queensland","work_addr":"work_addr:sydney"}

3paul20["cook","eat"]{"stu_addr":"stu_addr:brisbane","work_addr":"work_addr:gold_coast"}

Time taken: 0.058 seconds, Fetched: 3 row(s)

载入数据-load

数据一旦被导入就不可以被修改

数据会被存放到HDFS上,HDFS不支持数据的修改

语法结构

load data [local] inpath '/opt/module/datas/student.txt' overwrite | into table student [partition (partcol1=val1,…)];

load data 固定语法

[local] :如果有local说明分析本地数据，如果去掉local说明分析hdfs上的数据

inpath '/opt/module/datas/student.txt' 导入数据的路径

overwrite 新导入的数据覆盖以前的数据

into table student 导入到那张表中

Linux

load data local inpath '/root/personInfo.txt' into table personInfo;

load data local inpath '/root/ronnieInfo.txt' overwrite into table ronnie_info;

HDFS

load data inpath '/ronnie/hive/personInfo.txt' into table personInfo;

load data inpath '/ronnie/hive/ronnieInfo.txt' overwrite into table ronnie_info;

总结：

不管数据文件在哪，只要是内部表，数据文件都会拷贝一份到数据库表的文件夹中

如果是追加拷贝，查询数据的时候会查询所有的数据文件

当我删除数据文件的时候

载入数据-insert

查询t1表的数据插入到t2表中

1,admin

2,zs

3,ls

4,ww

create table t1(

id string,

name string

)

row format delimited fields terminated by ','

lines terminated by '\n';

load data local inpath '/root/t1.txt' into table t1;

create table t2(

name string

);

//会开启Mapreduce任务

insert overwrite table t2 select name from t1;

执行mapreduce结果:

hive> insert overwrite table t2 select name from t1;

Query ID = root_20190924045312_e3340ec4-55ad-4250-80c0-bf5f958eb4ab

Total jobs = 3

Launching Job 1 out of 3

Number of reduce tasks is set to 0 since there's no reduce operator

Starting Job = job_1569214475993_0001, Tracking URL = http://node03:8088/proxy/application_1569214475993_0001/

Kill Command = /opt/ronnie/hadoop-2.6.5/bin/hadoop job -kill job_1569214475993_0001

Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0

2019-09-24 04:53:20,136 Stage-1 map = 0%, reduce = 0%

2019-09-24 04:53:27,335 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.96 sec

MapReduce Total cumulative CPU time: 960 msec

Ended Job = job_1569214475993_0001

Stage-4 is selected by condition resolver.

Stage-3 is filtered out by condition resolver.

Stage-5 is filtered out by condition resolver.

Moving data to: hdfs://ronnie/ronnie_hive/ronnie_test/t2/.hive-staging_hive_2019-09-24_04-53-12_193_1698682512625223581-1/-ext-10000

Loading data to table ronnie.t2

Table ronnie.t2 stats: [numFiles=1, numRows=4, totalSize=15, rawDataSize=11]

MapReduce Jobs Launched:

Stage-Stage-1: Map: 1 Cumulative CPU: 0.96 sec HDFS Read: 3008 HDFS Write: 80 SUCCESS

Total MapReduce CPU Time Spent: 960 msec

Time taken: 16.388 seconds

将一次查询的结果放入到多张表中

//在上面数据的基础上

create table t3(

id string

);

//会开启Mapreduce任务

from t1

INSERT OVERWRITE TABLE t2 SELECT name

INSERT OVERWRITE TABLE t3 SELECT id ;

MapReduce执行结果:

Query ID = root_20190924045620_5582ef76-bbdc-4b60-b9e1-ba9e63b65865

Total jobs = 5

Launching Job 1 out of 5

Number of reduce tasks is set to 0 since there's no reduce operator

Starting Job = job_1569214475993_0002, Tracking URL = http://node03:8088/proxy/application_1569214475993_0002/

Kill Command = /opt/ronnie/hadoop-2.6.5/bin/hadoop job -kill job_1569214475993_0002

Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 0

2019-09-24 04:56:27,406 Stage-2 map = 0%, reduce = 0%

2019-09-24 04:56:33,559 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.07 sec

MapReduce Total cumulative CPU time: 1 seconds 70 msec

Ended Job = job_1569214475993_0002

Stage-5 is selected by condition resolver.

Stage-4 is filtered out by condition resolver.

Stage-6 is filtered out by condition resolver.

Stage-11 is selected by condition resolver.

Stage-10 is filtered out by condition resolver.

Stage-12 is filtered out by condition resolver.

Moving data to: hdfs://ronnie/ronnie_hive/ronnie_test/t2/.hive-staging_hive_2019-09-24_04-56-20_574_2344930125947110148-1/-ext-10000

Moving data to: hdfs://ronnie/ronnie_hive/ronnie_test/t3/.hive-staging_hive_2019-09-24_04-56-20_574_2344930125947110148-1/-ext-10002

Loading data to table ronnie.t2

Loading data to table ronnie.t3

Table ronnie.t2 stats: [numFiles=1, numRows=0, totalSize=15, rawDataSize=0]

Table ronnie.t3 stats: [numFiles=1, numRows=0, totalSize=8, rawDataSize=0]

MapReduce Jobs Launched:

Stage-Stage-2: Map: 1 Cumulative CPU: 1.07 sec HDFS Read: 3981 HDFS Write: 153 SUCCESS

Total MapReduce CPU Time Spent: 1 seconds 70 msec

Time taken: 14.425 seconds

按照原始SQL数据插入的方式

insert into t1 values ('id','5'),('name','yyz');

内部表与外部表

内部表

一般处理自己独享的数据，防止别人的误删除

删除表的时候，会一起将数据文件删除

内部表不适合和其他工具共享数据。

外部表

可以和别的表共享数据

删除表的时候，不会将数据文件删除

create EXTERNAL table ronnie_ex(

id int,

name string,

age int,

fav array,

addr struct

)

row format delimited fields terminated by ','

collection items terminated by '-'

map keys terminated by ':'

lines terminated by '\n';

//加载本地文件到外部表，文件会保存到表文件夹

load data local inpath '/root/ronnie_ex.txt' into table ronnie_ex;

//加载HDFS到外部表，依然会并拷贝一份到表文件夹

load data inpath '/ex/ronnie_ex.txt' into table ronnie_ex;

为了数据的共享，可以将外部表地址直接设置到数据地址

create EXTERNAL table ronnie_ex_location(

id int,

name string,

age int,

fav array,

addr struct

)

row format delimited fields terminated by ','

collection items terminated by '-'

map keys terminated by ':'

lines terminated by '\n'

location '/ronnie/ex';

外部表与内部表的切换(内-->外)

alter table personInfo set tblproperties('EXTERNAL'='TRUE');

alter table personInfo set tblproperties('EXTERNAL'='FALSE');

表的地址

修改表数据的存放地址

创建表的时候，会预先清空改文件夹中所有的数据

create table ronnieUserPath111(

id int,

name string,

age int,

fav array,

addr struct

)

row format delimited fields terminated by ','

collection items terminated by '-'

map keys terminated by ':'

lines terminated by '\n'

location '/ronnie/ex';

数据导出

将查询的结果导出到本地

insert overwrite local directory '/root/t11' select * from t1;

将查询的结果格式化导出到本地

insert overwrite local directory '/root/t12'

ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' select * from t1;

将查询的结果导出到HDFS上

insert overwrite local directory '/ronnie/t13' select * from t1;

使用export/import导出数据

export table t1 to '/ronnie/hive/t1';

import from '/ronnie/hive/t1';