hive3.0.0 建立orc表以便支援 delete 問題及sql語句調優

2023-06-24 19:20:57

說明：之前查了資料得到hive3.0 及以上版本是支援ACID的，但是在實際操作中并沒有實作delete功能，為了節省時間之間将原來存儲格式為textfile格式的内部表修改為存儲格式為orcfile的orc表，經過實操，發現實作了delete功能，且性能提升一倍左右。

另：測試的記憶體配置為垃圾8G，執行引擎為yarn。

# 老思路 cst_bsc_inf_dplt 全量表 按客戶ID分桶
create table if not exists cst_bsc_inf_dplt(
  cst_id string,
  ip_id string,
  .......,
  rmrk_1 string)
  comment 'this is the custormer_bastic_information_copy view table'
  clustered by (cst_id) into 8 buckets
  row format serde 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' with serdeproperties ('field.delim'='|@|')
  store as textfile tblproperties ('serialization.encoding'='utf-8');

# 新思路 将cst_bsc_inf_dplt建立成ORC表 按客戶ID分桶 
create table if not exists cst_bsc_inf_dplt(
  cst_id string,
  ip_id string,
  .......,
  rmrk_1 string)
  comment 'this is the custormer_bastic_information_copy view table'
  clustered by (cst_id) into 8 buckets
  store as orcfile tblproperties ('serialization.encoding'='utf-8','transactional'='true');



=======調優前=======
(老思路 textfile表：比對全量表與增量表相同的cst_id,然後進行全表覆寫  千萬級别全量資料5.6G + 增量資料 2.5M(1100條左右)   7分鐘)

sql腳本：
use cst_lbl;
insert overwrite table cat_bsc_inf_dplt select * from cat_bsc_inf_dplt_mid union all(select a.* from cat_bsc_inf_dplt a left join cat_bsc_inf_dplt_mid b on 1=1 and a.cst_id=b.cst_id where b.cst_id is null);
exit;

=======調優後=======
(新思路 orc表:先查出增量表中的cst_id，全量表與之相同的cst_id整條資料删除；然後将增量表全部資料整體插入到增量表中。千萬級别全量資料5.6G + 增量資料 2.5M(1100條左右)  3分鐘)

sql腳本：
use cst_lbl;
delete from cat_bsc_inf_dplt where cst_id in (select cst_id from cat_bsc_inf_dplt_mid);
insert into cat_bsc_inf_dplt select * from cat_bsc_inf_dplt_mid；
exit;

hive3.0.0 建立orc表以便支援 delete 問題及sql語句調優

繼續閱讀

luogu1078 文化之旅

Hadoop離線_Hive的基本操作

Hive中内部表、外部表、分區、分桶以及SQL的執行順序

Hive中的内部表外部表及分區表

Hive---外部分區表的建立

Hive學習筆記 3 Hive的資料模型：内部表、分區表、外部表、桶表、視圖

HiveQL(二):分區表

Hive的分區表入門

Hive的分區表

Hive（二）--分區分桶，内部表外部表

大資料高頻面試題之Hive的小檔案合并

世界因大資料而改變

hive sql通過具體位址解析出行政區劃(省＞市＞區＞縣＞鄉＞鎮＞村)

Hive最全常見錯誤及解決方案hive --service metastore &

《Hive權威指南》第八章：HiveQL索引8 HiveQL：索引

HiveQl語句應用執行個體：WordCount具體步驟如下：

hive3.0.0 建立orc表以便支援 delete 問題 及sql語句調優

繼續閱讀

hive3.0.0 建立orc表以便支援 delete 問題及sql語句調優