天天看點

資料倉庫 — 09_Hive的安裝與配置(linux環境下Hive的安裝、Hive內建Tez)1 安裝Hive2.32 Hive內建引擎Tez

文章目錄

  • 1 安裝Hive2.3
  • 2 Hive內建引擎Tez
    • 2.1 安裝Tez
    • 2.2 內建Tez
    • 2.3 測試
    • 2.4 注意事項
      • 2.4.1 內建tez後,插入資料失敗
      • 2.4.2 解決方法

歡迎通路筆者個人技術部落格: http://rukihuang.xyz/

學習視訊來源于尚矽谷,視訊連結: 尚矽谷大資料項目資料倉庫,電商數倉V1.2新版,Respect!

1 安裝Hive2.3

  1. 上傳

    apache-hive-2.3.0-bin.tar.gz

    /opt/software

    目錄下,并解壓到

    /opt/module

tar -zxvf apache-hive-2.3.6-bin.tar.gz -C /opt/module/
           
  1. 修改

    apache-hive-2.3.6-bin

    名稱為

    hive

mv apache-hive-2.3.6-bin hive
           
  1. 将Mysql 的

    mysql-connector-java-5.1.27-bin.jar

    拷貝到

    /opt/module/hive/lib/

cp /opt/software/mysql-libs/mysql-connector-java-5.1.27/mysql-connector-java-5.1.27-bin.jar /opt/module/hive/lib/
           
  1. /opt/module/hive/conf

    路徑上,建立

    hive-site.xml

    檔案
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl" target="_blank" rel="external nofollow"  target="_blank" rel="external nofollow" ?>
<configuration>
	<property>
		<name>javax.jdo.option.ConnectionURL</name>
	<value>jdbc:mysql://hadoop102:3306/metastore?createDatabaseIfNotExist=true</value>
	<description>JDBC connect string for a JDBC metastore</description>
	</property>
    <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.jdbc.Driver</value>
        <description>Driver class name for a JDBC metastore</description>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>root</value>
        <description>username to use against metastore database</description>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>root</value>
        <description>password to use against metastore database</description>
    </property>
    <property>
        <name>hive.metastore.warehouse.dir</name>
        <value>/user/hive/warehouse</value>
        <description>location of default database for the warehouse</description>
    </property>
    <property>
        <name>hive.cli.print.header</name>
        <value>true</value>
    </property>
    <property>
        <name>hive.cli.print.current.db</name>
        <value>true</value>
    </property>
    <property>
        <name>hive.metastore.schema.verification</name>
        <value>false</value>
    </property>
    <property>
        <name>datanucleus.schema.autoCreateAll</name>
        <value>true</value>
    </property>
    
	<!--安裝完tez後添加這個-->
	<property>
		<name>hive.execution.engine</name>
		<value>tez</value>
	</property>
</configuration>
           
  1. 啟動hive(

    /opt/module/hive

    )
bin/hive
           

2 Hive內建引擎Tez

  • Tez 是一個Hive 的運作引擎,性能優于MR。
資料倉庫 — 09_Hive的安裝與配置(linux環境下Hive的安裝、Hive內建Tez)1 安裝Hive2.32 Hive內建引擎Tez
  • 用Hive 直接編寫MR 程式,假設有四個有依賴關系的MR 作業,上圖中,綠色是ReduceTask,雲狀表示寫屏蔽,需要将中間結果持久化寫到HDFS。
  • Tez 可以将多個有依賴的作業轉換為一個作業,這樣隻需寫一次HDFS,且中間節點較少,進而大大提升作業的計算性能。

2.1 安裝Tez

  1. 拷貝

    apache-tez-0.9.1-bin.tar.gz

    到hadoop102 的

    /opt/software

    目錄
  2. apache-tez-0.9.1-bin.tar.gz

    上傳到HDFS 的

    /tez

    目錄下。(友善叢集節點共享)
hadoop fs -mkdir /tez
           
hadoop fs -put /opt/software/apache-tez-0.9.1-bin.tar.gz/ /tez
           
  1. 解壓縮

    apache-tez-0.9.1-bin.tar.gz

tar -zxvf apache-tez-0.9.1-bin.tar.gz -C /opt/module
           
  1. 修改名稱 (

    /opt/module

    )
mv apache-tez-0.9.1-bin/ tez-0.9.1
           

2.2 內建Tez

  1. 進入到Hive 的配置目錄:

    /opt/module/hive/conf

  2. 在Hive 的

    /opt/module/hive/conf

    下面建立一個

    tez-site.xml

    檔案
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl" target="_blank" rel="external nofollow"  target="_blank" rel="external nofollow" ?>
<configuration>
    <property>
       <name>tez.lib.uris</name>
       <value>${fs.defaultFS}/tez/apache-tez-0.9.1-bin.tar.gz</value>
    </property>
    <property>
       <name>tez.use.cluster.hadoop-libs</name>
       <value>true</value>
    </property>
    <property>
       <name>tez.history.logging.service.class</name>
       <value>org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService</value>
    </property>
</configuration>
           
  1. hive-env.sh

    檔案中添加tez 環境變量配置和依賴包環境變量配置
mv hive-env.sh.template hive-env.sh
           
# Set HADOOP_HOME to point to a specific hadoop install directory
export HADOOP_HOME=/opt/module/hadoop-2.7.2
# Hive Configuration Directory can be controlled by:
export HIVE_CONF_DIR=/opt/module/hive/conf
# Folder containing extra libraries required for hive compilation/execution can be controlled by:
export TEZ_HOME=/opt/module/tez-0.9.1 #是你的tez 的解壓目錄
export TEZ_JARS=""
for jar in `ls $TEZ_HOME |grep jar`; do
	export TEZ_JARS=$TEZ_JARS:$TEZ_HOME/$jar
done

for jar in `ls $TEZ_HOME/lib`; do
	export TEZ_JARS=$TEZ_JARS:$TEZ_HOME/lib/$jar
done

export HIVE_AUX_JARS_PATH=/opt/module/hadoop-2.7.2/share/hadoop/common/hadoop-lzo-0.4.20.jar$TEZ_JARS
           
  1. hive-site.xml

    檔案中添加如下配置,更改hive 計算引擎(步驟1.4已經添加)
<property>
    <name>hive.execution.engine</name>
    <value>tez</value>
</property>
           

2.3 測試

  1. /opt/module/hive

    目錄下啟動Hive
bin/hive
           
  1. 建立表
create table student(
id int,
name string);
           
  1. 插入資料(我在這一步報錯,解決方法詳見2.4注意事項)
資料倉庫 — 09_Hive的安裝與配置(linux環境下Hive的安裝、Hive內建Tez)1 安裝Hive2.32 Hive內建引擎Tez
  1. 查詢一下沒有報錯表示成功了
資料倉庫 — 09_Hive的安裝與配置(linux環境下Hive的安裝、Hive內建Tez)1 安裝Hive2.32 Hive內建引擎Tez

2.4 注意事項

2.4.1 內建tez後,插入資料失敗

  1. 運作Tez 時檢查到用過多記憶體而被

    NodeManager

    殺死程序問題:
Caused by: org.apache.tez.dag.api.SessionNotRunning: TezSession
has already shutdown. Application application_1546781144082_0005
failed 2 times due to AM Container for appattempt_1546781144082_0005_000002 exited with exitCode: -103
For more detailed output, check application tracking
page:http://hadoop103:8088/cluster/app/application_15467811440
82_0005Then, click on links to logs of each attempt.
Diagnostics: Container
[pid=11116,containerID=container_1546781144082_0005_02_000001]
is running beyond virtual memory limits. Current usage: 216.3 MB
of 1 GB physical memory used; 2.6 GB of 2.1 GB virtual memory used.
Killing container.
           
  1. 這種問題是從機上運作的

    Container

    試圖使用過多的記憶體,而被

    NodeManager

    kill 掉了。
[摘錄] The NodeManager is killing your container. It sounds like
you are trying to use hadoop streaming which is running as a child
process of the map-reduce task. The NodeManager monitors the entire
process tree of the task and if it eats up more memory than the
maximum set in mapreduce.map.memory.mb or
mapreduce.reduce.memory.mb respectively, we would expect the
Nodemanager to kill the task, otherwise your task is stealing memory
belonging to other containers, which you don't want.
           

2.4.2 解決方法

  1. 關掉虛拟記憶體檢查,修改

    yarn-site.xml

<property>
    <name>yarn.nodemanager.vmem-check-enabled</name>
    <value>false</value>
</property>
           
  1. 修改後一定要分發,并重新啟動hadoop 叢集。
xsync yarn-site.xml
           

繼續閱讀