Hadoop集群(五) Hive安装

作为一个多年的DBA，hadoop家族中，最亲切的产品就是hive了。毕竟SQL的使用还是很熟悉的。再也不用担心编写Mapreducer的痛苦了。

首先还是简单介绍一下Hive吧

Hive是基于Hadoop的数据仓库解决方案。由于Hadoop本身在数据存储和计算方面有很好的可扩展性和高容错性，因此使用Hive构建的数据仓库也秉承了这些特性。

简单来说，Hive就是在Hadoop上架了一层SQL接口，可以将SQL翻译成MapReduce去Hadoop上执行，这样就使得数据开发和分析人员很方便的使用SQL来完成海量数据的统计和分析，而不必使用编程语言开发MapReduce那么麻烦。

下面开始Hive的安装, 安装hive的前提，是hdfs，yarn已经安装完成并启动。hdfs安装，可以参考

Hadoop集群(一) Zookeeper搭建

Hadoop集群(二) HDFS搭建

Hadoop集群(三) Hbase搭建

Hive软件的下载，我使用版本是hive-1.2.1，现在已经无法下载了。大家可以根据需要下载新版本。

<a href="http://hive.apache.org/downloads.html">http://hive.apache.org/downloads.html</a>

tar -xzvf apache-hive-1.2.1-bin.tar.gz

修改hive-site.xml数据库相关的配置，主要有下面几个。实际生产中，还有很多其他的参数需要配置，比如后面提及的lzo压缩，kerberos等。这几个参数只是最基本保证hive运行的参数。

创建对应目录

创建hdfs目录

初始化hive

第一次启动hive，遇到错误，其实很多错误，都是一个“hive小白” 意识不到的配置问题，和启动等问题。对于老司机，这些都不是问题。

原因：因为没有正常启动Hive 的 Metastore Server服务进程。

[hive@aznbhivel01 ~]$ hive

Logging initialized using configuration in file:/usr/local/hadoop/apache-hive-1.2.1/conf/hive-log4j.properties

Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:528)

at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:677)

at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)

......

Caused by: java.lang.reflect.InvocationTargetException

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)

at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

at java.lang.reflect.Constructor.newInstance(Constructor.java:526)

at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1521)

... 14 more

Caused by: MetaException(message:Could not connect to meta store using any of the URIs provided. Most recent failure: ···

-7. 解决方法：启动Hive 的 Metastore Server服务进程，执行如下命令,，遇到下一个问题

[hive@aznbhivel01 ~]$ Starting Hive Metastore Server

hiveorg.apache.thrift.transport.TTransportException: java.io.IOException: Login failure for hive/[email protected] from keytab /etc/security/keytab/hive.keytab: javax.security.auth.login.LoginException: Unable to obtain password from user

at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server.<init>(HadoopThriftAuthBridge.java:358)

at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge.createServer(HadoopThriftAuthBridge.java:102)

at org.apache.hadoop.hive.metastore.HiveMetaStore.startMetaStore(HiveMetaStore.java:5990)

at org.apache.hadoop.hive.metastore.HiveMetaStore.main(HiveMetaStore.java:5909)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at org.apache.hadoop.util.RunJar.run(RunJar.java:221)

at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

Caused by: java.io.IOException: Login failure for hive/[email protected] from keytab /etc/security/keytab/hive.keytab: javax.security.auth.login.LoginException: Unable to obtain password from user```

-8. keytab 没找到，修正hive.keytab文件权限问题。

-9. 再次重启metastore

-10. 然后启动hiveserver

<code>hive --service hiveserver2 &</code>

-11. 启动依然失败，很困惑。问题很明显，就是说kerberos的KDC中无法找到这个server。但是已经kinit并且成功了。而且日志前面也说了，认证成功。

尝试重新生成keytab也无效。最后考虑是不是hive-site.xml中写的是IP的原因？修改成主机名，这个问题解决“thrift://aznbhivel01.liang.com:9083”

-12. 然后又遇到权限错误，他也不是哪里权限不对。hdfs已经可以看到hive写入的文件了，权限应该正确。继续分析.....

Google到的strace方法，看是什么权限问题

strace is your friend if you are on Linux. Try the following from the

shell in which you are starting hive...

strace -f -e trace=file service hive-server2 start 2>&1 | grep ermission

You should see the file it can't read/write.

上面的问题，最后发现 /tmp/hive-security路径的权限不对，修改之后，这个问题过去了。

-13. 下一个问题，继续：

google查询关键字org.apache.ranger.authorization.hive.authorizer.RangerHiveAuthorizerFactory

找到文章

<a href="https://www.cnblogs.com/wyl9527/p/6835620.html">https://www.cnblogs.com/wyl9527/p/6835620.html</a>

执行启动命令后需要进行重启hive服务.

安装结束后：

会看见多了几个配置文件。

修改hiveserver2-site.xml 文件

目前没有使用ranger安全认证，决定取消它。怎么取消呢？

干脆删除hiveserver2-site.xml 文件。又向前爬了一步， hiveserver2启动成功了。hive进去了，遇到下一个错误。

-14. 可以正常启动hive了，也可以通过hive命令进入查询，但是可以看到，命令执行是OK的，但是不能正常返回查询结果

百度解决方法

<a href="http://blog.csdn.net/wodedipang_/article/details/72720257">http://blog.csdn.net/wodedipang_/article/details/72720257</a>

但是我的配置是，没有文中说到的情况。怀疑是这个文件夹的权限等问题

最后在日志hive.log中有如下错误，说明缺少jar包

-15. 是hadoop的core-site.xml中有设置，有设置lzo.LzoCodec的压缩方式，所以需要对应的jar包支持，才能正常执行Mapreducer

将需要的包，从其他正常的环境copy过来，解决了。

注意，lzo jar包不只是在hive服务器上，在全部的yarn/MapReduce机器上，都需要有这个jar包，不然在调用mapreduce过程中，涉及到lzo压缩的话，就会出问题，而不只是hive发起的任务会遇到问题。

至此，hive安装完成了。

爬过一个有一个坑，来感受一下hive查询的输出吧：

需要注意的点

-1. mysql的字符集是latin1，这个字符集在安装hive的时候是适合的，但是后面使用的时候，尤其有中午文件存入的时候，就会无法正常显示。所以建议，安装完hive之后，修改字符集到UTF8

-2修改字符集

-3修改后

连接hive的方式

a. hive直接连接的方式，如果有kerberos，注意先kinit认证

b. beeline连接

<code>beeline -u "jdbc:hive2://hive-hostname:10000/default;principal=hive/[email protected]"</code>

如果是hiveserver2 HA的架构，连接方式如下：

<code>beeline -u "jdbc:hive2://zookeeper1-ip:2181,zookeeper2-ip:2181,zookeeper3-ip:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2_zk;principal=hive/[email protected]"</code>

如果没有kerberos等安全认证的情况下，beeline连接hive，需要指明登陆的用户。

beeline -u "jdbc:hive2://127.0.0.1:10000/default;" -n hive

另外，Hive在执行过程中，是否会走mapreducer？

hive 0.10.0为了执行效率考虑，简单的查询，就是只是select，不带count,sum,group by这样的，都不走map/reduce，直接读取hdfs文件进行filter过滤。这样做的好处就是不新开mr任务，执行效率要提高不少，但是不好的地方就是用户界面不友好，有时候数据量大还是要等很长时间，但是又没有任何返回。

改这个很简单，在hive-site.xml里面有个配置参数叫

hive.fetch.task.conversion

将这个参数设置为more，简单查询就不走map/reduce了，设置为minimal，就任何简单select都会走map/reduce

----Update 2018.2.11-----

如果重新初始化hive的mysql库，需要先登陆mysql，drop原有的库，不然会遇到下面错误

-# su - hive

[hive@aznbhivel01 ~]$ schematool -initSchema -dbType mysql

Metastore connection URL: jdbc:mysql://10.24.101.88:3306/hive_beta?useUnicode=true&characterEncoding=UTF-8&createDatabaseIfNotExist=true

Metastore Connection Driver : com.mysql.jdbc.Driver

Metastore connection User: envision

Starting metastore schema initialization to 1.2.0

Initialization script hive-schema-1.2.0.mysql.sql

Error: Specified key was too long; max key length is 3072 bytes (state=42000,code=1071)

org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !!

schemaTool failed

删除原有hive库之后，再次初始化，就直接OK了

Initialization script completed

schemaTool completed

Hive的启动与关闭：

1.启动metastore

nohup /usr/local/hadoop/hive-release/bin/hive --service metastore --hiveconf hive.log4j.file=/usr/local/hadoop/hive-release/conf/meta-log4j.properties > /data1/hiveLogs-security/metastore.log 2>&1 &

2.启动hiveserver2

nohup /usr/local/hadoop/hive-release/bin/hive --service hiveserver2 > /data1/hiveLogs-security/hiveserver2.log 2>&1 &

3.关闭HiveServer2

<code>kill -9</code>ps ax --cols 2000 | grep java | grep HiveServer2 | grep -v 'ps ax' | awk '{print $1;}'``

4.关闭metastore

<code>kill -9</code>ps ax --cols 2000 | grep java | grep MetaStore | grep -v 'ps ax' | awk '{print $1;}'``

本文转自 hsbxxl 51CTO博客，原文链接：http://blog.51cto.com/hsbxxl/2054028，如需转载请自行联系原作者

Hadoop集群(五) Hive安装

继续阅读

Apache2.4.x 配置文件详解Apache配置需要了解如下：开始讲解：

配置apache支持PHP（win7）

27 Best Free Eclipse Plug-ins for Java Developer to be ProductiveCode Quality PluginsText Editor PluginsDependency ManagementVersion Control Integration PluginsFramework Development Continuous Integration Related PluginsOther Utility Plugins

Java String.format方法的简单使用

neo4j之cypher使用文档

GitHub连夜封杀！这份阿里 10W 字内部 Java 字面试手册到底有多强？

spark/scala关于【资源文件】加载方法概述外部文件加载方案测试资源文件打包入jar包中小结

mybatis_入门程序Mybatis入门

AOP编程_Android优雅权限框架(1)概念基础，2021金三银四前言正文大纲正文

sqlServer根据经纬查距离

Effective Java 8:通用程序设计

OOM三种类型

工厂模式-三种类型

【递归】高效率求2的n次幂

win10本地scala和spark安装安装scala安装spark

scala (3) Function 和 Method