天天看点

sqoop导出orc数据至mysql_请问sqoop 如何把sqlserver 数据导入 hive,且用orc格式存储...

1.环境

java version "1.8.0_65"

hadoop 2.7.2

hive 2.0

sqoop 1.4.6

2.所需软件包

①.sqljdbc_3.0.1301.101_chs.tar  //下载地址:http://www.microsoft.com/en-us/d ... ang=en&id=21599

②.sqoop-sqlserver-1.0.tar   //改软件包微软已经不提供下载,CSDN有收费

3.软件安装

①.$tar -xvf ljdbc_3.0.1301.101_chs.tar  将解压后拷贝sqljdbc4.jar到${SQOOP_HOME}/lib目录下

②.解压qoop-sqlserver-1.0.tar ,配置环境变量MSSQL_CONNECTOR_HOME="/soft/sqoop-sqlserver-1.0",进入,执行 sh install.sh,下述为成功例子:

[email protected]:/soft/sqoop-sqlserver-1.0$ sh install.sh

Starting 'Microsoft SQL Server - Hadoop' Connector installation ...

SQOOP_HOME set to: /soft/sqoop

SQOOP_CONF_DIR set to: /soft/sqoop/conf

Installing 'SQL Server - Hadoop' Connector library ...

Installed library

Installing 'SQL Server - Hadoop' Connector configuration ...

Installed configuration

'SQL Server - Hadoop' Connector Installation completed successfully.

4.这时准备工作已经完成,已经可以从SQL server将数据load过来,需要注意的是 --connect ''

①.单个表import

sqoop import \

--connect 'jdbc:sqlserver://192.168.88.125; username=sa;password=123;database=testgold' \

--table da_jxc_daysum \

--fields-terminated-by "," \

--lines-terminated-by "\n" \

--target-dir /user/ubuntu/sql/da_jxc_daysum \

-m 1

②.将SQL server表导入到hive里

sqoop import  \

--connect 'jdbc:sqlserver://192.168.88.125; username=sa;password=123;database=testgold' \

--table da_jxc_daysum \

--target-dir /user/sql/da_jxc_daysum \

-m 1 \

--fields-terminated-by "," \

--lines-terminated-by "\n" \

--hive-import \

--hive-overwrite \

--create-hive-table \

--hive-table da_jxc_daysum \

--delete-target-dir

5.增量导入

①.--check-column (col)        用来作为判断的列名,如id

②.--incremental (mode)        append:追加,比如对大于last-value指定的值之后的记录进行追加导入。lastmodified:最后的修改时间,追加last-value指定的日期之后的记录

③.--last-value (value)        指定自从上次导入后列的最大值(大于该指定的值),也可以自己设定某一值

以下是个以时间为增量的例子:

sqoop job --create sqoop_jop1 -- import  --connect 'jdbc:sqlserver://192.168.88.125; username=sa;password=123;database=testgold'  --table da_jxc_daysum  --target-dir  /user/hive/warehouse/my01 --append   --check-column "modify_date"  --incremental lastmodified --last-value "2016-07-19 00:00:00.001" -m 1