天天看點

sqoop搭建與使用

寫在前面:

安裝sqoop的前提是已經具備​

​Java​

​​和​

​Hadoop、Zookeeper、MySQL​

​​的環境,如何往​

​Hive​

​​和​

​HBase​

​​導入資料,應具備相關​

​Hive、HBase​

​​環境。

(1)将sqoop-1.4.6-cdh5.14.2.tar.gz壓縮包放到/opt/software/目錄下

[root@nodefour ~]# cd /opt/software/
[root@nodefour software]# ll
總用量 1684240
-rw-r--r-- 1 root root 433895552 12月  8 14:59 hadoop-2.6.0-cdh5.14.2.tar.gz
-rw-r--r-- 1 root root 267038262 12月 17 09:10 hbase-1.2.0-cdh5.14.2.tar.gz
-rw-r--r-- 1 root root        91 12月 14 15:08 jd.txt
-rw-r--r-- 1 root root  30742669 12月 21 11:25 sqoop-1.4.6-cdh5.14.2.tar.gz
-rw-r--r-- 1 root root 992975720 12月 12 17:32 zeppelin-0.8.1-bin-all.tgz      

(2)解壓

[root@nodefour software]# tar -zxf sqoop-1.4.6-cdh5.14.2.tar.gz  -C /opt/install/      

(3)添加軟連接配接

[root@nodefour software]# ln -s /opt/install/sqoop-1.4.6-cdh5.14.2/ /opt/install/sqoop      

(4)配置環境變量

[root@nodefour software]# vi /etc/profile      

配置内容如下

export SQOOP_HOME=/opt/install/sqoop
export PATH=$SQOOP_HOME/bin:$PATH      

使配置生效

[root@nodefour software]# source /etc/profile      

(5)在conf目錄下修改配置檔案

[root@nodefour software]# cd /opt/install/sqoop
[root@nodefour sqoop]# cd conf/      

重命名配置檔案

[root@nodefour conf]# mv sqoop-env-template.sh sqoop-env.sh
[root@nodefour conf]# ll
總用量 28
-rw-rw-r-- 1 root root 3895 3月  28 2018 oraoop-site-template.xml
-rwxr-xr-x 1 root root 1345 3月  28 2018 sqoop-env.sh
-rw-rw-r-- 1 root root 1404 3月  28 2018 sqoop-env-template.cmd
-rw-rw-r-- 1 root root 6044 3月  28 2018 sqoop-site-template.xml
-rw-rw-r-- 1 root root 6044 3月  28 2018 sqoop-site.xml      

配置sqoop-env.sh檔案

[root@nodefour conf]# vi sqoop-env.sh      
export HADOOP_COMMON_HOME=/opt/install/hadoop
export HADOOP_MAPRED_HOME=/opt/install/hadoop
export HIVE_HOME=/opt/install/hive
export ZOOKEEPER_HOME=/opt/install/zookeeper
export ZOOCFGDIR=/opt/install/zookeeper
export HBASE_HOME=/opt/install/hbase      

(6)拷貝JDBC驅動

拷貝jdbc驅動到sqoop的lib目錄下

[hadoop@nodefour software] cp mysql-connector-java-5.1.44-bin.jar /opt/install/sqoop/lib/      

(7)通過以下指令可以檢視sqoop的安裝情況

[root@nodefour conf]# sqoop help
Warning: /opt/intall/hive does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /opt/install/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /opt/install/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
20/12/21 11:37:14 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.14.2
usage: sqoop COMMAND [ARGS]

Available commands:
  codegen            Generate code to interact with database records
  create-hive-table  Import a table definition into Hive
  eval               Evaluate a SQL statement and display the results
  export             Export an HDFS directory to a database table
  help               List available commands
  import             Import a table from a database to HDFS
  import-all-tables  Import tables from a database to HDFS
  import-mainframe   Import datasets from a mainframe server to HDFS
  job                Work with saved jobs
  list-databases     List available databases on a server
  list-tables        List available tables in a database
  merge              Merge results of incremental imports
  metastore          Run a standalone Sqoop metastore
  version            Display version information

See 'sqoop help COMMAND' for information on a specific command.      

二、RDB到HDFS的資料遷移

(1)測試連接配接

使用下列指令測試連接配接mysql,記得把端口号和密碼修改成自己的

[root@nodefour conf]# sqoop list-databases --connect jdbc:mysql://192.168.202.204:3306/ --username root --password 123QWEasd!
Warning: /opt/intall/hive does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /opt/install/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /opt/install/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
20/12/21 11:38:04 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.14.2
20/12/21 11:38:04 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
20/12/21 11:38:04 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
information_schema
hive
mysql
performance_schema
test      

如上,出現資料庫資訊即為連接配接成功。

(2)資料庫建表

1.在mysql資料庫中建立資料庫retail_db

mysql -uroot -p123QWEasd!
create database retail_db;      

2.使用retail_db.sql檔案建表

mysql> use retail_db;
Database changed
mysql> source /opt/datas/retail_db.sql      

3.檢視建表情況

mysql> show tables;
+---------------------+
| Tables_in_retail_db |
+---------------------+
| categories          |
| customers           |
| departments         |
| order_items         |
| orders              |
| products            |
+---------------------+
6 rows in set (0.00 sec)      

(3)使用sqoop将mysql資料庫表customers上傳到hdfs

sqoop import \
--connect jdbc:mysql://192.168.202.204:3306/retail_db \
--driver com.mysql.jdbc.Driver \
--table customers \
--username root \
--password 123QWEasd! \
--target-dir /data/retail_db/customers \
--delete-target-dir \
-m 3      

在hdfs中檢視

可能報的錯誤

Exception in thread "main" java.lang.NoClassDefFoundError: org/json/JSONObject
  at org.apache.sqoop.util.SqoopJsonUtil.getJsonStringforMap(SqoopJsonUtil.java:43)
  at org.apache.sqoop.SqoopOptions.writeProperties(SqoopOptions.java:784)
  at org.apache.sqoop.mapreduce.JobBase.putSqoopOptionsToConfiguration(JobBase.java:392)
  at org.apache.sqoop.mapreduce.JobBase.createJob(JobBase.java:378)
  at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:256)
  at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:692)
  at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:513)
  at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:621)
  at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
  at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
  at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
  at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
  at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
Caused by: java.lang.ClassNotFoundException: org.json.JSONObject
  at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
  ... 14 more