天天看点

docker 安装hadoop,hive,spark,hbase

0:网络和主机规划

docker network create --subnet=172.18.0.0/16 mynetwork

主机规划

"172.18.0.30 master" 

"172.18.0.31 slave1" 

"172.18.0.32 slave2" 

1:安装基础环境

docker pull ubuntu:16.04

docker run -it  ubuntu:16.04 /bin/bash

apt-get  安装ssh服务  mysql   open-jdk 8

确定java HOME:

[email protected]:/# ls -lrt /usr/bin/java

lrwxrwxrwx 1 root root 22 Jun 23 08:28 /usr/bin/java -> /etc/alternatives/java

[email protected]:/# ls -lrt /etc/alternatives/java

lrwxrwxrwx 1 root root 46 Jun 23 08:28 /etc/alternatives/java -> /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java

[email protected]:/# 

JAVA home 为:/usr/lib/jvm/java-8-openjdk-amd64

2:下载大数据安装包

wget http://archive.apache.org/dist/hadoop/common/hadoop-2.7.2/hadoop-2.7.2.tar.gz

wget http://archive.apache.org/dist/hive/hive-2.1.0/apache-hive-2.1.0-bin.tar.gz

wget http://archive.apache.org/dist/hbase/1.2.4/hbase-1.2.4-bin.tar.gz

wget  http://archive.apache.org/dist/spark/spark-2.1.0/spark-2.1.0-bin-hadoop2.7.tgz

wget http://downloads.lightbend.com/scala/2.12.1/scala-2.12.1.tgz

解压到 /opt/tools 目录下;

建立五个软连接

140  ln -s hbase-1.2.4 hbase

  141  ln -s hadoop-2.7.2 hadoop

  142  ln -s apache-hive-2.1.0-bin hive

  143  ln -s spark-2.1.0-bin-hadoop2.7 spark

  144  ln -s scala-2.12.1 scala

修改环境变量文件,最后一行增加:

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

export HADOOP_PREFIX=/opt/tools/hadoop

export HADOOP_COMMON_HOME=/opt/tools/hadoop

export HADOOP_HDFS_HOME=/opt/tools/hadoop

export HADOOP_MAPRED_HOME=/opt/tools/hadoop

export HADOOP_YARN_HOME=/opt/tools/hadoop

export HADOOP_CONF_DIR=/opt/tools/hadoop/etc/hadoop

export YARN_CONF_DIR=$HADOOP_PREFIX/etc/hadoop

export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_PREFIX/bin:$HADOOP_PREFIX/sbin

export SCALA_HOME=/opt/tools/scala

export PATH=${SCALA_HOME}/bin:$PATH

export SPARK_HOME=/opt/tools/spark

export PATH="$SPARK_HOME/bin:$PATH"

export HIVE_HOME=/opt/tools/hive

export PATH=$PATH:$HIVE_HOME/bin

export HBASE_HOME=/opt/tools/hbase

export PATH=$PATH:$HBASE_HOME/bin

/etc/init.d/ssh start

/etc/init.d/mysql start

echo "172.18.0.30 master" >> /etc/hosts

echo "172.18.0.31 slave1" >> /etc/hosts

echo "172.18.0.32 slave2" >> /etc/hosts  

3:打基础大数据image

docker commit 8eb631a1a734 wx-bigdata-base

4:运行三个容器

docker run -i -t --name master -h master --net mynetwork  --ip 172.18.0.30    wx-bigdata-base   /bin/bash

docker run -i -t --name slave1 -h slave1 --net mynetwork  --ip 172.18.0.31    wx-bigdata-base   /bin/bash

docker run -i -t --name slave2 -h slave2 --net mynetwork  --ip 172.18.0.32    wx-bigdata-base   /bin/bash

打通三台机器的免ssh 登录。

方法:ssh-keygen -t rsa,一路会车,生成/root/.ssh/id_rsa.pub ;

三台机器做同样操作。

在三台机器上执行cat /root/.ssh/id_rsa.pub   ,把三个输出结果合并,写入到 /root/.ssh/authorized_keys 

最终每台机器的authorized_keys  文件为:

ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDAyxS5rhm5etpm1eOSdBfaVKmRQPMI2TgY3PsUMWe1qo1NQAdNkpObSVN4Gq4HHso7SXLccd5Crb64fGrqYX9+jBVk3uUSQoKn8eoFtmnBU5Zpq7mRvGkctsubMa/EOh7DsjUWplo//p9+txvB45cvjwr8GSeBVPoTSyzRggleuERVVhRzDSXdg/z892JNoHukhGUrhOhtBnVemIV0wUlEoWFiuLJmJBo6Gj1yV7xJ5LDtWJ41XgkosKlKbEp8bc+w0e6NYN5k/DzaDtwfVc6utGE/7/mFs4gpWGzY0wRqP89QRnmlOYGm32v1I8+oXNqAmxfPKiWQdZ89jgZUS5RB [email protected]

ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCbHbwIO5zzzNBJX25rbIdUI0+fqA3YJIhcgqbY2cQxSfa1dK20Uy/JD3ZTlffajEJ20qrs3yDpzfRHP8E+0dyPET3CV7I7onzCy8eBOQSaYBqtWXiEvwzE8iOD4aJJ4ZA3G8dhE8jlSFphO62PoqblEpIfWgFS1WkLEmNMrqgyEUCwiwzxySs6StBQF1vQ4TT2rcG5+qXWOuKjeOjscekstA2DrYNBY8zOEP/kNF4tUPf7mf2uiMJCHg+keXP9b0aCDMvVqakMx4PJW36NYISQiKf6yvSt1RFTGY+SYMG2d4Ysx58iNTrk7ber2qwDBghgtcJhr2VvZbLC9xv2w4WN [email protected]

ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDg8FVvLhkeT1/xMA/fTbzk9k0cf+5AX514z9Pw8A78ofWDir65eMJBEqLTSX87ynTvtg2BEN4Ht+SlS7ZUrzW3wbUPZw9T045GbiFzSRdwzCAuyXUWAFa+pY3Pi4MJhL1zjwkfX8WzRlUM+a5PSJ+B3i/JnoKMUin0HmjQ1XxIwMeG66b7pxXRAs/9SVY7k+f0zACJzTBN3eD9tKEpujrJmjlOYLg4M17NssGNK9vE5nAkCCv86GCRixyS8FNAxh0a8GsezUjimT1XRWokw9FSZdDuAamVCREZ3j6LuveCx58XzoM8UQ6u4KtObeWOPbJCotxyKR5SdFEgsSjrOJYP [email protected] 

5:安装和运行

修改master 的hadoop hbase spark hive 的配置文件。

scp 拷贝hadoop hbase spark hive 配置文件到另外2台主机。

hadoop 格式化文件空间( hdfs namenode -format )。

在master 上分别启动hadoop,hbase,spark;检查另外2个机器上是否启动正常。

master 上 安装hive:schematool -dbType mysql -initSchema

hadoop 启动方式:

/opt/tools/hadoop/sbin# ./start-all.sh  

停止方式:

/opt/tools/hadoop/sbin# ./stop-all.sh  

hbase 启动方式:

/opt/tools/hbase/bin/start-hbase.sh

停止方式:

/opt/tools/hbase/bin/stop-hbase.sh

spark 启动方式:

/opt/tools/spark/sbin/start-all.sh

停止方式:

/opt/tools/spark/sbin/stop-all.sh

6:配置文件

6.1 hadoop

目录:/opt/tools/hadoop/etc/hadoop

core-site.xml 文件

<?xml version="1.0" encoding="utf-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" ?>

<!--

  Licensed under the Apache License, Version 2.0 (the "License");

  you may not use this file except in compliance with the License.

  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software

  distributed under the License is distributed on an "AS IS" BASIS,

  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

  See the License for the specific language governing permissions and

  limitations under the License. See accompanying LICENSE file.

-->

<!-- Put site-specific property overrides in this file. -->

<configuration>

  <property>

    <name>fs.defaultFS</name>

    <value>hdfs://master:9000</value>

  </property>

  <property>

    <name>hadoop.tmp.dir</name>

    <value>file:/data/hadoop/tmp</value>

  </property>

</configuration>

hdfs-site.xml 文件

<?xml version="1.0" encoding="utf-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" ?>

<!--

  Licensed under the Apache License, Version 2.0 (the "License");

  you may not use this file except in compliance with the License.

  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software

  distributed under the License is distributed on an "AS IS" BASIS,

  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

  See the License for the specific language governing permissions and

  limitations under the License. See accompanying LICENSE file.

-->

<!-- Put site-specific property overrides in this file. -->

<configuration>

  <property>

    <name>dfs.replication</name>

    <value>1</value>

  </property>

  <property>

    <name>dfs.datanode.data.dir</name>

    <value>file:/data/hadoop/data</value>

  </property>

  <property>

    <name>dfs.namenode.name.dir</name>

    <value>file:/data/hadoop/name</value>

  </property>

  <property>

    <name>dfs.namenode.secondary.http-address</name>

    <value>master:9001</value>

  </property>

</configuration>

mapred-site.xml 文件

<?xml version="1.0" encoding="utf-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" ?>

<!--

  Licensed under the Apache License, Version 2.0 (the "License");

  you may not use this file except in compliance with the License.

  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software

  distributed under the License is distributed on an "AS IS" BASIS,

  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

  See the License for the specific language governing permissions and

  limitations under the License. See accompanying LICENSE file.

-->

<!-- Put site-specific property overrides in this file. -->

<configuration>

  <property>

    <name>mapreduce.framework.name</name>

    <value>yarn</value>

  </property>

</configuration>

yarn-site.xml 文件

<?xml version="1.0"?>

<!--

  Licensed under the Apache License, Version 2.0 (the "License");

  you may not use this file except in compliance with the License.

  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software

  distributed under the License is distributed on an "AS IS" BASIS,

  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

  See the License for the specific language governing permissions and

  limitations under the License. See accompanying LICENSE file.

-->

<configuration>

<!-- Site specific YARN configuration properties -->

    <property>

        <name>yarn.nodemanager.aux-services</name>

        <value>mapreduce_shuffle</value>

    </property>

    <property>

        <name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>

        <value>org.apache.hadoop.mapred.ShuffleHandler</value>

    </property>

    <property>

        <name>yarn.resourcemanager.address</name>

        <value>master:8032</value>

    </property>

    <property>

        <name>yarn.resourcemanager.scheduler.address</name>

        <value>master:8030</value>

    </property>

    <property>

        <name>yarn.resourcemanager.resource-tracker.address</name>

        <value>master:8031</value>

    </property>

    <property>

        <name>yarn.resourcemanager.admin.address</name>

        <value>master:8033</value>

    </property>

    <property>

        <name>yarn.resourcemanager.webapp.address</name>

        <value>master:8088</value>

    </property>

</configuration>

slaves文件:

slave1

slave2

hadoop-env.sh 文件:

修改java home为:export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

6.2 hbase

目录位置:/opt/tools/hbase/conf

hbase-env.sh 文件:

修改javahome:export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

hbase-site.xml文件:

<?xml-stylesheet type="text/xsl" href="configuration.xsl" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" ?>

<configuration>

      <property>

        <name>hbase.rootdir</name>

        <value>hdfs://master:9000/hbase_db</value>

      </property>

      <property>

        <name>hbase.cluster.distributed</name>

        <value>true</value>

      </property>

      <property>

        <name>hbase.zookeeper.quorum</name> 

        <value>master,slave1,slave2</value> 

      </property>

       <property>

          <name>hbase.zookeeper.property.dataDir</name>

          <value>/opt/tools/hbase/zookeeper</value>

       </property>

</configuration>

regionservers 文件:

slave1

slave2

6.3 spark

目录位置:spark-env.sh

文件开头位置增加:

export SCALA_HOME=/opt/tools/scala

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

export SPARK_MASTER_IP=master

export SPARK_WORKER_MEMORY=1g

export HADOOP_CONF_DIR=/opt/tools/hadoop/etc/hadoop

slaves文件:

master

slave1

slave2

6.4 hive

目录位置:/opt/tools/hive/conf

hive-env.sh 文件:

行首增加:

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

export HADOOP_HOME=/opt/tools/hadoop

export HIVE_HOME=/opt/tools/hive

export HIVE_CONF_DIR=/opt/tools/hive/conf 

hive-site.xml文件:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" ?><!--

   Licensed to the Apache Software Foundation (ASF) under one or more

   contributor license agreements.  See the NOTICE file distributed with

   this work for additional information regarding copyright ownership.

   The ASF licenses this file to You under the Apache License, Version 2.0

   (the "License"); you may not use this file except in compliance with

   the License.  You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software

   distributed under the License is distributed on an "AS IS" BASIS,

   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

   See the License for the specific language governing permissions and

   limitations under the License.

--><configuration>

        <property>

            <name>hive.exec.scratchdir</name>

            <value>/opt/tools/hive/tmp</value>

        </property>

        <property>

            <name>hive.metastore.warehouse.dir</name>

            <value>/opt/tools/hive/warehouse</value>

        </property>

        <property>

            <name>hive.querylog.location</name>

            <value>/opt/tools/hive/log</value>

        </property>

        <property>

    <name>javax.jdo.option.ConnectionURL</name>

    <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true&amp;characterEncoding=UTF-8&amp;useSSL=false</value>

  </property>

  <property>

    <name>javax.jdo.option.ConnectionDriverName</name>

    <value>com.mysql.jdbc.Driver</value>

  </property>

  <property>

    <name>javax.jdo.option.ConnectionUserName</name>

    <value>root</value>

  </property>

  <property>

    <name>javax.jdo.option.ConnectionPassword</name>

    <value>root</value>

  </property>

</configuration>

继续阅读