天天看点

ganglia监控Hadoop完整部署

安装Ganglia所在集群的环境:

linux版本:

[[email protected] hadoop]# lsb_release -a

LSB Version:    :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch

Distributor ID: CentOS

Description:    CentOS Linux release 7.3.1611 (Core)

Release:        7.3.1611

Codename:       Core

命令不能用:

[[email protected] hadoop]# lsb_release -a

bash: lsb_release: 未找到命令...

1

2

安装命令

[[email protected] hadoop]# yum install lsb

1

hadoop版本:

[[email protected] hadoop]# hadoop version

Hadoop 2.7.2

Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r b165c4fe8a74265c792ce23f546c64604acf0e41

Compiled by jenkins on 2016-01-26T00:08Z

Compiled with protoc 2.5.0

From source with checksum d0fda26633fa762bff87ec759ebe689c

This command was run using /usr/local/hadoop/hadoop-2.7.2/share/hadoop/common/hadoop-common-2.7.2.jar

测试集群:有两个网卡,内网和外网,确保外网的网卡启动

网卡启动命令:ifup 网卡名

确保所有的机器都能够上网

(解释不正确的请在评论区咱们一起探讨,谢谢~)

安装场景:

服务器1 (master):安装gmond,gmetad,和web

服务器2 (slave1):仅安装gmond

服务器3 (slave2):仅安装gmond

服务器n (slaven):仅安装gmond

本次试验集群情况

安装流程:

首先要为每台机器安装EPEL:是yum的一个软件源,里面包含了许多基本源里没有的软件,不安装会找不到包 。

方式1:

wget http://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm

rpm -vih epel-release-latest-7.noarch.rpm

方式2:

rpm -Uvh http://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm

方式3

yum install epel-release

第一步:Linux开启安装EPEL YUM源

机器cloud0

[[email protected] hadoop]# wget http://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm

[[email protected] hadoop]# rpm  -vih epel-release-latest-7.noarch.rpm

机器cloud2

[[email protected] hadoop]# wget http://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm

[[email protected] hadoop]# rpm  -vih epel-release-latest-7.noarch.rpm

机器cloud3

[[email protected] hadoop]# wget http://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm

[[email protected] hadoop]# rpm  -vih epel-release-latest-7.noarch.rpm

机器cloud4

[[email protected] hadoop]# wget http://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm

[[email protected] hadoop]# rpm  -vih epel-release-latest-7.noarch.rpm

检查安装epel源是否成功:

源标识下有epel/x86_64即为成功

[[email protected] ~]# yum repolist

第二步:安装Ganglia ,root权限下操作

a: 机器cloud0

[[email protected] ~]# yum -y install ganglia-gmetad

[[email protected] ~]# yum -y install ganglia-web

b: 机器cloud0 、机器cloud2 、机器cloud3、机器cloud4

[[email protected] ~]# yum -y install ganglia-gmond

[[email protected] ~]# yum -y install ganglia-gmond

[[email protected] ~]# yum -y install ganglia-gmond

[[email protected] ~]# yum -y install ganglia-gmond

第三步:配置

文件说明:

gmetad.conf 配置监控哪些机器的文件。

gmond.conf 配置受监控机器文件

a: 对机器 cloud0 上的文件操作

[[email protected] ~]# vim /etc/ganglia/gmetad.conf

#修改

# data_source "my cluster" localhost

#为

data_source "MyCluster_TEST" cloud0 cloud2 cloud3 cloud4

[[email protected] ~]# vim /etc/httpd/conf.d/ganglia.conf

  7 <Location /ganglia>

  8

  9   Order deny,allow

 10   # Deny from all 注释这一行

 11   Allow from all

 12   

 13   # Require local

 14   # Require ip 10.1.2.3

 15   # Require host example.org

 16 </Location>

Apache的配置文件 httpd.conf

修改如下:

[[email protected] ~]# vim /etc/httpd/conf/httpd.conf

    102 <Directory />

    103     #AllowOverride none

    104     #Require all denied

    105

    106    Options FollowSymLinks

    107    AllowOverride None

    108    Order deny,allow

    109    allow from all

    110

    111 </Directory>

注: 不修改的话,用web页面查看图形化界面时,会报错:403 没有权限访问

b: 对机器cloud0 、机器cloud2 、机器cloud3、机器cloud4 上的文件操作

[[email protected] ~]# vi /etc/ganglia/gmond.conf

cluster {

  name = "MyCluster_TEST"

  owner = "unspecified"

  latlong = "unspecified"

  url = "unspecified"

}

注: 此时的 name 要和 文件/etc/ganglia/gmetad.conf中配置的data_source 中相同

第四步: 启动服务并设置开机启动

a: 机器cloud0 : 启动服务 gmetad gmond apache

[[email protected] ~]# service gmetad start  

[[email protected] ~]# service gmond  start  

[[email protected] ~]# service httpd  start  

查看是否启动成功

[[email protected] ~]# service gmetad status

...active (running) ...

[[email protected] ~]# service gmond status

...active (running)...

[[email protected] ~]# service httpd status

... active (running) ...

设置开机启动

[[email protected] ~]# chkconfig gmetad on

[[email protected] ~]# chkconfig gmond on

[[email protected] ~]# systemctl enable httpd.service  

b: 机器cloud2 、机器cloud3、机器cloud4

[[email protected] ~]# service gmond  start

[[email protected] ~]# service gmond  start

[[email protected] ~]# service gmond  start

[[email protected] ~]# chkconfig gmond on

[[email protected] ~]# chkconfig gmond on

[[email protected] ~]# chkconfig gmond on

最后通过网址访问:http://service_ip/ganglia

一些注意问题:

1、gmetad收集到的信息被放到/var/lib/ganglia/rrds/

2、可以通过以下命令检查是否有数据在传输

tcpdump port 8649  

Hadoop配置:

三、配置hadoop与hbase

1、配置hadoop

hadoop-metrics2.properties

[plain] view plain copy

1.    # syntax: [prefix].[source|sink|jmx].[instance].[options]  

2.    # See package.html for org.apache.hadoop.metrics2 for details  

3.      

4.    *.sink.file.class=org.apache.hadoop.metrics2.sink.FileSink  

5.      

6.    #namenode.sink.file.filename=namenode-metrics.out  

7.      

8.    #datanode.sink.file.filename=datanode-metrics.out  

9.      

10.    #jobtracker.sink.file.filename=jobtracker-metrics.out  

11.      

12.    #tasktracker.sink.file.filename=tasktracker-metrics.out  

13.      

14.    #maptask.sink.file.filename=maptask-metrics.out  

15.      

16.    #reducetask.sink.file.filename=reducetask-metrics.out  

17.    # Below are for sending metrics to Ganglia  

18.    #  

19.    # for Ganglia 3.0 support  

20.    # *.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink30  

21.    #  

22.    # for Ganglia 3.1 support  

23.    *.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31  

24.      

25.    *.sink.ganglia.period=10  

26.      

27.    # default for supportsparse is false  

28.    *.sink.ganglia.supportsparse=true  

29.      

30.    *.sink.ganglia.slope=jvm.metrics.gcCount=zero,jvm.metrics.memHeapUsedM=both  

31.    *.sink.ganglia.dmax=jvm.metrics.threadsBlocked=70,jvm.metrics.memHeapUsedM=40  

32.    menode.sink.ganglia.servers=78.79.12.9:8649  

33.      

34.    datanode.sink.ganglia.servers=78.79.12.9:8649  

35.      

36.    jobtracker.sink.ganglia.servers=78.79.12.9:8649  

37.    tasktracker.sink.ganglia.servers=78.79.12.9:8649  

38.      

39.    maptask.sink.ganglia.servers=78.79.12.9:8649  

40.      

41.    reducetask.sink.ganglia.servers=78.79.12.9:8649  

2、配置hbase

hadoop-metrics.properties

[plain] view plain copy

1.    # See http://wiki.apache.org/hadoop/GangliaMetrics  

2.    # Make sure you know whether you are using ganglia 3.0 or 3.1.  

3.    # If 3.1, you will have to patch your hadoop instance with HADOOP-4675  

4.    # And, yes, this file is named hadoop-metrics.properties rather than  

5.    # hbase-metrics.properties because we're leveraging the hadoop metrics  

6.    # package and hadoop-metrics.properties is an hardcoded-name, at least  

7.    # for the moment.  

8.    #  

9.    # See also http://hadoop.apache.org/hbase/docs/current/metrics.html  

10.    # GMETADHOST_IP is the hostname (or) IP address of the server on which the ganglia   

11.    # meta daemon (gmetad) service is running  

12.      

13.    # Configuration of the "hbase" context for NullContextWithUpdateThread  

14.    # NullContextWithUpdateThread is a  null context which has a thread calling  

15.    # periodically when monitoring is started. This keeps the data sampled  

16.    # correctly.  

17.    hbase.class=org.apache.hadoop.metrics.spi.NullContextWithUpdateThread  

18.    hbase.period=10  

19.      

20.    # Configuration of the "hbase" context for file  

21.    # hbase.class=org.apache.hadoop.hbase.metrics.file.TimeStampingFileContext  

22.    # hbase.fileName=/tmp/metrics_hbase.log  

23.      

24.    # HBase-specific configuration to reset long-running stats (e.g. compactions)  

25.    # If this variable is left out, then the default is no expiration.  

26.    hbase.extendedperiod = 3600  

27.      

28.    # Configuration of the "hbase" context for ganglia  

29.    # Pick one: Ganglia 3.0 (former) or Ganglia 3.1 (latter)  

30.    # hbase.class=org.apache.hadoop.metrics.ganglia.GangliaContext  

31.    hbase.class=org.apache.hadoop.metrics.ganglia.GangliaContext31  

32.    hbase.period=10  

33.    hbase.servers=10.171.29.191:8649  

34.      

35.    # Configuration of the "jvm" context for null  

36.    jvm.class=org.apache.hadoop.metrics.spi.NullContextWithUpdateThread  

37.    jvm.period=10  

38.      

39.    # Configuration of the "jvm" context for file  

40.    # jvm.class=org.apache.hadoop.hbase.metrics.file.TimeStampingFileContext  

41.    # jvm.fileName=/tmp/metrics_jvm.log  

42.      

43.    # Configuration of the "jvm" context for ganglia  

44.    # Pick one: Ganglia 3.0 (former) or Ganglia 3.1 (latter)  

45.    # jvm.class=org.apache.hadoop.metrics.ganglia.GangliaContext  

46.    jvm.class=org.apache.hadoop.metrics.ganglia.GangliaContext31  

47.    jvm.period=10  

48.    jvm.servers=10.171.29.191:8649  

49.      

50.    # Configuration of the "rpc" context for null  

51.    rpc.class=org.apache.hadoop.metrics.spi.NullContextWithUpdateThread  

52.    rpc.period=10  

53.      

54.    # Configuration of the "rpc" context for file  

55.    # rpc.class=org.apache.hadoop.hbase.metrics.file.TimeStampingFileContext  

56.    # rpc.fileName=/tmp/metrics_rpc.log  

57.      

58.    # Configuration of the "rpc" context for ganglia  

59.    # Pick one: Ganglia 3.0 (former) or Ganglia 3.1 (latter)  

60.    # rpc.class=org.apache.hadoop.metrics.ganglia.GangliaContext  

61.    rpc.class=org.apache.hadoop.metrics.ganglia.GangliaContext31  

62.    rpc.period=10  

63.    rpc.servers=10.171.29.191:8649  

64.      

65.    # Configuration of the "rest" context for ganglia  

66.    # Pick one: Ganglia 3.0 (former) or Ganglia 3.1 (latter)  

67.    # rest.class=org.apache.hadoop.metrics.ganglia.GangliaContext  

68.    rest.class=org.apache.hadoop.metrics.ganglia.GangliaContext31  

69.    rest.period=10  

70.    rest.servers=10.171.29.191:8649  

重启hadoop与hbase。

 监控Hadoop集群

修改Hadoop的配置文件/etc/hadoop/hadoop-metrics.properties,根据文件中的说明,修改三处:

dfs.class=org.apache.hadoop.metrics.ganglia.GangliaContext31

dfs.period=30

dfs.servers=192.168.52.105:8649

mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContext31

mapred.period=30

mapred.servers=192.168.52.105:8649

jvm.class=org.apache.hadoop.metrics.ganglia.GangliaContext

jvm.class=org.apache.hadoop.metrics.ganglia.GangliaContext31

jvm.period=30

jvm.servers=192.168.52.105:8649

所有的servers都修改为安装为gmetad的机器IP。

重启Hadoop datanode:service hadoop-datanode restart

重启gmond:/usr/sbin/gmond restart