一、相關概念:
cacti(監控工具;收集資料,根據資料繪圖,如收集到CPU load:0.8 1.2等是具體的資料,做聚合後繪圖;thold插件實作報警功能)
nagios:
監控工具;
監控對象(主機、服務|資源、聯系人、時段、指令)
nagios對監控對象的監控有四種狀态,隻取狀态值(OK、CRITICAL、WARNING、UNKNOWN),不論數值是多少,隻取狀态值,例如将監控對象CPU使用率定義好,在90%定義為CRITICAL,80%時為WARNING,其它數值則OK,監測不到時為UNKNOWN,不論監控對象是什麼,隻取監控的四種狀态,簡化使得管理者隻關心監控對象是否正常,而不管目前的值是多少,更重要的是nagios在這樣分析的結果之上提供了功能非常強大的報警系統,而cacti中是用thold插件實作報警能力,它與nagios比報警能力差太遠了
cacti和nagios的着眼點不同,cacti收集資料繪圖、展示走勢;nagios分析監控結果,傳回四種狀态的某一種,并在狀态危急時啟動強大的報警機制給管理者發送通知,到現今nagios被廣泛采用,已成為工業标準,強大到nagios本身是高度插件式的,nagios core不做任何監控工作,隻是支援監控本身的工作運作,可将nagios core了解為nagios的工作平台,所有的監控功能都通過插件實作,nagios有一堆的plugins,可用官方提供,使用者自己也可開發,plugin每次檢測主機資源通過分析四種狀态中的一種,nagios core取回nagios plugin傳回的狀态值來判斷接下來處理的動作,高度插件化使得nagios整個工作機制和配置過程極具靈活性(越靈活複雜度越高)
nagios的整個工作過程是靠幾種監控對象實作的:
主機--主機組(主機是一種對象,主機組也是一種對象)
服務|資源—服務組(服務和資源都統一稱為服務)
聯系人—聯系人組(nagios的重大功能是一旦出問題報警,要能聯系到誰,将通知發給誰,發給哪一組人)
時段timeperiod(定義對主機服務的監控時間段,聯系人在什麼時間段可接受通知,如server政策白天一定要正常,若不正常要能接到通知,晚上不正常則無所謂,就沒必要接到通知)
指令command(非常重要的對象,nagios通過plugin監控主機或服務,簡單來講plugin就是一堆script,這個script本身對哪些對象進行監控,如對linux主機或win主機的監控方式不一樣,對于httpd和nginx的監控方式也不一樣,盡管都是web service,對于不同對象的監控通常使用特定的script來實作,script要應用到特定的對象上去,就算是同一個script對于不同的監控對象接受的參數、使用的方式都有可能不同(例如某一主機同時線上500個使用者認為OK,1000個則WARNING,1500就是CRITICAL,而另一主機性能差線上100個OK,200個就WARNING了,500個就CRITICAL),command就是将插件揉合進定義好的指令模闆中,這個模闆可以應用到某個或某些監控對象上,以實作具體的監控)
這些監控對象彼此間有緊密的聯系(非常複雜),如主機要有聯系人(出現故障給誰通知),在哪個時段可發送給指定聯系人,監控使用什麼指令,對象之間有時需要互相引用,每一個監控對象,主機|服務|資源,都要定義出來,以主機為例給它起個名字,給出描述資訊,使用什麼指令監控,出現什麼樣的問題發送通知,是WARNING就告知還是CRITICAL才告知,還要說明發送通知給誰,在什麼時候發送通知等
nagios支援模闆進行配置(有時需要定義N個主機,若這N個主機都是linux-server,這些server除名字和描述資訊不同之外,其它的要監控的内容都可以相同,對于多個監控對象,如果有很多屬性相同時可使用template(對象模闆)、聯系人模闆、主機|服務都可使用模闆,在定義對象時直接套用模闆,在模闆中繼承一些屬性,再定義一些獨有的屬性即可
nagios要完成監控工作要定義對象,這些對象就是定義好的實體、并對它們加以差別
如下圖,nagios對某一監控對象進行監控,要通過某一手段擷取遠端主機相關的屬性狀态資訊,cacti基于SNMP工作,nagios也如此,nagios core不進行任何監控工作,通過各種插件來監控,插件分五類:check_by_ssh、check_nrpe、snmp、NSCA、check_xyz
<a href="http://s2.51cto.com/wyfs02/M00/7E/94/wKioL1cE3gvRbjKpAAB_OBTlMf4057.jpg" target="_blank"></a>
ssh(在遠端server(被監控端)上運作sshdaemon,被監控端要能接受監控端的ssh指令,插件将取得的結果予以分析,将分析的結果傳回給nagios core,由core決定是否報警)
nrpe(非常獨特,專用于監控linux或unix主機的機制,要在遠端server上專門安裝一nrpe程式,nrpe在被監控端運作将有監控結果,将結果傳回給監控端的nrpe,監控端的nrpe再将結果傳回給nagios core,可将這種方式了解為是C/S架構,監控端的nrpe是client,而被監控端是server-side)
snmp(在監控端每隔一段時間運作一堆snmp指令,聯系到被監控端的snmpd(161port),通過本地的插件分析将結果傳回至nagios core,snmp專用于監控那些既不支援ssh又不支援nrpe的主機,如win主機支援snmp、nrpe,但nagios并不優先使用基于snmp來監控win,而是使用NSclient++(專門在win主機上的用戶端工具,是win的WMI元件),這個工具運作起來可實作nagios與win通信并且可擷取win上資源的運作狀态,并最終傳回給nagios core)
nsca(snmp協定中有一種機制是trap,被監控端可主動通知監控端,nsca就是這麼一種被監控機制,讓nagios實作被動監控功能)
監控linux|unix有nrpe/snmp/nsca;監控win在win上安裝NSclient++;監控router/switch/printer用snmp
ssh|nrpe|NSclient|snmp|nsca有些實作專門監控主機有些實作專門監控服務,這些本身并不是監控,而具體監控是由插件來實作的,這些隻是讓插件擷取性能資料的一種手段、一種基礎,而有些服務在監控時可直接使用插件來實作而不用借助額外的任何手段
例如要監控一台linux主機:
要定義主機對象(執行個體化監控對象的過程,說明監控的是哪個主機ip位址);
要使用什麼指令來監控(要定義指令對象,定義監控這個主機使用什麼插件來監控,真正監控靠的是插件,插件能夠監控的對象有很多,可用的插件也有很多,定義好指令把插件寫裡面,用這個指令對象監控這個主機對象,建立指令的過程就是執行個體化具體化插件的過程,建立對象的過程就是執行個體化被監控對象的過程;可使用多個指令來監控主機,如有的是監控主機資源、有的監控主機服務等,它們之間未必是一對一關系);
一旦這個主機出故障應通知給誰(定義聯系人對象,聯系人對象名字、郵件、手機号,說明白通知的接收者,可使用聯系人組
監控工作什麼時候進行(要定義時段,是7*24都監控還是隻在工作日内監控,聯系人可在哪些時段接受通知,若server出現的是微小故障不是特别嚴重不必要半夜接到通知,還可定義例行維護時段不做監控)
nagios還可定義主機間的依賴關系(如router下有swith,switch下有N個主機,nagios既監控router、switch也監控這些主機,若switch故障就要發警告資訊,由于switch故障其下的主機當然不能監控到,可定義依賴如switch故障就不需要再檢測監控主機了,否則會收到一堆資訊
依賴有彼此間依賴(雙向依賴)和上下依賴(父子間依賴);如兩台host間互相依賴,那host1故障将不會收到host2的警告資訊(不監控host2),host2故障也不收host1的警告資訊(不監控host1);如既監控某主機,又監控主機上的一些服務,當這台主機挂掉時其上運作的服務就沒必要監控了
nagios強大到能分析這些依賴關系,要事先定義好
以上nagios是種監控機制,通過插件進行監控,監控狀态很簡單隻傳回4種狀态,OK、WARNING、CRITICAL、UNKNOWN
發通知要由一種狀态轉為另一種狀态才向管理者發通知(如OK-->CRITICAL);有可能這樣一種特殊情況,nagios監控某主機的一個服務,這個服務由于過于繁忙沒及時響應(監控觸發到被監控端,被監控端要消耗一些資源予以響應監控端),狀态這時為UNKNOWN
狀态有軟狀态和硬狀态之分(當監控端發現狀态發生改變,會重複多次檢測,如OK-->UNKNOWN并不會立即發通知,再重複兩次若仍為UNKNOWN就轉為硬狀态這時才通知,因為軟狀态的錯誤可能是臨時性、偶然性的
還有一種非正常狀态叫flapping(OK-->WARNING-->CRITICAL-->OK-->UNKNOWN-->OK),一旦主機處于此狀态也要發通知
nagios提供了web接口(依賴php),像cacti那樣展示出來(不但展示還發告警通知),要使用web接口則要裝httpd,nagios的web server也要依賴于php,它也是一堆php script,在某些情況下要用到mysql(狀态資料并不需要儲存在mysql中,除非使用别的工具時),編譯安裝nagios時要裝mysql,要監控mysql server時要調用mysql的頭檔案、庫檔案
nagios通常由一個主程式nagios(或叫nagioscore),一個插件程式(nagios-plugins)和四個可選的附件addon(NRPE、NSCA、NSclient++、NDOUtils)組成
注:NDOUtils用來将nagios的配置資訊和各事件産生的資料存入資料庫,以實作這些資料的快速檢索和處理,可了解為是broker掮客,它能阻斷nagios core自身的工作,在nagios core上附加一層新功能,将nagios core本來應該儲存在檔案中的資訊,奪過來儲存到資料庫中(改變了原先應該走的方向)
安裝nagios server-side要裝nagios、nagios-plugins、httpd
NRPE(要實作基于NRPE監控linux則要裝NRPE,用戶端也要裝NRPE,NRPE的運作依賴nagios-plugins,在client裝NRPE前先安裝nagios-plugins)
若要使用snmp監控别的主機,nagios-plugins已提供了snmp功能
若要監控windows,在win上裝NSclient++
若要用NSCA,用戶端要裝上send-nsca,伺服器端隻要開啟NSCA的功能(nagios自帶的功能)
nagios監控win的手段有兩種(snmp和NSClinet++)
注:NSClient++功能非常強大,可監測win的各種資源,如cpu/memory/disk spare/process/services,此工具還提供nrpe的能力和nsca的能力)
nagios與NSClient++通信(通信機制有N種,預設的且最簡單常用的一種是nagios使用插件check_nt(如要監控win主機CPU狀況使用check_nt指令并傳遞一些參數給NSClient++,NSClient++收到後在本地執行檢測指令再傳回給check_nt),這種方式雖易用但監測能力是最弱的;還可用nrpe功能,使用check_nrpe,建議使用此種check_nrpe監測能力更強大;通過nsca可實作被動檢測,nagios監控端需要nsca daemon接受對方發來的檢測結果)
注:check_nt的監控能力較弱,最好用check_nrpe
NRPE(nagios remote pluginexecutor)
二、操作:
[root@localhost ~]# uname -a(redhat6.5)
Linux localhost.localdomain2.6.32-431.el6.x86_64 #1 SMP Sun Nov 10 22:19:54 EST 2013 x86_64 x86_64 x86_64GNU/Linux
準備LAMP環境
同步系統時間
準備軟體包:
nagios-3.3.1.tar.gz
nagios-plugins-1.4.14.tar.gz
[root@localhost ~]# yum -y install httpd php php-mysql mysql mysql-devel mysql-server
[root@localhost ~]# groupadd nagcmd(nagios的運作需要特殊的使用者群組,這個組至關重要,很多nagios的管理功能一些cgi腳本的執行都要有這個組的權限才能執行)
[root@localhost ~]# useradd -G nagcmd nagios
[root@localhost ~]# passwd nagios
[root@localhost ~]# vim /etc/httpd/conf/httpd.conf(二進制格式包安裝的httpd,使用者名群組為apache,源碼方式安裝為daemon)
User apache
Group apache
[root@localhost ~]# usermod -a -G nagcmd apache
[root@localhost ~]# tar xf nagios-3.3.1.tar.gz
[root@localhost ~]# cd nagios
[root@localhost nagios]# ./configure --help| less
[root@localhost nagios]# ./configure --with-command-group=nagcmd --enable-event-broker --sysconfdir=/etc/nagios(--enable-event-broker,enables integration of event broker routines為ndo-utils作準備,無這個選項要使用nagios得重新編譯)
……
Review the options above for accuracy. If they look okay,
type 'make all' to compile the main program and CGIs.
[root@localhost nagios]# make all
[root@localhost nagios]# make install(安裝nagios)
[root@localhost nagios]# make install-init(安裝nagios的相關腳本,例如可使用servicestart|stop等)
[root@localhost nagios]# make install-commandmode(指令權限)
[root@localhost nagios]# make install-config(安裝生成配置檔案)
/usr/bin/install -c -m 775 -o nagios -gnagios -d /etc/nagios
/usr/bin/install -c -m 775 -o nagios -gnagios -d /etc/nagios/objects
/usr/bin/install -c -b -m 664 -o nagios -gnagios sample-config/nagios.cfg /etc/nagios/nagios.cfg
/usr/bin/install -c -b -m 664 -o nagios -gnagios sample-config/cgi.cfg /etc/nagios/cgi.cfg
/usr/bin/install -c -b -m 660 -o nagios -gnagios sample-config/resource.cfg /etc/nagios/resource.cfg
/usr/bin/install -c -b -m 664 -o nagios -gnagios sample-config/template-object/templates.cfg /etc/nagios/objects/templates.cfg
/usr/bin/install -c -b -m 664 -o nagios -gnagios sample-config/template-object/commands.cfg/etc/nagios/objects/commands.cfg
/usr/bin/install -c -b -m 664 -o nagios -gnagios sample-config/template-object/contacts.cfg /etc/nagios/objects/contacts.cfg
/usr/bin/install -c -b -m 664 -o nagios -gnagios sample-config/template-object/timeperiods.cfg/etc/nagios/objects/timeperiods.cfg
/usr/bin/install -c -b -m 664 -o nagios -gnagios sample-config/template-object/localhost.cfg /etc/nagios/objects/localhost.cfg
/usr/bin/install -c -b -m 664 -o nagios -gnagios sample-config/template-object/windows.cfg/etc/nagios/objects/windows.cfg
/usr/bin/install -c -b -m 664 -o nagios -gnagios sample-config/template-object/printer.cfg /etc/nagios/objects/printer.cfg
/usr/bin/install -c -b -m 664 -o nagios -gnagios sample-config/template-object/switch.cfg /etc/nagios/objects/switch.cfg
*** Config files installed ***
Remember, these are *SAMPLE* configfiles. You'll need to read
the documentation for more information onhow to actually define
services, hosts, etc. to fit yourparticular needs.
[root@localhost nagios]# make install-webconf(會自動在/etc/httpd/conf.d/下生成nagios.conf配置檔案,用于web接口,用于識别nagios程式配置,網頁在/usr/local/nagios/share/下,這個配置檔案可了解為路徑别名,之後可通過http://192.168.23.137/nagios通路)
/usr/bin/install -c -m 644sample-config/httpd.conf /etc/httpd/conf.d/nagios.conf
*** Nagios/Apache conf file installed ***
[root@localhost nagios]# htpasswd -c /etc/nagios/htpasswd.users nagiosadmin(nagios的登入認證機制是用httpd的方式實作的)
New password:
Re-type new password:
Adding password for user nagiosadmin
[root@localhost nagios]# service httpd restart
Stopping httpd: [ OK ]
Starting httpd: [ OK ]
[root@localhost nagios]# chkconfig --add nagios
[root@localhost nagios]# chkconfig --list nagios
nagios 0:off 1:off 2:off 3:on 4:on 5:on 6:off
[root@localhost nagios]# service nagios start
Starting nagios: done.
[root@localhost nagios]# cd ..
[root@localhost ~]# tar xf nagios-plugins-1.4.14.tar.gz
[root@localhost ~]# cd nagios-plugins-1.4.14
[root@localhost nagios-plugins-1.4.14]#./configure --help | less
[root@localhost nagios-plugins-1.4.14]#./configure --with-nagios-user=nagios --with-nagios-group=nagios--sysconfdir=/etc/nagios
[root@localhost nagios-plugins-1.4.14]#make && make install
[root@localhost nagios-plugins-1.4.14]#service nagios restart(要關掉selinux否則會阻止cgi腳本的運作,#setenforce 0)
Running configuration check...done.
Stopping nagios: done.
[root@localhost nagios-plugins-1.4.14]# cd
[root@localhost ~]# ls /etc/nagios
cgi.cfg htpasswd.users nagios.cfg objects resource.cfg
[root@localhost ~]# ls /etc/nagios/objects(objects/下的這些對象可放在任意位置,隻要在主配置檔案nagios.cfg中将其包含進來即可)
commands.cfg contacts.cfg localhost.cfg printer.cfg switch.cfg templates.cfg timeperiods.cfg windows.cfg
通路http://192.168.23.137/nagios
<a href="http://s3.51cto.com/wyfs02/M01/7E/97/wKiom1cE3fWygIM0AABnzaMLavU840.jpg" target="_blank"></a>
<a href="http://s3.51cto.com/wyfs02/M00/7E/97/wKiom1cE3gaApnXdAAB_7nnjfaE949.jpg" target="_blank"></a>
[root@localhost ~]# vim /etc/nagios/nagios.cfg(cfg_dir定義的目錄下的所有檔案都會加載進來)
log_file=/usr/local/nagios/var/nagios.log
cfg_file=/etc/nagios/objects/commands.cfg
cfg_file=/etc/nagios/objects/contacts.cfg
cfg_file=/etc/nagios/objects/timeperiods.cfg
cfg_file=/etc/nagios/objects/templates.cfg
cfg_file=/etc/nagios/objects/localhost.cfg
#cfg_dir=/etc/nagios/servers
resource_file=/etc/nagios/resource.cfg
status_file=/usr/local/nagios/var/status.dat
status_update_interval=10
check_external_commands=1
command_check_interval=-1
command_file=/usr/local/nagios/var/rw/nagiosNaNd
lock_file=/usr/local/nagios/var/nagios.lock
temp_file=/usr/local/nagios/var/nagios.tmp
temp_path=/tmp
log_rotation_method=d
注:command_file=/usr/local/nagios/var/rw/nagiosNaNd,定義command的執行權限和執行身份,不是定義command本身
[root@localhost ~]# vim/etc/nagios/resource.cfg(對nagios而言$USER1$是宏(變量),由變量定義的配置檔案,nagios支援32個宏,從$USER1$到$USER32$,預設$USER1$已使用,這些宏可了解為是nagios的環境變量,除31個可自定義的宏外,nagios還支援原生态的宏,不必事先定義的,如$HOSTADDRESS$會根據上下文的不同用來表示不同的主機;resource.cfg此檔案一般不允許通過前端的web接口通路,正是通過此配置檔案剝離了使用者接口與cgi的内容,cgi若要通路使用者的配置資訊可調用這個檔案,但在web接口通路不到,加強其安全性)
$USER1$=/usr/local/nagios/libexec
[root@localhost ~]# ls /usr/local/nagios/libexec(其下是一堆的插件,要引用某一個插件時,使用$USER1$/PLUGINS_NAME即可)
[root@localhost ~]# vim /usr/local/nagios/var/status.dat(nagios監測的某一服務或主機在某一時刻都有狀态,保留所有狀态的資料檔案)
[root@localhost ~]# cd /etc/nagios/objects
[root@localhost objects]# vim commands.cfg
define command{
command_name notify-host-by-email(必須要全局唯一,兩個command_name一定不能重名,至關重要)
command_line /usr/bin/printf"%b" "***** Nagios *****\n\nNotification Type:$NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo:$HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" | /bin/mail -s "**$NOTIFICATIONTYPE$ Host Alert: $HOSTNAME$ is $HOSTSTATE$ **"$CONTACTEMAIL$
}
command_name check-host-alive
command_line $USER1$/check_ping-H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 5(-w,warning,警告預值,有80%的丢包率且延遲為3000ms就警告;-c,critical的預值;-p,package,共檢測幾個資料包)
command_name check_local_disk
command_line $USER1$/check_disk-w $ARG1$ -c $ARG2$ -p $ARG3$($ARG#$在不同的主機上可傳遞不同的參數)
[root@localhost objects]# vim contacts.cfg
define contact{
contact_name nagiosadmin ; Shortname of user(contact_name定義的要全局唯一)
use generic-contact ; Inheritdefault values from generic-contact template (defined above)(use從哪個模闆繼承的一些屬性)
alias Nagios Admin ; Fullname of user(描述性的名字,友善檢視)
email nagios@localhost ;<<***** CHANGE THIS TO YOUR EMAIL ADDRESS ******
[root@localhost objects]# vim timeperiods.cfg
define timeperiod{
timeperiod_name 24x7(timeperiod_name全局唯一)
alias 24 Hours A Day, 7Days A Week
sunday 00:00-24:00
monday 00:00-24:00
tuesday 00:00-24:00
wednesday 00:00-24:00
thursday 00:00-24:00
friday 00:00-24:00
saturday 00:00-24:00
[root@localhost objects]# vim localhost.cfg
define host{
use linux-server ; Name of host template to use
; This host definition will inherit all variables that are defined(use使用哪個模闆)
; in (or inherited by) the linux-server host template definition.
host_name localhost(host_name全局唯一)
alias localhost
address 127.0.0.1
define service{
use local-service ; Name ofservice template to use
host_name localhost(先定義好主機,再定義服務,服務必須是某個主機的服務,服務要全局唯一)
service_description PING
check_command check_ping!100.0,20%!500.0,60%(!100.0,20%,表示傳遞的第一個參數,!500.0,60%表示傳遞的第二個參數;要先在commands.cfg中定義好check_ping)
1、
通過check_nt方式監控windows主機
windows-side(被監控端):
<a href="http://s5.51cto.com/wyfs02/M01/7E/94/wKioL1cE3wWhQ777AACmUgY7zNQ642.jpg" target="_blank"></a>
在win主機上安裝NSClinet++(http://nsclient.org/)
注意Allowed hosts為監控端naigos的位址
<a href="http://s4.51cto.com/wyfs02/M00/7E/94/wKioL1cE3xHA3Wn7AACPVoNiK3Y327.jpg" target="_blank"></a>
在win上使用netstat -an檢視12489port是否開啟,預設是1248已改為12489,這是check_nt插件與NSClient++通信的端口;5666是nrpe使用的端口
<a href="http://s3.51cto.com/wyfs02/M01/7E/97/wKiom1cE3nHDU6jfAABZGlWsKj8186.jpg" target="_blank"></a>
修改win上MSC配置檔案将password注釋掉,友善監控端配置,否則監控端每個監控語句都要多配置一個參數用來傳遞密碼(生産環境中要設定)
<a href="http://s1.51cto.com/wyfs02/M01/7E/94/wKioL1cE3ymyP3exAACQJhIAD3w450.jpg" target="_blank"></a>
在win指令行下重新開機服務(>nsclinet++.exe -stop,>nsclient++.exe-start)
<a href="http://s5.51cto.com/wyfs02/M02/7E/97/wKiom1cE3orw7O-2AACSLNoY_p0558.jpg" target="_blank"></a>
nagios-side(監控端):
[root@localhost objects]# ifconfig | grep "inet addr:"
inet addr:192.168.23.138 Bcast:192.168.23.255 Mask:255.255.255.0
inet addr:127.0.0.1 Mask:255.0.0.0
[root@localhost objects]# cd /usr/local/nagios/libexec/
[root@localhost libexec]# ll check_nt
-rwxr-xr-x. 1 nagios nagios 95456 Apr 1 15:59 check_nt
[root@localhost libexec]# ./check_nt -h
Usage:check_nt -H host -v variable [-p port] [-w warning] [-c critical] [-l params] [-d SHOWALL] [-u] [-t timeout]
注:-H,--hostname=HOST
-v,--variable=STRING(variable有CLIENTVERSION,CPULOAD,UPTIME,USEDDISKSPACE,MEMUSE,SERVICESTATE,PROCSTATE,COUNTER,INSTANCES)
[root@localhost libexec]# ./check_nt -H 192.168.23.140 -v UPTIME -p 12489 -s nagios
System Uptime - 0 day(s) 0 hour(s) 40minute(s)
[root@localhost libexec]# ./check_nt -H 192.168.23.140 -p 12489 -v CPULOAD -w 80 -c 90 -l 5,80,90 -s nagios(顯示的結果分性能資訊和一般資訊,用豎線|隔開,注意若自己開發插件時,性能資訊和一般資訊必須要使用豎線隔開)
CPU Load 0% (5 min average) | '5 min avg Load'=0%;80;90;0;100
[root@localhost libexec]# ./check_nt -H 192.168.23.140 -p 12489 -v USEDDISKSPACE -w 80 -c 90 -l C -s nagios
C:\ - total: 40.00 Gb - used: 8.96 Gb (22%)- free 31.04 Gb (78%) | 'C:\ Used Space'=8.96Gb;32.00;36.00;0.00;40.00
[root@localhost libexec]# cd /etc/nagios/objects
command_name check_nt
command_line $USER1$/check_nt-H $HOSTADDRESS$ -p 12489 -v $ARG1$ $ARG2$
[root@localhost objects]# vim windows.cfg
use windows-server ; Inherit default values from a template
host_name winserver ; The name we're giving to this host
alias My WindowsServer ; A longer name associatedwith the host
address 192.168.23.140 ; IP address of the host
use generic-service
host_name winserver
service_description NSClient++Version
check_command check_nt!CLIENTVERSION
service_description Uptime
check_command check_nt!UPTIME
use generic-service
service_description CPU Load
check_command check_nt!CPULOAD!-l 5,80,90
host_name winserver
service_description MemoryUsage
check_command check_nt!MEMUSE!-w 80 -c 90
service_description C:\ DriveSpace
check_command check_nt!USEDDISKSPACE!-l c -w 80 -c 90
service_description W3SVC
check_command check_nt!SERVICESTATE!-d SHOWALL -l W3SVC
service_description Explorer
check_command check_nt!PROCSTATE!-d SHOWALL -lExplorer.exe
[root@localhost objects]# vim ../nagios.cfg(添加如下一行)
cfg_file=/etc/nagios/objects/windows.cfg
[root@localhost objects]#/usr/local/nagios/bin/nagios -v /etc/nagios/nagios.cfg
Total Warnings: 0
Total Errors: 0
Things look okay - No serious problems weredetected during the pre-flight check
[root@localhost objects]# service nagios restart
<a href="http://s4.51cto.com/wyfs02/M00/7E/94/wKioL1cE34iBzXCaAAB5AMMZhd0693.jpg" target="_blank"></a>
<a href="http://s2.51cto.com/wyfs02/M02/7E/94/wKioL1cE35_wpVcOAACiOpIZ7qY445.jpg" target="_blank"></a>
2、
通過check_nrpe插件監測linux
nagios使用check_nrpe插件與被監控端的nrpe程序通信,nrpe的程序預設在5666port上,nagios-side監控端也要安裝nrpe這個addon附件隻不過不需啟動這個服務
被監控端:
[root@localhost ~]# uname -a(centos6.3)
Linux localhost.localdomain2.6.32-279.el6.x86_64 #1 SMP Fri Jun 22 12:19:21 UTC 2012 x86_64 x86_64 x86_64GNU/Linux
[root@localhost ~]# ifconfig | grep "inet addr:"
inet addr:192.168.23.132 Bcast:192.168.23.255 Mask:255.255.255.0
[root@localhost ~]# rpm -i nrpe-2.15-7.el6.src.rpm
[root@localhost ~]# cd rpmbuild
[root@localhost rpmbuild]# ls
SOURCES SPECS
[root@localhost SPECS]# yum -y install tcp_wrappers-devel
[root@localhost SPECS]# rpmbuild -bp nrpe.spec
[root@localhost SPECS]# cd ..
BUILD BUILDROOT RPMS SOURCES SPECS SRPMS
[root@localhost rpmbuild]# cd BUILD
[root@localhost BUILD]# ls
nrpe-2.15
[root@localhost BUILD]# cd nrpe-2.15/
[root@localhost nrpe-2.15]# ./configure --with-nrpe-user=nagios --with-nrpe-group=nagios --with-nagios-user=nagios --with-nagios-group=nagios --enable-command-args --enable-ssl --sysconfdir=/etc/nagios(--enable-command-args更強功能向指令傳遞參數)
[root@localhost nrpe-2.15]# make all
[root@localhost nrpe-2.15]# make install-plugin
[root@localhost nrpe-2.15]# make install-daemon
[root@localhost nrpe-2.15]# make install-daemon-config
[root@localhost nrpe-2.15]# cd /etc/nagios
[root@localhost nagios]# vim nrpe.cfg
log_facility=daemon
pid_file=/var/run/nrpe/nrpe.pid
server_port=5666
server_address=192.168.23.132(服務監聽的位址,不指預設為0.0.0.0)
nrpe_user=nagios
nrpe_group=nagios
allowed_hosts=192.168.23.138(由誰來監控)
debug=0
command_timeout=60
connection_timeout=300
# command[<command_name>]=<command_line>(監控端nagios基于nrpe監控被監控端,要發起監控請求,在被監控端要先定義好執行的指令)
command[check_users]=/usr/local/nagios/libexec/check_users-w 5 -c 10
command[check_load]=/usr/local/nagios/libexec/check_load-w 15,10,5 -c 30,25,20
command[check_sda1]=/usr/local/nagios/libexec/check_disk-w 20% -c 10% -p /dev/sda1
command[check_sda2]=/usr/local/nagios/libexec/check_disk-w 20% -c 10% -p /dev/sda2
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs-w 5 -c 10 -s Z
command[check_total_procs]=/usr/local/nagios/libexec/check_procs-w 150 -c 200 include_dir=/etc/nrpe.d/
[root@localhost nrpe-2.15]#/usr/local/nagios/bin/nrpe -c /etc/nagios/nrpe.cfg -d(開啟nrpe守護程序,可制作腳本/etc/init.d/nrped友善管理見文末)
[root@localhost nrpe-2.15]# netstat -tnlp |grep :5666
tcp 0 0 192.168.23.132:5666 0.0.0.0:* LISTEN 21662/nrpe
監控端:
安裝nrpe(具體見以上被監控端安裝,此處隻需安裝到make all和make install-plugin即可)
[root@localhost nrpe-2.15]# ls /usr/local/nagios/libexec(檢視是否有check_nrpe)
[root@localhost nrpe-2.15]# cd !$
cd /usr/local/nagios/libexec
[root@localhost libexec]# ./check_nrpe -h
Usage: check_nrpe -H <host> [ -b <bindaddr> ] [-4] [-6] [-n] [-u] [-p <port>] [-t <timeout>][-c <command>] [-a <arglist...>]
[root@localhost libexec]# vim /etc/nagios/objects/commands.cfg
command_name check_nrpe
command_line $USER1$/check_nrpe-H $HOSTADDRESS$ -c $ARG1$
[root@localhost libexec]# cp /etc/nagios/objects/windows.cfg /etc/nagios/objects/linuxhost.cfg
[root@localhost libexec]# vim !$(此處service中定義的項要與被監控端nrpe.cfg中最末處定義的内容一緻)
vim /etc/nagios/objects/linuxhost.cfg
use linux-server ; Inherit default values from a template
host_name linuxserver ; The name we're giving to this host
alias My Linux Server ;A longer name associated with the host
address 192.168.23.132 ; IP address ofthe host
host_name linuxserver
service_description CHECK_USERS
check_command check_nrpe!check_users
service_description LOAD
check_command check_nrpe!check_load
service_description SDA1
check_command check_nrpe!check_sda1
service_description SDA2
check_command check_nrpe!check_sda2
service_description Zombie
check_command check_nrpe!check_zombie_procs
use generic-service
service_description Totalprocs
check_command check_nrpe!check_total_procs
[root@localhost libexec]# vim /etc/nagios/nagios.cfg
cfg_file=/etc/nagios/objects/linuxhost.cfg
[root@localhost libexec]#/usr/local/nagios/bin/nagios -v /etc/nagios/nagios.cfg
[root@localhost libexec]# service nagios restart
<a href="http://s5.51cto.com/wyfs02/M01/7E/97/wKiom1cE316QoUjzAABgPNtWpDI819.jpg" target="_blank"></a>
<a href="http://s1.51cto.com/wyfs02/M02/7E/97/wKiom1cE32yCxjshAAC78KFzSKg840.jpg" target="_blank"></a>
3、
通過check_nrpe監測windows
C:\Program Files\NSClient++\NSC([modules]定義啟動的子產品;分号打頭的是注釋;allow_arguments是否允許nagios監控端傳遞參數,允許改為1;allow_nasty_meta_chars傳遞參數時是否允許包含特殊字元,允許改為1;use_ssl若啟用則會強行使用ssl)
[modules]
NRPEListener.dll
NSClientListener.dll
NSCAAgent.dll
CheckWMI.dll
FileLogger.dll
CheckSystem.dll
CheckDisk.dll
CheckEventLog.dll
CheckHelpers.dll
[Settings]
use_file=1
allowed_hosts=192.168.23.138
[NSClient]
[NRPE]
port=5666
allow_arguments=1
allow_nasty_meta_chars=1
;use_ssl=1
bind_to_address=192.168.23.140
在win指令行下:
>cd ../..
>cd "Program FIles"
>cd "NSClient++"
>nsclient++ -stop
>nsclient++ -start
[root@localhost ~]# cd /usr/local/nagios/libexec
[root@localhost libexec]# ./check_nrpe -H 192.168.23.140 -c checkCPU -a warn=80 crit=90 time=20 time=10 time=5
OK CPU Load ok.|'20'=0%;80;90;'10'=0%;80;90; '5'=0%;80;90;
4、
/usr/local/nagios/libexec/下,check_http用于監測webservice,check_mysql用于監測mysql service
[root@localhost libexec]# ./check_http -h
Usage: check_http -H <vhost> | -I<IP-address> [-u <uri>] [-p <port>]
[-w <warn time>] [-c <critical time>] [-t <timeout>][-L]
[-a auth] [-f <ok | warn | critcal | follow | sticky |stickyport>]
[-e <expect>] [-s string] [-l] [-r <regex> | -R<case-insensitive regex>]
[-P string] [-m <min_pg_size>:<max_pg_size>] [-4|-6] [-N][-M <age>]
[-A string] [-k string] [-S] [-C <age>] [-T <content-type>][-j method]
Examples:
CHECK CONTENT: check_http -w 5 -c 10 --ssl -H www.verisign.com
[root@localhost libexec]# ./check_mysql -h
Usage: check_mysql [-d database] [-H host][-P port] [-s socket]
[-u user] [-p password] [-S]
添加監控httpd服務:
command_name check_http
command_line $USER1$/check_http-I $HOSTADDRESS$ $ARG1$
[root@localhost objects]# vim linuxhost.cfg
service_description Web Server
check_command check_http
[root@localhost objects]# service nagiosrestart
<a href="http://s3.51cto.com/wyfs02/M00/7E/97/wKiom1cE353BquG5AAB7rI6v_yo055.jpg" target="_blank"></a>
添加監控mysql:
command_name check_mysql
command_line $USER1$/check_mysql -H $HOSTADDRESS$ -u $ARG1$ -p $ARG2$
use generic-service
service_description MySQLServer
check_command check_mysql!root!magedu
<a href="http://s1.51cto.com/wyfs02/M00/7E/94/wKioL1cE4F6B2co3AACADeZaoZ8297.jpg" target="_blank"></a>
注:web service和mysql本身就對外提供服務,不需要NRPE或NSClient++這些額外插件
[root@localhost objects]# vim templates.cfg(host和service都定義在admins組上)
name generic-contact ; The name of this contact template
service_notification_period 24x7 ; servicenotifications can be sent anytime
host_notification_period 24x7 ; hostnotifications can be sent anytime
service_notification_options w,u,c,r,f,s ; sendnotifications for all service states, flapping events, and scheduled downtimeevents
host_notification_options d,u,r,f,s ; sendnotifications for all host states, flapping events, and scheduled downtimeevents
service_notification_commands notify-service-by-email ; send service notificationsvia email
host_notification_commands notify-host-by-email ; send host notifications via email
register 0 ; DONTREGISTER THIS DEFINITION - ITS NOT A REAL CONTACT, JUST A TEMPLATE!
name linux-server ; The name of thishost template
use generic-host ; This templateinherits other values from the generic-host template
check_period 24x7 ; By default,Linux hosts are checked round the clock
check_interval 5 ; Actively checkthe host every 5 minutes
retry_interval 1 ; Schedule host check retries at1 minute intervals
max_check_attempts 10 ; Check each Linuxhost 10 times (max)
check_command check-host-alive ; Default command to check Linux hosts
notification_period workhours ; Linux adminshate to be woken up, so we only notify during the day
; Note that the notification_period variable is being overridden from
; the value that is inherited from the generic-host template!
notification_interval 120 ; Resendnotifications every 2 hours
notification_options d,u,r ; Only sendnotifications for specific host states
contact_groups admins ; Notifications get sent to the admins by default
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT AREAL HOST, JUST A TEMPLATE!
name windows-server ; The name of thishost template
use generic-host ; Inherit defaultvalues from the generic-host template
check_period 24x7 ; By default, Windows servers aremonitored round the clock
check_interval 5 ; Actively check the server every 5 minutes
retry_interval 1 ; Schedule host check retries at1 minute intervals
max_check_attempts 10 ; Check each server 10 times(max)
check_command check-host-alive ; Default command to check if serversare "alive"
notification_period 24x7 ; Send notification out at any time- day or night
notification_interval 30 ; Resend notifications every 30minutes
notification_options d,r ; Only send notifications forspecific host states
contact_groups admins ; Notifications get sent to the adminsby default
hostgroups windows-servers ; Host groups that Windows servers should be a member of
register 0 ; DONT REGISTER THIS - ITS JUSTA TEMPLATE
contact_name nagiosadmin ; Shortname of user
use generic-contact ; Inherit default values from generic-contact template (defined above)
alias Nagios Admin ; Full name of user
define contactgroup{
contactgroup_name admins
alias NagiosAdministrators
members nagiosadmin
command_name notify-host-by-email
command_line /usr/bin/printf"%b" "***** Nagios *****\n\nNotification Type:$NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress:$HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" |/bin/mail -s "** $NOTIFICATIONTYPE$ Host Alert: $HOSTNAME$ is $HOSTSTATE$**" $CONTACTEMAIL$
command_name notify-service-by-email
command_line /usr/bin/printf"%b" "***** Nagios *****\n\nNotification Type:$NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress:$HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditionalInfo:\n\n$SERVICEOUTPUT$\n" | /bin/mail -s "** $NOTIFICATIONTYPE$Service Alert: $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **" $CONTACTEMAIL$
注:contacts.cfg中的generic-contact與templates.cfg中的generic-contact相關聯
contacts.cfg中的admins與templates.cfg中的admins相關聯
commands.cfg中的notify-host-by-email與templates.cfg中的notify-host-by-email
commands.cfg中的notify-service-by-email與templates.cfg中的notify-service-by-email
NSCA方式,定義主機時注意:
active_checks_enabled為0
passive_checks_enabled為1
附:nrped腳本
#vim /etc/init.d/nrped
-----------------------script start-----------------
#!/bin/sh
#
# chkconfig: - 86 14
nrpe_num=`ps aux | grep /bin/nrpe | grep -vgrep | wc -l`
case $1 in
start)
if [ $nrpe_num -eq 1 ]
then
echo "Error:nrpe is running."
else
/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
echo "nrpe started successfully."
fi
;;
stop)
nrpe_pid=`ps aux | grep /bin/nrpe | grep -v grep | awk '{print $2}'`
kill -9 $nrpe_pid
echo "nrpe stoped successfully."
echo "Error:nrpe is stoping."
restart)
echo "Error:nrpe is stoping"
esac
-------------------script end---------------------------
本文轉自 chaijowin 51CTO部落格,原文連結:http://blog.51cto.com/jowin/1761024,如需轉載請自行聯系原作者