Linux監視サーバ-nagios-

お久しぶりです￣O￣)ノ

本当に久しぶりの技術投稿になります。
仕事で久々にLinuxを扱うことになり、事前に下調べがてら記事にしました(´▽｀)
今回は、nagiosサーバに関してです！！
今までzabbixとかcactiとかやってきましたが、nagiosはやってませんでした。前職のとき触っておけばよかったな；￣ロ￣）

では、早速インストールから始めてみましょう☆

◆ サーバ側
まずは、必要なパッケージのインストールをします。監視状況は、webブラウザ上で確認するため、webサーバとしてapacheを入れておきます。

# yum install epel-release httpd php

次に、nagiosのインストールをします。
nagiosにはデフォルトでlocalhostの監視設定が入っているため、監視できるようにnagiosのプラグインも入れておきます。

# yum install nagios nagios-plugins-all

apacheのnagios設定が自動的に入るのでとりあえず確認程度で！
ただ、ほんとは通信制限とかセキュリティ周りはやったほうがいいかなと思いますが（￣□￣；）

# cat /etc/httpd/conf.d/nagios.conf

nagiosの管理画面では、basic認証を使うためユーザ名とパスワードを設定しておきます。

# htpasswd /etc/nagios/passwd nagiosadmin
New password:
Re-type new password:
Updating password for user nagiosadmin

各サーバを起動し、自動起動設定もONにします。

# systemctl start httpd
# systemctl start nagios
# systemctl enable httpd
# systemctl enable nagios

ここまでこれば、webから管理画面を確認できます！！
http://サーバのIPアドレス/nagios/

かなり簡単ですね(＝⌒▽⌒＝)
localhostの監視も出来ていることが確認できると思います。
では次に、監視設定の方法を試していきましょう！

◆ クライアント側
クライアント側にnagiosのプラグインをインストールします。

# yum install nagios-plugins-all

次にクライアントの設定を入れていきます！
まず、テンプレートに設定を追加しておきます。

# cp -p /etc/nagios/objects/templates.cfg /etc/nagios/objects/templates.cfg.org
# diff /etc/nagios/objects/templates.cfg.org /etc/nagios/objects/templates.cfg
197a198,238
>
> define host{
>         name                            test-server
>         use                             generic-host
>         check_period                    24x7
>         check_interval                  5
>         retry_interval                  1
>         max_check_attempts              3
>         check_command                   check-host-alive
>         notification_period             24x7
>         notification_interval           60
>         notification_options            d,u
>         contact_groups                  admins
>         register                        0
>         }
>
> define service{
>         name                            test-server             ; The 'name' of this service template
>         active_checks_enabled           1                       ; Active service checks are enabled
>         passive_checks_enabled          1                       ; Passive service checks are enabled/accepted
>         parallelize_check               1                       ; Active service checks should be parallelized (disabling this can lead to major performance problems)
>         obsess_over_service             1                       ; We should obsess over this service (if necessary)
>         check_freshness                 0                       ; Default is to NOT check service 'freshness'
>         notifications_enabled           1                       ; Service notifications are enabled
>         event_handler_enabled           1                       ; Service event handler is enabled
>         flap_detection_enabled          1                       ; Flap detection is enabled
>         failure_prediction_enabled      1                       ; Failure prediction is enabled
>         process_perf_data               1                       ; Process performance data
>         retain_status_information       1                       ; Retain status information across program restarts
>         retain_nonstatus_information    1                       ; Retain non-status information across program restarts
>         is_volatile                     0                       ; The service is not volatile
>         check_period                    24x7                    ; The service can be checked at any time of the day
>         max_check_attempts              1                       ; Re-check the service up to 3 times in order to determine its final (hard) state
>         normal_check_interval           5                       ; Check the service every 10 minutes under normal conditions
>         retry_check_interval            2                       ; Re-check the service every two minutes until a hard state can be determined
>         contact_groups                  admins                  ; Notifications get sent out to everyone in the 'admins' group
>         notification_options            w,c                     ; Send notifications about warning, unknown, critical, and recovery events
>         notification_interval           60                      ; Re-notify about service problems every hour
>         notification_period             24x7                    ; Notifications can be sent out at any time
>         register                        0                       ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
>         }

次にクライアント側の設定を追加します。ファイル名はなんでも問題ないです。

# cat /etc/nagios/conf.d/test.cfg
define host{
        use                     test-server
        host_name               test-server
        alias                   テストクライアント
        address                 クライアント側IPアドレス
        }

define service{
        use                             test-server
        host_name                       test-server
        service_description             alive
        check_command                   check-host-alive
        }

設定が問題ないかを確認します。

# /usr/sbin/nagios -v /etc/nagios/nagios.cfg
:
:
Total Warnings: 0
Total Errors:   0

Things look okay - No serious problems were detected during the pre-flight check

エラーがなければ、反映させます♪

# systemctl restart nagios

webの管理画面から監視しているサーバ増えたことが確認できます（‐＾▽＾‐）

では、最後に一部監視の方法を紹介しておきます。
/etc/nagios/conf.d/test.cfgに追加して反映すればできます☆

■http
下記は、http://IPアドレス/test/index.htmlのページが表示できる(200レスポンスが返る)ことを監視する設定です。ちなみに!が引数の区切り文字となります☆

define service{
        use                             test-server
        host_name                       test-server
        service_description             httpd
        check_command                   check_http!-u /test/index.html
        }

■snmp
下記は、メモリ空き容量(UCD-SNMP-MIB::memAvailReal.0)が100000KBを下回った際、CRITICALのエラーとするものです。範囲の指定は以下のように出来ます。
ちなみに、A→Aを超えたらエラー、A:B→AとBの範囲をはみ出したらエラー、B:→B未満ならエラーとなります。

# snmp
define service{
        use                             test-server
        host_name                       test-server
        service_description             snmp
        check_command                   check_snmp!-P 2c -C public -o UCD-SNMP-MIB::memAvailReal.0 -c 100000:
        }

監視が追加されているのが確認できますね！！