Advertisement
bewleberkl

Deploy Nagios with Ansible. (basic howto)

May 28th, 2019
198
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 18.31 KB | None | 0 0
  1. Ansible/Nagios install/configure/deploy
  2.  
  3. Purpose: Quick howto that combined howto steps from lots of other sources.
  4.  
  5. Enviro:
  6. - NB master server: nb801d2c4u.lab.krt / 192.168.1.28, on RHEL7.x ...will be our Ansible server and Nagios monitoring server.
  7. - NB media server: nb801d2cmed / 192.168.1.26, on RHEL7.x; will be a remote server to-be-monitored
  8. - NB clients: nbclient1 / 192.168.1.33, on RHEL7.x; nbclient2 / 192.168.1.36, on Debian; will be a remote server to-be-monitored
  9.  
  10.  
  11. Apps:
  12. - Ansible. The config stuff is all in /etc/ansible after install.
  13. - Nagios:
  14. - /etc/nagios is config stuff for RHEL
  15. - /usr/lib64/nagios is where all the executables and plugins live.
  16.  
  17.  
  18.  
  19. From the server that will run 'Nagios server' (from which we will monitor remote servers):
  20.  
  21. 1. Install Ansible:
  22. yum --nogpgcheck install ansible
  23.  
  24. 2. Configure passwordless SSH to remote hosts to be monitored:
  25.  
  26. #cat id_rsa.pub | ssh root@192.168.1.26 'cat >> .ssh/authorized_keys'
  27.  
  28. ....repeat for all to-be-monitored remote hosts
  29. ....or we could have used:
  30. # ssh-copy-id root@192.168.1.26
  31. # ssh-copy-id root@192.168.1.27
  32. # ssh-copy-id root@192.168.1.29
  33.  
  34.  
  35. 3. Install Nagios, Nagios Remote Plugin Executor, Nagios plugins, Nagios nrpe plugin, httpd (webserver) and PHP:
  36. # yum --nogpgcheck --enablerepo=epel -y install nagios nrpe nagios-plugins.x86_64 nagios-plugins-all.x86_64 nagios-plugins-nrpe httpd php
  37.  
  38.  
  39. 4. Edit local Nagios server /etc/nagios/nrpe.cfg 'allowed_hosts=' to add local IP as well as commands used (see example attached).
  40. # vi /etc/nagios/nrpe.cfg
  41.  
  42.  
  43. 5. Test local disk monitor command:
  44. # /usr/lib64/nagios/plugins/check_disk -w 20% -c 10% -p /
  45.  
  46.  
  47. 6. Enable httpd, set the password for nagiosadmin user and start httpd and nagios:
  48. # systemctl enable httpd
  49.  
  50. # htpasswd -c /etc/nagios/passwd nagiosadmin
  51.  
  52. # systemctl start httpd
  53.  
  54. # systemctl start nagios
  55.  
  56. ...test login: http://192.168.1.28/nagios/
  57.  
  58.  
  59.  
  60. At this point our Nagios server is only monitoring itself. Next we'll use Ansible to push install Nagios Remote Plugin Executor (NRPE) and the nrpe.cfg config file to the remote Nagios clients to be monitored.
  61.  
  62. 1. On the Ansible/Nagios server, populate a Ansible 'hosts' file with a list of remote hosts. In this example nb801d2c4u.lab.krt is our local Nagios server and two RHEL7 hosts in a group called 'nbrhelhosts' and then a Debian sytem in 'nbdebianhosts' group:
  63.  
  64. -
  65. /etc/ansible/hosts contents:
  66.  
  67. all:
  68. hosts:
  69. nb801d2c4u.lab.krt:
  70. children:
  71. nbrhelsvrs:
  72. hosts:
  73. nb801d2cmed.lab.krt:
  74. nb811d2cmed.lab.krt:
  75. nb812d2cmed.lab.krt:
  76. nb812ad2cmed.lab.krt:
  77. nb812bd2cmed.lab.krt:
  78. nb812cd2cmed.lab.krt:
  79. nbrhelclnts:
  80. hosts:
  81. nbclient1.lab.krt:
  82. nbdebianhosts:
  83. hosts:
  84. nbclient2.lab.krt:
  85.  
  86. Note: add more clients to your own 'hosts' configuration as needed.
  87.  
  88.  
  89. 2. Test Ansible 'hosts' file and functionality/connectivity:
  90. # ansible all -m ping
  91.  
  92. # ansible nbrhelsvrs -m ping
  93.  
  94.  
  95. 3. Next we create a Ansible playbook YAML file named nrpe-deploy.yaml to install NRPE, install Nagios plugins, and configure NRPE. But we'll 'accidentally' forget the nrpe.cfg configuration file to demonstrate that we can update the same YAML playbook file later and re-execute it from the Ansible server and Ansible will re-execute the steps that are 'new' (as well as the original steps that may have failed on certain remote hosts previously):
  96.  
  97. a. Create the nrpe-deploy.yaml on the Ansible server, nrpe-deploy.yaml contents:
  98.  
  99. ---
  100. - hosts: nbrhelsvrs
  101. remote_user: root
  102.  
  103. tasks:
  104. - name: install epel
  105. yum:
  106. name: epel-release
  107. state: latest
  108.  
  109. - name: install nrpe
  110. yum:
  111. name: nrpe
  112. state: latest
  113.  
  114. - name: install nagios plugins
  115. yum:
  116. name: nagios-plugins-all
  117. state: latest
  118.  
  119.  
  120. ....example execution:
  121.  
  122. [root@nb801d2c4u ansible]# ansible-playbook nrpe-deploy.yaml
  123.  
  124. PLAY [nbrhelsvrs] *******************************************************************************************************
  125.  
  126. TASK [Gathering Facts] ***************************************************************************************************
  127. ok: [nb812cd2cmed.lab.krt]
  128. ok: [nb812bd2cmed.lab.krt]
  129. ok: [nb812ad2cmed.lab.krt]
  130. ok: [nb811d2cmed.lab.krt]
  131. ok: [nb801d2cmed.lab.krt]
  132.  
  133. TASK [install epel] ******************************************************************************************************
  134. fatal: [nb812cd2cmed.lab.krt]: FAILED! => {"changed": false, "msg": "No package matching 'epel-release' found available, installed or updated", "rc": 126, "results": ["No package matching 'epel-release' found available, installed or updated"]}
  135. changed: [nb801d2cmed.lab.krt]
  136.  
  137. TASK [install nrpe] ******************************************************************************************************
  138. changed: [nb812bd2cmed.lab.krt]
  139. changed: [nb812ad2cmed.lab.krt]
  140. changed: [nb811d2cmed.lab.krt]
  141. changed: [nb801d2cmed.lab.krt]
  142. changed: [nb801d2cmed.lab.krt]
  143.  
  144. TASK [install nagios plugins] ********************************************************************************************
  145. changed: [nb812bd2cmed.lab.krt]
  146. changed: [nb812ad2cmed.lab.krt]
  147. changed: [nb811d2cmed.lab.krt]
  148. changed: [nb801d2cmed.lab.krt]
  149. changed: [nb801d2cmed.lab.krt]
  150.  
  151. PLAY RECAP ***************************************************************************************************************
  152. nb801d2cmed.lab.krt : ok=4 changed=3 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
  153. nb801d2cmed.lab.krt : ok=4 changed=3 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
  154. nb811d2cmed.lab.krt : ok=4 changed=3 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
  155. nb812ad2cmed.lab.krt : ok=4 changed=3 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
  156. nb812cd2cmed.lab.krt : ok=1 changed=0 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0
  157.  
  158. [root@nb801d2c4u ansible]#
  159.  
  160.  
  161.  
  162. Now on the Ansible server, we will create a nrpe.cfg file for the remote Nagios clients, update the Ansible server-side nrpe-deploy.yaml file and re-deploy:
  163.  
  164. 1. first create the file nrpe.cfg on the ansible server with contents:
  165.  
  166. -------->start nrpe.cfg file contents<----------
  167. # bind to all interfaces
  168. server_address=0.0.0.0
  169.  
  170. # allow access by localhost and the Nagios server:
  171. allowed_hosts=127.0.0.1,192.168.1.28
  172.  
  173. # allow command args
  174. dont_blame_nrpe=1
  175.  
  176. # example monitor commands
  177. command[check_users]=/usr/lib64/nagios/plugins/check_users -w 5 -c 10
  178. command[check_load]=/usr/lib64/nagios/plugins/check_load -r -w .15,.10,.05 -c .30,.25,.20
  179. #command[check_hda1]=/usr/lib64/nagios/plugins/check_disk -w 20% -c 10% -p /dev/hda1
  180. command[check_root]=/usr/lib64/nagios/plugins/check_disk -w 20% -c 10% -p /dev/mapper/rhel-root
  181. command[check_nbvol]=/usr/lib64/nagios/plugins/check_disk -w 20% -c 10% -p /dev/nbu/nbvol
  182. command[check_advdsk]=/usr/lib64/nagios/plugins/check_disk -w 20% -c 10% -p /dev/mapper/nbu-advdsk
  183. command[check_msdp]=/usr/lib64/nagios/plugins/check_disk -w 20% -c 10% -p /dev/mapper/nbu-msdpcache
  184. command[check_zombie_procs]=/usr/lib64/nagios/plugins/check_procs -w 5 -c 10 -s Z
  185. command[check_total_procs]=/usr/lib64/nagios/plugins/check_procs -w 150 -c 200
  186.  
  187. -------->end nrpe.cfg file contents<----------
  188.  
  189.  
  190. 2. Next, append following lines to the nrpe-deploy.yaml file:
  191.  
  192. - name: deploy nrpe.cfg
  193. copy:
  194. src: nrpe.cfg
  195. dest: /etc/nagios/nrpe.cfg
  196. force: yes
  197. backup: yes
  198. register: deploy_nrpe
  199.  
  200.  
  201.  
  202. 3. From the Ansible server, redeploy (and whats cool is before running this I fixed a broken yum installer on nb812cd2cmed.lab.krt so this re-execution successfully completed the NRPE and Nagios plugin installs on nb812cd2cmed.lab.krt, then deployed the nrpe.cfg file):
  203.  
  204. [root@nb801d2c4u ansible]# ansible-playbook nrpe-deploy.yaml
  205.  
  206. PLAY [nbrhelhosts] *******************************************************************************************************
  207.  
  208. TASK [Gathering Facts] ***************************************************************************************************
  209. ok: [nbclient1.lab.krt]
  210. ok: [nb812bd2cmed.lab.krt]
  211. ok: [nb812ad2cmed.lab.krt]
  212. ok: [nb811d2cmed.lab.krt]
  213. ok: [nb801d2cmed.lab.krt]
  214.  
  215. TASK [install epel] ******************************************************************************************************
  216. ok: [nb801d2cmed.lab.krt]
  217. ok: [nb812bd2cmed.lab.krt]
  218. ok: [nb812ad2cmed.lab.krt]
  219. ok: [nb811d2cmed.lab.krt]
  220. changed: [nb812cd2cmed.lab.krt]
  221.  
  222. TASK [install nrpe] ******************************************************************************************************
  223. ok: [nb801d2cmed.lab.krt]
  224. ok: [nb812bd2cmed.lab.krt]
  225. ok: [nb812ad2cmed.lab.krt]
  226. ok: [nb811d2cmed.lab.krt]
  227. changed: [nb812cd2cmed.lab.krt]
  228.  
  229. TASK [install nagios plugins] ********************************************************************************************
  230. ok: [nb801d2cmed.lab.krt]
  231. ok: [nb812bd2cmed.lab.krt]
  232. ok: [nb812ad2cmed.lab.krt]
  233. ok: [nb811d2cmed.lab.krt]
  234. changed: [nb812cd2cmed.lab.krt]
  235.  
  236. TASK [deploy nrpe.cfg] ***************************************************************************************************
  237. changed: [nb812cd2cmed.lab.krt]
  238. changed: [nb812bd2cmed.lab.krt]
  239. changed: [nb812ad2cmed.lab.krt]
  240. changed: [nb811d2cmed.lab.krt]
  241. changed: [nb801d2cmed.lab.krt]
  242. changed: [nb801d2cmed.lab.krt]
  243.  
  244. TASK [start/restart and enable nrpe] *************************************************************************************
  245. changed: [nb812cd2cmed.lab.krt]
  246. changed: [nb812bd2cmed.lab.krt]
  247. changed: [nb812ad2cmed.lab.krt]
  248. changed: [nb811d2cmed.lab.krt]
  249. changed: [nb801d2cmed.lab.krt]
  250. changed: [nb801d2cmed.lab.krt]
  251.  
  252. PLAY RECAP ***************************************************************************************************************
  253. nb801d2cmed.lab.krt : ok=6 changed=2 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
  254. nb801d2cmed.lab.krt : ok=6 changed=3 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
  255. nb801d2cmed.lab.krt : ok=6 changed=3 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
  256. nb811d2cmed.lab.krt : ok=6 changed=3 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
  257. nb812ad2cmed.lab.krt : ok=6 changed=3 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
  258. nb812cd2cmed.lab.krt : ok=6 changed=3 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
  259. nb812cd2cmed.lab.krt : ok=6 changed=5 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
  260.  
  261. [root@nb801d2c4u ansible]#
  262.  
  263.  
  264.  
  265. 4. On the Nagios Server, append following to bottom of /etc/nagios/objects/commands.cfg:
  266.  
  267. # .check_nrpe. command definition
  268. define command{
  269. command_name check_nrpe
  270. command_line /usr/lib64/nagios/plugins/check_nrpe -H $HOSTADDRESS$ -t 30 -c $ARG1$
  271. }
  272.  
  273.  
  274. 5. On the Nagios Server, create <hostname>.cfg files inside /etc/nagios/servers/ for each host to be monitored. In this example we created one for one of the remote NB media servers:
  275.  
  276. # vi nb801d2cmed.cfg
  277. .....add following content:
  278.  
  279. ###############################################################################
  280. #
  281. # HOST DEFINITION
  282. #
  283. ###############################################################################
  284.  
  285. # Define a host for the local machine
  286.  
  287. define host {
  288.  
  289. use linux-server ; Name of host template to use
  290. ; This host definition will inherit all variables that are defined
  291. ; in (or inherited by) the linux-server host template definition.
  292. host_name nb801d2cmed
  293. alias nb801d2cmed
  294. address 192.168.1.26
  295. register 1
  296. }
  297.  
  298.  
  299.  
  300. ###############################################################################
  301. #
  302. # SERVICE DEFINITIONS
  303. #
  304. ###############################################################################
  305.  
  306. # Define a service to "ping"
  307.  
  308. define service {
  309.  
  310. use generic-service ; Name of service template to use
  311. host_name nb801d2cmed
  312. service_description PING
  313. check_command check_ping!100.0,20%!500.0,60%
  314. }
  315.  
  316.  
  317.  
  318. # Define a service to check the disk space of the root partition
  319. # on the local machine. Warning if < 20% free, critical if
  320. # < 10% free space on partition.
  321.  
  322. define service {
  323.  
  324. use generic-service ; Name of service template to use
  325. host_name nb801d2cmed
  326. service_description Root Partition
  327. check_command check_nrpe!check_root
  328. }
  329.  
  330. define service {
  331.  
  332. use generic-service ; Name of service template to use
  333. host_name nb801d2cmed
  334. service_description NB Partition
  335. check_command check_nrpe!check_nbvol
  336. }
  337.  
  338. define service {
  339.  
  340. use generic-service ; Name of service template to use
  341. host_name nb801d2cmed
  342. service_description Advanced Disk Partition
  343. check_command check_nrpe!check_advdsk
  344. }
  345.  
  346. define service {
  347.  
  348. use generic-service ; Name of service template to use
  349. host_name nb801d2cmed
  350. service_description MSDP
  351. check_command check_nrpe!check_msdp
  352. }
  353.  
  354.  
  355.  
  356.  
  357. # Define a service to check the number of currently logged in
  358. # users on the local machine. Warning if > 20 users, critical
  359. # if > 50 users.
  360.  
  361. define service {
  362.  
  363. use generic-service ; Name of service template to use
  364. host_name nb801d2cmed
  365. service_description Current Users
  366. check_command check_local_users!20!50
  367. }
  368.  
  369.  
  370.  
  371. # Define a service to check the number of currently running procs
  372. # on the local machine. Warning if > 250 processes, critical if
  373. # > 400 processes.
  374.  
  375. define service {
  376.  
  377. use generic-service ; Name of service template to use
  378. host_name nb801d2cmed
  379. service_description Total Processes
  380. check_command check_local_procs!250!400!RSZDT
  381. }
  382.  
  383.  
  384.  
  385. # Define a service to check the load on the local machine.
  386.  
  387. define service {
  388.  
  389. use generic-service ; Name of service template to use
  390. host_name nb801d2cmed
  391. service_description Current Load
  392. check_command check_local_load!5.0,4.0,3.0!10.0,6.0,4.0
  393. }
  394.  
  395.  
  396.  
  397. # Define a service to check the swap usage the local machine.
  398. # Critical if less than 10% of swap is free, warning if less than 20% is free
  399.  
  400. define service {
  401.  
  402. use generic-service ; Name of service template to use
  403. host_name nb801d2cmed
  404. service_description Swap Usage
  405. check_command check_local_swap!20%!10%
  406. }
  407.  
  408.  
  409.  
  410. # Define a service to check SSH on the local machine.
  411. # Disable notifications for this service by default, as not all users may have SSH enabled.
  412.  
  413. define service {
  414.  
  415. use generic-service ; Name of service template to use
  416. host_name nb801d2cmed
  417. service_description SSH
  418. check_command check_ssh
  419. notifications_enabled 0
  420. }
  421.  
  422.  
  423. .....save the /etc/nagios/servers/nb801d2cmed.cfg file and the others you create.
  424.  
  425. 6. Make the entire servers directory and contents owned by root.nagios:
  426. # chown -R root.nagios /etc/nagios/servers
  427.  
  428.  
  429. 7. confirm Nagios config:
  430.  
  431. # /usr/sbin/nagios -v /etc/nagios/nagios.cfg
  432.  
  433. Nagios Core 4.4.3
  434. Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
  435. Copyright (c) 1999-2009 Ethan Galstad
  436. Last Modified: 2019-01-15
  437. License: GPL
  438.  
  439. Website: https://www.nagios.org
  440. Reading configuration data...
  441. Read main config file okay...
  442. Read object config files okay...
  443.  
  444. Running pre-flight check on configuration data...
  445.  
  446. Checking objects...
  447. Checked 8 services.
  448. Checked 1 hosts.
  449. Checked 1 host groups.
  450. Checked 0 service groups.
  451. Checked 1 contacts.
  452. Checked 1 contact groups.
  453. Checked 24 commands.
  454. Checked 5 time periods.
  455. Checked 0 host escalations.
  456. Checked 0 service escalations.
  457. Checking for circular paths...
  458. Checked 1 hosts
  459. Checked 0 service dependencies
  460. Checked 0 host dependencies
  461. Checked 5 timeperiods
  462. Checking global event handlers...
  463. Checking obsessive compulsive processor commands...
  464. Checking misc settings...
  465.  
  466. Total Warnings: 0
  467. Total Errors: 0
  468.  
  469. Things look okay - No serious problems were detected during the pre-flight check
  470. [root@nb801d2c4u ~]#
  471.  
  472.  
  473.  
  474.  
  475. Summary:
  476. - We installed Ansible, Nagios on a central command server.
  477. - We remotely push installed nrpe and Nagios plugins plus nrpe configuration (nrpe.cfg) to remote hosts to be monitored.
  478. - We can add new commands to nrpe.cfg, or new machines to /etc/ansible/hosts, and then execute ansible-playbook nrpe-deploy.yaml again to update all remote machines or install new machines.
  479.  
  480.  
  481. References:
  482. https://docs.ansible.com/ansible/latest/installation_guide/intro_installation.html
  483. - We actually installed from a yum repository, this next link has some good info:
  484. https://support.nagios.com/kb/article/nagios-core-installing-nagios-core-from-source-96.html#RHEL
  485. https://www.ansible.com/overview/how-ansible-works
  486. https://docs.ansible.com/ansible/latest/user_guide/intro_inventory.html
  487. https://docs.ansible.com/ansible/latest/user_guide/guide_rolling_upgrade.html
  488. https://docs.ansible.com/ansible/latest/user_guide/playbooks_best_practices.html#directory-layout
  489. - Excellent HowTo on NRPE install:
  490. https://www.neteye-blog.com/2018/04/how-to-deploy-nrpe-on-centos-7-with-ansible/
  491. - Nagios manual (TOC):
  492. https://assets.nagios.com/downloads/nagioscore/docs/nagioscore/4/en/toc.html
  493. - Nagios QuickStart guide:
  494. https://assets.nagios.com/downloads/nagioscore/docs/nagioscore/4/en/quickstart.html
  495. - NRPE install/config guide:
  496. https://tecadmin.net/install-nrpe-on-centos-rhel/
  497. - yet another Nagios tutorial w/some good info:
  498. https://www.edureka.co/blog/nagios-tutorial/
  499. - Nagios server-side remote host define/config:
  500. https://www.scaleway.com/en/docs/deploy-nagios-on-scaleway/
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement