Sunday, August 12, 2018

Error report mail notifications with errnotify

Having the pleasure of working across many client accounts, it’s funny to see some of the convoluted scripts people have written to receive alerts from the AIX error log daemon. Early in my AIX career, I used to do the exact same thing, and it involved a whole bunch of SSH keys, some text manipulation, crontab, and sendmail. Wouldn’t it be nicer if AIX had some way of doing all of this for us? Well, you know I wouldn’t ask the question if the answer wasn’t yes!
AIX has an Error Notification object class in the Object Data Manager (ODM). By default, there are a number of predefined errnotify entries, and each time an error is logged via errlog, it checks if that error entry matches the criteria of any of the Error Notification objects. What we’re about to do, is add another entry into the errnotify object class to be checked and actioned upon.

The end result will be AIX sending an email upon any new entries into the error log. I also discuss how you can further refine alerts for particular error types and classes. The only prerequisite in this solution is that you’re able to send mail out from the server.
Before we get started, we need to understand all the errnotify object class descriptors that can be configured. Table 1 below was created from information available on the IBM Information Center [1].
Table 1
Descriptor Value Description
en_alertflg TRUEFALSE Identifies whether the error can be alerted. This descriptor is provided for use by alert agents associated with network management applications using the SNA Alert Architecture.
en_class H – (Hardware Error Class)S – (Software Error Class)O – (Messages for the errlogger command)U – (Undetermined) Identifies the class of the error log entries to match.
en_crcid Identifier Specifies the error identifier associated with a particular error.
en_dup TRUEFALSE If set, identifies whether duplicate errors as defined by the kernel should be matched.
en_err64 TRUEFALSE If set, identifies whether errors from a 64-bit or 32-bit environment should be matched.
en_label Identifier Specifies the label associated with a particular error identifier as defined in the output of the errpt -t command.
en_method Path to application Specifies a user-programmable action, such as a shell script or command string, to be run when an error matching the selection criteria of this Error Notification object is logged. Additional arguments shows in Table 2.
en_name Text string Uniquely identifies the object. This unique name is used when removing the object.
en_persistenceflg 0 – non-persistent (removed at boot time)1 – persistent (persists through boot) Designates whether the Error Notification object should be automatically removed when the system is restarted.
en_pid Numeric Specifies a process ID (PID) for use in identifying the Error Notification object. Objects that have a PID specified should have the en_persistenceflg descriptor set to 0.
en_rclass Device class Identifies the class of the failing resource. For the hardware error class, the resource class is the device class. The resource error class is not applicable for the software error class.
en_resource Text string Identifies the name of the failing resource. For the hardware error class, a resource name is the device name.
en_rtype Text string Identifies the type of the failing resource. For the hardware error class, a resource type is the device type by which a resource is known in the devices object class.
en_symptom TRUE Enables notification of an error accompanied by a symptom string when set to TRUE.
en_type INFO – (Informational)PEND – (Impending loss of availability) PERM – (Permanent)
PERF – (Unacceptable performance degradation)
TEMP – (Temporary)
UNKN – (Unknown)
Identifies the severity of error log entries to match.
The en_method descriptor contains the command which will be run when an entry in the error log matches our new errnotify object. This descriptor has a number of additional parameters which can also be used to add further details on the command line. A list of all parameters and their values is listed in Table 2.
Table 2
Argument Description
$1 Sequence number from the error log entry
$2 Error ID from the error log entry
$3 Class from the error log entry
$4 Type from the error log entry
$5 Alert flags value from the error log entry
$6 Resource name from the error log entry
$7 Resource type from the error log entry
$8 Resource class from the error log entry
$9 Error label from the error log entry
Now that we have an understanding on the values that we can configure in the errnotify object, we can create a simple “catch-all” errnotify object which will trigger an email for all entries logged in the error log.

Step 1
Create a temporary text file (e.g. /tmp/errnotify) with the following text:
errnotify:
  en_name = "mail_all_errlog"
  en_persistenceflg = 1
  en_method = "/usr/bin/errpt -a -l $1 | mail -s \"errpt $9 on `hostname`\" user@mail.com"
Step 2
Add the new entry into the ODM.
# odmadd /tmp/errnotify
Step 3
Test that it’s working by adding an entry into the error log.
# errlogger 'This is a test entry'
If required, you can delete the ODM entry with the following command:
# odmdelete -q 'en_name=mail_all_errlog' -o errnotify
0518-307 odmdelete: 1 objects deleted.

That’s it! A nice and simple way of getting email alerts for new entries in the error log, without the use of scripts. The handy thing about this solution is that you can include this into your AIX golden image so it’s already configured for any new AIX installations. I recommend that for environments which are supported by multiple systems administrators, that you create a shared mailbox. This way, you can manage the adding/removing of users to the reports at the mail server, and not on the hosts.
The above “catch-all” solution is great, but there may be times that you only want to be notified for particular errors. For example, you might only want to be notified for hardware errors, or when the error type is permanent. To do this, you just need to modify the particular object class descriptors.
The example below will only notify on permanent hardware messages in the error log.
errnotify:
  en_name = "mail_perm_hw"
  en_class = H
  en_persistenceflg = 1
  en_type = PERM
  en_method = "/usr/bin/errpt -a -l $1 | mail -s \"Permanent hardware errpt $9 on `hostname`\" user@mail.com"

The errnotify object class has been around for quite some time, and probably isn’t something new to veteran AIX system administrators. While some enterprise environments use 3rd party utilities to monitor errlog on AIX, this is a quick and easy alternative method of receiving notifications without all that much effort.
[1] – http://pic.dhe.ibm.com/infocenter/aix/v6r1/index.jsp?topic=%2Fcom.ibm.aix.genprogc%2Fdoc%2Fgenprogc%2Ferror_notice.htm

No comments:

Post a Comment