Having the pleasure of working across many client accounts, it’s
funny to see some of the convoluted scripts people have written to
receive alerts from the AIX error log daemon. Early in my AIX career, I
used to do the exact same thing, and it involved a whole bunch of SSH
keys, some text manipulation, crontab, and sendmail. Wouldn’t it be
nicer if AIX had some way of doing all of this for us? Well, you know I
wouldn’t ask the question if the answer wasn’t yes!
AIX has an Error Notification object class in the Object Data Manager (ODM). By default, there are a number of predefined errnotify entries, and each time an error is logged via errlog, it checks if that error entry matches the criteria of any of the Error Notification objects. What we’re about to do, is add another entry into the errnotify object class to be checked and actioned upon.
The end result will be AIX sending an email upon any new entries into the error log. I also discuss how you can further refine alerts for particular error types and classes. The only prerequisite in this solution is that you’re able to send mail out from the server.
Before we get started, we need to understand all the errnotify object class descriptors that can be configured. Table 1 below was created from information available on the IBM Information Center [1].
Table 1
The en_method descriptor contains the command which will be
run when an entry in the error log matches our new errnotify object.
This descriptor has a number of additional parameters which can also be
used to add further details on the command line. A list of all
parameters and their values is listed in Table 2.
Table 2
Now that we have an understanding on the values that we can configure
in the errnotify object, we can create a simple “catch-all” errnotify
object which will trigger an email for all entries logged in the error
log.
Step 1
Create a temporary text file (e.g. /tmp/errnotify) with the following text:
Step 2
Add the new entry into the ODM.
Step 3
Test that it’s working by adding an entry into the error log.
If required, you can delete the ODM entry with the following command:
That’s it! A nice and simple way of getting email alerts for new entries in the error log, without the use of scripts. The handy thing about this solution is that you can include this into your AIX golden image so it’s already configured for any new AIX installations. I recommend that for environments which are supported by multiple systems administrators, that you create a shared mailbox. This way, you can manage the adding/removing of users to the reports at the mail server, and not on the hosts.
The above “catch-all” solution is great, but there may be times that you only want to be notified for particular errors. For example, you might only want to be notified for hardware errors, or when the error type is permanent. To do this, you just need to modify the particular object class descriptors.
The example below will only notify on permanent hardware messages in the error log.
The errnotify object class has been around for quite some time, and probably isn’t something new to veteran AIX system administrators. While some enterprise environments use 3rd party utilities to monitor errlog on AIX, this is a quick and easy alternative method of receiving notifications without all that much effort.
[1] – http://pic.dhe.ibm.com/infocenter/aix/v6r1/index.jsp?topic=%2Fcom.ibm.aix.genprogc%2Fdoc%2Fgenprogc%2Ferror_notice.htm
AIX has an Error Notification object class in the Object Data Manager (ODM). By default, there are a number of predefined errnotify entries, and each time an error is logged via errlog, it checks if that error entry matches the criteria of any of the Error Notification objects. What we’re about to do, is add another entry into the errnotify object class to be checked and actioned upon.
The end result will be AIX sending an email upon any new entries into the error log. I also discuss how you can further refine alerts for particular error types and classes. The only prerequisite in this solution is that you’re able to send mail out from the server.
Before we get started, we need to understand all the errnotify object class descriptors that can be configured. Table 1 below was created from information available on the IBM Information Center [1].
Table 1
Descriptor | Value | Description |
---|---|---|
en_alertflg | TRUEFALSE | Identifies whether the error can be alerted. This descriptor is provided for use by alert agents associated with network management applications using the SNA Alert Architecture. |
en_class | H – (Hardware Error Class)S – (Software Error Class)O – (Messages for the errlogger command)U – (Undetermined) | Identifies the class of the error log entries to match. |
en_crcid | Identifier | Specifies the error identifier associated with a particular error. |
en_dup | TRUEFALSE | If set, identifies whether duplicate errors as defined by the kernel should be matched. |
en_err64 | TRUEFALSE | If set, identifies whether errors from a 64-bit or 32-bit environment should be matched. |
en_label | Identifier | Specifies the label associated with a particular error identifier as defined in the output of the errpt -t command. |
en_method | Path to application | Specifies a user-programmable action, such as a shell script or command string, to be run when an error matching the selection criteria of this Error Notification object is logged. Additional arguments shows in Table 2. |
en_name | Text string | Uniquely identifies the object. This unique name is used when removing the object. |
en_persistenceflg | 0 – non-persistent (removed at boot time)1 – persistent (persists through boot) | Designates whether the Error Notification object should be automatically removed when the system is restarted. |
en_pid | Numeric | Specifies a process ID (PID) for use in identifying the Error Notification object. Objects that have a PID specified should have the en_persistenceflg descriptor set to 0. |
en_rclass | Device class | Identifies the class of the failing resource. For the hardware error class, the resource class is the device class. The resource error class is not applicable for the software error class. |
en_resource | Text string | Identifies the name of the failing resource. For the hardware error class, a resource name is the device name. |
en_rtype | Text string | Identifies the type of the failing resource. For the hardware error class, a resource type is the device type by which a resource is known in the devices object class. |
en_symptom | TRUE | Enables notification of an error accompanied by a symptom string when set to TRUE. |
en_type | INFO – (Informational)PEND – (Impending loss of availability)
PERM – (Permanent) PERF – (Unacceptable performance degradation) TEMP – (Temporary) UNKN – (Unknown) |
Identifies the severity of error log entries to match. |
Table 2
Argument | Description |
---|---|
$1 | Sequence number from the error log entry |
$2 | Error ID from the error log entry |
$3 | Class from the error log entry |
$4 | Type from the error log entry |
$5 | Alert flags value from the error log entry |
$6 | Resource name from the error log entry |
$7 | Resource type from the error log entry |
$8 | Resource class from the error log entry |
$9 | Error label from the error log entry |
Step 1
Create a temporary text file (e.g. /tmp/errnotify) with the following text:
errnotify: en_name = "mail_all_errlog" en_persistenceflg = 1 en_method = "/usr/bin/errpt -a -l $1 | mail -s \"errpt $9 on `hostname`\" user@mail.com"
Add the new entry into the ODM.
# odmadd /tmp/errnotify
Test that it’s working by adding an entry into the error log.
# errlogger 'This is a test entry'
# odmdelete -q 'en_name=mail_all_errlog' -o errnotify 0518-307 odmdelete: 1 objects deleted.
That’s it! A nice and simple way of getting email alerts for new entries in the error log, without the use of scripts. The handy thing about this solution is that you can include this into your AIX golden image so it’s already configured for any new AIX installations. I recommend that for environments which are supported by multiple systems administrators, that you create a shared mailbox. This way, you can manage the adding/removing of users to the reports at the mail server, and not on the hosts.
The above “catch-all” solution is great, but there may be times that you only want to be notified for particular errors. For example, you might only want to be notified for hardware errors, or when the error type is permanent. To do this, you just need to modify the particular object class descriptors.
The example below will only notify on permanent hardware messages in the error log.
errnotify: en_name = "mail_perm_hw" en_class = H en_persistenceflg = 1 en_type = PERM en_method = "/usr/bin/errpt -a -l $1 | mail -s \"Permanent hardware errpt $9 on `hostname`\" user@mail.com"
The errnotify object class has been around for quite some time, and probably isn’t something new to veteran AIX system administrators. While some enterprise environments use 3rd party utilities to monitor errlog on AIX, this is a quick and easy alternative method of receiving notifications without all that much effort.
[1] – http://pic.dhe.ibm.com/infocenter/aix/v6r1/index.jsp?topic=%2Fcom.ibm.aix.genprogc%2Fdoc%2Fgenprogc%2Ferror_notice.htm
No comments:
Post a Comment