Principles of Operation

Top  Previous  Next

General Initialization Phase:

Test to make sure program is run from root (superuser). If you are running the Windows release, then the test is to make sure you have administrative privileges, or was installed as a windows service.
Read and parse Command-Line Operations. If no list of devices is supplied to the program at invocation, it will launch a discovery to identify all devices that are currently attached.


Device Discovery:

Once the program authenticates the user for sufficient privilege to run the program, it parses the command options. If the operator supplies the program with a list of devices to run against, the program builds that list and issues the commands to verify that the devices exist and are not offline.  If no list of devices is supplied, the software will initiate a device discovery. This discovery can take several seconds up to over a minute if you have a large UNIX configuration.


If your system's peripheral configuration is rather static, you should bypass the discovery by supplying a list of devices to the program, and modify any scripts you have created to use a hard list of devices.


Device Initialization Phase (IBM AIX):

The program builds a list of device candidates by issuing "lsdev | grep Available | cut -f 1 -d ' ' | grep -e disk -e cd -e sas -e ses".


Device Initialization Phase (Apple OS X 10.2.3 and higher):

This software supports fibre channel devices using the AsteraTech fibre channel HBA only. The drivers must be dated after February 15th, 2003, as that is when they released drivers that communicate with our software.  There is no support for SCSI peripherals.
ATA devices are scanned by enumerating the BSD /dev names. If the device is an IDE (SATA or ATA) disk drive, it will be added to the list for processing.
We build a numeric list of device candidates by performing direct pass-through calls to the AsteraTech driver, and requesting that it returns information for every fibre channel device it discovers on all controllers and ports. This list is a numeric list that starts from 0.
As only fibre-channel devices are supported, no scanning for parallel SCSI, fire wire, or  ATA devices is performed.


Device Initialization Phase (HP-UX):

The program builds a list of device candidates by issuing the /sbin/ioscan -FknC disk and   /sbin/ioscan -FknC tape commands, along with enumerating devices in the /dev/rscsi directory.


Device Initialization Phase (IRIX):

The program builds a list of device candidates by searching for /hw/scsi entries and parsing out the SCSI and fibre channel disk entries which are returned. Then it appends the list with tapes using the wildcard /hw/tape/*nrs. The program  continues in the same way that the LINUX release does, as described earlier in this section.


Device Initialization Phase (LINUX):

The program builds a list of device candidates by issuing the /sbin/sfdisk command and parsing out entries beginning with /dev/s. Then it appends the first SCSI tape device, /dev/st0. IDE devices are detected by scanning /dev/hda through /dev/hdl. (This is not done if SMARTMon-ux is invoked with a list of specific disks to monitor).
For each IDE disk device discovered: (/dev/hda ... /dev/hdl)
Device information is read and stored.
If the disk has S.M.A.R.T. firmware capability, it is enabled. Otherwise the program reports that it cannot enable it for the specific device.
Initial S.M.A.R.T. values and thresholds are read to establish a baseline.
Drive information is displayed and placed into log file in format specified in command-line operations or defaults.
For each SCSI (or Fibre channel or SSA device found):
Two SCSI Inquiries are issued. The first is a standard inquiry.  The second is an inquiry on an optional vendor-specific page to determine the device's unique serial number. (The SCSI specification unfortunately does not require disks to report a serial number programmatically).
If the manufacturer is listed as "Promise", the card is an IDE-based Promise RAID controller. SMARTMon-ux issues the vendor-specific commands to extract make model and serial number information for the drives which make the Promise RAID-0 or RAID-1 data set. (Promise RAID controllers do not support S.M.A.R.T. polling).
If the disk has S.M.A.R.T. firmware capability, it is enabled. Otherwise the program reports that it cannot enable it for the selected device. Note also that SCSI devices support a performance bit which is a S.M.A.R.T. setting that lets the drive run internal S.M.A.R.T. diagnostics without interrupting data flow. If you are in a high-throughput environment such as video streaming, you should invoke this program with the -P option. Not all disk drives support the performance bit (also known as PERF bit). SMARTMon will let the user know if there is a problem setting this value.
The S.M.A.R.T. polling interval is the internal interval programmed into the disk drive.  This is set to 10 minutes, unless changed via the command line option -F.
The disk is checked to see if it supports optional SMART and temperature reporting log pages. If so, they are read to establish a baseline.
Device information is displayed and placed into log file in format specified in command-line operations or defaults. Since SCSI and Fibre channel support devices other than disk drives, all devices discovered are reported. Of course, only disk drives with non-removable media are monitored.
If a disk supports SES (SCSI Enclosure Services), it marks the drive as one which might be capable of communicating with a SES enclosure, provided the -E flag is set.
Note:  The LINUX operating system has a hard limit of 4KB worth of data that can be sent to a /dev/sd* driver. The 4KB limitation will only affect operations such as reading an extremely long log page (which would typically be vendor/device specific), or reading a long defect list (using the -Y) command.  If you prefer,  as of release 1.21, you can also interact with a peripheral that uses the /dev/sg type driver. Our code will allow up to a 64KB transfer, provided your LINUX kernel allows it. We did not design this software to use the sg class driver as LINUX has no reliable method to insure a successful cross-reference to a physical device. Whenever you system boots, it will assign sg class drivers in any order it wishes. We suggest you do not use sg class drivers unless specifically told to use them because a particular command failed.
(Added in 1.23D) The program now insures I/O will be sent to any device specifically entered on the command-line. This was done to facilitate discovery of devices behind Intel and other's zero-channel RAID cards, which generally report the back-end disks under device /dev/sg type drivers. I.e., if you enter ./smartmon-ux -I /dev/sda /dev/sg0 /dev/sg[3-5], then it will poll /dev/sga, /dev/sg0, /dev/sg3, /dev/sg4, and /dev/sg5.  This may result in a duplicate entry as /dev/sda would normally be mapped to /dev/sg0, but this is only way to detect disks masked by a RAID engine.
Important: The LINUX operating system is in process of phasing out support for pass-through SCSI commands to /dev/sd class drivers, so even though this software allows you to perform most actions on a particular device using the /dev/sd class driver, you need to get in habit of using /dev/sg class driver.



Device Initialization Phase (SPARC and Intel Solaris):

The program builds a list of device candidates by searching the /dev/rdsk/*s0, /dev/es, /dev/osa/dev/rdsk/*s0, /dev/rmt/*mn, /dev/scsi/*/* directories and parsing out the SCSI and fibre channel device and enclosure which are valid. It will also report whether a disk is an IDE device, and if it will have to be skipped.


Device Initialization Phase (Tru64):

The program builds a list of device candidates by searching the wild-cards: /devices/disk/*disk*a, /devices/disk/cdrom?a, /devices/tape/tape? and /devices/changer/?.


Device Initialization Phase (VMS):

The program builds a list of device candidates by issuing the SHOW DEVICES command, then tossing any device that has a "$" character in it. Then it examines the remaining entries and ignores them unless they show as having an online or mounted state.


Device Initialization Phase (Microsoft Windows® family operating systems):

The program searches for assigned physical disks at \\.\PHYSICALDRIVE0 through \\.\PHYSICALDRIVE127. This will result in discovering all disk drives which have been assigned a drive letter. It then searches for unconfigured devices by searching the list of \\.\SCSI0 - \\.\SCSI16. Other devices are discovered \\.\TAPE0 - \\.\TAPE15, \\.\SCANNER0 - \\.\SCANNER7, then \\.\CDROM0 ..\\.\CDROM15.
We addressed a serious bug that prevented some devices from being discovered if attached to Emulex LP9002, and some JNI HBAs, depending on the driver levels.  The problem was that these controllers/drivers might map more than one device to a \\.\SCSI type driver. Because of this, we now also query the host adapters to discover devices under all ports, paths, IDs, and LUNs for a particular \\.\SCSI class driver. A device appearing on SCSI2 at Port2, target ID 18, LUN 3 and path0 would be referenced as \\.\SCSI2Port2Path0Target18Lun3. Please see the device naming conventions topic for additional details.
If the O/S indicates there are LUNs, then they are added to the device list as well.
Finally IDE disks and ATAPI (CDROMs) are discovered and added to the table if found.
UAC and appropriate manifest information was added in 1.35 to insure native compatibility with Windows Vista and Windows 2008.



Device Polling:

After all devices have been discovered, they will be polled at a configurable interval. If none is supplied, all disks will be polled every 10 minutes. This is the recommended value defined by the S.M.A.R.T. specification. IDE drives are polled first (if LINUX), then the SCSI disks. Tapes or devices with removable medium are not polled. In the case of IDE disk drives, SMARTMon requests the status result of the internal S.M.A.R.T. diagnostic registers that are constantly being updated during idle times and I/Os by the disk drives themselves. SMARTMon-UX does NOT instruct the disk to run a diagnostic test at the current polling interval. It asks the IDE disk what it's S.M.A.R.T. status is at the time of the poll.


If the device is not an IDE disk, SMARTMon-UX instructs the disk drive to read a block of data into the bit bucket to initiate a S.M.A.R.T. error notification. It also checks the SMART log page and temperature pages, if the disk is equipped with them.


If an error is found (which would indicate a degrading condition, and impending drive failure), a message is logged in the system log file, /vary/log/messages, using the standard UNIX syslog facility. In addition, if EMAIL is enabled and configured on your LINUX system, an email is sent to the address specified. If the operator invoked SMARTMon-UX with the -L option, these messages will be found in the file, /vary/log/smartmon-ux.


If no errors are found, an S.M.A.R.T. test passed message is logged to syslog as well. All messages contain a time-date stamp, and reference smartmon-ux as the program creating the message.


SES Enclosure Polling:

If the device is in an SES enclosure (applicable to fibre channel host-attached enclosures only), the program must first determine if it may be used to communicate with the SES electronics embedded in the intelligent enclosure. This must be done because not all disks may have this capability, as defined by the particular make and model of enclosure.


If SMARTMon determines that the selected device can not communicate with the enclosure, it marks the drive accordingly, and it does not attempt to communicate again.


If the disk can access the SES status registers, the software retrieves them and parses status information. If the status shows there is a problem, the software reports the problem in the manner selected by the software's installer.


SES polling will only be done if the -E command-line option is specified on the command line.


SAF-TE Enclosure Polling:

SAF-TE enclosures will always have a unique SCSI ID and LUN associated with them and appear as a SCSI processor type device. If SMARTMon determines that the device is a processor-type, it will determine if it is a SAF-TE enclosure by sending the appropriate commands and parse the output.


If SMARTMon determines that the selected device is a SAF-TE enclosure, it will mark it as pollable and will poll it if the -E option is specified on the command line. Otherwise the device will not be polled.


SAF-TE polling will only be done if the -E command-line option is specified on the command line.


Threshold Monitoring:

When the program is invoked with the -W option, and a corresponding user-defined threshold file, it loads them into the program's memory so they will not have to be re-loaded.  As thresholds are loaded, the program determines the minimum common polling frequency to examine thresholds. (See Threshold Monitoring and Threshold Configuration sections for details).


At the defined polling period, the program scans through the list of thresholds for a device that needs polling and is on-line.  It issues a Log Sense command to the device for the page holding the required information. The resulting value is compared against the user-defined threshold. If the value read is greater than or equal to the threshold, the appropriate action (email, event log, and/or user-defined script) is taken.


The process continues until all thresholds have been examined. The program sleeps until the next polling period.


Windows Service Program Startup

As of release 1.29, the windows version of the software can be installed and run as a standard NT service program which, by default, will be configured to auto-launch at boot time.