|
Self-Test Diagnostics - ANSI |
Top Previous Next |
|
In release 1.21, we introduced the ability for the user to initiate self-tests. SANtools-specific self-test diagnostics were added in version 1.26. Both have strengths and weaknesses, and you should consider which one (or both) of these tests would be best for you to run in your environment.
This feature is supposed to be available on all SCSI devices, but it has been our experience that it is not supported on everything. Furthermore, the ANSI SCSI specification allows for several types of self tests, both foreground and background, as well as short and extended self tests. As the foreground self-tests suspend all application I/O, we chose not to allow you to enable this feature. We will probably remove this constraint in future releases, depending on customer requests.
Note: This section may be confusing.
Self-tests are complicated, and if you plan on running them on disks that you have mounted file systems on, or on your boot disk, then you need to understand all of this to determine whether or not you really want to run the test this way.
For the impatient ... run -sta to abort a self test, -str to report results, -stsb for short (under 2 minute) test, and -steb for long (normally under 2 hour test, but this would take longer if there is a load on the disk). The -stfd command runs the built-in factory default self test. This test is defined by the manufacturer. The test generally runs for a minute or so.
The types of self-tests that smartmon-ux supports are non-destructive and operate in the background. Once smartmon-ux instructs your device to begin the test, our program continues processing other commands which you may have given it. Your device runs the test independently of smartmon-ux and will only end if either the test completes, terminates because an error is found, or you abort the test (via the -str command).
If you wish to run a self-test on your boot disk, it is best that you bring the system to single-user mode. This is not a requirement, and we have never crashed our O/S running a self-test on the booted device. As system I/O suspends the self-tests, and self-tests temporarily suspend system I/O, the tests will take significantly longer to complete.
We let you run standard, short and extended background self-tests. The SCSI spec. states that the standard self-test is mandatory, and the short and extended self-tests are optional. If your particular device does not support your selected test, the program will notify you after you attempt to initiate the test.
The next paragraphs are paraphrased from the SCSI specifications. They will help you understand what self tests are, what they perform, and how they interact with commands sent from the operating system.
The Short and Extended Self-Tests The short self-test will run in less than two minutes, and it can be used as a sanity check to confirm whether or not a questionable disk is bad. A goal of the extended self-test routine is to simplify factory testing during integration by having devices perform more comprehensive testing without application client intervention. A second goal of the extended self-test is to provide a more comprehensive test to validate the results of a short self-test, if its results are judged by the application client to be inconclusive.
The criteria for the short self-test are that it has one or more segments and completes in two minutes or less. The criteria for the extended self-test are that it is has one or more segments and that the completion time is vendor specific. Any tests performed in the segments are vendor specific.
The following are examples of segments:
The tests performed in the segments may be the same for the short and extended self-tests. The time required by a logical unit (i.e. SCSI or fibre channel device) to complete its extended self-test is reported via a mode page. Our software will report the estimated time to complete the self-test after you initiate the test. Per the SCSI spec, the extended self-test must complete in two hours or less, and the short test must complete in under two minutes. If you do not have time for the device to finish the test, you may always abort the test. This test time is reported by the device, and not the result of an estimate made by our software, so if the number is not accurate, chances are high you have background I/O attempting to interact with the device while the test was running.
Foreground mode (not supported) When the user sends a command specifying a self-test to be performed in the foreground mode, the device server shall return status for that command after the self-test has been completed. While performing a self-test in the foreground mode, the device server shall respond to all commands except INQUIRY, REPORT LUNS, and REQUEST SENSE with a CHECK CONDITION status, a sense key of NOT READY and an additional sense code of LOGICAL UNIT NOT READY, SELF-TEST IN PROGRESS.
If a device server is performing a self-test in the foreground mode and a test segment error occurs during the test, the device server shall update the Self-Test Results log page (reported by smartmon-ux -C) and report CHECK CONDITION status with a sense key of HARDWARE ERROR and an additional sense code of LOGICAL UNIT FAILED SELF-TEST. The application client may obtain additional information about the failure by reading the Self-Test Results log page. If the device server is unable to update the Self-Test Results log page, it shall return a CHECK CONDITION status with a sense key of HARDWARE ERROR and an additional sense code of LOGICAL UNIT UNABLE TO UPDATE SELF-TEST LOG.
Note that very few disk drives support the foreground mode.
Background mode (supported) When the self-test runs in the background mode, the device server shall return status for that command as soon as the CDB has been validated. After returning status for the SEND DIAGNOSTICS command specifying a self-test to be performed in the background mode, the device server shall initialize the Self-Test Results log page. While the device server is performing a self-test in the background mode, it shall terminate with a CHECK CONDITION status any self-test command it receives.
When terminating the SEND DIAGNOSTICS command, the sense key shall be set to NOT READY and the additional sense code shall be set to LOGICAL UNIT NOT READY, SELF-TEST IN PROGRESS. While performing a self-test in the background mode, the device server shall suspend the self-test to service any other commands received with the exceptions listed in table 29. Suspension of the self-test to service the command shall occur as soon as practical and shall not take longer than two seconds.
Table 29 — Exception commands for background self-tests [From ANSI Spec]
Device types not listed in this table do not have commands that are exceptions for background self-tests, other than those listed above for all device types.
If one of the exception commands listed in table 29 is received, the device server shall abort the self-test, update the self-test log, and service the command as soon as practical but not longer than two seconds after the CDB has been validated. An application client may terminate a self-test that is being performed in the background mode by issuing a SEND DIAGNOSTICS command with the SELF-TEST CODE field set to 100b (Abort background self-test function). This corresponds to sending the -str option with smartmon-ux.
Elements common to foreground and background self-test modes Although devices report the results of the last twenty most recently completed self-tests, smartmon-ux reports only the last 3 self tests via the -C option, where it reports the results in human-readable text. If you require the results of the last 20 tests, you must manually decode the log page hex dump (-A option).
Self-Test Results log page is page 10 hex. Smartmon-ux reports the results and status of the tests based on information from that page.
Table 30 - Self-Test Mode Summary (From ANSI Spec)
Let's look at some program output:
Case 1: Initiate a short background self test, for scsi disk at /dev/sda [root@rh90 smartmon]# ./smartmon-ux -stsb /dev/sda SMARTMon-ux [Release 1.21, Build 26-JUL-2003] - Copyright 2003 SANtools, Inc. http://www.SANtools.com Discovered SEAGATE ST373307LC S/N "3HZ0381E" on /dev/sda (SMART enabled)(70007 MB) - Initiating short background self-test on SEAGATE ST373307LC at /dev/sda Terminating program.
The test was launched and the program immediately returned to the command-line prompt. Remember, self-tests are performed by the device directly. Once the command is kicked off, control passes back to the operating system.
Case 2: See what is going on, a few seconds after initiating a self-test
[root@rh90 smartmon]# ./smartmon-ux -str /dev/sda SMARTMon-ux [Release 1.21, Build 26-JUL-2003] - Copyright 2003 SANtools, Inc. http://www.SANtools.com Discovered SEAGATE ST373307LC S/N "3HZ0381E" on /dev/sda (SMART enabled)(70007 MB) - Results from last self-test: Short background test in progress Terminating program.
The test is still running. Let's wait a few minutes and ask for the results again.
[root@rh90 smartmon]# ./smartmon-ux -str /dev/sda SMARTMon-ux [Release 1.21, Build 26-JUL-2003] - Copyright 2003 SANtools, Inc. http://www.SANtools.com Discovered SEAGATE ST373307LC S/N "3HZ0381E" on /dev/sda (SMART enabled)(70007 MB) - Results from last self-test: Short background test completed w/o error Terminating program.
The test completed without any errors. What can be seen from the -C option which reports all log page results? We have truncated part of the output to focus on the part we care about.
[root@rh90 smartmon]# ./smartmon-ux -C /dev/sda SMARTMon-ux [Release 1.21, Build 26-JUL-2003] - Copyright 2003 SANtools, Inc. http://www.SANtools.com Discovered SEAGATE ST373307LC S/N "3HZ0381E" on /dev/sda (Not Enabling SMART)(70007 MB) Statistical log pages dump below [# of bytes reserved for value in device]: Logical blocks sent to initiators: 74497749 [4] ... Self-test (short background): Completed w/o error @ 1769 powered hours Self-test (short background): Completed w/o error @ 1765 powered hours Self-test (extended background): Completed w/o error @ 1755 powered hours
The drive had been powered up for 1769 cumulative hours when the test was completed. The cumulative hours figure is reported by the Seagate disk and not some internal timer running on your operating system or our software. Below is what you would see if you initiated the extended test. The software will start the test and tell you how long the drive reports it will take.
[root@rh90 smartmon]# ./smartmon=ux -steb /dev/sda SMARTMon-ux [Release 1.21, Build 26-JUL-2003] - Copyright 2003 SANtools, Inc. http://www.SANtools.com
Discovered SEAGATE ST373307LC S/N "3HZ0381E" on /dev/sda (SMART enabled)(70007 MB) - Initiating extended (25 minutes) background self-test on SEAGATE ST373307LC at /dev/sda
Finally, if the self-test failed, you might see something like below:
[root@rh90 smartmon]# ./smartmon-ux -str /dev/sda SMARTMon-ux [Release 1.21, Build 26-JUL-2003] - Copyright 2003 SANtools, Inc. http://www.SANtools.com Discovered SEAGATE ST373307LC S/N "3HZ0381E" on /dev/sda (Not Enabling SMART)(70007 MB) - Results from last self-test: Short background test FAILED in segment #0 at Block #00000000 000238CFh @ 21 powered hours [Drive media failed] Unrecovered read error ASC=1 1 ASCQ=00, SelfTestByte=00, VendorSpecificByte=E4
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||