Self-Test Diagnostics - ANSI

Top  Previous  Next

In release 1.21, we introduced the ability for the user to initiate self-tests. SANtools-specific self-test diagnostics were added in version 1.26. Both have strengths and weaknesses, and you should consider which one (or both) of these tests would be best for you to run in your environment.

 

Before going further, it is important to understand that the various ANSI specifications for peripherals mandate several types of self-tests. One is mandatory (unless your peripheral is ancient), many are optional. If you send a certain type of self-test to a peripheral that does not support it, then the device is obligated to reject the command.  Our software will not tell you ahead of time that a particular device supports a certain self-test function.  Well will however, report if it was rejected, or accepted. The ANSI self-test specifications define foreground and background self tests, as well as sort and long self tests that may run for a few seconds to a few hours.

 

Some self-tests, like a foreground test, will lock up your peripheral while it is running.. Others will affect performance by only a few percentage points.  Per the spec, self-tests can be aborted, and you can report ongoing status at any time.  Per real-life situations, we have found that some peripherals and firmware revisions do not correctly allow self-tests to be terminated nor do all of them allow the user to request an update while they are running.  The SCSI spec. states that the standard self-test is mandatory, and the short and extended self-tests are optional. If your particular device does not support your selected test, the program will notify you after you attempt to initiate the test.

 

Once smartmon-ux instructs your device to begin the test, our program continues processing other commands which you may have given it. Your device runs the test independently of smartmon-ux and will only end if either the test completes, terminates because an error is found, or you abort the test (via the -str command).

 

Self-Tests for Tapes, Autochangers, and everything but Disk Drives

SMARTMonUX will allow you to run the embedded self-tests that manufacturers include in their firmware. A great number of our customers buy our software so they can do nothing more than test peripherals and tapes on non-windows operating systems.

 

Self-Tests for Disk and Random-access Devices

If you have SCSI, SAS, or fibre channel disks, then there are no constraints (except under Apple OS X, due to lack of pass-though support for SCSI peripherals).   If, however, you have ATA or SATA disk drives, then there are limitations under several operating systems.  We provide full support for the native ATA/SATA self-tests under Windows only at the time this revision of the manual was placed online.  If you need to perform self-tests of ATA/SATA disks on other operating systems, then please contact us for status on extending this function to other operating systems.

 

SCSI vs. non-SCSI Protocols.

If the selected device is an ATA or SATA disk drive, then the self-test command will end with the letter 'a'.  For most self-tests, the concepts are the same whether running a SATA disk drive or a SCSI tape, and the commands are nearly the same.

 

If you wish to run a background self-test (-steb, for example) on your boot disk, it is best that you bring the system to single-user mode. This is not a requirement, and we have never crashed our O/S running a bactground self-test on the booted device. As system I/O suspends the self-tests, and self-tests temporarily suspend system I/O, the tests will take significantly longer to complete.

 

What do Self-tests Do?

The next paragraphs are paraphrased from the SCSI specifications. They will help you understand what self tests are, what they perform, and how they interact with commands sent from the operating system.

 

The Short and Extended Self-Tests

The short self-test will run in less than two minutes, and it can be used as a sanity check to confirm whether or not a questionable disk is bad. A goal of the extended self-test routine is to simplify factory testing during integration by having devices perform more comprehensive testing without application client intervention. A second goal of the extended self-test is to provide a more comprehensive test to validate the results of a short self-test, if its results are judged by the application client to be inconclusive.

 

The criteria for the short self-test are that it has one or more segments and completes in two minutes or less. The criteria for the extended self-test are that it is has one or more segments and that the completion time is vendor specific. Any tests performed in the segments are vendor specific.

 

The following are examples of segments:

An electrical segment wherein the logical unit tests its own electronics. The tests in this segment are vendor specific, but some examples of tests that may be included are: a buffer RAM test, a read/write circuitry test, and/or a test of the read/write head elements.
A seek/servo segment wherein a device tests it capability to find and servo on data tracks.
A read/verify scan segment wherein a device performs read scanning of some or all of the medium surface.

 

The tests performed in the segments may be the same for the short and extended self-tests. The time required by a logical unit (i.e. SCSI or fibre channel device) to complete its extended self-test is reported via a mode page. Our software will report the estimated time to complete the self-test after you initiate the test. Per the SCSI spec, the extended self-test must complete in two hours or less, and the short test must complete in under two minutes. If you do not have time for the device to finish the test, you may always abort the test. This test time is reported by the device, and not the result of an estimate made by our software, so if the number is not accurate, chances are high you have background I/O attempting to interact with the device while the test was running.

 

Foreground mode

When the user sends a command specifying a self-test to be performed in the foreground mode, the device server shall return status for that command after the self-test has been completed. While performing a self-test in the foreground mode, the device server shall respond to all commands except INQUIRY, REPORT LUNS, and REQUEST SENSE with a CHECK CONDITION status, a sense key of NOT READY and an additional sense code of LOGICAL UNIT NOT READY, SELF-TEST IN PROGRESS.

 

If a device server is performing a self-test in the foreground mode and a test segment error occurs during the test, the device server shall update the Self-Test Results log page (reported by smartmon-ux -C) and report CHECK CONDITION status with a sense key of HARDWARE ERROR and an additional sense code of LOGICAL UNIT FAILED SELF-TEST. The application client may obtain additional information about the failure by reading the Self-Test Results log page. If the device server is unable to update the Self-Test Results log page, it shall return a CHECK CONDITION status with a sense key of HARDWARE ERROR and an additional sense code of LOGICAL UNIT UNABLE TO UPDATE SELF-TEST LOG.

 

Note that very few disk drives support the foreground mode.

 

Background mode

When the self-test runs in the background mode, the device server shall return status for that command as soon as the CDB has been validated. After returning status for the SEND DIAGNOSTICS command specifying a self-test to be performed in the

background mode, the device server shall initialize the Self-Test Results log page. While the device server is performing a self-test in the background mode, it shall terminate with a CHECK CONDITION status any self-test command it receives.

 

When terminating the SEND DIAGNOSTICS command, the sense key shall be set to NOT READY and the additional sense code shall be set to LOGICAL UNIT NOT READY, SELF-TEST IN PROGRESS. While performing a self-test in the background mode, the device server shall suspend the self-test to service any other commands received with the exceptions listed in table 29. Suspension of the self-test to service the command shall occur as soon as practical and shall not take longer than two seconds.

 

Table 29 Exception commands for background self-tests [From ANSI Spec]

Device Type

Command Reference

All device types

SEND DIAGNOSTIC (with SELF-TEST CODE field set to 100b)

WRITE BUFFER (with the mode set to any download microcode option)

Direct access

(i.e, disks)

FORMAT UNIT
START/STOP UNIT

 

Sequential access

(i.e. tapes)

ERASE
FORMAT MEDIUM
LOAD UNLOAD
LOCATE
READ
READ POSITION
READ REVERSE
REWIND
SPACE
VERIFY
WRITE
WRITE BUFFER
WRITE FILEMARKS

Medium Changer

EXCHANGE MEDIUM
INITIALIZE ELEMENT STATUS
MOVE MEDIUM
POSITION TO ELEMENT
READ ELEMENT STATUS
WRITE BUFFER

 

Device types not listed in this table do not have commands that are exceptions for background self-tests, other than those listed above for all device types.

 

If one of the exception commands listed in table 29 is received, the device server shall abort the self-test, update

the self-test log, and service the command as soon as practical but not longer than two seconds after the CDB has

been validated. An application client may terminate a self-test that is being performed in the background mode by issuing a SEND DIAGNOSTICS command with the SELF-TEST CODE field set to 100b (Abort background self-test function). This corresponds to sending the -str option with smartmon-ux.

 

Elements common to foreground and background self-test modes

Although devices report the results of the last twenty most recently completed self-tests, smartmon-ux reports only the last 3 self tests via the -C option, where it reports the results in human-readable text. If you require the results of the last 20 tests, you must manually decode the log page hex dump (-A option).

 

Self-Test Results log page is page 10 hex. Smartmon-ux reports the results and status of the tests based on information from that page.

 

Table 30 - Self-Test Mode Summary (From ANSI Spec)

Mode

When status is returned

How to abort the test

Processing of subsequent commands while self-test is executing

Self-test failure reporting

Foreground

(Not supported with SMARTMon-UX)

After the self-test is complete

N/A - Not supported with smartmon-ux

If command is INQUIRY, REPORT LUNS, or REQUEST SENSE, process normally.

 

Otherwise terminate with CHECK CONDITION status, NOT READY sense key,  and LOGICAL UNIT NOT READY, SELF-TEST IN PROGRESS sense code.

Terminate with CHECK CONDITION status, HARDWARE ERROR sense key, and LOGICAL UNIT FAILED SELF-TEST or LOGICAL UNIT UNABLE TO UPDATE SELF-TEST LOG sense code.

Background

-stsb

(short test)

 

-steb

(extended test)

 

-stfd

(factory default test)

 

After the CDB is complete

(after -steb, -stfd, -stsb issued)

 

Send -sta command

Process the command with up to 2 second delay.

Send -str command to show just self-test results,

 

or -C to show all log page results in ASCII,

 

or -A to show all log page results in hex

 

 

Note: See the SANtools scrub functions which also perform self tests. They may be more appropriate for your requirements.

 

 

 

Let's look at some program output:

 

Case 1: Initiate a short background self test, for scsi disk at /dev/sda

[root@rh90 smartmon]# ./smartmon-ux -stsb /dev/sda

SMARTMon-ux [Release 1.21, Build 26-JUL-2003] - Copyright 2003 SANtools, Inc. http://www.SANtools.com

Discovered SEAGATE ST373307LC S/N "3HZ0381E" on /dev/sda (SMART enabled)(70007 MB)

- Initiating short background self-test on SEAGATE ST373307LC at /dev/sda

Terminating program.

 

The test was launched and the program immediately returned to the command-line prompt. Remember, self-tests are performed by the device directly. Once the command is kicked off,  control passes back to the operating system.

 

Case 2: See what is going on, a few seconds after initiating a self-test

 

[root@rh90 smartmon]# ./smartmon-ux -str /dev/sda

SMARTMon-ux [Release 1.21, Build 26-JUL-2003] - Copyright 2003 SANtools, Inc. http://www.SANtools.com

Discovered SEAGATE ST373307LC S/N "3HZ0381E" on /dev/sda (SMART enabled)(70007 MB)

- Results from last self-test: Short background test in progress

Terminating program.

 

The test is still running. Let's wait a few minutes and ask for the results again.

 

[root@rh90 smartmon]# ./smartmon-ux -str /dev/sda

SMARTMon-ux [Release 1.21, Build 26-JUL-2003] - Copyright 2003 SANtools, Inc. http://www.SANtools.com

Discovered SEAGATE ST373307LC S/N "3HZ0381E" on /dev/sda (SMART enabled)(70007 MB)

- Results from last self-test: Short background test completed w/o error

Terminating program.

 

The test completed without any errors. What can be seen from the -C option which reports all log page results?  We have truncated part of the output to focus on the part we care about.

 

[root@rh90 smartmon]# ./smartmon-ux -C /dev/sda

SMARTMon-ux [Release 1.21, Build 26-JUL-2003] - Copyright 2003 SANtools, Inc. http://www.SANtools.com

Discovered SEAGATE ST373307LC S/N "3HZ0381E" on /dev/sda (Not Enabling SMART)(70007 MB)

Statistical log pages dump below [# of bytes reserved for value in device]:

  Logical blocks sent to initiators: 74497749 [4]

  ...

  Self-test (short background): Completed w/o error @ 1769 powered hours

  Self-test (short background): Completed w/o error @ 1765 powered hours

  Self-test (extended background): Completed w/o error @ 1755 powered hours

 

The drive had been powered up for 1769 cumulative hours when the test was completed. The cumulative hours figure is reported by the Seagate disk and not some internal timer running on your operating system or our software. Below is what you would see if you initiated the extended test. The software will start the test and tell you how long the drive reports it will take.

 

[root@rh90 smartmon]# ./smartmon=ux -steb /dev/sda

SMARTMon-ux [Release 1.21, Build 26-JUL-2003] - Copyright 2003 SANtools, Inc. http://www.SANtools.com

 

Discovered SEAGATE ST373307LC S/N "3HZ0381E" on /dev/sda (SMART enabled)(70007 MB)

- Initiating extended (25 minutes) background self-test on SEAGATE ST373307LC at /dev/sda

 

Finally, if the self-test failed, you might see something like below:

 

[root@rh90 smartmon]# ./smartmon-ux -str  /dev/sda

SMARTMon-ux [Release 1.21, Build 26-JUL-2003] - Copyright 2003 SANtools, Inc. http://www.SANtools.com

Discovered SEAGATE ST373307LC S/N "3HZ0381E" on /dev/sda (Not Enabling SMART)(70007 MB)

- Results from last self-test: Short background test FAILED in segment #0 at Block #00000000 000238CFh @ 21 powered hours [Drive media failed] Unrecovered read error ASC=1

1 ASCQ=00, SelfTestByte=00, VendorSpecificByte=E4

 

 

Self-tests for a SATA Disk Drive Examples

 

Case 1: Initiate an extended background self test, for a SATA disk running on a Windows XP-64 machine, then look at the results. We are using a disk that has 3 known bad blocks on it.

 

E:\Test1>smartmon-ux -steba \\.\PhysicalDrive1

SMARTMon-UX [Release 1.41, Build  1-NOV-2009] - Copyright 2001-2009 SANtools(R), Inc. http://www.SANtools.com

Discovered Maxtor 6L100P0 S/N "L23MTW0G" on \\.\PhysicalDrive1 (SMART Enabled)

The current device temperature is:  43C (109F) degrees

Initiating extended background self-test on Maxtor 6L100P0 S/N "L23MTW0G"

 

Program Ended.

 

Note, this returned immediately.  We then queried the drive to see what happened..

 

E:\Test1>smartmon-ux -stra \\.\PhysicalDrive1

SMARTMon-UX [Release 1.41, Build  1-NOV-2009] - Copyright 2001-2009 SANtools(R), Inc. http://www.SANtools.com

Discovered Maxtor 6L100P0 S/N "L23MTW0G" on \\.\PhysicalDrive1 (SMART Enabled)

The current device temperature is:  43C (109F) degrees

 

  Self-test (Short offline) completed - FAILED with read error at block #00000000 00016C0F at 13544 powered hours

  Self-test (Short offline) completed - FAILED with read error at block #00000000 00016C0F at 13544 powered hours

  Self-test (Short offline) completed - FAILED with read error at block #00000000 00016C0F at 12810 powered hours

  Self-test (Short offline) completed - FAILED with read error at block #00000000 00016C0F at 12810 powered hours

  Self-test (Short offline) completed - FAILED with read error at block #00000000 00016C0F at 12810 powered hours

  Self-test (Short offline) completed - FAILED with read error at block #00000000 00016C0F at 12810 powered hours

  Self-test (Extended offline) completed - FAILED with read error at block #00000000 00016C0F at 12809 powered hours

 

Program Ended.

 

Above also returned immediately.  We can see that there is a bad block at hex address 00016C0F.  We can also see that this same bad block consistently appears in all of the self-tests we ran while creating this section of the manual.

 

Now compare with the results of running the -verify on the same disk. The -verify took nearly 30 minutes, but it returned all 3 bad blocks.

smartmon-ux -verify \\.\PhysicalDrive1

SMARTMon-UX [Release 1.41, Build  1-NOV-2009] - Copyright 2001-2009 SANtools(R), Inc. http://www.SANtools.com

Discovered Maxtor 6L100P0 S/N "L23MTW0G" on \\.\PhysicalDrive1 (SMART Enabled)

 The current device temperature is:  39C (102F) degrees

 

Beginning SANtools read/verify test for Maxtor 6L100P0 at \\.\PhysicalDrive1 (195813072 blocks, blocksize=512)

 

Read/Verify error summary:

 Event#   PowerOnMins   HexBlockNumber   State   Reassignment Status             AdditionalInfo

      0             -            16c0f   ERR     reassign failed, data invalid   Block 93184 ERR/DEV/STAT: 

                                                                                 00/F0/51 Error: DRDY, DSC, ERR

      1             -            219a7   ERR     reassign failed, data invalid   Block 137472 ERR/DEV/STAT: 

                                                                                 00/F0/51 Error: DRDY, DSC, ERR

      2             -            21a19   ERR     reassign failed, data invalid   Block 137728 ERR/DEV/STAT: 

                                                                                 00/F0/51 Error: DRDY, DSC, ERR

 

Self-Tests FAQ

Q. What are the dangers of running a self-test?

A. Worst-case scenario, if you kick off a foreground self test on the disk that your operating system is booted to, then you will crash your O/S, and your disk will be unresponsive until either the self-test completes or you power cycle the disk. Our software does not care or warn the operator if they run such a test on the boot disk  Sometimes this is the only thing you can do if you want to run tests on your boot disk.  We will not second-guess you or stand in your way.

 

At the conclusion of a self-test, then you may have to recycle power on the peripheral, especially if you ran a foreground test.  Sometimes the host senses that the peripheral went away, so it stops talking to it.  Other times the person(s) who wrote the self-test did it in such a way that requires a power cycle.

 

Q. What if the self-test locks up and I have to reboot, how do I know if it completed and get results?

A. The results of self-tests are non-volatile.  Run smartmon-ux -stra or -str, depending on type of peripheral, and it will report the results of the last few self-tests that the device ran.

 

Q. I have a lot of disks that need testing, can I run multiple self-tests concurrently?

A. Absolutely.  In fact, if you run the extended background tests then you can easily test 100 disk drives at the same time with near zero host overhead.  The self-tests run inside of the selected peripheral's CPU and firmware.  Note that some peripherals unfortunately lock up a peripheral during a self-test, so if this affects your device, then run multiple instances of SMARTMonUX.

 

Q. Why do self tests and other functions not work on USB and sometimes SATA disks?

A. The most common problem with USB and SATA/ATA disks is that the command isn't getting properly translated to the disk. When you hook up a ATA/SATA device to a USB port, part of the process is that a bridge chip translates the native ATA commands that the disk uses to SCSI commands that the USB protocol uses.   The low-level commands that run and report self-tests

 

Q. Can I test tape drives?

A. Yes, absolutely.  We have examples in this section of running self-tests on a cartridge tape drive.  Remember, the self-test is a feature of the firmware.

 

Q. I am having problems running self-tests on USB-attached devices, or some SATA disks. What is wrong?

A. The most common problem with USB and SATA/ATA disks is that the command isn't getting properly translated to the disk. When you hook up a ATA/SATA device to a USB port, part of the process is that a bridge chip translates the native ATA commands that the disk uses to SCSI commands that the USB protocol uses.   The low-level commands that run and report self-tests are incompatible.  Unless the manufacturer of your USB enclosure took great care to properly integrate the necessary translation, then it just won't work.  The vast majority of external USB devices will NOT do the translation properly.   Don't blame them as they are more concerned with supporting reads & writes.   The bottom line is that if you want to perform self-tests on USB mounted peripherals, then you are going to have to hook them up via a native ATA or SATA controller.

 

There is a similar problem with many of the low-end RAID controllers on motherboards.  If your ATA disks appear as SCSI devices, then the RAID controller is performing protocol translation, and their chip may have the same problem  Other RAID vendors get around the problem by providing a proprietary programming interface that allows a developer to encapsulate commands so that they work properly.

 

Q. How does the smartmon-us -verify differ from a self-test?

A. The -verify command will provide you a full list of unreadable blocks.  It will not test electronics, or even make sure that the disk can write anything at all.  However, unlike the self-test, a self-test will terminate on the first bad block. Furthermore a self-test will not verify the media.  It is more likely to never even discover that you have a bad block. If you need to determine if you have unreadable data, then use the -verify command.  If you need to do full testing of a disk to make sure it is burned in and safe for use, then run both a -verify, and a self-test, then follow up with the -dft family of commands to perform some destructive write tests.

 

Q. Can I run self-tests on mounted disk drives?  

A. Background tests, per the specification, are not supposed to prevent your host O/S from using the disks concurrently to read and write to.  We do this all the time in windows laptops and never have any problems (This does not mean that it is safe, we are just saying we have not had any problems).. However, the safest thing to do before performing tests is to make sure they are not mounted. This allows you to run the potentially more extensive foreground tests.  If the disks do not have any data on them, then you can also run destructive tests that verify that the media is OK.

 

SANtools' official policy is to check with your storage vendor to see if it is 'safe' to run self-tests on systems with live data.