Data Integrity Test

Top  Previous  Next

Release 1.27 introduces two new destructive integrity tests, -scrubdi and -scrubdiv. They are used to do a write / read / compare test on every byte of the selected device.  The tests are not designed for ATA family disk drives. They are applicable to SCSI, FC random-access devices. (This includes USB memory sticks and optical R/W media).  The command will be rejected if you attempt to run it on ATA family disk drives.

 

These tests were designed with cooperation from RAID controller and subsystem manufacturers.  The idea was to create a whole-device data integrity test that would find if there were any situations where the data read back didn't match the data written, or if any I/Os didn't complete without incident the first time they were tried. The reason for the data alignment pattern is to make sure that there was a marker on every block so you could discover if there was a problem that might shift the data left or right a few bits or bytes.

 

Typical O/S-assisted read/write tests (such as using dd if=/dev/zero of=/dev/dsk) write the same byte to the target device.  If you are writing zeros to every block on a device, then how do you know if anything is skipped, especially if the disk had mostly zeros written to it before you began the test?  That is why we designed the test to let you supply a 4-byte pattern, and why we put markers in the data so we know what block number we are supposed to be reading and writing to.

 

Usage

smartmon-ux -scrubdi [-16 | -12] PATTERN SINGLEYN CHUNKSIZE DeviceName

smartmon-ux -scrubdiv [-16 | -12] PATTERN SINGLEYN CHUNKSIZE DeviceName

 

The PATTERN field must be a 4-byte hex value, as in E66EF0F0.  This pattern will be repeated throughout the device.  If you supplied this value then the disk or RW optical media would be written with E6 6E F0 F0 E6 6E ...  until the last byte of the device.  (Exception is that at the end of every block (typically 512 bytes), the last 8 bytes is going to be a 64-bit value for the current block number.  Other things to know about the PATTERN are:

Assuming you have a disk formatted to the standard 512 byte block size, then bytes #504 - #511 on the first block of the disk would contain 00 00 00 00 00 00 00 00. The last byte of the 2nd block would end with 00 01, the next block ends with 00 02, and so on.
If your disk drive is formatted on a 520 byte pattern, then this pattern would be written on byte numbers #512 - #519 on every block.
If  you want every block of the disk to be zeroed, with the exception of the end-of-block sequence number, then set the PATTERN to 00000000.

 

The SINGLEYN field can be used to control whether or not the test is done in a single pass.  Enter "Y" to instruct the software to do the write/read/compare of X blocks, increment block number and continue until end-of-disk.  Enter "N" to instruct the software to first write the data on the entire disk sequentially, then do a read/compare sequentially. Due to the performance benefits of caching, then the single-pass version will generally complete faster. As some users might not want the data to be in cache on the read/compare part of the test, we add the SINGLEYN flag as an option.

 

CHUNKSIZE corresponds to the number of blocks that will be processed in each I/O.  The maximum CHUNKSIZE is 64 which would correspond to a 32KB I/O, assuming the standard 512 byte block size. The larger the CHUNKSIZE, the faster the program runs, but this assumes the user wants a large chunk size. As this is not so  much a benchmark as a diagnostic routine, we offer the ability to control the chunk size.

 

The DeviceName must be a single device name. No wild-cards are supported in this release. This is because the test is quite destructive. Future revisions of this software may allow wild-cards if customer requests warrant this flexibility.

 

You may optionally add the -12 or -16 to force the test to attempt to use 12 or 16-byte CDBs.  This will provide you with a method which will determine if both your host machine and the target device reports the 12 or 16-byte read and or write commands.

 

Example

The test below was run on a 256 MB Sony memory stick plugged into a USB port under LINUX.

smartmon-ux -scrubdiv E5F5FF00 Y 1 /dev/sg3

SMARTMon-ux [Release 1.27, Build 21-JUN-2004] - Copyright 2001-2004 SANtools, Inc. http://www.SANtools.com

Discovered Sony Storage Media S/N " " on /dev/sg3 (SMART unsupported)(250 MB)

 

****************************************************************************************

* Warning:  You have instructed the operating system to perform a data integrity       *

*           check on the selected device. No checks will be made to verify that the    *

*           device isn't mounted or in use in any way.                                 *

*                                                                                      *

*           * * *    THIS WILL DESTROY ALL DATA ON THE SELECTED DEVICE    * * *        *

*                                                                                      *

*           The test will write your pattern on every byte of the media, with the      *

*           exception of end-of-block markers in order to perform a data alignment     *

*           test.                                                                      *

*                                                                                      *

*           Please make sure the disk is unmounted before proceeding. This will        *

*           insure that the operating system will not write to the device during       *

*           test which would cause the test to fail.                                   *

*                                                                                      *

****************************************************************************************

 

The selected device is:"Sony Storage Media at /dev/sg3":

Are you sure you want to do this? Answer "YES" to begin, anything else exits program: YES

 

Beginning SANtools data integrity test for Sony Storage Media at /dev/sg3 (512000 blocks, blocksize=512, chunksize=1)

00% (< --- This line is updated after every 1% completion)

Block 0000000Ah Sense: 1/10/00 [Recovered error] CRC or ECC error

Block 0000000Fh Sense: 1/10/00 [Recovered error] CRC or ECC error

100%

 

SANtools data integrity test (Write Phase) completed for Sony Storage Media at /dev/sg3 with 4 Sense Code Events: PASSED-WARNINGS

 Block 0000000A 1/10/00 Count=2 [Recovered error] CRC or ECC error

 Block 0000000F 1/10/00 Count=2 [Recovered error] CRC or ECC error

 

SANtools data integrity test (Verify Phase) completed for Sony Storage Media at /dev/sg3 with 0 Data Validation (Byte) Errors: PASSED

Data Validation Test: PASSED

 

In this case, the device returned several recoverable errors during the write phase. This test still passed as all events were recoverable.  If there were no events, then the test would have returned the string PASSED.  If there were any unrecovered errors, then the write phase would have returned FAILED.  (Unrecovered errors are marked by returned sense key values of 3, 4, 5, 7, 8, and Bh.

 

Frequently Asked Questions

What is this test good for?

The data integrity tests are most useful for storage professionals who want to qualify hardware, test RAID controllers, and insure data is in tact after stressing the storage, such as after a controller or HBA fail over test.  System administrators should consider running this test in qualifying hardware. You would not ordinarily run this as part of any scheduled maintenance.

 

What about host overhead?

Generally very low CPU overhead, and high I/O overhead for the device that is being tested. One read or write operation is sent per chunk CHUNKSIZE.

 

Is this a safe operation?

All data is destroyed on the selected device. Use this function wisely.

 

How long does this take?

This could run all night on a large disk drive. If you run the program in verbose mode, with the -scrubdiv flag, then the program will tell you percent complete and remaining time after every 1% of completion.

 

What do data integrity errors look like?

If the data read is not equal to the data written for any byte, then the software will return specifics of the offset, what was written, and what was read back from the device.

 

Notes

These tests make no assumptions about 512-byte block sizes. If the device you wish to test is formatted for 520 or 528 bytes/block, and if your operating system and device drivers have no problems recognizing devices which are not 512 bytes in size, then the software will work as expected.
Like the -scrub family of commands, these tests are controlled by this software. That means the target device can be any SCSI-family random access device, such as a Read/write DVD, USB memory stick, or disk drive.
In the event of any non-zero sense key for the write phase, the program will record the error and block number, then retry. After two retries, the program continues. Full details about all errors and warnings are returned with -scrubdiv.  If you run the -scrubdi version of the test then you only get totals.
You may add the -16 command to force the test(s) to send 16-byte SCSI READ/WRITE commands rather than the 10-byte versions, or add the -12 option to send the READ(12) and WRITE(12) commands.