|
Data Integrity Test |
Top Previous Next |
|
Release 1.27 introduces two new destructive integrity tests, -scrubdi and -scrubdiv. They are used to do a write / read / compare test on every byte of the selected device. The tests are not designed for ATA family disk drives. They are applicable to SCSI, FC random-access devices. (This includes USB memory sticks and optical R/W media). The command will be rejected if you attempt to run it on ATA family disk drives.
These tests were designed with cooperation from RAID controller and subsystem manufacturers. The idea was to create a whole-device data integrity test that would find if there were any situations where the data read back didn't match the data written, or if any I/Os didn't complete without incident the first time they were tried. The reason for the data alignment pattern is to make sure that there was a marker on every block so you could discover if there was a problem that might shift the data left or right a few bits or bytes.
Typical O/S-assisted read/write tests (such as using dd if=/dev/zero of=/dev/dsk) write the same byte to the target device. If you are writing zeros to every block on a device, then how do you know if anything is skipped, especially if the disk had mostly zeros written to it before you began the test? That is why we designed the test to let you supply a 4-byte pattern, and why we put markers in the data so we know what block number we are supposed to be reading and writing to.
Usage smartmon-ux -scrubdi [-16 | -12] PATTERN SINGLEYN CHUNKSIZE DeviceName smartmon-ux -scrubdiv [-16 | -12] PATTERN SINGLEYN CHUNKSIZE DeviceName
The PATTERN field must be a 4-byte hex value, as in E66EF0F0. This pattern will be repeated throughout the device. If you supplied this value then the disk or RW optical media would be written with E6 6E F0 F0 E6 6E ... until the last byte of the device. (Exception is that at the end of every block (typically 512 bytes), the last 8 bytes is going to be a 64-bit value for the current block number. Other things to know about the PATTERN are:
The SINGLEYN field can be used to control whether or not the test is done in a single pass. Enter "Y" to instruct the software to do the write/read/compare of X blocks, increment block number and continue until end-of-disk. Enter "N" to instruct the software to first write the data on the entire disk sequentially, then do a read/compare sequentially. Due to the performance benefits of caching, then the single-pass version will generally complete faster. As some users might not want the data to be in cache on the read/compare part of the test, we add the SINGLEYN flag as an option.
CHUNKSIZE corresponds to the number of blocks that will be processed in each I/O. The maximum CHUNKSIZE is 64 which would correspond to a 32KB I/O, assuming the standard 512 byte block size. The larger the CHUNKSIZE, the faster the program runs, but this assumes the user wants a large chunk size. As this is not so much a benchmark as a diagnostic routine, we offer the ability to control the chunk size.
The DeviceName must be a single device name. No wild-cards are supported in this release. This is because the test is quite destructive. Future revisions of this software may allow wild-cards if customer requests warrant this flexibility.
You may optionally add the -12 or -16 to force the test to attempt to use 12 or 16-byte CDBs. This will provide you with a method which will determine if both your host machine and the target device reports the 12 or 16-byte read and or write commands.
Example The test below was run on a 256 MB Sony memory stick plugged into a USB port under LINUX. smartmon-ux -scrubdiv E5F5FF00 Y 1 /dev/sg3 SMARTMon-ux [Release 1.27, Build 21-JUN-2004] - Copyright 2001-2004 SANtools, Inc. http://www.SANtools.com Discovered Sony Storage Media S/N " " on /dev/sg3 (SMART unsupported)(250 MB)
**************************************************************************************** * Warning: You have instructed the operating system to perform a data integrity * * check on the selected device. No checks will be made to verify that the * * device isn't mounted or in use in any way. * * * * * * * THIS WILL DESTROY ALL DATA ON THE SELECTED DEVICE * * * * * * * The test will write your pattern on every byte of the media, with the * * exception of end-of-block markers in order to perform a data alignment * * test. * * * * Please make sure the disk is unmounted before proceeding. This will * * insure that the operating system will not write to the device during * * test which would cause the test to fail. * * * ****************************************************************************************
The selected device is:"Sony Storage Media at /dev/sg3": Are you sure you want to do this? Answer "YES" to begin, anything else exits program: YES
Beginning SANtools data integrity test for Sony Storage Media at /dev/sg3 (512000 blocks, blocksize=512, chunksize=1) 00% (< --- This line is updated after every 1% completion) Block 0000000Ah Sense: 1/10/00 [Recovered error] CRC or ECC error Block 0000000Fh Sense: 1/10/00 [Recovered error] CRC or ECC error 100%
SANtools data integrity test (Write Phase) completed for Sony Storage Media at /dev/sg3 with 4 Sense Code Events: PASSED-WARNINGS Block 0000000A 1/10/00 Count=2 [Recovered error] CRC or ECC error Block 0000000F 1/10/00 Count=2 [Recovered error] CRC or ECC error
SANtools data integrity test (Verify Phase) completed for Sony Storage Media at /dev/sg3 with 0 Data Validation (Byte) Errors: PASSED Data Validation Test: PASSED
In this case, the device returned several recoverable errors during the write phase. This test still passed as all events were recoverable. If there were no events, then the test would have returned the string PASSED. If there were any unrecovered errors, then the write phase would have returned FAILED. (Unrecovered errors are marked by returned sense key values of 3, 4, 5, 7, 8, and Bh.
Frequently Asked Questions What is this test good for? The data integrity tests are most useful for storage professionals who want to qualify hardware, test RAID controllers, and insure data is in tact after stressing the storage, such as after a controller or HBA fail over test. System administrators should consider running this test in qualifying hardware. You would not ordinarily run this as part of any scheduled maintenance.
What about host overhead? Generally very low CPU overhead, and high I/O overhead for the device that is being tested. One read or write operation is sent per chunk CHUNKSIZE.
Is this a safe operation? All data is destroyed on the selected device. Use this function wisely.
How long does this take? This could run all night on a large disk drive. If you run the program in verbose mode, with the -scrubdiv flag, then the program will tell you percent complete and remaining time after every 1% of completion.
What do data integrity errors look like? If the data read is not equal to the data written for any byte, then the software will return specifics of the offset, what was written, and what was read back from the device.
Notes
|