What is S.M.A.R.T. and How Does it Work?

Top  Previous  Next

S.M.A.R.T. is an acronym for Self-Monitoring, Analysis and Reporting Technology, an open standard for developing disk drives and software systems that automatically monitor a disk drive's health and report potential problems. Ideally, if a problem is reported, you have enough time to take proactive actions to prevent impending disk crashes.

 

A S.M.A.R.T. drive monitors the internal performance of the motors, media, heads, and electronics of the drive, while our software monitors the overall reliability status of the drive. The reliability status is determined through the analysis of the drive's internal performance level and the comparison of internal performance levels to predetermined threshold limits.

 

How does S.M.A.R.T. Work?

Part of what makes the S.M.A.R.T. system possible is that disk drive reliability has been intensely studied for many years. Manufacturers spend billions of dollars researching how vital areas of disk drives change over time and operating environments. By analyzing this data, they can define performance thresholds, which correlate to imminent failures.

 

SMART Disk Monitor turns on this capability, interacts with it, and reports these conditions to the system administrator.

 

Mode Page 1C Settings

All SCSI, Fibre Channel, SSA, and SAS disks allow an application to configure the S.M.A.R.T. behavior by making changes in mode page 1C.  As these changes affect how the disk responds to I/Os when the disk triggers a SMART condition, it is important that we share this with you along with our rationality for having things the way they are.

 

ANSI-Defined

Field Name

Description

SMARTMon-UX

Setting

Notes

PERF

Performance bit

0.

This is configurable

with -P option

Enable this for high-throughput i.e., video streaming systems. The disk drive will prioritize application I/O over SMART diagnostics.

EWASC

Enable Warning bit

1

 

If disk supports this bit, it will be set to 1, otherwise 0.

DEXCPT

Disable Exception bit

0

0 means to turn ON SMART, 1 means turn off SMART.

Use the -p flag to turn SMART off

MRIE

Method of Reporting

Interval Exceptions

 

6

(But if 6 not

supported, it

tries 4, then 3)

 

Setting MRIE to 6 is preferred, as a SMART alert will only be sent in response to a request for it.

 

MRIE of 4 means that the disk will unconditionally generate a CHECK CONDITION (recovered error) sense error on I/Os when/if disk becomes degraded and SMART kicks in.

 

Setting bit to 3 conditionally generates the same errors, depending on a mode page setting.

 

MRIE values of 3 & 4 have higher overhead due to requirement that log pages are updated once SMART alert kicks in, but 6 is not supported on all disk drives.

Interval Timer

Period between subsequent SMART error messages

Defaults to 10 minutes

unless -F command

used to poll more

frequently

 

The original ANSI spec draft that describes SMART suggested a 10 minute polling interval. The delay with PERF off is typically under 400 ms and under 150 ms with PERF on. You will have to consult your disk drive vendor's documentation for specific timing values.

Report Count

# of times to report SMART status per interval

0

This means there will be no limit to number of times SMART is reported in response to a query.