|
The following section explains the Drive Protection Features in greater detail. You may wish to
skip this section and proceed to the Procedures for Synchronization and Data Scrubbing in
the next section. Disk drives manufactured today can store over 10 times the data of drives
manufactured just 5 years ago. In order to achieve these capacities, the read/write heads must fly
lower, the media data rates must be much higher, and the data tracks must be located much
closer together on the platters than in older drives. All these changes reduce the margin for errors
and make the drives more sensitive to damage due to handling, particularly in the case of
hot-swappable drives which receive far more handling than those which are hard mounted inside
a system. Handling damage can occur if a drive is dropped, even if dropped less than one inch to
a surface. For these reasons, it is important for drives to recognize and recover from certain
types of drive problems.
Remapping Bad Sectors
When data is first written to the hard drive, the write process will check the drive to ensure the
media quality is good enough to safely store the data. Minor damage that shows up over time is
what is commonly called sector media errors. Sector media errors usually only affect a single 5 12
byte block of data on the disk. This sector can be marked as bad and the location reassigned or
'remapped' to a spare sector of the drive. Most drives reserve one spare sector per track of data
and can perform this operation automatically.
Error Correction Code (ECC)
By remapping bad sectors, the drive avoids potential problems by using only 'reliable' sections
of the disk. What happens if a media problem develops after the data has been written? When
an area of the disk is being read, most drives can correct minor sector media errors automatically
by using error correction code (FCC) information stored along with the data and then used in
rewriting the data on the disk. If the sector is badly damaged and the data can not be reliably
rewritten to the same spot, the drive will remap the data to a spare sector on the disk. If the sector
is very badly damaged, the drive may not be able to recreate the data automatically with the
ECC. If no other protection (such as RAID) is in place, the system will report a read failure and
the data will be lost. These lost data areas are typically reported to the user via operating system
messages.
Predictive Failure Analysis
As with any electrical/mechanical device, there are two basic failure types:
The first type of failure is the gradual performance degradation of components that can ultimately
lead to a catastrophic drive failure. Predictive Failure Analysis has been developed to monitor
performance of drives, analyze data from periodic internal measurements, and recommend
replacement when specific thresholds are exceeded. The data from periodic internal
measurements is collected when actual accesses of the data sectors occurs. Data Scrubbing,
which forces all data sectors to be read, provides more data to improve the accuracy of PFA. The
thresholds have been determined by examining the history logs of drives that have failed in actual
customer operation. When PFA detects a threshold exceeded failure, the system administrator
can be notified through Netfinity Manager 5.0. The design goal of PFA is to provide a minimum
of 24 hours warning before a drive experiences 'catastrophic' failure.
Second, there is the on/off type of failure. A cable breaking, a component burning out, a solder
connection failing. These are all examples of unpredictable catastrophic failures. As assembly
and component processes have improved, these types of defects have been reduced but not
eliminated. PFA cannot always provide warning for on/off unpredictable failures.
Please see the LEGAL - Trademark notice.
Feel free - send a for any BUG on this page found - Thank you.