|
Preventive Measures to Help Obtain High Availability
IBM recommends the following precautions in order to help obtain high availability of the RAID
subsystem:
Define a Hot Spare
Defining a hot spare drive minimizes the length of time a server operates with degraded
performance when a defunct drive occurs. The hot spare also allows the 'inconsistent' drive to be
easily recognized in the event of a multiple defunct drive failure such that recovery procedures
require much less technical expertise. The section below explains this advantage in greater
detail:
Hot Spare Advantages
When a system has a drive that becomes defunct, data is not written to this DDD drive, but data is
written to the other drives in the array. Therefore that DDD becomes 'inconsistent' with the rest
of the drives in the array. When multiple drives appear DDD, the first and most critical task is
defining the 'inconsistent' drive correctly. The 'inconsistent' drive must be the last drive
replaced since it requires rebuilding (and, if truly defective, may need physical replacement). If
the 'inconsistent' drive is software replaced (See Software Replace vs. Physical Replace) first
when a multiple DDD failure occurs, the 'inconsistent' data will be used to rebuild another drive.
This eventually corrupts the other drives (and data) on the system.
However, when an HSP is defined, you are protected from rebuilding another drive from an
'inconsistent' drive. This is because of the way the RAID adapter marks the states of drives.
When a system has a defined HSP, as soon as the HSP takes over for the DDD drive, the RAID
Adapter marks the DDD drive in its configuration as the HSP drive. The adapter does not
visually change the status of the drive to HSP. Yet if you perform a software replace or physical
replace, the RAID Adapter starts the drive and changes the DDD state to HSP. The RAID
Adapter does not allow this drive to be brought back to ONL status.
When the HSP takes over for the DDD drive, the HSP is rebuilt to replace the DDD drive. During
the rebuilding of the HSP drive, it appears in the OFL state. The OFL state changes to ONL once
this drive is completely rebuilt and fully operational for the DDD drive. The DDD drive remains
DDD.
If a HSP is not defined or multiple drives appear DDD before the HSP is completely rebuilt, then
this is not the case. You must read the RAID log to determine the 'inconsistent' drive. Then for
the IBM SCSI-2 F/W PCI-Bus RAID Adapter and the IBM F/W Streaming RAID Adapter/A,
you must ensure that the software replace option is selected on each drive bay in the correct order
such that the 'inconsistent' drive is brought online last and rebuilt.
If a HSP drive was defined but did not complete the rebuild, then it is much easier to identify the
'inconsistent' drive. The 'inconsistent' drive will remain in OFL status.
When multiple drives appear defunct, as long as the logical drive is not in the OFL state, the user
may select the Replace option to change the state of any of the DDD drives. Order does not
matter with logical drives in the CRT state because the 'inconsistent' drive will appear as OFL or
DDD to the user. If the logical drive is in the OFL state, the user may attempt to recover by
identifying the 'inconsistent' drive, software replacing all drives except the 'inconsistent' drive,
and then rebuilding the 'inconsistent' drive.
Please see the LEGAL - Trademark notice.
Feel free - send a for any BUG on this page found - Thank you.