|
IBM recommends the following precautions in order to help obtain high availability of the RAID
subsystem:
Define a Hot Spare
Defining a hot spare drive minimizes the length of time a server operates with degraded
performance when a defunct drive occurs. The hot spare also allows the 'inconsistent' drive to
be easily recognized in the event of a multiple defunct drive failure such that recovery procedures
require much less technical expertise. The section below explains this advantage in greater
detail:
Hot Spare Advantages
When a system has a drive that becomes defunct, data is not written to this DDD drive, but data is
written to the other drives in the array. Therefore that DDD drive becomes 'inconsistent' with
the rest of the drives in the array. When multiple drives appear DDD, the first and most critical
task is defining the 'inconsistent' drive correctly. The 'inconsistent' drive must be the last drive
replaced since it requires rebuilding (and, if truly defective, may need physical replacement). If
the 'inconsistent' drive is software replaced (See Software Replace vs. Physical Replace) first
when a multiple DDD failure occurs, the 'inconsistent' data will he used to rebuild another drive.
This eventually corrupts the other drives (and data) on the system.
However, when an HSP is defined, you are protected from rebuilding another drive from an
'inconsistent' drive. This is because of the way the RAID adapter marks the states of drives.
When a system has a defined HSP, as soon as the IISP takes over for the DDD drive, the RAID
adapter marks the DDD drive as a defunct hot spare (DHS) drive in its configuration. If you
perform a software replace or physical replace of this DHS drive, the RAID adapter starts the
DHS drive and changes the state from DHS to HSP. The RAID adapter does not allow this drive
to be brought back to ONL status.
When the HSP takes over for the DDD drive, the HSP is rebuilt to replace the DDD drive. During
the rebuilding of the HSP drive, it appears in the RBL state. The RBL state changes to ONL once
this drive is completely rebuilt and fully operational for the now DHS drive.
If an HSP is not defined and multiple drives appear DDD, then determination of the
'inconsistent' drive is more difficult. You must now read the RAID log , generated by IPSMON,
to determine the 'inconsistent' drive. The 'inconsistent' drive is the drive which goes DDD first.
To examine this process in a little more depth, consider the following points. When the first drive
appears DDD, the operating system remains operational with the remaining drives. It writes to all
the other drives in the array except for the first DDD drive. When the second DDD occurs, the
operating system is no longer functional and does not write to any drives. If writing to the RAID
log, generated by IPSMON, can only occur while the operating system is operational, the first
DDD drive must by default be the 'inconsistent' drive. To rectify this situation, you must change
the 'consistent' drives from DDD to ONL by using the Set Device State option and ensure that
the 'inconsistent' drive is the one you try to rebuild.
If a HSP drive is defined but did not complete the rebuild, then it is much easier to identify the
'inconsistent' drive. The 'inconsistent' drive remains in RBL status. The DDD drive will appear
with a DHS status.
Install and Use NetFinity Manager
You should install NetFinity Manager 5.0 or greater in order to monitor the RAID array
remotely. Netfinity Manager can be used to schedule data scrubbing to occur at any time of the
day, so synchronization of the RAID array can be scheduled for off-peak hours and will not
require user input to get things started. With NetFinity services installed at the server, and the
NetFinity Manager installed on a workstation, the RAID array can be monitored, and even
synchronized, from a remote location. The system can also be configured to send alert messages
regarding the RAID subsystem over the network to the workstation. You can even setup
NetFinity Manager to page someone, e.g., the network administrator or a service technician, if a
certain alert condition is reached. NetFinity Manager can also perform many other functions such
as monitoring processor utilization, critical file monitoring and detecting installed software
across the network. Netfinity is also used to capture PFA alerts from hard files and then send
system alerts to the appropriate parties. In order to use Netfinity 5.0 to schedule data scrubbing,
please download NF50RAID.EXE from http://www.us.pc.ibm.com/files.html This file contains
updated Netfinity program files which are required for scheduling data scrubbing on controllers
with the write policy set to write-back cache. When installed with the NetFinity Manager code
the following operating systems are affected: OS/2, WINNT, and WIN95.
Data Scrub Drives Weekly
One of the best ways to recognize potential disk media problems in advance and correct them
belbre a failure occurs, is to Data Scrub (This is done in the background by the ServeRAIDIl
Adapter with firmware 2.30 or higher). Sector media errors can be identified and corrected
simply by forcing all data sectors in the array to be accessed through Data Scrubbing. Data
Scrubbing checks all data sectors in the array and should be performed weekly. With the IBM
ServeRAID and ServeRAIDil Adapters, an easy process used to accomplish Data Scrubbing is
synchronization. Data Scrubbing will force all sectors of the drives contained in the array to be
read in the background while allowing concurrent user disk activity. Netfinity Manager 5.0
will allow you to automatically schedule synchronization from either the server or a remote
manager. Netfinity Manager 5.0 can be obtained at no additional charge by customers that have
purchased an IBM server that ships with ServerGuide. If the customer has another type of
scheduler such as the AT scheduler built into Windows NT, then the IBM ServeRAID Adapter's
IPSSEND command line titility may be used to allow the customer to schedule Data Scrubbing
without Netfinity Manager installed. The IPSSEND utility is available on the ServeRAID
Supplemental Diskette.
Apply All Updates
You should apply all updates regarding RAID. Check the IBM Server web site at
http://www.us.pc.ibm.com/server/server.html or call the HelpCenter for up-to-date information,
Please see the LEGAL - Trademark notice.
Feel free - send a for any BUG on this page found - Thank you.