Tuesday, 10 April 2012

DB2 HADR Setup and Failure Scenarios


DB2 HADR Setup and Failure Scenarios

By Vamshidhar K


High Availability Disaster Recovery (HADR)

HADR is a data replication feature that provides a high availability solution for both partial and complete site failures. HADR protects against data loss by replicating data changes from a source database, called the primary, to a target database, called the standby. The two primary goals for HADR are fastest failover possible and easy to use.

Let us consider an example with  IBM Tivoli Identity Manager with HADR setup using DB2 database. In this scenario, ITIM Database is residing on two servers serverOne and serverTwo with a database name ITIMDB. HADR is setup between two servers with  serverOne as primary and serverTwo as standby. The two databases are in Peer state, Synchronous mode and the connection state is Connected.

The following configuration parameters would have been set during HADR setup:

Sl. #
Configuration Parameter
serverOne
serverTwo
1
HADR_LOCAL_HOST
serverOne
serverTwo
2
HADR_LOCAL_SVC
serverOnePort
serverTwoPort
3
HADR_REMOTE_HOST
serverTwo
serverOne
4
HADR_REMOTE_SVC
serverTwoPort
serverOnePort

The two possible failure scenarios that could occur are listed below and actions that need to be taken are detailed further in the article.

  • The standby server fails
  • The primary server fails


1)      The standby server fails
The primary server regularly sends log buffer over to the standby and will wait for the number of seconds defined by the HADR_TIMEOUT configuration parameter (by default it's 120 seconds). In the case of standby server failure, primary server will not receive any response for HADR_TIMEOUT seconds then the primary server will continue to process transactions and will not try to contact the standby again.

Impact:
The performance is impacted once when the primary server determines the standby server offline.

Solution:
There is no need to perform any action after the standby is online. As the standby will contact the primary to say it is alive and the standby will automatically resynchronize itself. Once it is back in Peer state, the primary will again ship log buffers to the standby.

2)      The primary server fails
When the primary server fails, the connections to that server will be severed
Impact:
When the primary has failed then all the transactions on primary must be routed to standby so that there are no loss of transactions and response to the requests.

Solution:
When the primary server fails, standby server can be made as primary by forceful takeover (run the TAKEOVER HADR ON DATABASE database_name BY FORCE command which will cause the standby server to become a primary server). The standby will apply the last log buffers that it has (if there are any left to apply), undo any in-flight transactions, and then open the database for connections. At this point the automatic client reroute will successfully establish a connection from the client application to the standby (now primary) server and inform the application that the most recent in-flight transaction has been rolled back.



Important DB2 Commands:
a)      To stop any applications connecting to databases and ensure that those other applications are not requesting and holding agents.
DB2 QUIESCE INSTANCE database_name IMMEDIATE FORCE CONNECTIONS
In quiesced mode, users cannot connect from outside of the database engine and the trace file is less cluttered.
b)      Another command to start an instance in a quiesced mode.
DB2START ADMIN MODE
c)       To restore user access to instances or databases which have been quiesced for maintenance or other reasons.
DB2 UNQUIESCE INSTANCE database_name
d)      To check the status of HADR.
DB2PD –DB database_name -HADR
e)      To display detailed information the current database manager configuration parameters.
DB2 GET DBM CFG

DB2 Commands to establish HADR between serverOne and serverTwo:
1)       Execute the following command from serverOne
DB2 TERMINATE
2)       Execute the following command from serverTwo
DB2 TERMINATE
3)       Execute the following command from serverOne
DB2 DEACTIVATE DATABASE serverOne
4)       Execute the following command from serverTwo
DB2 DEACTIVATE DATABASE servertWO
5)       Execute the following command from serverOne
DB2 START HADR ON DATABASE serverTwo USER user_name USING password AS STANDBY
6)       Execute the following command from serverOne
DB2 START HADR ON DATABASE serverOne AS PRIMARY
7)       Execute the following command from serverOne or serverTwo
DB2PD –DB serverOne -HADR
DB2PD –DB serverTwo -HADR

HADR Takeover:

The following command performs failover:
TAKEOVER HADR ON DATABASE database_name [BY FORCE]

Modes of takeover:
1)       Without the BY FORCE option, standby server and primary servers switch roles. (Current standby becomes new primary).
The standby server will send a message to tell the primary server to turn itself into a standby. The primary server will oblige by stopping in-flight transactions, sending over the last log buffer(s) to the standby, and turning itself into a standby. The initial standby server will wait for the final log buffers to arrive, apply them, and then turn itself into a primary server.
The client applications that were disconnected from the old primary will automatically reroute to this new primary and be connected. Then applications that were in-flight will be sent an error message telling them that they are still connected but their current transaction was rolled back. The proper response from an application (as with many error codes) would be to retry the transaction which will now succeed on the new primary server and the end user will be none the wiser.
If you do not use the BY FORCE option and the standby cannot contact the primary for any reason, then the takeover command will fail.
2)       With the BY FORCE option, the standby server itself turns its role to primary without coordinating any change of roles with the (current) primary server.

No comments:

Post a Comment