DB2 HADR Setup and Failure Scenarios
By Vamshidhar K
High
Availability Disaster Recovery (HADR)
HADR is a data replication feature that provides a high
availability solution for both partial and complete site failures. HADR
protects against data loss by replicating data changes from a source database,
called the primary, to a target database, called the standby. The two primary
goals for HADR are fastest failover possible and easy to use.
Let us consider an example with IBM Tivoli Identity Manager with HADR setup
using DB2 database. In this scenario, ITIM
Database is residing on two servers serverOne and serverTwo with a database name ITIMDB. HADR is setup
between two servers with serverOne as primary and serverTwo as standby. The
two databases are in Peer state, Synchronous mode and the
connection state is Connected.
The following configuration
parameters would have been set during HADR setup:
Sl. #
|
Configuration Parameter
|
serverOne
|
serverTwo
|
1
|
HADR_LOCAL_HOST
|
serverOne
|
serverTwo
|
2
|
HADR_LOCAL_SVC
|
serverOnePort
|
serverTwoPort
|
3
|
HADR_REMOTE_HOST
|
serverTwo
|
serverOne
|
4
|
HADR_REMOTE_SVC
|
serverTwoPort
|
serverOnePort
|
The two possible failure scenarios that could occur are
listed below and actions that need to be taken are detailed further in the
article.
- The standby server fails
- The primary server fails
1) The standby server fails
The
primary server regularly sends log buffer over to the standby and will wait for
the number of seconds defined by the HADR_TIMEOUT configuration parameter (by
default it's 120 seconds). In the case of standby server failure, primary
server will not receive any response for HADR_TIMEOUT seconds then the primary
server will continue to process transactions and will not try to contact the
standby again.
Impact:
The
performance is impacted once when the primary server determines the standby
server offline.
Solution:
There
is no need to perform any action after the standby is online. As the standby will
contact the primary to say it is alive and the standby will automatically
resynchronize itself. Once it is back in Peer state, the primary will again
ship log buffers to the standby.
2) The primary server fails
When
the primary server fails, the connections to that server will be severed
Impact:
When
the primary has failed then all the transactions on primary must be routed to
standby so that there are no loss of transactions and response to the requests.
Solution:
When the primary server fails,
standby server can be made as primary by forceful takeover (run the TAKEOVER
HADR ON DATABASE database_name BY
FORCE command which will cause the standby server to become a
primary server). The standby will apply the last log buffers that it has (if
there are any left to apply), undo any in-flight transactions, and then open
the database for connections. At this point the automatic client reroute will
successfully establish a connection from the client application to the standby
(now primary) server and inform the application that the most recent in-flight
transaction has been rolled back.
Important DB2 Commands:
a)
To stop any
applications connecting to databases and ensure that those other applications
are not requesting and holding agents.
DB2 QUIESCE INSTANCE database_name IMMEDIATE
FORCE CONNECTIONS
In
quiesced mode, users cannot connect from outside of the database engine and the
trace file is less cluttered.
b)
Another command
to start an instance in a quiesced mode.
DB2START ADMIN MODE
c)
To restore user
access to instances or databases which have been quiesced for maintenance or
other reasons.
DB2 UNQUIESCE INSTANCE database_name
d)
To check the
status of HADR.
DB2PD –DB database_name -HADR
e)
To display detailed
information the current database manager configuration parameters.
DB2 GET DBM CFG
DB2 Commands to establish HADR between
serverOne and serverTwo:
1)
Execute the following command from
serverOne
DB2 TERMINATE
2)
Execute the following command from
serverTwo
DB2
TERMINATE
3)
Execute the following command from
serverOne
DB2 DEACTIVATE
DATABASE serverOne
4)
Execute the following command from
serverTwo
DB2 DEACTIVATE
DATABASE servertWO
5)
Execute the following command from
serverOne
DB2
START HADR ON DATABASE serverTwo
USER user_name USING
password AS
STANDBY
6)
Execute the following command from
serverOne
DB2
START HADR ON DATABASE serverOne
AS PRIMARY
7)
Execute the following command from
serverOne
or serverTwo
DB2PD –DB serverOne -HADR
DB2PD –DB serverTwo -HADR
HADR Takeover:
The
following command performs failover:
TAKEOVER HADR ON DATABASE database_name [BY
FORCE]
Modes of takeover:
1)
Without the BY FORCE option, standby server and
primary servers switch roles. (Current standby becomes new primary).
The standby server will send
a message to tell the primary server to turn itself into a standby. The primary
server will oblige by stopping in-flight transactions, sending over the last
log buffer(s) to the standby, and turning itself into a standby. The initial
standby server will wait for the final log buffers to arrive, apply them, and
then turn itself into a primary server.
The client
applications that were disconnected from the old primary will automatically
reroute to this new primary and be connected. Then applications that were
in-flight will be sent an error message telling them that they are still
connected but their current transaction was rolled back. The proper response
from an application (as with many error codes) would be to retry the
transaction which will now succeed on the new primary server and the end user
will be none the wiser.
If
you do not use the BY FORCE option and the standby cannot contact the primary
for any reason, then the takeover command will fail.
2)
With the BY FORCE option, the standby server itself
turns its role to primary without coordinating any change of roles with the (current)
primary server.
No comments:
Post a Comment