Connect Time Failover & Transparent Application Failover for Data Guard
I was giving a 10g
Data Guard course this week in Düsseldorf, demonstrating amongst others the
possibility to configure Transparent Application Failover (TAF) for Data
Guard. I always try to keep things as simple as seriously possible, in order to
achieve an easy and good understanding of what I like to explain. Later on,
things are getting complex by themselves soon enough
In my simple
scenario, I have one Primary Database (prima) and one Physical Standby Database
(physt). On my downloads page, you may find an installation guide for that
scenario for 10g and for 11g. After a switchover or after a failover, the
primary is going to be physt. The challenge is now to get the connect from the
client side to the right (primary) database. That is called Connect Time
Failover and is achieved as follows:
First, we make sure
that the client uses a tnsnames.ora with a connect
descriptor that uses a SERVICE_NAME instead
of a SID
MYAPP =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = HostA)(PORT
= 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = HostB)(PORT
= 1521))
)
(CONNECT_DATA =
(SERVICE_NAME = myapp)
)
)
HostA is the host on
which prima runs, HostB has physt running.
Second, we take care
that the service myapp is offered only at the right database – only on
the primary.Notice that the PMON background processes of both databases must be
able to communicate with the (local) listeners in order to register the service myapp. If you don’t use the listener port 1521, they
can’t. You have to point to that listener port then with the initialization
parameter LOCAL_LISTENER.
We create and start
now the service myapp manually on the primary:
begin
dbms_service.create_service('myapp','myapp');
end;
/
begin
DBMS_SERVICE.START_SERVICE('myapp');
end;
/
Then we create a
trigger, that ensures that this service is only offered, if the database is in
the primary role:
create trigger
myapptrigg after startup on database
declare
v_role varchar(30);
begin
select database_role into v_role from
v$database;
if v_role = 'PRIMARY' then
DBMS_SERVICE.START_SERVICE('myapp');
else
DBMS_SERVICE.STOP_SERVICE('myapp');
end if;
end;
/
The event after
startup is fired, if an instance changes from status MOUNT to OPEN. If you use
a logical standby, it is not fired, because the logical standby remains in
status OPEN. You may use the event after db_role_change in this case.
The creation of the trigger and of the service is accompanied with redo
protocol (the Data Dictionary has changed) and therefore also present at physt
without any additional work to do there for the DBA. With the present setup, we
have already achieved Connect Time Failover: Clients can use the same
connect descriptor (myapp) to get to the right
(primary) database now, regardless of switchover or failover.
But sessions that
are connected to prima are disconnected if a switchover or failover to physt
takes place. They have got to connect again then. We can change that, so that a
Runtime Failover is possible, under ideal circumstances, that failover
is even completely transparent to the client and proceeds without error
messages. To achieve that, you don’t have to touch the tnsnames.ora on the
client side. Instead, you do the following on the primary database:
begin
dbms_service.modify_service
('myapp',
FAILOVER_METHOD => 'BASIC',
FAILOVER_TYPE => 'SELECT',
FAILOVER_RETRIES =>
200,
FAILOVER_DELAY => 1);
end;
/
Connections to the
service myapp are now automatically failed over
together with the service to the new primary. Should they have done nothing
during the time of the failover/switchover, or even if they had run a select
statement, they will not receive any error but only notice a short interruption
(about 20 seconds, in a typical case). Only if sessions have open transactions
during the failover/switchover, they will receive error messages (“transaction
must roll back”) after they try commit then.
I use to demonstrate
that with a select on a table with 100000 rows that starts on the primary. Then
I kill the SMON of that primary and the select stops at row 30000 something,
waits a couple of seconds (maximal 200, with the
above settings) and then continues on the new primary after the failover,
fetching exactly the 100000 rows! That is always quite impressive and shows how
robust Oracle Databases – especially combined with Data Guard – are