OS: RedHat Enterprise Linux 6.x
One thing which is indispensable nowadays at any (reputable) company that works with IT is a copy of your primary database at a different datacenter/availability zone (cloud).
To do that, you have three options: create your own replication script (which can lead to an enormous amount of problems); using a non homologated software created by some company (which can be dangerous too), or buy an Oracle Data Guard license (which can be expensive).
Any one of those three strategies can present a considerable problem.
So, let's suppose that you have an Oracle Data Guard license. Now we can use a broker to control all replication between our primary and our new standby database.
Before starting our tutorial, let me define one simple rule: you have to install the same Oracle software version at your new machine (which is going to be your standby database), and preferably keep your storage solution (like ASM, but that is just to keep it beautiful - it is not mandatory).
Let's assume that we have a single instance primary database with a SID called "prod" and you want to create a single instance standby database with a SID called "dataguard", and you have 3 kinds of diskgroup: DATA, RECO and REDO. DATA will keep our datafiles, spfile, and these kind of things; RECO will keep our archives, and REDO will keep our redo log files.
1st: You have to do is to create some groups of standby logfile (depending on your archive generation, you may have to increase theses numbers):
ALTER DATABASE ADD STANDBY LOGFILE GROUP 1 ('+REDO', '+REDO');
2nd: Backup your primary database using rman. You have many ways to do that, but for example:
$ rman target /
Recovery Manager: Release 11.2.0.4.0 - Production on Wed Oct 7 14:46:41 2015
Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.
connected to target database: BIOPROD (DBID=1276674215)
RMAN> backup as compressed backupset database format '$BACKUP_HOME/BKP_%D%M%Y_%I_%s_%U.bkp' include current controlfile;
After that, transfer your backup (and archives) to your standby database machine (to the same directory).
3th: Configure your listener.ora adding the following settings.
$ vi $ORACLE_HOME/network/admin/listener.ora
SID_LIST_LISTENER =
(SID_LIST=
(SID_DESC=
(GLOBAL_DBNAME=dataguard)
(ORACLE_HOME=/u01/app/oracle/product/11.2.0/db_1/)
(SID_NAME=dataguard))
(SID_DESC=
(GLOBAL_DBNAME=dataguard_DGMGRL)
(ORACLE_HOME=/u01/app/oracle/product/11.2.0/db_1/)
(SID_NAME=dataguard))
(SID_DESC=
(GLOBAL_DBNAME=dataguard_DGB)
(ORACLE_HOME=/u01/app/oracle/product/11.2.0/db_1/)
(SID_NAME=dataguard)))
Change your ORACLE_HOME variable to your own. The important here is to configure three kinds of service: dataguard, dataguard_DGMGRL and dataguard_DGB (they are essentials to broker).
Restart your listener.
4th: Configure your tnsnames.ora file on both machines (production and standby environment) with your database settings.
$ vi $ORACLE_HOME/network/admin/tnsnames.ora
dataguard =
(DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = dataguard.com.br)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = dataguard)))
prod =
(DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = prod.com.br)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = prod)))
5th: At your primary database, set the following parameters:
ALTER SYSTEM SET LOG_ARCHIVE_CONFIG='dg_config=(prod,dataguard)';
ALTER SYSTEM SET LOG_ARCHIVE_DEST_2='service="dataguard", LGWR ASYNC NOAFFIRM delay=0 optional compression=disable max_failure=0 max_connections=5 reopen=15 db_unique_name="dataguard", valid_for=(all_logfiles,primary_role)';
ALTER SYSTEM SET LOG_ARCHIVE_DEST_STATE_2='DEFER';
ALTER SYSTEM SET DB_FILE_NAME_CONVERT='+DATA/dataguard','+DATA/prodbr' scope=spfile;
ALTER SYSTEM SET LOG_FILE_NAME_CONVERT='+RECO/dataguard','+RECO/prodbr' scope=spfile;
ALTER SYSTEM SET STANDBY_FILE_MANAGEMENT=AUTO;
ALTER SYSTEM SET LOG_ARCHIVE_MIN_SUCCEED_DEST=1;
A brief explanation about these parameters:
- The LOG_ARCHIVE_CONFIG parameter need to contain your DB_UNIQUE_NAME parameter from production and standby database. In this case, our SID and DB_UNIQUE_NAME are equal.
-The LOG_ARCHIVE_DEST_x configures a series of parameters about our replication, like if our synchronization will be asynchronous or synchronous (for more information, see the documentation). The important here is to know that data guard broker will synchronize his configuration with this parameter: if we change any property later, it will reflect on this parameter.
- The LOG_ARCHIVE_DEST_STATE_x just enable or disable your LOG_ARCHIVE_DEST_x configuration.
- The DB_FILE_NAME_CONVERT is a very important parameter to configure here. Be careful: wrong conversions could lead to a bunch of problems. Here, we need to configure what is our datafile directory source and say what will be our target source: in this example, we are reading our datafiles from +DATA/dataguard and we going to save it now at +DATA/prodbr. It's very important to configure this parameter if we are thinking to switchover the environments (change primary to standby, and the standby to primary roles).
- The LOG_FILE_NAME_CONVERT is like DB_FILE_NAME_CONVERT, except here we specify the red log files destinations. If you want to increase your security, you could configure this parameter with '+DATA' information (in case you have some redo log files lost in wrong directories, for example).
- The STANDBY_FILE_MANAGEMENT sets your file name configuration. Instead you define your datafile name or redo file name, the database will take care of it. So, when we create a datafile at +DATA, we don't need to specify the file path (like '+DATA/prod/datafile/datafile_01.dbf'); for example, just say it will be create at '+DATA' and oracle will create a file with a random id at the right location (like '+DATA/datafile/tablespace_name.123.4567891011').
- The LOG_ARCHIVE_MIN_SUCCEED_DEST defines the minimum success destination to archive logs before set the current redo log file to be available to use again. This prevent you lost data if something going wrong. If you configure more than one standby, and consequently configure more LOG_ARCHIVE_DEST_x parameters, consider increase this number.
Like I said, to understand better about these parameters, check Oracle documentation.
6th: Create your standby database pfile from your primary database, or create a new pfile with the minimum parameters to start it. For example:
$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.4.0 Production on Wed Oct 7 17:52:26 2015
Copyright (c) 1982, 2013, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Automatic Storage Management, OLAP, Data Mining
and Real Application Testing options
SQL> create pfile = '/tmp/dataguard_pfile.ora' from spfile;
...
$ vi /tmp/dataguard_pfile.ora
*.db_name='
prod
'
*.memory_target=5G
*.processes = 500
*.audit_file_dest='/u01/app/oracle/admin/dataguard/adump'
*.audit_trail ='db'
*.db_block_size=8192
*.db_domain=''
*.db_recovery_file_dest='+RECO'
*.db_recovery_file_dest_size=100G
*.diagnostic_dest='/u01/app/oracle'
*.open_cursors=300
*.remote_login_passwordfile='EXCLUSIVE'
*.undo_tablespace='UNDOTBS1'
*.control_files = '+DATA/dataguard/control01.ctl', '+REDO/dataguard/control02.ctl'
*.compatible ='11.2.0.4'
*.log_archive_config='dg_config=(prod,dataguard)'
*.log_archive_dest_1='location=USE_DB_RECOVERY_FILE_DEST','valid_for=(ALL_LOGFILES, ALL_ROLES)'
*.log_archive_dest_2='service="prod", LGWR ASYNC NOAFFIRM delay=0 optional compression=disable max_failure=0 max_connections=5 reopen=15 db_unique_name="prod", valid_for=(all_logfiles,primary_role)'
*.log_archive_dest_state_1='enable'
*.log_archive_dest_state_2='defer'
*.db_file_name_convert='+DATA/prod','+DATA/dataguard'
*.log_file_name_convert='+DATA/prod','+DATA/dataguard','+RECO/prod','+RECO/dataguard','+REDO/prod','+REDO/dataguard'
*.standby_file_management=auto
*.log_archive_min_succeed_dest=1
*.db_unique_name='
dataguard
'
Remember two things: your db_name MUST be exactly like your production environment; and you need to create the audit_file_dest directory (in this case, /u01/app/oracle/admin/dataguard/adump'), otherwise you will receive a warning.
7th: Move your standby pfile to your standby machine, and then startup the new instance on nomount mode.
$ export ORACLE_SID=dataguard
$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.4.0 Production on Thu Oct 8 16:47:22 2015
Copyright (c) 1982, 2013, Oracle. All rights reserved.
Connected to an idle instance.
SQL> startup nomount pfile='/home/oracle/dataguard_pfile.ora';
...
SQL> create spfile='+DATA/dataguard/spfiledataguard.ora' from pfile='/home/oracle/dataguard_pfile.ora';
File created.
SQL> shutdown immediate;
ORA-01507: database not mounted
ORACLE instance shut down.
SQL> startup nomount;
8th: Create your oracle password file (orapw) for SYS user, for both environments (you can use the same password from prod, just to keep it simple):
$ orapwd file=$ORACLE_HOME/dbs/orapwdataguard_nv password=dataguard_sys_password
$ orapwd file=$ORACLE_HOME/dbs/orapwprod_nv password=prod_sys_password
Do it on both environments (probably your prod machine will already have orapwprod file, but there is no problem if you overwrite it or not).
9th: Duplicate (restore) your dataguard database! Use the exactly command below:
$ rman target sys/prod_sys_password@prod auxiliary sys/dataguard_sys_password@dataguard
Recovery Manager: Release 11.2.0.4.0 - Production on Thu Oct 8 17:14:34 2015
Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.
connected to target database: PROD (DBID=12345678910)
connected to auxiliary database: PROD (not mounted)
RMAN> DUPLICATE TARGET DATABASE FOR STANDBY NOFILENAMECHECK DORECOVER;
Starting Duplicate Db at 08-OCT-15
contents of Memory Script:
{
set until scn 54045879909;
restore clone standby controlfile;
}
executing Memory Script
executing command: SET until clause
Starting restore at 08-OCT-15
channel C1: restoring control file
channel C1: copied control file copy
input file name=/backup/controlfile_bkp.bkp
output file name=+DATA/dataguard/control01.ctl
output file name=+REDO/dataguard/control02.ctl
Finished restore at 08-OCT-15
contents of Memory Script:
{
sql clone 'alter database mount standby database';
}
executing Memory Script
sql statement: alter database mount standby database
RMAN-05529: WARNING: DB_FILE_NAME_CONVERT resulted in invalid ASM names; names changed to disk group only.
contents of Memory Script:
{
set until scn 54045879909;
set newname for tempfile 1 to
"+data";
...
restore
clone database
;
}
executing Memory Script
executing command: SET until clause
executing command: SET NEWNAME
...
renamed tempfile 1 to +data in control file
...
executing command: SET NEWNAME
...
channel C1: starting datafile backup set restore
channel C1: specifying datafile(s) to restore from backup set
channel C1: restoring datafile 00001 to +data
...
channel C1: reading from backup piece /backup/BKP_06102015_1276674215_98210_t2qj01fp_1_1.bkp
...
Finished restore at 08-OCT-15
contents of Memory Script:
{
switch clone datafile all;
}
executing Memory Script
...
released channel: C1
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of Duplicate Db command at 10/14/2015 01:40:41
RMAN-05501: aborting duplication of target database
RMAN-03015: error occurred in stored script Memory Script
RMAN-20506: no backup of archived log found
RMAN-06053: unable to perform media recovery because of missing log
RMAN-06025: no backup of archived log for thread 1 with sequence 109545 and starting SCN of 54055457212 found to restore
10th: When you run the command DUPLICATE DATABASE with DORECOVER clause, all archives will be applied after restore. But it can fail if you do something wrong. So, just to insure that at leat all archives which was genereted together with backup was applied, let's do it manually.
$ rman target /
Recovery Manager: Release 11.2.0.4.0 - Production on Thu Oct 8 22:51:52 2015
Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.
connected to target database: PROD (DBID=1276674215, not open)
RMAN> catalog start with '/backup/archive';
Starting implicit crosscheck backup at 14-OCT-15
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=3395 device type=DISK
Crosschecked 173 objects
Finished implicit crosscheck backup at 14-OCT-15
Starting implicit crosscheck copy at 14-OCT-15
using channel ORA_DISK_1
Crosschecked 12 objects
Finished implicit crosscheck copy at 14-OCT-15
searching for all files in the recovery area
cataloging files...
no files cataloged
searching for all files that match the pattern /backup/archive
List of Files Unknown to the Database
=====================================
File Name: /backup/archive/ARCH_PROD_1276674215_99307_1_vbqjbmqr.BKP
...
Do you really want to catalog the above files (enter YES or NO)? YES
cataloging files...
cataloging done
List of Cataloged Files
=======================
File Name: /backup/archive/ARCH_BIOPROD_1276674215_99307_1_vbqjbmqr.BKP
...
RMAN> recover database;
Starting recover at 14-OCT-15
using channel ORA_DISK_1
starting media recovery
channel ORA_DISK_1: starting archived log restore to default destination
channel ORA_DISK_1: restoring archived log
channel ORA_DISK_1: reading from backup piece /backup/archive/ARCH_BIOPROD_1276674215_99508_1_5kqjdlul.BKP
channel ORA_DISK_1: piece handle=/backup/archive/ARCH_BIOPROD_1276674215_99508_1_5kqjdlul.BKP tag=BACKUP_ARCHIVELOG_BIONEXO
channel ORA_DISK_1: restored backup piece 1
...
unable to find archived log
archived log thread=1 sequence=109579
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 10/15/2015 21:20:06
RMAN-06054: media recovery requesting unknown archived log for thread 1 with sequence 109579 and starting SCN of 54055767606
11th: Now we need to configure our broker configuration. At your primary environment, run the dgmgrl command to access the broker and create a new configuration with the following commands:
$ dgmgrl /
DGMGRL for Linux: Version 11.2.0.4.0 - 64bit Production
Copyright (c) 2000, 2009, Oracle. All rights reserved.
Welcome to DGMGRL, type "help" for information.
Connected.
DGMGRL> create configuration MY_CONFIG as primary database is PROD connect identifier is PROD;
Configuration "MY_CONFIG" created with primary database "PROD"
DGMGRL> add database DATAGUARD as connect identifier is DATAGUARD maintained as physical;
Database "DATAGUARD" added
DGMGRL> show database DATAGUARD;
Database - DATAGUARD
Role: PHYSICAL STANDBY
Intended State: OFFLINE
Transport Lag: (unknown)
Apply Lag: (unknown)
Apply Rate: (unknown)
Real Time Query: OFF
Instance(s):
DATAGUARD
Database Status:
DISABLED
DGMGRL> enable database DATAGUARD;
Enabled.
DGMGRL> edit database prodbr set state='transport-on';
Succeeded.
DGMGRL> edit database dgprodbr set state='apply-on';
Succeeded.
DGMGRL> show database verbose DATAGUARD;
Database - dgprodbr
Role: PHYSICAL STANDBY
Intended State: APPLY-ON
Transport Lag:
5 days 16 hours 32 minutes 12 seconds (computed 1 second ago)
Apply Lag: 5 days 20 hours 12 minutes 3 seconds (computed 1 second ago)
Apply Rate: (unknown)
Real Time Query: ON
Instance(s):
DATAGUARD
Database Warning(s):
ORA-16855: transport lag has exceeded specified threshold
Properties:
DGConnectIdentifier = 'dataguard'
ObserverConnectIdentifier = ''
LogXptMode = 'ASYNC'
DelayMins = '0'
Binding = 'OPTIONAL'
MaxFailure = '0'
MaxConnections = '5'
ReopenSecs = '15'
NetTimeout = '900'
RedoCompression = 'DISABLE'
LogShipping = 'ON'
PreferredApplyInstance = ''
ApplyInstanceTimeout = '0'
ApplyParallel = 'AUTO'
StandbyFileManagement = 'auto'
ArchiveLagTarget = '0'
LogArchiveMaxProcesses = '10'
LogArchiveMinSucceedDest = '1'
DbFileNameConvert = '+DATA/prodbr/, +DATA/dgprodbr/, +DATA/dataguard/, +DATA/dgprodbr/'
LogFileNameConvert = '+DATA/prodbr/, +DATA/dgprodbr/, +DATA/dataguard/, +DATA/dgprodbr/, +REDO/prodbr/, +REDO/dgprodbr/, +REDO/dataguard/, +REDO/dgprodbr/, +RECO/prodbr/,+RECO/dgprodbr/, +RECO/dataguard/, +RECO/dgprodbr/'
FastStartFailoverTarget = ''
InconsistentProperties = '(monitor)'
InconsistentLogXptProps = '(monitor)'
SendQEntries = '(monitor)'
LogXptStatus = '(monitor)'
RecvQEntries = '(monitor)'
ApplyLagThreshold = '0'
TransportLagThreshold = '300'
TransportDisconnectedThreshold = '30'
SidName = 'dataguard'
StaticConnectIdentifier = '(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=dataguard.com.br)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=dataguard_DGMGRL)(INSTANCE_NAME=dataguard)(SERVER=DEDICATED)))'
StandbyArchiveLocation = 'USE_DB_RECOVERY_FILE_DEST'
AlternateLocation = ''
LogArchiveTrace = '0'
LogArchiveFormat = 'arch_%t_%s_%r.dbf'
TopWaitEvents = '(monitor)'
Database Status:
WARNING
Warning!: If you get error "ORA-01033: ORACLE initialization or shutdown in progress" right after you try to add your database configuration, means your standby database is not open. Startup (open read only) and then retry.
If you get error "Error: ORA-01017: invalid username/password; logon denied" right after you try to add your database configuration, means that your password orapw file is wrong. You can try to recreate it (8th step) or you can just copy your orapw file from primary environment to your standby environment (scp or rsync).
In my case, the database status showing WARNING is normal, because I just configured the broker and the difference between my applied backup date (and applied archives) and my production database, is 5 days and 20 hours, since my transport lag threshold is 300 seconds (5 minutes), it must alarm.
I have some properties I like to configure manually, like:
edit database dgprodbr set property LogXptMode='ASYNC';
edit database dgprodbr set property MaxConnections = 5;
edit database dgprodbr set property ReopenSecs = 15;
edit database dgprodbr set property TransportDisconnectedThreshold = 30;
edit database dgprodbr set property ArchiveLagTarget=120;
edit database dgprodbr set property TransportLagThreshold=300;
edit database dgprodbr set property NetTimeout=900;
edit database dgprodbr set property StaticConnectIdentifier='(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=dataguard.com.br)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=dataguard_DGMGRL)(INSTANCE_NAME=dataguard)(SERVER=DEDICATED)))';
To understand more about these parameters, just check oracle documentation.
12th: If you dont have Oracle Grid configured, then jump this step.
Now we are going to configure our Server Control Utility, doing the following command on oracle user:
$ srvctl add database -d dataguard -o $ORACLE_HOME -r PHYSICAL_STANDBY -p "+DATA/dataguard/spfiledataguard.ora" -a "DATA","RECO","REDO"
$ srvctl modify database -d dgprodbr -s 'open','read only'
$ srvctl config database -d dataguard
Database unique name: dataguard
Database name:
Oracle home: /u01/app/oracle/product/11.2.0/db_1
Oracle user: oracle
Spfile: +DATA/dataguard/spfiledataguard.ora
Domain:
Start options: open,read only
Stop options: immediate
Database role: PHYSICAL_STANDBY
Management policy: AUTOMATIC
Database instance: dgprodbr
Disk Groups: DATA,RECO,REDO
Services:
Now you can add some services if you want, but remember: services are related with your database role (PRIMARY, PHYSICAL STANDBY, LOGICAL STANDBY or SNAPSHOT STANDBY), e.g.:
$ srvctl add service -d dataguard -s srv_dataguard -l PHYSICAL_STANDBY
$ srvctl config service -d dataguard -s srv_dataguard
Service name: srv_dataguard
Service is enabled
Cardinality: SINGLETON
Disconnect: false
Service role: PHYSICAL_STANDBY
Management policy: AUTOMATIC
DTP transaction: false
AQ HA notifications: false
Failover type: NONE
Failover method: NONE
TAF failover retries: 0
TAF failover delay: 0
Connection Load Balancing Goal: LONG
Runtime Load Balancing Goal: NONE
TAF policy specification: NONE
Edition:
$ srvctl start service -d dgprodbr -s srv_dataguard
And it's done! you can use your broker to check if your archive log is been transported, like:
dgmgrl /
DGMGRL for Linux: Version 11.2.0.4.0 - 64bit Production
Copyright (c) 2000, 2009, Oracle. All rights reserved.
Welcome to DGMGRL, type "help" for information.
Connected.
DGMGRL> show database dataguard;
Database - dataguard
Role: PHYSICAL STANDBY
Intended State: APPLY-ON
Transport Lag: 2 minutes 15 seconds (computed 1 second ago)
Apply Lag: 0 seconds (computed 1 second ago)
Apply Rate: 545.00 KByte/s
Real Time Query: ON
Instance(s):
dataguard
Database Status:
SUCCESS
Considerations:
* Always check your alert_$ORACLE_SID.log file, you can always get a good picture about what are happening with your replication (on both nodes, i.e., standby and primary database)
* Whenever you change your parameter which is shared with your broker, e.g., db_file_name_convert, you need to change the equivalent parameter in broker (in this case, DbFileNameConvert), or your broker database status will be set to WARNING, and some replications can fail if you do, for example, a shutdown and startup.