sexta-feira, 16 de outubro de 2015

Create standby database using Active Data Guard

Database version: 11.2.0.x (11gR2) using ASM
OS: RedHat Enterprise Linux 6.x

One thing which is indispensable nowadays at any (reputable) company that works with IT is a copy of your primary database at a different datacenter/availability zone (cloud).

To do that, you have three options: create your own replication script (which can lead to an enormous amount of problems); using a non homologated software created by some company (which can be dangerous too), or buy an Oracle Data Guard license (which can be expensive).

Any one of those three strategies can present a considerable problem.
So, let's suppose that you have an Oracle Data Guard license. Now we can use a broker to control all replication between our primary and our new standby database.

Before starting our tutorial, let me define one simple rule: you have to install the same Oracle software version at your new machine (which is going to be your standby database), and preferably keep your storage solution (like ASM, but that is just to keep it beautiful - it is not mandatory).

Let's assume that we have a single instance primary database with a SID called "prod" and you want to create a single instance standby database with a SID called "dataguard", and you have 3 kinds of diskgroup: DATA, RECO and REDO. DATA will keep our datafiles, spfile, and these kind of things; RECO will keep our archives, and REDO will keep our redo log files.



1st: You have to do is to create some groups of standby logfile (depending on your archive generation, you may have to increase theses numbers):


ALTER DATABASE ADD STANDBY LOGFILE GROUP 1 ('+REDO', '+REDO');


2nd: Backup your primary database using rman. You have many ways to do that, but for example:

$ rman target /

Recovery Manager: Release 11.2.0.4.0 - Production on Wed Oct 7 14:46:41 2015

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

connected to target database: BIOPROD (DBID=1276674215)
RMAN> backup as compressed backupset database format '$BACKUP_HOME/BKP_%D%M%Y_%I_%s_%U.bkp' include current controlfile;

After that, transfer your backup (and archives) to your standby database machine (to the same directory).

3th: Configure your listener.ora adding the following settings.


$ vi $ORACLE_HOME/network/admin/listener.ora

SID_LIST_LISTENER =
  (SID_LIST=
    (SID_DESC=
      (GLOBAL_DBNAME=dataguard)
      (ORACLE_HOME=/u01/app/oracle/product/11.2.0/db_1/)
      (SID_NAME=dataguard))
    (SID_DESC=
      (GLOBAL_DBNAME=dataguard_DGMGRL)
      (ORACLE_HOME=/u01/app/oracle/product/11.2.0/db_1/)
      (SID_NAME=dataguard))
    (SID_DESC=
      (GLOBAL_DBNAME=dataguard_DGB)
      (ORACLE_HOME=/u01/app/oracle/product/11.2.0/db_1/)
      (SID_NAME=dataguard)))


Change your ORACLE_HOME variable to your own. The important here is to configure three kinds of service: dataguard, dataguard_DGMGRL and dataguard_DGB (they are essentials to broker).
Restart your listener.

4th: Configure your tnsnames.ora file on both machines (production and standby environment) with your database settings.


$ vi $ORACLE_HOME/network/admin/tnsnames.ora

dataguard =
(DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = dataguard.com.br)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = dataguard)))

prod =
(DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = prod.com.br)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = prod)))


5th: At your primary database, set the following parameters:

ALTER SYSTEM SET LOG_ARCHIVE_CONFIG='dg_config=(prod,dataguard)';
ALTER SYSTEM SET LOG_ARCHIVE_DEST_2='service="dataguard", LGWR ASYNC NOAFFIRM delay=0 optional compression=disable max_failure=0 max_connections=5 reopen=15 db_unique_name="dataguard", valid_for=(all_logfiles,primary_role)';
ALTER SYSTEM SET LOG_ARCHIVE_DEST_STATE_2='DEFER';
ALTER SYSTEM SET DB_FILE_NAME_CONVERT='+DATA/dataguard','+DATA/prodbr' scope=spfile;
ALTER SYSTEM SET LOG_FILE_NAME_CONVERT='+RECO/dataguard','+RECO/prodbr' scope=spfile;
ALTER SYSTEM SET STANDBY_FILE_MANAGEMENT=AUTO;
ALTER SYSTEM SET LOG_ARCHIVE_MIN_SUCCEED_DEST=1;


A brief explanation about these parameters:
  - The LOG_ARCHIVE_CONFIG parameter need to contain your DB_UNIQUE_NAME parameter from production and standby database. In this case, our SID and DB_UNIQUE_NAME are equal.
  -The LOG_ARCHIVE_DEST_x configures a series of parameters about our replication, like if our synchronization will be asynchronous or synchronous (for more information, see the documentation). The important here is to know that data guard broker will synchronize his configuration with this parameter: if we change any property later, it will reflect on this parameter.
  - The LOG_ARCHIVE_DEST_STATE_x just enable or disable your LOG_ARCHIVE_DEST_x configuration.
  - The DB_FILE_NAME_CONVERT is a very important parameter to configure here. Be careful: wrong conversions could lead to a bunch of problems. Here, we need to configure what is our datafile directory source and say what will be our target source: in this example, we are reading our datafiles from +DATA/dataguard and we going to save it now at +DATA/prodbr. It's very important to configure this parameter if we are thinking to switchover the environments (change primary to standby, and the standby to primary roles).
  - The LOG_FILE_NAME_CONVERT is like DB_FILE_NAME_CONVERT, except here we specify the red log files destinations. If you want to increase your security, you could configure this parameter with '+DATA' information (in case you have some redo log files lost in wrong directories, for example).
  - The STANDBY_FILE_MANAGEMENT sets your file name configuration. Instead you define your datafile name or redo file name, the database will take care of it. So, when we create a datafile at +DATA, we don't need to specify  the file path (like '+DATA/prod/datafile/datafile_01.dbf'); for example, just say it will be create at '+DATA' and oracle will create a file with a random id at the right location (like '+DATA/datafile/tablespace_name.123.4567891011').
  - The LOG_ARCHIVE_MIN_SUCCEED_DEST defines the minimum success destination to archive logs before set the current redo log file to be available to use again. This prevent you lost data if something going wrong. If you configure more than one standby, and consequently configure more LOG_ARCHIVE_DEST_x parameters, consider increase this number.

Like I said, to understand better about these parameters, check Oracle documentation.


6th: Create your standby database pfile from your primary database, or create a new pfile with the minimum parameters to start it. For example:


$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.4.0 Production on Wed Oct 7 17:52:26 2015

Copyright (c) 1982, 2013, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Automatic Storage Management, OLAP, Data Mining
and Real Application Testing options

SQL> create pfile = '/tmp/dataguard_pfile.ora' from spfile;

...

$ vi /tmp/dataguard_pfile.ora

*.db_name='
prod'
*.memory_target=5G
*.processes = 500
*.audit_file_dest='/u01/app/oracle/admin/dataguard/adump'
*.audit_trail ='db'
*.db_block_size=8192
*.db_domain=''
*.db_recovery_file_dest='+RECO'
*.db_recovery_file_dest_size=100G
*.diagnostic_dest='/u01/app/oracle'
*.open_cursors=300
*.remote_login_passwordfile='EXCLUSIVE'
*.undo_tablespace='UNDOTBS1'
*.control_files = '+DATA/dataguard/control01.ctl', '+REDO/dataguard/control02.ctl'
*.compatible ='11.2.0.4'
*.log_archive_config='dg_config=(prod,dataguard)'
*.log_archive_dest_1='location=USE_DB_RECOVERY_FILE_DEST','valid_for=(ALL_LOGFILES, ALL_ROLES)'
*.log_archive_dest_2='service="prod", LGWR ASYNC NOAFFIRM delay=0 optional compression=disable max_failure=0 max_connections=5 reopen=15 db_unique_name="prod", valid_for=(all_logfiles,primary_role)'
*.log_archive_dest_state_1='enable'
*.log_archive_dest_state_2='defer'
*.db_file_name_convert='+DATA/prod','+DATA/dataguard'
*.log_file_name_convert='+DATA/prod','+DATA/dataguard','+RECO/prod','+RECO/dataguard','+REDO/prod','+REDO/dataguard'
*.standby_file_management=auto
*.log_archive_min_succeed_dest=1

*.db_unique_name='dataguard'

Remember two things: your db_name MUST be exactly like your production environment; and you need to create the audit_file_dest directory (in this case, /u01/app/oracle/admin/dataguard/adump'), otherwise you will receive a warning.

7th: Move your standby pfile to your standby machine, and then startup the new instance on nomount mode.


$ export ORACLE_SID=dataguard
$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.4.0 Production on Thu Oct 8 16:47:22 2015

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup nomount pfile='/home/oracle/dataguard_pfile.ora';
...
SQL> create spfile='+DATA/dataguard/spfiledataguard.ora' from pfile='/home/oracle/dataguard_pfile.ora';
File created.

SQL> shutdown immediate;
ORA-01507: database not mounted


ORACLE instance shut down.
SQL> startup nomount;



8th: Create your oracle password file (orapw) for SYS user, for both environments (you can use the same password from prod, just to keep it simple):


$ orapwd file=$ORACLE_HOME/dbs/orapwdataguard_nv password=dataguard_sys_password
$ orapwd file=$ORACLE_HOME/dbs/orapwprod_nv password=prod_sys_password


Do it on both environments (probably your prod machine will already have orapwprod file, but there is no problem if you overwrite it or not).

9th: Duplicate (restore) your dataguard database! Use the exactly command below:


$ rman target sys/prod_sys_password@prod auxiliary sys/dataguard_sys_password@dataguard

Recovery Manager: Release 11.2.0.4.0 - Production on Thu Oct 8 17:14:34 2015

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

connected to target database: PROD (DBID=12345678910)
connected to auxiliary database: PROD (not mounted)

RMAN> DUPLICATE TARGET DATABASE FOR STANDBY NOFILENAMECHECK DORECOVER;

Starting Duplicate Db at 08-OCT-15

contents of Memory Script:
{
   set until scn  54045879909;
   restore clone standby controlfile;
}
executing Memory Script

executing command: SET until clause

Starting restore at 08-OCT-15

channel C1: restoring control file
channel C1: copied control file copy
input file name=/backup/controlfile_bkp.bkp
output file name=+DATA/dataguard/control01.ctl
output file name=+REDO/dataguard/control02.ctl
Finished restore at 08-OCT-15

contents of Memory Script:
{
   sql clone 'alter database mount standby database';
}
executing Memory Script

sql statement: alter database mount standby database
RMAN-05529: WARNING: DB_FILE_NAME_CONVERT resulted in invalid ASM names; names changed to disk group only.

contents of Memory Script:
{
   set until scn  54045879909;
   set newname for tempfile  1 to
 "+data";

...

   restore
   clone database
   ;
}
executing Memory Script

executing command: SET until clause

executing command: SET NEWNAME

...

renamed tempfile 1 to +data in control file

...

executing command: SET NEWNAME

...

channel C1: starting datafile backup set restore
channel C1: specifying datafile(s) to restore from backup set
channel C1: restoring datafile 00001 to +data

...

channel C1: reading from backup piece /backup/BKP_06102015_1276674215_98210_t2qj01fp_1_1.bkp

...

Finished restore at 08-OCT-15

contents of Memory Script:
{
   switch clone datafile all;
}
executing Memory Script

...

released channel: C1
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of Duplicate Db command at 10/14/2015 01:40:41
RMAN-05501: aborting duplication of target database
RMAN-03015: error occurred in stored script Memory Script
RMAN-20506: no backup of archived log found
RMAN-06053: unable to perform media recovery because of missing log

RMAN-06025: no backup of archived log for thread 1 with sequence 109545 and starting SCN of 54055457212 found to restore


Do not be afraid: those final RMAN errors are "normal" (unfortunately). It means that you dont have the necessary archive with SCN ### to restore, which will be impossible to get anyway (to understand about that, please read more about archives and how they work).

10th: When you run the command DUPLICATE DATABASE with DORECOVER clause, all archives will be applied after restore. But it can fail if you do something wrong. So, just to insure that at leat all archives which was genereted together with backup was applied, let's do it manually.


$ rman target /

Recovery Manager: Release 11.2.0.4.0 - Production on Thu Oct 8 22:51:52 2015

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

connected to target database: PROD (DBID=1276674215, not open)

RMAN> catalog start with '/backup/archive';

Starting implicit crosscheck backup at 14-OCT-15
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=3395 device type=DISK
Crosschecked 173 objects
Finished implicit crosscheck backup at 14-OCT-15

Starting implicit crosscheck copy at 14-OCT-15
using channel ORA_DISK_1
Crosschecked 12 objects
Finished implicit crosscheck copy at 14-OCT-15

searching for all files in the recovery area
cataloging files...
no files cataloged

searching for all files that match the pattern /backup/archive

List of Files Unknown to the Database
=====================================
File Name: /backup/archive/ARCH_PROD_1276674215_99307_1_vbqjbmqr.BKP

...

Do you really want to catalog the above files (enter YES or NO)? YES
cataloging files...

cataloging done

List of Cataloged Files
=======================
File Name: /backup/archive/ARCH_BIOPROD_1276674215_99307_1_vbqjbmqr.BKP

...

RMAN> recover database;

Starting recover at 14-OCT-15
using channel ORA_DISK_1

starting media recovery
channel ORA_DISK_1: starting archived log restore to default destination
channel ORA_DISK_1: restoring archived log
channel ORA_DISK_1: reading from backup piece /backup/archive/ARCH_BIOPROD_1276674215_99508_1_5kqjdlul.BKP
channel ORA_DISK_1: piece handle=/backup/archive/ARCH_BIOPROD_1276674215_99508_1_5kqjdlul.BKP tag=BACKUP_ARCHIVELOG_BIONEXO

channel ORA_DISK_1: restored backup piece 1

...

unable to find archived log
archived log thread=1 sequence=109579
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 10/15/2015 21:20:06

RMAN-06054: media recovery requesting unknown archived log for thread 1 with sequence 109579 and starting SCN of 54055767606


11th: Now we need to configure our broker configuration. At your primary environment, run the dgmgrl command to access the broker and create a new configuration with the following commands:

$ dgmgrl /
DGMGRL for Linux: Version 11.2.0.4.0 - 64bit Production

Copyright (c) 2000, 2009, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
Connected.
DGMGRL> create configuration MY_CONFIG as primary database is PROD connect identifier is PROD;

Configuration "MY_CONFIG" created with primary database "PROD"

DGMGRL> add database DATAGUARD as connect identifier is DATAGUARD maintained as physical;
Database "DATAGUARD" added
DGMGRL> show database DATAGUARD;

Database - DATAGUARD

  Role:            PHYSICAL STANDBY
  Intended State:  OFFLINE
  Transport Lag:   (unknown)
  Apply Lag:       (unknown)
  Apply Rate:      (unknown)
  Real Time Query: OFF
  Instance(s):
    DATAGUARD

Database Status:
DISABLED

DGMGRL> enable database DATAGUARD;
Enabled.
DGMGRL> edit database prodbr set state='transport-on';
Succeeded.
DGMGRL> edit database dgprodbr set state='apply-on';
Succeeded.
DGMGRL> show database verbose DATAGUARD;

Database - dgprodbr

  Role:            PHYSICAL STANDBY
  Intended State:  APPLY-ON
  Transport Lag:  
5 days 16 hours 32 minutes 12 seconds (computed 1 second ago)
  Apply Lag:       5 days 20 hours 12 minutes 3 seconds (computed 1 second ago)
  Apply Rate:      (unknown)
  Real Time Query: ON
  Instance(s):
    DATAGUARD

  Database Warning(s):
    ORA-16855: transport lag has exceeded specified threshold

  Properties:
    DGConnectIdentifier             = 'dataguard'
    ObserverConnectIdentifier       = ''
    LogXptMode                      = 'ASYNC'
    DelayMins                       = '0'
    Binding                         = 'OPTIONAL'
    MaxFailure                      = '0'
    MaxConnections                  = '5'
    ReopenSecs                      = '15'
    NetTimeout                      = '900'
    RedoCompression                 = 'DISABLE'
    LogShipping                     = 'ON'
    PreferredApplyInstance          = ''
    ApplyInstanceTimeout            = '0'
    ApplyParallel                   = 'AUTO'
    StandbyFileManagement           = 'auto'
    ArchiveLagTarget                = '0'
    LogArchiveMaxProcesses          = '10'
    LogArchiveMinSucceedDest        = '1'
    DbFileNameConvert               = '+DATA/prodbr/, +DATA/dgprodbr/, +DATA/dataguard/, +DATA/dgprodbr/'
    LogFileNameConvert              = '+DATA/prodbr/, +DATA/dgprodbr/, +DATA/dataguard/, +DATA/dgprodbr/, +REDO/prodbr/, +REDO/dgprodbr/, +REDO/dataguard/, +REDO/dgprodbr/, +RECO/prodbr/,+RECO/dgprodbr/, +RECO/dataguard/, +RECO/dgprodbr/'
    FastStartFailoverTarget         = ''
    InconsistentProperties          = '(monitor)'
    InconsistentLogXptProps         = '(monitor)'
    SendQEntries                    = '(monitor)'
    LogXptStatus                    = '(monitor)'
    RecvQEntries                    = '(monitor)'
    ApplyLagThreshold               = '0'
    TransportLagThreshold           = '300'
    TransportDisconnectedThreshold  = '30'
    SidName                         = 'dataguard'
    StaticConnectIdentifier         = '(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=dataguard.com.br)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=dataguard_DGMGRL)(INSTANCE_NAME=dataguard)(SERVER=DEDICATED)))'
    StandbyArchiveLocation          = 'USE_DB_RECOVERY_FILE_DEST'
    AlternateLocation               = ''
    LogArchiveTrace                 = '0'
    LogArchiveFormat                = 'arch_%t_%s_%r.dbf'
    TopWaitEvents                   = '(monitor)'

Database Status:
WARNING


Warning!: If you get error "ORA-01033: ORACLE initialization or shutdown in progress" right after you try to add your database configuration, means your standby database is not open. Startup (open read only) and then retry.

If you get error "Error: ORA-01017: invalid username/password; logon denied" right after you try to add your database configuration, means that your password orapw file is wrong. You can try to recreate it (8th step) or you can just copy your orapw file from primary environment to your standby environment (scp or rsync).

In my case, the database status showing WARNING is normal, because I just configured the broker and the difference between my applied backup date (and applied archives) and my production database, is 5 days and 20 hours, since my transport lag threshold is 300 seconds (5 minutes), it must alarm.


I have some properties I like to configure manually, like:


  edit database dgprodbr set property LogXptMode='ASYNC';
  edit database dgprodbr set property MaxConnections = 5;
  edit database dgprodbr set property ReopenSecs = 15;
  edit database dgprodbr set property TransportDisconnectedThreshold = 30;
  edit database dgprodbr set property ArchiveLagTarget=120;
  edit database dgprodbr set property TransportLagThreshold=300;
  edit database dgprodbr set property NetTimeout=900;
  edit database dgprodbr set property StaticConnectIdentifier='(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=dataguard.com.br)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=dataguard_DGMGRL)(INSTANCE_NAME=dataguard)(SERVER=DEDICATED)))';


To understand more about these parameters, just check oracle documentation.

12th: If you dont have Oracle Grid configured, then jump this step.

Now we are going to configure our Server Control Utility, doing the following command on oracle user:

$ srvctl add database -d dataguard -o $ORACLE_HOME -r PHYSICAL_STANDBY -p "+DATA/dataguard/spfiledataguard.ora" -a "DATA","RECO","REDO"
$ srvctl modify database -d dgprodbr -s 'open','read only'
$ srvctl config database -d dataguard
Database unique name: dataguard
Database name:
Oracle home: /u01/app/oracle/product/11.2.0/db_1
Oracle user: oracle
Spfile: +DATA/dataguard/spfiledataguard.ora
Domain:
Start options: open,read only
Stop options: immediate
Database role: PHYSICAL_STANDBY
Management policy: AUTOMATIC
Database instance: dgprodbr
Disk Groups: DATA,RECO,REDO
Services:


Now you can add some services if you want, but remember: services are related with your database role (PRIMARY, PHYSICAL STANDBY, LOGICAL STANDBY or SNAPSHOT STANDBY), e.g.:

$ srvctl add service -d dataguard -s srv_dataguard -l PHYSICAL_STANDBY
$ srvctl config service -d dataguard -s srv_dataguard
Service name: srv_dataguard
Service is enabled
Cardinality: SINGLETON
Disconnect: false
Service role: PHYSICAL_STANDBY
Management policy: AUTOMATIC
DTP transaction: false
AQ HA notifications: false
Failover type: NONE
Failover method: NONE
TAF failover retries: 0
TAF failover delay: 0
Connection Load Balancing Goal: LONG
Runtime Load Balancing Goal: NONE
TAF policy specification: NONE
Edition:
$ srvctl start service -d dgprodbr -s srv_dataguard


In this case, this service will be only available when your database is in physical standby mode. If you open it to snapshot standby, this service will not be available anymore.

And it's done! you can use your broker to check if your archive log is been transported, like:


dgmgrl /
DGMGRL for Linux: Version 11.2.0.4.0 - 64bit Production

Copyright (c) 2000, 2009, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
Connected.
DGMGRL> show database dataguard;

Database - dataguard

  Role:            PHYSICAL STANDBY
  Intended State:  APPLY-ON
  Transport Lag:   2 minutes 15 seconds (computed 1 second ago)
  Apply Lag:       0 seconds (computed 1 second ago)
  Apply Rate:      545.00 KByte/s
  Real Time Query: ON
  Instance(s):
    dataguard

Database Status:
SUCCESS


Considerations:
* Always check your alert_$ORACLE_SID.log file, you can always get a good picture about what are happening with your replication (on both nodes, i.e., standby and primary database)
* Whenever you change your parameter which is shared with your broker, e.g., db_file_name_convert, you need to change the equivalent parameter in broker (in this case, DbFileNameConvert), or your broker database status will be set to WARNING, and some replications can fail if you do, for example, a shutdown and startup.