
Analyzing the recovery process involves determining the factors that influence the recovery process. Database size, system complexity, database structure, and application structure are the main factors that influence the mean time to recover (MTTR).
The MTTR can be very critical to the operation of systems that need high availability. You can reduce the MTTR in several ways:
Table 14.1 describes the techniques that can be used for high
availability. These strategies should be used to quickly recover
in the event of a database failure.
| Object-level recovery using Export, Import, and SQL*Loader | Uses Export/ Import to protect data | Fast object- level recovery | Difficult to scale; you must be aware of object associations |
| Failover systems using hardware redundancy | Failover provided by using another node | No data loss due to redundant system | No scalability; costly |
| Oracle standby databases | Primary database's redo log keeps another database updated, which can be used during recovery | Fast recovery; failover; disaster recovery possible | Data loss possible; complicated setup and maintenance; potential of replicating data-base corruption |
| Oracle Symmetric Replication | Uses Oracle's replication feature to provide high availability | No data loss; failover; disaster recovery possible; both databases can be used simultaneously | Slow recovery due to use of transactions; use of two-phase commit can lead to additional problems while maintaining the database's consistency |
| Oracle Parallel Server | Clustering solution that allows failover to another instance; recovery can proceed simultaneously and is done by the surviving instances | No data loss; fast failover; protects against node and cache failures; high scalability; load balancing | Tuning can be difficult; application design plays significant part in strategy's success |
| Triple mirroring | Uses a third hardware that is a mirror | Fast hot backups; fast recovery | Cost of triple writes and resilvering |
| EMC SRDF facility | Physical I/O- based replication | No data loss; failover; disaster recovery possible; faster than Oracle Symmetric Replication | Potential of replicating database corruption |
| Customized store-and- forward replication | Makes use of Oracle8 features such as advanced queuing or trigger-based asynchronous replication | No data loss; fast recovery | Complex; serializing of transactions |
General steps to recover a database system
| Location factors |
If the backup is on a disk, is the disk on-site or off-site? Is the disk local or network? Do you have mirrored copies? Are you recovering from a cold or a hot backup? If the backup is on tape, is the tape on-site or off-site? Do you need additional components to access the tape? |
You need to answer several questions to determine the type and extent of failure. Your answers dictate the steps you take to recover the system:
Are there any errors in the operating system log?
Database failures can be detected if you get one of various errors. The following list shows some of the common errors encountered and their solutions:
The procedure to recover from the loss of general data files depends on the type of tablespace from which the data file is lost-rollback tablespace, user tablespace, index tablespace, or read-only tablespace. Several symptoms can indicate this problem. You might get the following errors:
These errors can indicate a loss of a general data file. It's up to you to determine the type of tablespace involved.
If you determine that the tablespace contains user data, the steps you follow to recover the tablespace depend on whether you have a good backup and whether that backup was cold or hot.
In this case you're in the NOARCHIVELOG mode. The data file recovery will be complete if the redo to be applied is within the range of your online redo logs.
Recover with a cold backup
SELECT X.GROUP#, MEMBER, SEQUENCE#, FIRST_CHANGE# FROM V$LOG X, V$LOGILE Y WHERE X.GROUP# = Y.GROUP#;
SELECT FILE#, CHANGE# FROM V$RECOVER_FILE;
RECOVER DATAFILE 'fullpath of the datafile'
ALTER DATABASE OPEN
In this case you're in the ARCHIVELOG mode. The data file recovery will be complete if the redo to be applied is within the range of your online logs.
Recover with a hot backup
SELECT X.GROUP#, MEMBER, SEQUENCE#, FIRST_CHANGE# FROM V$LOG X, V$LOGILE Y WHERE X.GROUP# = Y.GROUP#;
RECOVER DATAFILE 'fullpath of the datafile'
| Use the online redo logs during recovery |
If while performing step 5 you're prompted for a non-existing archived log, you need to use the online redo logs to continue with the recovery. Compare the sequence number referenced in the ORA-280 message with the sequence numbers of your online redo logs. Supply the full path name of one of the members of the redo log group whose sequence number matches it. |
ALTER DATABASE OPEN
If redo data is missing, the recovery won't be complete as described in the preceding steps and you'll have to recreate the tablespace. To recreate the tablespace, you can either use a good export script that can easily load the data and recreate the objects in that tablespace or load the data through SQL*Loader.
Recover with missing redo data
Svrmgrl> Startup mount
Svrmgrl> ALTER DATABASE DATAFILE 'fullpath of datafile'
OFFLINE DROP;
Svrmgrl> ALTER DATABASE OPEN;
Svrmgrl> DROP TABLESPACE tablespace_name
INCLUDING CONTENTS;
Because read-only tablespaces are never modified, the recovery solution is very simple: Restore the data file from its last backup to its original location. No media recovery is required in this case. There are two exceptions to this procedure, however:
In either scenario, recovery can be performed by following the same steps as described earlier in the section "Recovering from a Lost Data File in a User Tablespace," based on the backup type you have.
If you determine that the tablespace contains user indexes, you should use the following steps, depending on whether you have a good backup and whether it's a cold or hot backup.
In this case you're in NOARCHIVELOG mode. The data file recovery will be complete if the redo to be applied is within the range of your online redo logs. To recover from this situation, follow the same steps as described earlier in the "Recovering with a Cold Backup" section of "Recovering from a Lost Data File in a User Tablespace."
In this case you're in ARCHIVELOG mode. The data file recovery will be complete if the redo to be applied is within the range of your online logs. To recover from this situation, follow the same steps as described earlier in the "Recovering with a Hot Backup" section of "Recovering from a Lost Data File in a User Tablespace."
If redo data is missing, the recovery won't be complete. You need to recreate the tablespace. To do so, you can either use a script that can be used to easily create the index or manually create the indexes by issuing CREATE INDEX statements.
Recover with missing redo data
Svrmgrl> Startup mount
Svrmgrl> ALTER DATABASE DATAFILE 'fullpath of datafile'
OFFLINE DROP;
Svrmgrl> ALTER DATABASE OPEN;
Svrmgrl> DROP TABLESPACE tablespace_name
INCLUDING CONTENTS;
This recovery scenario is the most critical one, and you should really work with Oracle Support while performing such a recovery. The main issue involved is that you must make sure the active transactions in the rollback segment aren't lost.
In this case, the procedure to follow depends on when the loss was detected.
You're trying to start the database and get ORA-1157, ORA-1110 and operating system errors, as well as determine that the tablespace contains rollback segments. One thing you have to determine is how the database was shut down.
You're certain that the database was shut down via shutdown normal or shutdown immediate. Check the alert log and look at the last shutdown entry. The following log entry indicates that the shutdown was clean:
'alter database dismount completed: alter database dismount"
This may be followed by an attempt you made to start, resulting in the ORA errors and a subsequent SHUTDOWN ABORT by Oracle.
Recover a database that has been shut down cleanly
Svrmgrl> STARTUP RESTRICT MOUNT
Svrmgrl> ALTER DATABASE DATAFILE 'fullpath of datafile' FFLINE DROP;
Svrmgrl> ALTER DATABASE OPEN
_Corrupted_rollback_segments = (rollback1,rollback2,...,rollbackN)
Svrmgrl> startup restrict mount
Svrmgrl> drop tablespace tablespace_name including contents;
Svrmgrl> alter system disable restricted session;
_Corrupted_rollback_segments = (rollback1,rollback2,...,rollbackN)
In this scenario, the database was shut down, aborted, or crashed. You can't offline or drop the lost data file because it's almost certain that the rollback segments with extents in the lost data file contain active transactions; you must restore the lost data file from backup and apply media recovery. If the database is in NOARCHIVELOG mode, a complete recovery is possible only if the redo to be applied is in the range of your online redo log files.
Recover a database that wasn't shut down cleanly
Svrmgrl> SELECT FILE#, NAME, STATUS FROM V$DATAFILE;
Svrmgrl> ALTER DATABASE DATAFILE 'full path of datafile'
ONLINE;
SELECT X.GROUP#, MEMBER, SEQUENCE#, FIRST_CHANGE# FROM V$LOG X, V$LOGILE Y WHERE X.GROUP# = Y.GROUP#;
Open the database in an inconsistent state and rebuild it
| Be careful here! |
These steps should be used with extreme caution after taking a full database export-there is a potential for database corruption. |
_allow_resetlogs_corruption = true
_corrupted_rollback_segments
= list of all rollback segments
Svrmgrl> RECOVER DATABASE UNTIL CANCEL;
Svrmgrl> ALTER DATABASE OPEN RESETLOGS;
| Rebuilding the database |
Rebuilding the database is an essential step in this procedure because forcefully opening the database can corrupt the data-base. |
RECOVER DATAFILE 'fullpath of the datafile'
ALTER DATABASE OPEN
Don't shut down the database if you've detected a loss of data file in the rollback tablespace while the database is up and running. It's simpler to resolve the situation with the database up than it is with the database down. You can use two approaches to recover.
| Create new rollback segments |
You may have to create additional rollback segments in a different tablespace to continue working with the database while the current problem is being addressed. |
Recover a live database in ARCHIVELOG mode
ALTER DATABASE DATAFILE 'fullpath of datafile' OFFLINE;
RECOVER DATAFILE 'fullpath of datafile';
ALTER DATABASE DATAFILE 'fullpath of datafile' ONLINE;
Recover a live database, no matter what mode (slower method)
ALTER ROLLBACK SEGMENT rollback_segment OFFLINE;
SELECT SEGMENT_NAME, STATUS FROM DBA_ROLLBACK_SEGS WHERE TABLESPACE_NAME = 'tablespace_name';
DROP ROLLBACK SEGMENT rollback_segment;
SELECT SEGMENT_NAME, XACTS ACTIVE_TX, V.STATUS FROM V$ROLLSTAT V, DBA_ROLLBACK_SEGS WHERE TABLESPACE_NAME = 'I' AND SEGMENT_ID = USN;
Execute the following query to identify users who have transactions assigned to the rollback segments:
SELECT S.SID, S.SERIAL#, S.USERNAME, R.NAME "ROLLBACK"
FROM V$SESSION S, V$TRANSACTION T, V$ROLLNAME R
WHERE R.NAME IN ('pending_rollback1','pending_rollback2',
... 'pending_rollbackN') AND
S.TADDR = T.ADDR AND
T.XIDUSN = R.USN;
After you determine which users have active transactions in the "pending offline" rollback segments, you can either ask them to commit or roll back their transaction or you can kill their session by executing the following:
ALTER SYSTEM KILL SESSION 'sid, serial#';
The following steps can be performed after you have taken care of the active transactions.
Clean up after active transactions
A good backup strategy can be a lifesaver in a situation where a data file from the system tablespace is lost or damaged. In the absence of a good backup, you may be faced with the undesirable alternative of rebuilding the database with possible data loss.
The symptoms of the scenario in which the system tablespace needs recovery are as follows:
You should use the following methods depending on whether you have a good backup and whether that backup is cold or hot.
In this case you're in NOARCHIVELOG mode. The data file recovery will be complete if the redo to be applied is within the range of your online logs.
Recover with a cold backup
SELECT X.GROUP#, MEMBER, SEQUENCE#, FIRST_CHANGE# FROM V$LOG X, V$LOGILE Y WHERE X.GROUP# = Y.GROUP#;
SELECT FILE#, CHANGE# FROM V$RECOVER_FILE;
RECOVER DATAFILE 'fullpath of the datafile'
ALTER DATABASE OPEN
In this case you are in ARCHIVELOG mode. The data file recovery will be complete if the redo to be applied is within the range of your online redo logs.
Recover with a hot backup
SELECT X.GROUP#, MEMBER, SEQUENCE#, FIRST_CHANGE# FROM V$LOG X, V$LOGILE Y WHERE X.GROUP# = Y.GROUP#;
RECOVER DATAFILE 'fullpath of the datafile'
| Use online redo logs for recovery |
If you're prompted for a non-existing archived log, you'll need to use the online redo logs to continue with the recovery. Compare the sequence number referenced in the ORA-280 message with the sequence numbers of your online redo logs. Supply the full path name of one of the members of the redo log group whose sequence number matches. |
ALTER DATABASE OPEN
The recovery won't be complete if redo data is missing. Your option isn't attractive-you'll have to rebuild the database. For this purpose, you can either use the Export/Import utility to take a full database export, or you can do a user/table-level export. SQL*Loader can also be used for this purpose.
The control file is very important for the database. Usually a problem with the control file isn't detected while the database is up and running. If the control file is lost or damaged in such a way that Oracle can't recognize it, a subsequent database startup will result in ORA-205 (error in identifying control file '%s'), along with an operating system-level error.
The recovery procedure for this situation depends on whether you have lost one of the control files from a mirrored configuration or all copies of the current control file. The following methods should be used to recover the database based on the particular situation.
In this scenario, you can use the other copies of the control file for getting the database up and running.
Recover with a mirrored control file
In this case, the recovery steps are a bit more involved. First you have to analyze the situation and determine the answers to the following questions:
Recover without a mirrored control file
Svrmgrl> alter database backup controlfile to trace;
Svrmgrl> @create_control.sql
svrmgrl> Alter database open;
Listing 14.1 Example control file creation script
01: Dump file E:\ORANT\rdbms80\trace\ORA00167.TRC
02: Tue Mar 31 17:06:56 1998
03: ORACLE V8.0.3.0.0 - Production vsnsta=0
04: vsnsql=c vsnxtr=3
05: Windows NT V4.0, OS V5.101, CPU type 586
06: Oracle8 Enterprise Edition Release 8.0.3.0.0 - Production
07: With the Partitioning and Objects options
08: PL/SQL Release 8.0.3.0.0 - Production
09: Windows NT V4.0, OS V5.101, CPU type 586
10: Instance name: sjr
11:
12: Redo thread mounted by this instance: 1
13:
14: Oracle process number: 8
15:
16: pid: a7
17:
18: Tue Mar 31 17:06:56 1998
19: Tue Mar 31 17:06:56 1998
20:
21: *** SESSION ID:(7.1) 1998.03.31.17.06.56.062
22: # The following commands will create a new control file
23: # and use it to open the database.
24: # Data used by the recovery manager will be lost.
25: # Additional logs may be required for media recovery of
26: # offline data files. Use this only if the current
27: # version of all online logs are available.
28: STARTUP NOMOUNT
29: CREATE CONTROLFILE REUSE DATABASE "SJR" NORESETLOGS
NOARCHIVELOG
30: MAXLOGFILES 32
31: MAXLOGMEMBERS 2
32: MAXDATAFILES 254
33: MAXINSTANCES 1
34: MAXLOGHISTORY 899
35: LOGFILE
36: GROUP 1 'E:\ORANT\DATABASE\LOGSJR1.ORA' SIZE 200K,
37: GROUP 2 'E:\ORANT\DATABASE\LOGSJR2.ORA' SIZE 200K
38: DATAFILE
39: 'E:\ORANT\DATABASE\SYS1SJR.ORA',
40: 'E:\ORANT\DATABASE\RBS1SJR.ORA',
41: 'E:\ORANT\DATABASE\USR1SJR.ORA',
42: 'E:\ORANT\DATABASE\TMP1SJR.ORA',
43: 'E:\ORANT\DATABASE\INDX1SJR.ORA'
44: ;
45:
46: # Recovery is required if any of the datafiles are
47: # restored backups, or if the last shutdown was not
48: # normal or immediate.
49: RECOVER DATABASE
50:
51: # Database can now be opened normally.
52: ALTER DATABASE OPEN;
Recover without an accurate trace file
| Full syntax available |
For the CREATE CONTROLFILE statement's complete syntax, see Oracle's SQL reference manual. |
Create Controlfile reuse database "TEST"
noresetlogs noarchivelog
Maxlogfiles 50
Maxlogmembers 3
Maxdatafiles 500
Maxinstances 8
Maxloghistory 500
Logfile
Group 1 '/u01/oracle/8.0.4/dbs/log1test.dbf' size 1M,
Group 2 '/u01/oracle/8.0.4/dbs/log2test.dbf' size 1M,
Group 3 '/u01/oracle/8.0.4/dbs/log3test.dbf' size 1M,
Datafile
'/u01/oracle/8.0.4/dbs/systest.dbf' size 40M,
'/u01/oracle/8.0.4/dbs/data1test.dbf' size 10M,
'/u01/oracle/8.0.4/dbs/data2test.dbf' size 20M;
Svrmgrl> Recover database;
Svrmgrl> alter database open;
Redo logs are important for performing up-to-the-minute database recovery. Several symptoms and errors can indicate that the redo logs are lost or corrupted:
"RECOVER DATABASE USING BACKUP CONTROLFILE UNTIL CANCEL"
These errors indicate that the redo logs are misplaced, removed, corrupt, or from an old backup. The recovery technique that you need to use depends on whether you've mirrored your redo logs.
Recover with mirrored redo logs
Svrmgrl> select * from V$RECOVERY_FILE;
Recover without mirrored redo logs
| Caution: Database can become corrupt if you force it to open |
These steps use some very dangerous parameters that should be used only on understanding their consequences and with the help of an Oracle Support Services analyst. After the data-base is opened in this manner, you should rebuild the data-base at your earliest chance. |
_allow_resetlogs_corruption = true
_corrupted_rollback_segments =
list of all rollback segments
Svrmgrl> RECOVER DATABASE UNTIL CANCEL;
Svrmgrl> ALTER DATABASE OPEN RESETLOGS;
| Rebuild the database after forcing it to open |
Rebuilding the database is an essential step in this procedure because a forced database-open can corrupt the database. |
The following parameters can have a very harmful effect on the database and should be used carefully:
ALTER DATABASE OPEN RESETLOGS;
© Copyright, Macmillan Computer Publishing. All rights reserved.