Oracle Database Lost Write Detection & Recovery Options

  • What is a lost write?

lost write occurs when an I/O subsystem acknowledges the completion of the block write even though the writers did not occur in the first place. Another situation is when a former image of the block overwrites the current block image.

  • Error Symptoms:

The ORA-600 [3020] stuck recovery error could occur on the Standby database for several reasons including & ORA-752

IMPORTANT REMARK: Do NOT repair the Standby by restoring a backup taken from the Primary, as that will lead to corrupting the Standby also!

  • Possible ACTIONS:
  1. First, immediately Open a Case with Oracle Support !
  • An ORA-752 error definitively identifies a lost write on the Primary. Consider failing over to the Standby immediately if data integrity is critical and some data loss is acceptable.
  • Inspect the alert log file for more information (take backup of the alert log file to offsite place).
  • Inspect the trace files generated using the view:

SQL> select originating_timestamp,detailed_location,message_level,message_text,problem_key from  V$diag_alert_ext where message_level=1 and message_text like’%ORA-00600%’ order by originating_timestamp desc;

  • Inspect OS logs:

cd /var/log

ls -l message*

  • Perform multiple dumps so Oracle support can review:

Dump of the controlfiles

SQL> alter session set events ‘immediate trace name controlf level xx’;

Dump of the datafile headers:

SQL> alter session set events ‘immediate trace name file_hdrs level 10’;

Dump of the redo log headers:

SQL> alter session set events ‘immediate trace name redohdr level 10’;

  • The alert log message will provide the datafile number along with the corresponding block number (replace &file_number, &block_number values):

SQL> spool /exp/oraxx1/corrupted_file_info.txt

SQL> Select * from DBA_EXTENTS where FILE_ID=&file_number and  &block_number BETWEEN BLOCK_ID and BLOCK_ID+BLOCKS-1;

SQL> spool off;

SQL> spool /exp/oraxx1/object_number.txt

SQL> Select * from DBA_OBJECTS    where DATA_OBJECT_ID = &object_number;

SQL> spool off;

  • If feasible/possible, drop and recreate the affected objects on the PRIMARY database (perform good analysis):

Once the objects have been recreated, use the following procedure to skip corrupted block on the STANDBY DB:

Temporarily disable lost write protection on the standby:

SQL> ALTER SYSTEM SET DB_LOST_WRITE_PROTECT = NONE;

Allow recovery to proceed in spite of block corruptions by running the RECOVER command with the ALLOW n CORRUPTION clause, where n is the number of allowable corrupt blocks.

SQL> alter database recover automatic standby database

      allow 1 corruption;

Once the alert log indicates the blocks have been marked corrupt, restart managed recovery.

SQL> alter database recover cancel;

 SQL> alter database recover managed standby database using current logfile disconnect;

  • If the affected objects cannot be recreated then activate the standby database. By activating the standby database you will be experiencing data loss but data integrity will be ensured.

Issue the following SQL statement on the standby database to convert it to a primary:

SQL> ALTER DATABASE ACTIVATE STANDBY DATABASE;

Then, immediately backup the database (ALL standby databases Linux Host Names are included in NetBackup backup policy)

Backup the database against the standby recovery catalog

  • Detecting & Protecting Against Lost Writes

Set DB_LOST_WRITE_PROTECT to TYPICAL on primary and standby databases, if your database is 18c you can utilize the new feature “shadow tablespace”

*** shadow table-space setup:

ensure that compatible parameter is set to 18.0:

SQL> show parameter compatible

NAME                                 TYPE        VALUE

———————————— ———– ——————————

compatible                           string      18.0.0

SQL> create bigfile tablespace SHADOW datafile ‘/orad04/oradbp02/shadow.dbf’ size 120M AUTOEXTEND ON lost write protection;

SQL> select tablespace_name from dba_tablespaces where CONTENTS=’LOST WRITE PROTECTION’;

SQL> select tablespace_name,status,bigfile,contents,logging,allocation_type,encrypted,lost_write_protect,chunk_tablespace from dba_tablespaces where tablespace_name=’SHADOW’;

// to enable LOST WRITE PROTECTION:

SQL> alter database enable lost write protection;

// to disable LOST WRITE PROTECTION:

SQL> ALTER DATABASE DISABLE LOST WRITE PROTECTION;

SQL> select * from new_lost_write_datafiles$;

NOTE: you can enable shadow lost write protection for a tablespace or a data file levels also.