One beautiful day we found in alert.log a lot of
corruption messages:
--------------------------------------------------------------
Hex dump of (file
951, block 3733504) in trace file
/u01/app/oracle/diag/rdbms/spur/spur1/trace/spur1_ora_256147.trc
Corrupt block relative dba: 0xedf8f800 (file 951,
block 3733504)
Completely zero block
found during backing up datafile
Trying mirror side
DATAC1_CD_03_MRCELADM05.
Reread of
blocknum=3733504, file=+DATAC1/SPUR/DATAFILE/forts_java_data.984.962454999. found same corrupt data
Reread of
blocknum=3733504, file=+DATAC1/SPUR/DATAFILE/forts_java_data.984.962454999. found valid data
--------------------------------------------------------------
Hex dump of (file
951, block 3733505) in trace file /u01/app/oracle/diag/rdbms/spur/spur1/trace/spur1_ora_256147.trc
Corrupt block relative dba: 0xedf8f801 (file 951,
block 3733505)
Completely zero block
found during backing up datafile
Trying mirror side
DATAC1_CD_03_MRCELADM05.
Reread of
blocknum=3733505, file=+DATAC1/SPUR/DATAFILE/forts_java_data.984.962454999. found same corrupt data
Reread of
blocknum=3733505, file=+DATAC1/SPUR/DATAFILE/forts_java_data.984.962454999. found valid data
--------------------------------------------------------------
Hex dump of (file
951, block 3733506) in trace file
/u01/app/oracle/diag/rdbms/spur/spur1/trace/spur1_ora_256147.trc
Corrupt block relative dba: 0xedf8f802 (file 951,
block 3733506)
Completely zero block
found during backing up datafile
Trying mirror side
DATAC1_CD_03_MRCELADM05.
Reread of
blocknum=3733506, file=+DATAC1/SPUR/DATAFILE/forts_java_data.984.962454999. found same corrupt data
Reread of
blocknum=3733506, file=+DATAC1/SPUR/DATAFILE/forts_java_data.984.962454999. found valid data
--------------------------------------------------------------
Reading messages above bring me many questions:
·
If these messages
are about block corruptions, then where is ora-1578? Why it is absent?
·
Is this error from
hardware or not? Is it bad disk, flash or offload server crashing? We did the
cell check, but all disks and flashes are ok. We rebooted the cell –
corruptions still take place.
·
The select * from v$database_block_corruption shows “no rows selected”
·
We run RMAN> validate database – and found 0 corruptions.
We tried RMAN> validate database check logical – 0 errors found by RMAN.
We tried RMAN> validate database check logical – 0 errors found by RMAN.
·
We installed
18.1.9 (new Exadata image) – corruptions still take place.
alert.log analysis
shows 16 affected files :
$ cat
alert_spur1.log|grep "Corrupt block relative dba"|awk '{print
$6,$7}'|sort|uniq
(file 303,
(file 360,
(file 374,
(file 375,
(file 377,
(file 379,
(file 799,
(file 800,
(file 826,
(file 863,
(file 937,
(file 938,
(file 951,
(file 952,
(file 953,
(file 962,
We opened the SR, but Oracle recommended to check HW: Physical Corrupted Blocks consisting of
all Zeroes indicate a problem with OS, HW or Storage (Doc ID 1545366.1)
For the clarity:
This is the virtual Exadata, which has 4 DB nodes, 7
X7-2 cells and 9 VMs. Each VM is spread over all cells and all 84 disks, each
disk is shared by all 9 VMs.
And only one VM is affected by the corruption issue. Should
we suspect the software issue?
The reason:
The reason of zeroed blocks is corruption in ASM
metadata. As I understand this issue when database asquire the block from the
datafile the ASM points to the empty place of disk.
Solution:
Drop affected datafiles and copy these datafiles from
the other dataguard instance.
After datafiles have been dropped and recreated the
corruption disappeared.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.