Quantcast
Channel: Oracle Blog : database
Viewing all articles
Browse latest Browse all 128

Database Instance terminated due to ZFS data pool SUSPENDED

$
0
0

On one of the TEST severs there are several databases hosted on one single mount-point configured on single ZFS data pool.  We received  multiple requests from users that the databases running on these servers are not accessible.  When we Investigate all Instances running on this servers was not responding and after some time it terminated.

In  the database alert log-file the following error messages was reported:

WARNING: aiowait timed out 1 times
Thu May 21 14:09:38 2015
WARNING: aiowait timed out 1 times
Thu May 21 14:09:38 2015opiodr aborting process unknown
root@dbnode1:~# zpool status data_pool
pool: data_pool
state: SUSPENDED
status: One or more devices are unavailable in response to IO failures.
The pool is suspended.
action: Make sure the affected devices are connected, then run 'zpool clear' or'fmadm repaired'.
Run 'zpool status -v' to see device specific details.
see: http://support.oracle.com/msg/ZFS-8000-HC
scan: none requested
config:

NAME STATE READ WRITE CKSUM
data_pool SUSPENDED 0 1 0
c1d2s0 ONLINE 0 0 0

To fix this Issue we need to clear the dataset and run repair command for this specific data pool

root@dbnode1:~# zpool clear data_pool

– Check the faulty data-pool

root@dbnode1:~# fmadm faulty
--------------- ------------------------------------ -------------- ---------
TIME EVENT-ID MSG-ID SEVERITY
--------------- ------------------------------------ -------------- ---------
May 20 22:02:50 202f0e1f-3cd8-4ce6-a495-8e22a94796ee ZFS-8000-8A Critical

Problem Status : solved
Diag Engine : zfs-diagnosis / 1.0
System
Manufacturer : unknown
Name : SPARC-T5-2
Part_Number : unknown
Serial_Number : unknown
Host_ID : 84fbc1b9

----------------------------------------
Suspect 1 of 1 :
Fault class : fault.fs.zfs.object.corrupt_data
Certainty : 100%
Affects : zfs://pool=b07605b211b6c1f3/pool_name=data_pool
Status : faulted but still in service

FRU
Name : "zfs://pool=b07605b211b6c1f3/pool_name=data_pool"
Status : faulty

Description : A file or directory in pool 'data_pool' could not be read due to
corrupt data.

Response : No automated response will occur.

Impact : The file or directory is unavailable.

Action : Use 'fmadm faulty' to provide a more detailed view of this event.
Run 'zpool status -xv' and examine the list of damaged files to
determine what has been affected. Please refer to the associated
reference document at http://support.oracle.com/msg/ZFS-8000-8A
for the latest service procedures and policies regarding this
diagnosis.

--------------- ------------------------------------ -------------- ---------
TIME EVENT-ID MSG-ID SEVERITY
--------------- ------------------------------------ -------------- ---------
May 20 22:02:50 004d413b-9765-4e5f-8763-dd29281b0cad ZFS-8000-HC Major

Problem Status : solved
Diag Engine : zfs-diagnosis / 1.0
System
Manufacturer : unknown
Name : SPARC-T5-2
Part_Number : unknown
Serial_Number : unknown
Host_ID : 84fbc1b9

----------------------------------------
Suspect 1 of 1 :
Fault class : fault.fs.zfs.io_failure_wait
Certainty : 100%
Affects : zfs://pool=b07605b211b6c1f3/pool_name=data_pool
Status : faulted but still in service

FRU
Name : "zfs://pool=b07605b211b6c1f3/pool_name=data_pool"
Status : faulty

Description : ZFS pool 'data_pool' has experienced currently unrecoverable I/O
failures.

Response : No automated response will occur.

Impact : Read and write I/Os cannot be serviced.

Action : Use 'fmadm faulty' to provide a more detailed view of this event.
Make sure the affected devices are connected, then run 'zpool
clear'. Please refer to the associated reference document at
http://support.oracle.com/msg/ZFS-8000-HC for the latest service
procedures and policies regarding this diagnosis.

– Run the repair command:

root@dbnode1:~# fmadm repaired zfs://pool=b07605b211b6c1f3/pool_name=data_pool
fmadm: recorded repair to of zfs://pool=b07605b211b6c1f3/pool_name=data_pool
root@dbnode1:~#

– Check the status of data pool:

root@dbnode1:~# zpool status data_pool
pool: data_pool
state: ONLINE
config:

NAME STATE READ WRITE CKSUM
data_pool ONLINE 0 0 0
c1d2s0 ONLINE 0 0 0

Conclusion:

Now the data pool status changed from “SUSPENDED” to “ONLINE”.  It is recommended to restart all Instances that were running on that mount-point and also check for the database clock corruption after all DB Instances started normally.

Thanks for reading :)

regards,

X A H E E R


Viewing all articles
Browse latest Browse all 128

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>