So, yeah, about our Oracle cluster going a lovely 0x0000FFFF shade of blue:
Subject: Why do we get a Blue Screen Caused By orafencedrv.sys
Doc ID: Note:337784
When running Oracle 10g RAC/CRS on Windows, the Oracle CSService is SUPPOSED to reboot the OS if it detects a problem in the clusterware.The result of a CSS daemon rebooting the node will be that a bluescreen will occur.
The failure is as per design. Anytime that the Oracle CSService process fails, it is designed to cause the machine to reboot it does this by means of an IOCTL to the IOFENCE driver, this is a kernel driver which gets a fault. And for windows this is an unhandled exception that will cause the blue screen.
Not “kill the service”, or anything sissy like that. Hard-stop the entire machine (after, I note, a brief timeout, for anyone going to make a related argument…) by segfaulting a driver that they’ve apparently written for the sole purpose of sitting in ring zero and misbehaving.
Anything else running on that machine? Any possible side effects to randomly hard-failing a server? Who cares? *klonk*
You guys suck so much.