Wednesday, September 9, 2015

Removing huge crfclust.bdb in 11.2.0.4 Grid home

Problem

Due to a bug (20186278) in 11.2.0.4, huge CHM(Cluster Health Monitor) files will be created in Grid Home. Remedy for this issue is available from 11.2.0.4.7 (Jul 2015) Grid Infrastructure patch set update or from 12.2. As a quick fix, you can stop the resource ora.crf and remove the files. The file location can be found as below:

Solution

Let's find the cluster health monitor's path: 

[oracle@myserver bin]$ ./oclumon manage -get reppath
CHM Repository Path = /u01/app/11.2.0.4/grid/crf/db/db01db01
Done

Let us list the files under this:

[oracle@myserver db01db01]$ ls -lhtr
total 19G
-rw-r--r-- 1 root root  89K May 29  2014 29-MAY-2014-17:48:14.txt
-rw-r--r-- 1 root root 1.3M May 29  2014 29-MAY-2014-20:02:00.txt
-rw-r--r-- 1 root root 2.1M May 31  2014 31-MAY-2014-11:52:44.txt
-rw-r--r-- 1 root root 1.2M May 31  2014 31-MAY-2014-11:58:47.txt
-rw-r--r-- 1 root root 2.1M Oct 14  2014 14-OCT-2014-16:15:09.txt
-rw-r--r-- 1 root root 1.3M Oct 14  2014 14-OCT-2014-16:29:55.txt
-rw-r--r-- 1 root root 1.9M Oct 23  2014 23-OCT-2014-15:49:22.txt
-rw-r--r-- 1 root root 1.3M Oct 23  2014 23-OCT-2014-16:26:58.txt
-rw-r--r-- 1 root root 1.8M Apr  7 16:30 07-APR-2015-16:30:33.txt
-rw-r--r-- 1 root root 1.2M Apr  7 16:38 07-APR-2015-16:38:42.txt
-rw-r----- 1 root root 8.0K May 29 16:58 repdhosts.bdb
-rw-r----- 1 root root  24K May 29 16:58 __db.001
-rw-r--r-- 1 root root 115M May 29 16:59 db01db01.ldb
-rw-r----- 1 root root 8.0K May 29 16:59 crfconn.bdb
-rw-r----- 1 root root  16M Sep  9 14:59 log.0000014662
-rw-r----- 1 root root  56K Sep  9 15:23 __db.006
-rw-r----- 1 root root 2.1M Sep  9 15:23 __db.004
-rw-r----- 1 root root  16M Sep  9 15:23 log.0000014663
-rw-r----- 1 root root 392K Sep  9 15:23 __db.002
-rw-r----- 1 root root 298M Sep  9 15:23 crfts.bdb
-rw-r----- 1 root root 460M Sep  9 15:23 crfloclts.bdb
-rw-r----- 1 root root 387M Sep  9 15:23 crfhosts.bdb
-rw-r----- 1 root root 424M Sep  9 15:23 crfcpu.bdb
-rw-r----- 1 root root  17G Sep  9 15:23 crfclust.bdb
-rw-r----- 1 root root 382M Sep  9 15:23 crfalert.bdb
-rw-r----- 1 root root 1.2M Sep  9 15:23 __db.005
-rw-r----- 1 root root 2.6M Sep  9 15:23 __db.003


As you can see in the list above, crfclust.bdb grew to 17G due to the bug. We can remove this file after stopping ora.crf as below: 

[oracle@myserver bin]$ ./crsctl stop res ora.crf -init
CRS-2673: Attempting to stop 'ora.crf' on 'db01db01'

CRS-2677: Stop of 'ora.crf' on 'db01db01' succeeded

Now we can remove crfclust.bdb. It should be done as root user:

[root@myserver db01db01]# rm -f crfclust.bdb 

Restart ora.crf:

[oracle@myserver bin]$ ./crsctl start res ora.crf -init
CRS-2672: Attempting to start 'ora.crf' on 'db01db01'

CRS-2676: Start of 'ora.crf' on 'db01db01' succeeded


If you come across any file system full alerts on your Grid boxes, make sure you check this file if you run 11.2.0.4 binaries.