FreeBSD：UFS2の不整合でKernelPanic

あんまり見たこと無いけど原因考えるとなるほどと思った。
原因はおそらくRAID Arrayが離脱したとき。
その後Arrayは復旧出来た。

mode = 0100744, inum = 12118762, fs = /data
panic: ffs_valloc: dup alloc
cpuid = 1
KDB: stack backtrace:
#0 0xffffffff808e7e70 at kdb_backtrace+0x60
#1 0xffffffff808af955 at panic+0x155
#2 0xffffffff80ac1a0a at ffs_valloc+0x86a
#3 0xffffffff80afecb4 at ufs_makeinode+0x84
#4 0xffffffff80d97dd2 at VOP_CREATE_APV+0x92
#5 0xffffffff80957f49 at vn_open_cred+0x2c9
#6 0xffffffff80951651 at kern_openat+0x261
#7 0xffffffff80c8f027 at amd64_syscall+0x357
#8 0xffffffff80c7571b at Xfast_syscall+0xfb
Uptime: 1m17s
Dumping 381 out of 6119 MB:..5%..13%..21%..34%..42%..51%..63%..72%..84%..93%
Dump complete
Automatic reboot in 15 seconds - press a key on the console to abort
Rebooting...

ファイルシステムで重複したものがあるっぽい？
どうやら不整合があるらしく、該当のファイルにアクセスするとPanicする模様
UFS2なのでfsckで・・・と思ったら実行しても解決せず、何故だ。

# tunefs -p /dev/da1p1
tunefs: POSIX.1e ACLs: (-a)                                enabled
tunefs: NFSv4 ACLs: (-N)                                   disabled
tunefs: MAC multilabel: (-l)                               disabled
tunefs: soft updates: (-n)                                 enabled
tunefs: soft update journaling: (-j)                       enabled
tunefs: gjournal: (-J)                                     disabled
tunefs: trim: (-t)                                         disabled
tunefs: maximum blocks per file in a cylinder group: (-e)  4096
tunefs: average file size: (-f)                            16384
tunefs: average number of files in a directory: (-s)       64
tunefs: minimum percentage of free space: (-m)             5%
tunefs: space to hold for metadata blocks: (-k)            6408
tunefs: optimization preference: (-o)                      time
tunefs: should optimize for space with minfree < 8%
tunefs: volume label: (-L)

# fsck -y /dev/da1p1

** /dev/da1p1

USE JOURNAL? yes

** SU+J Recovering /dev/da1p1
Journal timestamp does not match fs mount time
** Skipping journal, falling through to full fsck

** Last Mounted on /data
** Phase 1 - Check Blocks and Sizes
24275080 DUP I=12118750
UNEXPECTED SOFT UPDATE INCONSISTENCY

～略～

ぐぐってみたらジャーナルとファイルシステムの間に重複したものがあるらしい
一時的にsoft updatesを無効にしてfsck走らせれば解決するそうな

# tunefs -n disable /dev/da1p1
tunefs: soft updates cleared
 
# fsck -y /dev/da1p1          

** /dev/da1p1

USE JOURNAL? yes

** SU+J Recovering /dev/da1p1
Journal timestamp does not match fs mount time
** Skipping journal, falling through to full fsck

** Last Mounted on /data
** Phase 1 - Check Blocks and Sizes
24275080 DUP I=12118750
24275081 DUP I=12118750

～略～

904133 files, 2047901555 used, 789805708 free (1650660 frags, 98519381 blocks, 0.1% fragmentation)

***** FILE SYSTEM MARKED CLEAN *****

***** FILE SYSTEM WAS MODIFIED *****

無効にしたsoft updatesを戻す。

# tunefs -n enable /dev/da1p1
tunefs: soft updates set

念のため再マウントする前にfsckしてエラーが出ないことを確認
full fsckなので結構時間がかかる

マウントしなおして再利用開始
一応問題は解決した模様

新NAS鯖のRAID関係の調子がいまいちなので困る
RAIDカードかバックプレーンもしくはSASのケーブルのどれかに原因があるんじゃないかとにらんでるのだが・・・