Untitled

To: FreeBSD-gnats-submit@freebsd.org
From: swell.k@gmail.com
X-send-pr-version: 3.113
X-GNATS-Notify:


>Submitter-Id:	current-users
>Originator:	<swell.k@gmail.com>
>Organization:	n/a
>Confidential:	no
>Synopsis:	[zfs] panic on concurrent writing & rollback
>Severity:	non-critical
>Priority:	low
>Category:	kern
>Class:		sw-bug
>Release:	FreeBSD 8.0-CURRENT amd64
>Environment:
System:	FreeBSD  8.0-CURRENT FreeBSD 8.0-CURRENT #2 r185244: Mon Nov 24 16:29:02 UTC 2008     luser@qemu:/usr/obj/usr/src/sys/TEST  amd64

qemu-devel cmdline:
qemu-system-x86_64 -no-kqemu -m 512 -net nic,model=rtl8139 \
-net tap,ifname=tap0 -nographic -s -echr 0x03 scrap/freebsd-generic-amd64.qcow2

zpool upgrade shows version 13
zfs upgrade shows version 3

The box boots from gptzfsboot. There are no UFS partitions on it.

kernel config:
include GENERIC
options BREAK_TO_DEBUGGER
options	DIAGNOSTIC
options	DEBUG_LOCKS
options DEBUG_VFS_LOCKS
nooption WITNESS_SKIPSPIN

loader.conf:
autoboot_delay=0
beastie_disable=YES
zfs_load=YES
vfs.root.mountfrom="zfs:q"
kern.hz=100
hint.uart.0.flags=0x90

no kmem_size and prefetch_disable tunings here.

boot.config: -h -S115200

entire system was built with __MAKE_CONF=/dev/null on host machine.
No local patches applied on it.

The host is on 8-CURRENT r185232M amd64. `M' stands for slightly updated ZFS.
It experiences similar problem along with another box on i386.

>Description:

	When doing rollbacks on snapshot multiple times there is
	chance to encounter a panic.
%%%
# sh crash.sh
lock order reversal:
 1st 0xffffff0002888638 vnode interlock (vnode interlock) @ /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:3699
 2nd 0xffffff0002429710 struct mount mtx (struct mount mtx) @ /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c:1050
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
_witness_debugger() at _witness_debugger+0x2e
witness_checkorder() at witness_checkorder+0x81e
_mtx_lock_flags() at _mtx_lock_flags+0x78
zfs_znode_free() at zfs_znode_free+0x84
zfs_freebsd_inactive() at zfs_freebsd_inactive+0x1a
VOP_INACTIVE_APV() at VOP_INACTIVE_APV+0xb5
vinactive() at vinactive+0x90
vput() at vput+0x25c
vn_close() at vn_close+0xb9
vn_closefile() at vn_closefile+0x7d
_fdrop() at _fdrop+0x23
closef() at closef+0x4d
do_dup() at do_dup+0x351
syscall() at syscall+0x1e7
Xfast_syscall() at Xfast_syscall+0xab
--- syscall (90, FreeBSD ELF64, dup2), rip = 0x80093b08c, rsp = 0x7fffffffe328, rbp = 0x800b0d0a0 ---
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
vfs_badlock() at vfs_badlock+0x95
VOP_INACTIVE_APV() at VOP_INACTIVE_APV+0xc8
vinactive() at vinactive+0x90
vput() at vput+0x25c
vn_close() at vn_close+0xb9
vn_closefile() at vn_closefile+0x7d
_fdrop() at _fdrop+0x23
closef() at closef+0x4d
do_dup() at do_dup+0x351
syscall() at syscall+0x1e7
Xfast_syscall() at Xfast_syscall+0xab
--- syscall (90, FreeBSD ELF64, dup2), rip = 0x80093b08c, rsp = 0x7fffffffe328, rbp = 0x800b0d0a0 ---
VOP_INACTIVE: 0xffffff00028884e0 interlock is locked but should not be
KDB: enter: lock violation
[thread pid 85 tid 100056 ]
Stopped at      kdb_enter+0x3d: movq    $0,0x65c598(%rip)

db> show all locks
Process 85 (sh) thread 0xffffff0002427390 (100056)
exclusive sleep mutex vnode interlock (vnode interlock) r = 0 (0xffffff0002888638) locked @ /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:3699
exclusive lockmgr zfs (zfs) r = 0 (0xffffff0002888578) locked @ /usr/src/sys/kern/vfs_vnops.c:293

db> show lockedvnods
Locked vnodes

0xffffff00028884e0: tag zfs, type VREG
    usecount 0, writecount 0, refcount 1 mountedhere 0
    flags (VI_DOINGINACT)
 VI_LOCKed    v_object 0xffffff0002886960 ref 0 pages 0
     lock type zfs: EXCL by thread 0xffffff0002427390 (pid 85)
#0 0xffffffff804dfc78 at __lockmgr_args+0x758
#1 0xffffffff8056de19 at vop_stdlock+0x39
#2 0xffffffff8080d77b at VOP_LOCK1_APV+0x9b
#3 0xffffffff805894a7 at _vn_lock+0x57
#4 0xffffffff8058a58e at vn_close+0x6e
#5 0xffffffff8058a6bd at vn_closefile+0x7d
#6 0xffffffff804c7443 at _fdrop+0x23
#7 0xffffffff804c8a6d at closef+0x4d
#8 0xffffffff804c9ed1 at do_dup+0x351
#9 0xffffffff807c9d27 at syscall+0x1e7
#10 0xffffffff807ac85b at Xfast_syscall+0xab

db> show all pcpu
Current CPU: 0

cpuid        = 0
curthread    = 0xffffff0002427390: pid 85 "sh"
curpcb       = 0xfffffffe40180d50
fpcurthread  = none
idlethread   = 0xffffff00021cc720: pid 11 "idle: cpu0"
spin locks held:
%%%

Complete msgbuf with ps and alltrace is here:
http://pastebin.com/f44ad88b3

It can occur with a slightly different message:

%%%
# sh crash.sh
Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x70
fault code              = supervisor read data, page not present
instruction pointer     = 0x8:0xffffffff804fb57a
stack pointer           = 0x10:0xfffffffe401997a0
frame pointer           = 0x10:0xfffffffe401997e0
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, IOPL = 0
current process         = 179 (zfs)
[thread pid 179 tid 100061 ]
Stopped at      _sx_xlock+0x3a: movq    0x18(%rdi),%rax

db> bt
Tracing pid 179 tid 100061 td 0xffffff000245dab0
_sx_xlock() at _sx_xlock+0x3a
dmu_buf_update_user() at dmu_buf_update_user+0x47
zfs_znode_dmu_fini() at zfs_znode_dmu_fini+0x38
zfs_freebsd_reclaim() at zfs_freebsd_reclaim+0xbe
VOP_RECLAIM_APV() at VOP_RECLAIM_APV+0xb5
vgonel() at vgonel+0x119
vflush() at vflush+0x284
zfs_umount() at zfs_umount+0x105
dounmount() at dounmount+0x2ed
unmount() at unmount+0x24b
syscall() at syscall+0x1e7
Xfast_syscall() at Xfast_syscall+0xab
--- syscall (22, FreeBSD ELF64, unmount), rip = 0x800f401cc, rsp = 0x7fffffffe478, rbp = 0x801202300 ---

db> show all locks
Process 179 (zfs) thread 0xffffff000245dab0 (100061)
exclusive lockmgr zfs (zfs) r = 0 (0xffffff0002693098) locked @ /usr/src/sys/kern/vfs_subr.c:2358
exclusive sleep mutex Giant (Giant) r = 0 (0xffffffff80b5eea0) locked @ /usr/src/sys/kern/vfs_mount.c:1139
exclusive lockmgr zfs (zfs) r = 0 (0xffffff0002693a58) locked @ /usr/src/sys/kern/vfs_mount.c:1207

db> show lockedvnods
Locked vnodes

0xffffff00026939c0: tag zfs, type VDIR
    usecount 1, writecount 0, refcount 1 mountedhere 0xffffff0002432710
    flags ()
     lock type zfs: EXCL by thread 0xffffff000245dab0 (pid 179)
#0 0xffffffff804dfc78 at __lockmgr_args+0x758
#1 0xffffffff8056de19 at vop_stdlock+0x39
#2 0xffffffff8080d77b at VOP_LOCK1_APV+0x9b
#3 0xffffffff805894a7 at _vn_lock+0x57
#4 0xffffffff80577303 at dounmount+0x93
#5 0xffffffff80577adb at unmount+0x24b
#6 0xffffffff807c9d27 at syscall+0x1e7
#7 0xffffffff807ac85b at Xfast_syscall+0xab


0xffffff0002693000: tag zfs, type VREG
    usecount 0, writecount 0, refcount 1 mountedhere 0
    flags (VI_DOOMED)
     lock type zfs: EXCL by thread 0xffffff000245dab0 (pid 179)
#0 0xffffffff804dfc78 at __lockmgr_args+0x758
#1 0xffffffff8056de19 at vop_stdlock+0x39
#2 0xffffffff8080d77b at VOP_LOCK1_APV+0x9b
#3 0xffffffff805894a7 at _vn_lock+0x57
#4 0xffffffff8057fecf at vflush+0x20f
#5 0xffffffff80f5f175 at zfs_umount+0x105
#6 0xffffffff8057755d at dounmount+0x2ed
#7 0xffffffff80577adb at unmount+0x24b
#8 0xffffffff807c9d27 at syscall+0x1e7
#9 0xffffffff807ac85b at Xfast_syscall+0xab

db> show all pcpu
Current CPU: 0

cpuid        = 0
curthread    = 0xffffff000245dab0: pid 179 "zfs"
curpcb       = 0xfffffffe40199d50
fpcurthread  = none
idlethread   = 0xffffff00021cc720: pid 11 "idle: cpu0"
spin locks held:
%%%

Again, full session with ps and alltrace include is here:
http://pastebin.com/f21e46723

BTW, here is a backup of this message in case it's mangled:
you're already looking at it ;)

>How-To-Repeat:

It's not very reliable but the following script triggers it very
often. If the panic don't occur within a minute then there is a chance
it will occur after you interrupt and restart the script.

%%%
#! /bin/sh
# crash.sh

PATH=/sbin:/bin

pool=q
dataset=test
snapshot=last
prefix=foo_
cycles=999999999

zfs destroy -r $pool/$dataset
zfs create $pool/$dataset
zfs snapshot $pool/$dataset@$snapshot

mountpoint=$(zfs get -Ho value mountpoint $pool/$dataset)

loop() {
    local i=0
    while [ $((i+=1)) -lt $cycles ]; do
	eval $@
    done &
    pids="$pids $!"
}
trap 'kill $pids' int term exit

# juggle these
loop : \>$mountpoint/$prefix\${i}
loop zfs rollback $pool/$dataset@$snapshot

wait
%%%

>Fix: