Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- To: FreeBSD-gnats-submit@freebsd.org
- From: swell.k@gmail.com
- X-send-pr-version: 3.113
- X-GNATS-Notify:
- >Submitter-Id: current-users
- >Originator: <swell.k@gmail.com>
- >Organization: n/a
- >Confidential: no
- >Synopsis: [zfs] panic on concurrent writing & rollback
- >Severity: non-critical
- >Priority: low
- >Category: kern
- >Class: sw-bug
- >Release: FreeBSD 8.0-CURRENT amd64
- >Environment:
- System: FreeBSD 8.0-CURRENT FreeBSD 8.0-CURRENT #2 r185244: Mon Nov 24 16:29:02 UTC 2008 luser@qemu:/usr/obj/usr/src/sys/TEST amd64
- qemu-devel cmdline:
- qemu-system-x86_64 -no-kqemu -m 512 -net nic,model=rtl8139 \
- -net tap,ifname=tap0 -nographic -s -echr 0x03 scrap/freebsd-generic-amd64.qcow2
- zpool upgrade shows version 13
- zfs upgrade shows version 3
- The box boots from gptzfsboot. There are no UFS partitions on it.
- kernel config:
- include GENERIC
- options BREAK_TO_DEBUGGER
- options DIAGNOSTIC
- options DEBUG_LOCKS
- options DEBUG_VFS_LOCKS
- nooption WITNESS_SKIPSPIN
- loader.conf:
- autoboot_delay=0
- beastie_disable=YES
- zfs_load=YES
- vfs.root.mountfrom="zfs:q"
- kern.hz=100
- hint.uart.0.flags=0x90
- no kmem_size and prefetch_disable tunings here.
- boot.config: -h -S115200
- entire system was built with __MAKE_CONF=/dev/null on host machine.
- No local patches applied on it.
- The host is on 8-CURRENT r185232M amd64. `M' stands for slightly updated ZFS.
- It experiences similar problem along with another box on i386.
- >Description:
- When doing rollbacks on snapshot multiple times there is
- chance to encounter a panic.
- %%%
- # sh crash.sh
- lock order reversal:
- 1st 0xffffff0002888638 vnode interlock (vnode interlock) @ /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:3699
- 2nd 0xffffff0002429710 struct mount mtx (struct mount mtx) @ /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c:1050
- KDB: stack backtrace:
- db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
- _witness_debugger() at _witness_debugger+0x2e
- witness_checkorder() at witness_checkorder+0x81e
- _mtx_lock_flags() at _mtx_lock_flags+0x78
- zfs_znode_free() at zfs_znode_free+0x84
- zfs_freebsd_inactive() at zfs_freebsd_inactive+0x1a
- VOP_INACTIVE_APV() at VOP_INACTIVE_APV+0xb5
- vinactive() at vinactive+0x90
- vput() at vput+0x25c
- vn_close() at vn_close+0xb9
- vn_closefile() at vn_closefile+0x7d
- _fdrop() at _fdrop+0x23
- closef() at closef+0x4d
- do_dup() at do_dup+0x351
- syscall() at syscall+0x1e7
- Xfast_syscall() at Xfast_syscall+0xab
- --- syscall (90, FreeBSD ELF64, dup2), rip = 0x80093b08c, rsp = 0x7fffffffe328, rbp = 0x800b0d0a0 ---
- KDB: stack backtrace:
- db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
- vfs_badlock() at vfs_badlock+0x95
- VOP_INACTIVE_APV() at VOP_INACTIVE_APV+0xc8
- vinactive() at vinactive+0x90
- vput() at vput+0x25c
- vn_close() at vn_close+0xb9
- vn_closefile() at vn_closefile+0x7d
- _fdrop() at _fdrop+0x23
- closef() at closef+0x4d
- do_dup() at do_dup+0x351
- syscall() at syscall+0x1e7
- Xfast_syscall() at Xfast_syscall+0xab
- --- syscall (90, FreeBSD ELF64, dup2), rip = 0x80093b08c, rsp = 0x7fffffffe328, rbp = 0x800b0d0a0 ---
- VOP_INACTIVE: 0xffffff00028884e0 interlock is locked but should not be
- KDB: enter: lock violation
- [thread pid 85 tid 100056 ]
- Stopped at kdb_enter+0x3d: movq $0,0x65c598(%rip)
- db> show all locks
- Process 85 (sh) thread 0xffffff0002427390 (100056)
- exclusive sleep mutex vnode interlock (vnode interlock) r = 0 (0xffffff0002888638) locked @ /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:3699
- exclusive lockmgr zfs (zfs) r = 0 (0xffffff0002888578) locked @ /usr/src/sys/kern/vfs_vnops.c:293
- db> show lockedvnods
- Locked vnodes
- 0xffffff00028884e0: tag zfs, type VREG
- usecount 0, writecount 0, refcount 1 mountedhere 0
- flags (VI_DOINGINACT)
- VI_LOCKed v_object 0xffffff0002886960 ref 0 pages 0
- lock type zfs: EXCL by thread 0xffffff0002427390 (pid 85)
- #0 0xffffffff804dfc78 at __lockmgr_args+0x758
- #1 0xffffffff8056de19 at vop_stdlock+0x39
- #2 0xffffffff8080d77b at VOP_LOCK1_APV+0x9b
- #3 0xffffffff805894a7 at _vn_lock+0x57
- #4 0xffffffff8058a58e at vn_close+0x6e
- #5 0xffffffff8058a6bd at vn_closefile+0x7d
- #6 0xffffffff804c7443 at _fdrop+0x23
- #7 0xffffffff804c8a6d at closef+0x4d
- #8 0xffffffff804c9ed1 at do_dup+0x351
- #9 0xffffffff807c9d27 at syscall+0x1e7
- #10 0xffffffff807ac85b at Xfast_syscall+0xab
- db> show all pcpu
- Current CPU: 0
- cpuid = 0
- curthread = 0xffffff0002427390: pid 85 "sh"
- curpcb = 0xfffffffe40180d50
- fpcurthread = none
- idlethread = 0xffffff00021cc720: pid 11 "idle: cpu0"
- spin locks held:
- %%%
- Complete msgbuf with ps and alltrace is here:
- http://pastebin.com/f44ad88b3
- It can occur with a slightly different message:
- %%%
- # sh crash.sh
- Fatal trap 12: page fault while in kernel mode
- cpuid = 0; apic id = 00
- fault virtual address = 0x70
- fault code = supervisor read data, page not present
- instruction pointer = 0x8:0xffffffff804fb57a
- stack pointer = 0x10:0xfffffffe401997a0
- frame pointer = 0x10:0xfffffffe401997e0
- code segment = base 0x0, limit 0xfffff, type 0x1b
- = DPL 0, pres 1, long 1, def32 0, gran 1
- processor eflags = interrupt enabled, IOPL = 0
- current process = 179 (zfs)
- [thread pid 179 tid 100061 ]
- Stopped at _sx_xlock+0x3a: movq 0x18(%rdi),%rax
- db> bt
- Tracing pid 179 tid 100061 td 0xffffff000245dab0
- _sx_xlock() at _sx_xlock+0x3a
- dmu_buf_update_user() at dmu_buf_update_user+0x47
- zfs_znode_dmu_fini() at zfs_znode_dmu_fini+0x38
- zfs_freebsd_reclaim() at zfs_freebsd_reclaim+0xbe
- VOP_RECLAIM_APV() at VOP_RECLAIM_APV+0xb5
- vgonel() at vgonel+0x119
- vflush() at vflush+0x284
- zfs_umount() at zfs_umount+0x105
- dounmount() at dounmount+0x2ed
- unmount() at unmount+0x24b
- syscall() at syscall+0x1e7
- Xfast_syscall() at Xfast_syscall+0xab
- --- syscall (22, FreeBSD ELF64, unmount), rip = 0x800f401cc, rsp = 0x7fffffffe478, rbp = 0x801202300 ---
- db> show all locks
- Process 179 (zfs) thread 0xffffff000245dab0 (100061)
- exclusive lockmgr zfs (zfs) r = 0 (0xffffff0002693098) locked @ /usr/src/sys/kern/vfs_subr.c:2358
- exclusive sleep mutex Giant (Giant) r = 0 (0xffffffff80b5eea0) locked @ /usr/src/sys/kern/vfs_mount.c:1139
- exclusive lockmgr zfs (zfs) r = 0 (0xffffff0002693a58) locked @ /usr/src/sys/kern/vfs_mount.c:1207
- db> show lockedvnods
- Locked vnodes
- 0xffffff00026939c0: tag zfs, type VDIR
- usecount 1, writecount 0, refcount 1 mountedhere 0xffffff0002432710
- flags ()
- lock type zfs: EXCL by thread 0xffffff000245dab0 (pid 179)
- #0 0xffffffff804dfc78 at __lockmgr_args+0x758
- #1 0xffffffff8056de19 at vop_stdlock+0x39
- #2 0xffffffff8080d77b at VOP_LOCK1_APV+0x9b
- #3 0xffffffff805894a7 at _vn_lock+0x57
- #4 0xffffffff80577303 at dounmount+0x93
- #5 0xffffffff80577adb at unmount+0x24b
- #6 0xffffffff807c9d27 at syscall+0x1e7
- #7 0xffffffff807ac85b at Xfast_syscall+0xab
- 0xffffff0002693000: tag zfs, type VREG
- usecount 0, writecount 0, refcount 1 mountedhere 0
- flags (VI_DOOMED)
- lock type zfs: EXCL by thread 0xffffff000245dab0 (pid 179)
- #0 0xffffffff804dfc78 at __lockmgr_args+0x758
- #1 0xffffffff8056de19 at vop_stdlock+0x39
- #2 0xffffffff8080d77b at VOP_LOCK1_APV+0x9b
- #3 0xffffffff805894a7 at _vn_lock+0x57
- #4 0xffffffff8057fecf at vflush+0x20f
- #5 0xffffffff80f5f175 at zfs_umount+0x105
- #6 0xffffffff8057755d at dounmount+0x2ed
- #7 0xffffffff80577adb at unmount+0x24b
- #8 0xffffffff807c9d27 at syscall+0x1e7
- #9 0xffffffff807ac85b at Xfast_syscall+0xab
- db> show all pcpu
- Current CPU: 0
- cpuid = 0
- curthread = 0xffffff000245dab0: pid 179 "zfs"
- curpcb = 0xfffffffe40199d50
- fpcurthread = none
- idlethread = 0xffffff00021cc720: pid 11 "idle: cpu0"
- spin locks held:
- %%%
- Again, full session with ps and alltrace include is here:
- http://pastebin.com/f21e46723
- BTW, here is a backup of this message in case it's mangled:
- you're already looking at it ;)
- >How-To-Repeat:
- It's not very reliable but the following script triggers it very
- often. If the panic don't occur within a minute then there is a chance
- it will occur after you interrupt and restart the script.
- %%%
- #! /bin/sh
- # crash.sh
- PATH=/sbin:/bin
- pool=q
- dataset=test
- snapshot=last
- prefix=foo_
- cycles=999999999
- zfs destroy -r $pool/$dataset
- zfs create $pool/$dataset
- zfs snapshot $pool/$dataset@$snapshot
- mountpoint=$(zfs get -Ho value mountpoint $pool/$dataset)
- loop() {
- local i=0
- while [ $((i+=1)) -lt $cycles ]; do
- eval $@
- done &
- pids="$pids $!"
- }
- trap 'kill $pids' int term exit
- # juggle these
- loop : \>$mountpoint/$prefix\${i}
- loop zfs rollback $pool/$dataset@$snapshot
- wait
- %%%
- >Fix:
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement