openzfsonosx / zfs Goto Github PK

OpenZFS on OS X

License: Other

Shell 20.21% C++ 2.32% C 74.00% Perl 0.21% Python 0.81% Awk 0.01% Assembly 1.09% Ruby 0.01% Objective-C++ 0.02% Objective-C 0.01% Makefile 0.47% M4 0.79% DIGITAL Command Language 0.01% Roff 0.05% sed 0.01%

zfs's Introduction

OpenZFS on OS X (O3X) brings OpenZFS features to Apple's OS X.

** zfs.kext depends upon spl.kext, so start with that repository: https://github.com/openzfsonosx/spl.git

It is tested primarily on MacOs Mojave.

See http://openzfsonosx.org/ for more information.

Open Issues:

https://github.com/openzfsonosx/zfs/issues?state=open

Detailed compiling instructions can be found in the wiki:

https://openzfsonosx.org/wiki/Install

If you want to load it directly;

# ./load.sh

To use commands directly;

# ./cmd.sh zpool status

To load unsigned kexts you need to disable SIP for kexts. Or sign them with your own keys.

For messages use:

Pre-Sierra:

# tail -f /var/log/system.log

Sierra and higher:

# log stream --source --predicate 'senderImagePath CONTAINS "zfs" OR senderImagePath CONTAINS "spl"'

For example:

: ZFS: Loading module ...
: ZFS: ARC limit set to (arc_c_max): 1073741824
: ZFS: Loaded module v0.6.2-rc1_2_g691a603, ZFS pool version 5000, ZFS filesystem version 5
: ZFS filesystem version: 5
: ZFS: hostid set to 9e5e1b35 from UUID 'C039E802-1F44-5F62-B3A2-5E252F3EFF2A'

OpenZFSonOsX team

zfs's People

Contributors

Stargazers

Watchers

zfs's Issues

panic: a freed zone element has been modified in zone:

(gdb) paniclog 
panic(cpu 0 caller 0xffffff8000243d2b): "a freed zone element has been modified in zone: kalloc.1024"@/SourceCache/xnu/xnu-2050.7.9/osfmk/kern/zalloc.c:214
Backtrace (CPU 0), Frame : Return Address
0xffffff804122b630 : 0xffffff800021d5f6 mach_kernel : _panic + 0xc6
0xffffff804122b6a0 : 0xffffff8000243d2b mach_kernel : _zalloc_canblock + 0x87b
0xffffff804122b6e0 : 0xffffff80002435a2 mach_kernel : _zalloc_canblock + 0xf2
0xffffff804122b7c0 : 0xffffff80002245bd mach_kernel : _kalloc_canblock + 0x7d
0xffffff804122b7f0 : 0xffffff800055ad20 mach_kernel : __MALLOC + 0x90
0xffffff804122b820 : 0xffffff80004e9c5e mach_kernel : _cat_lookup + 0x31e
0xffffff804122b9b0 : 0xffffff80004e99d8 mach_kernel : _cat_lookup + 0x98
0xffffff804122ba40 : 0xffffff80004f9708 mach_kernel : _hfs_vnop_lookup + 0x478
0xffffff804122bc60 : 0xffffff8000311634 mach_kernel : _VNOP_LOOKUP + 0x34
0xffffff804122bca0 : 0xffffff80002eb75c mach_kernel : _lookup + 0x22c
0xffffff804122bd20 : 0xffffff80002eb14e mach_kernel : _namei + 0x5ae
0xffffff804122bde0 : 0xffffff800030089f mach_kernel : _rename + 0x73f
0xffffff804122bf50 : 0xffffff80005e17da mach_kernel : _unix_syscall64 + 0x20a
0xffffff804122bfb0 : 0xffffff80002cecf3 mach_kernel : _hndl_unix_scall64 + 0x13

BSD process name corresponding to current thread: installd

High chance it is related to reclaim, last lines in systemlog

vnop_reclaim
zfs_zinactive
znode_free zp 0xffffff800b099000 vp 0xffffff800b30fb20
+vnop_fsync
-vnop_fsync
vnop_reclaim
zfs_zinactive
znode_free zp 0xffffff800b08dc00 vp 0xffffff800b30fa28
+vnop_fsync
-vnop_fsync
vnop_reclaim
zfs_zinactive
znode_free zp 0xffffff800b092000 vp 0xffffff800b30f930

Missing update_pages() for zfs_write.

Occasionally, zfs_write appears to work but does not actually store any data on disk. export/import will sometimes fix the issue.

I am wondering if it is related to that we do not have update_pages(), as used here;

            ASSERT(tx_bytes <= uio_resid(uio));
            uioskip(uio, tx_bytes);
        }
        if (tx_bytes && vn_has_cached_data(vp)) {
            printf("we should add update_pages()\n");
            //update_pages(vp, woff, tx_bytes, zfsvfs->z_os,
            //      zp->z_id, uio->uio_segflg, tx);
        }

The FreeBSD version looks like;
http://fxr.watson.org/fxr/source/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c#L441

The Solaris version looks like;
http://fxr.watson.org/fxr/source/common/fs/zfs/zfs_vnops.c?v=OPENSOLARIS#L355

I am unsure what the OSX kernel version would look like.

(20130712) error -36 in response to Finder > Open with:

With a 2013-07-12 build of zfs-osx

When the Open with: menu is used to select an alternative application (in the following example, TextWrangler instead of TextEdit):

You can’t change the item “example.txt” to always open in the selected application.
The item is either locked or damaged, or in a folder you don’t have permission to modify (error code -36).
OK

The operation can’t be completed.
An unexpected error occurred (error code -36).
OK

Side note

There's Core Storage to encrypt the home directory where the test file is stored.

umount fails to return, and fails to un-mount.

Currently umounting fails, either nothing happens, or it is stuck in the vnode release code.

Not entirely convinced that zfs_umount is called.

zfs_vfs_unmount(struct mount *mp, int mntflags, vfs_context_t context)
{
    printf("+zfs_umount\n");

# ./cmd.sh zfs umount FROMSOLARIS
Apr  8 16:56:13 lundmans-Mac-Pro kernel[0]: +vnop_fsync
Apr  8 16:56:13 lundmans-Mac-Pro kernel[0]: -vnop_fsync

Entirely possible we have not released a vnode somewhere, all uses of vnode needs to call vnode_put().

(20130712) can not give just part of a disk to MacZFS prototype

With a 2013-07-12 build of zfs-osx

A 4 GB USB flash drive at disk5

GPES3E-gjp4-1:~ bbsadmin-l$ diskutil list
/dev/disk0
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      GUID_partition_scheme                        *750.2 GB   disk0
   1:                        EFI EFI                     209.7 MB   disk0s1
   2:                  Apple_HFS swap                    32.0 GB    disk0s2
   3:                  Apple_HFS disk0s3                 536.9 MB   disk0s3
   4:                  Apple_HFS spare                   671.1 MB   disk0s4
   5:          Apple_CoreStorage                         99.5 GB    disk0s5
   6:                 Apple_Boot Boot OS X               650.0 MB   disk0s6
   7:          Apple_CoreStorage                         616.3 GB   disk0s7
   8:                 Apple_Boot Boot OS X               134.2 MB   disk0s8
/dev/disk1
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:                  Apple_HFS OS                     *99.2 GB    disk1
/dev/disk2
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:     Apple_partition_scheme                        *37.8 MB    disk2
   1:        Apple_partition_map                         32.3 KB    disk2s1
   2:                  Apple_HFS osx.zfs-20130712        37.8 MB    disk2s2
/dev/disk3
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      GUID_partition_scheme                        *7.7 GB     disk3
   1:                        EFI EFI                     209.7 MB   disk3s1
   2:          Apple_CoreStorage                         7.4 GB     disk3s2
   3:                 Apple_Boot Boot OS X               134.2 MB   disk3s3
/dev/disk4
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:                  Apple_HFS blooper                *7.1 GB     disk4
/dev/disk5
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      GUID_partition_scheme                        *4.0 GB     disk5
   1:                        EFI EFI                     209.7 MB   disk5s1
   2:                  Apple_HFS slice 2                 2.0 GB     disk5s2
   3:                  Apple_HFS slice 3                 1.5 GB     disk5s3
GPES3E-gjp4-1:~ bbsadmin-l$ diskutil unmountDisk /dev/disk5
Unmount of all volumes on disk5 was successful
GPES3E-gjp4-1:~ bbsadmin-l$ clear

Intention

Give just part of disk5 to MacZFS prototype. Give slice 2.

Preserve the HFS Plus content at slice 3.

Result

GPES3E-gjp4-1:~ bbsadmin-l$ sudo zpool create -o version=28 justpartofadisk /dev/disk5s2
invalid vdev specification
use '-f' to override the following errors:
/dev/disk5s2 does not contain an EFI label but it may contain partition
information in the MBR.
GPES3E-gjp4-1:~ bbsadmin-l$ clear





GPES3E-gjp4-1:~ bbsadmin-l$ sudo gpt -r show /dev/disk5
    start     size  index  contents
        0        1         PMBR
        1        1         Pri GPT header
        2       32         Pri GPT table
       34        6         
       40   409600      1  GPT part - C12A7328-F81F-11D2-BA4B-00A0C93EC93B
   409640  3929088      2  GPT part - 48465300-0000-11AA-AA11-00306543ECAC
  4338728   262144         
  4600872  2995120      3  GPT part - 48465300-0000-11AA-AA11-00306543ECAC
  7595992   262151         
  7858143       32         Sec GPT table
  7858175        1         Sec GPT header
GPES3E-gjp4-1:~ bbsadmin-l$ clear





GPES3E-gjp4-1:~ bbsadmin-l$ sudo gpt -r show -l /dev/disk5
    start     size  index  contents
        0        1         PMBR
        1        1         Pri GPT header
        2       32         Pri GPT table
       34        6         
       40   409600      1  GPT part - "EFI System Partition"
   409640  3929088      2  GPT part - "slice 2"
  4338728   262144         
  4600872  2995120      3  GPT part - "slice 3"
  7595992   262151         
  7858143       32         Sec GPT table
  7858175        1         Sec GPT header
GPES3E-gjp4-1:~ bbsadmin-l$ clear

Result, with force

GPES3E-gjp4-1:~ bbsadmin-l$ date
Tue 16 Jul 2013 21:46:43 BST
GPES3E-gjp4-1:~ bbsadmin-l$ sudo zpool create -f -o version=28 justpartofadisk /dev/disk5s2
efi_write mate3
efi_write mate
cannot label 'disk5s2': failed to detect device partitions on '/dev/disk5s2s1': 2
GPES3E-gjp4-1:~ bbsadmin-l$ clear





GPES3E-gjp4-1:~ bbsadmin-l$ diskutil list /dev/disk5
/dev/disk5
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      GUID_partition_scheme                        *4.0 GB     disk5
   1:                        EFI EFI                     209.7 MB   disk5s1
   2:                  Apple_HFS slice 2                 2.0 GB     disk5s2
   3:                  Apple_HFS slice 3                 1.5 GB     disk5s3
GPES3E-gjp4-1:~ bbsadmin-l$

Panic on zpool create #2

#./zpool.sh create -f BOOM pool-image.bin
[zfs] ioctl done 2
[zfs] Yay, got ioctl 0
[zfs] vdev_alloc_common top 0
[vdev] vdev_alloc vd top 0 parent top
[vdev] alloc parent top 0
[zfs] vdev_alloc_common top 0
[vdev] vdev_alloc vd top 0 parent top

(gdb) bt
#0  0xffffff80003105de in current_rootdir () at /SourceCache/xnu/xnu-2050.18.24/bsd/vfs/kpi_vfs.c:2195
#1  0xffffff7f80e8d50a in VOP_GETATTR ()
#2  0xffffff7f80f7e786 in vdev_file_open (vd=0xffffff80079f4800, psize=0xffffff8046b4beb8, max_psize=0xffffff8046b4beb0, ashift=0xffffff8046b4be90) at vdev_file.c:113
#3  0xffffff7f80f74964 in vdev_open (vd=0xffffff80079f4800) at vdev.c:1178
#4  0xffffff7f80f74280 in vdev_open_child (arg=0xffffff80079f4800) at vdev.c:1080
#5  0xffffff7f80e8b0d3 in taskq_thread ()

(gdb) p vf->vf_vnode
$1 = (struct vnode *) 0x40

Support sharenfs property

NFS sharing currently works when exported manually. We should add support for sharenfs dataset property to work.

async operations stall

cp and tar work fairly well when writing to a pool, but rsync does not. The first two issue SYNC and the latter does not, so there appears to be some interplay with creating files and directories in quick succession. (Using --inplace as to not also trigger renames)

The report looks as follows:

# rsync  -arv --inplace ./ /BOOM/
.gitignore
AUTHORS
COPYING
COPYRIGHT
ChangeLog
DISCLAIMER
META
Makefile



vnop_create: 'DISCLAIMER'
mknode vtype 1 use_sa 1: obj_type 44
zfs_znode_alloc blksize is ZERO; fixed 512
zfs_vnops attach 1
Attaching vnode 0xffffff800a7130f8 type 1: zmode 0x8180
vnop_write
zfs_range_lock: 0xffffff800a76cc00 off 0 len 1376 type 1
 vnop_write: locked, write 1376, endsz 1376
 vnop_write hold_sa 131072
 vnop_write nbytes 1376 abuf 0
 vnop_write uio_dbuf before 1376 after 0
zfs_log_write len 1376
zfs_write: tx_bytes 1376
+vnop_setattr
 vnop_setattr top
 vnop_setattr 5
zfs_range_lock: 0xffffff800a76cc00 off 0 len -1 type 1
 vnop_setattr 1
 vnop_setattr out
-vnop_setattr
+vnop_setattr
 vnop_setattr top
 vnop_setattr 2
 vnop_setattr 1
 vnop_setattr 3
 vnop_setattr 2
 vnop_setattr 4
 vnop_setattr 3
 vnop_setattr 5
 vnop_setattr 4
 vnop_setattr out
-vnop_setattr
 vnop_setattr 5
vnop_inactive
+vnop_setattr
 vnop_setattr top
 vnop_setattr 1
 vnop_setattr out
-vnop_setattr
vnop_mkdir: type 2
mknode vtype 2 use_sa 1: obj_type 44
zfs_znode_alloc blksize is ZERO; fixed 512
zfs_vnops attach 2
Attaching vnode 0xffffff800a713000 type 2: zmode 0x41ed
+vnop_setattr
 vnop_setattr top
 vnop_setattr 2
 vnop_setattr 1
 vnop_setattr 3
 vnop_setattr 2
 vnop_setattr 4
 vnop_setattr 5
 vnop_setattr 3
 vnop_setattr out
-vnop_setattr
+vnop_setattr
 vnop_setattr top
 vnop_setattr 4
 vnop_setattr 1
 vnop_setattr 5
 vnop_setattr 2
 vnop_setattr out
-vnop_setattr
+vnop_setattr
 vnop_setattr top
 vnop_setattr 3
 vnop_setattr 1
 vnop_setattr 4
 vnop_setattr 2
 vnop_setattr 5
 vnop_setattr 3
 vnop_setattr out
-vnop_setattr
vnop_create: 'META'
mknode vtype 1 use_sa 1: obj_type 44
zfs_znode_alloc blksize is ZERO; fixed 512
zfs_vnops attach 1
Attaching vnode 0xffffff800a7a8f00 type 1: zmode 0x8180
vnop_write
zfs_range_lock: 0xffffff800a76f800 off 0 len 209 type 1
 vnop_write: locked, write 209, endsz 209
 vnop_write hold_sa 131072
 vnop_write nbytes 209 abuf 0
 vnop_write uio_dbuf before 209 after 0
zfs_log_write len 209
zfs_write: tx_bytes 209
+vnop_setattr
 vnop_setattr top
 vnop_setattr 4
zfs_range_lock: 0xffffff800a76f800 off 0 len -1 type 1
 vnop_setattr 1
 vnop_setattr 5
 vnop_setattr 2
 vnop_setattr out
-vnop_setattr
vnop_mkdir: type 2
mknode vtype 2 use_sa 1: obj_type 44
zfs_znode_alloc blksize is ZERO; fixed 512
zfs_vnops attach 2
Attaching vnode 0xffffff800a7a8e08 type 2: zmode 0x41ed
+vnop_setattr
 vnop_setattr top
 vnop_setattr 3
 vnop_setattr 1
 vnop_setattr 4
 vnop_setattr 5
 vnop_setattr 2
 vnop_setattr out
-vnop_setattr
vnop_inactive
+vnop_setattr
 vnop_setattr top
 vnop_setattr 3
 vnop_setattr 1
 vnop_setattr 4
 vnop_setattr 2
 vnop_setattr 5
 vnop_setattr 3
 vnop_setattr out
-vnop_setattr
+vnop_setattr
 vnop_setattr top
 vnop_setattr 4
 vnop_setattr 1
 vnop_setattr 5
 vnop_setattr 2
 vnop_setattr out
-vnop_setattr
+vnop_setattr
 vnop_setattr top
 vnop_setattr 3
 vnop_setattr 1
 vnop_setattr 4
 vnop_setattr 2
 vnop_setattr 5
 vnop_setattr 3
 vnop_setattr out
-vnop_setattr
vnop_mkdir: type 2
dmu_tx fail -1
txg=8 quiesce_txg=8 sync_txg=4
txg=7 quiesce_txg=8 sync_txg=4
 vnop_setattr 4
 vnop_setattr 5
 vnop_setattr out
-vnop_setattr
vnop_create: 'Makefile'
waiting; tx_synced=6 waiting=4 dp=0xffffff800656bc00
+txg_wait_synced
txg=3 quiesce_txg=9 sync_txg=4
-txg_wait_synced
+txg_wait_synced
txg=8 quiesce_txg=9 sync_txg=8
broadcasting sync more tx_synced=6 waiting=8 dp=0xffffff800656bc00

I induce panic here and look at the threads, those not idle are:

0xffffff80081c1000 zfs_vnop_mkdir 
dmu_tx_wait cv_wait(&tx->tx_quiesce_done_cv, &tx->tx_sync_lock);

0xffffff8009551550 zfs_vnop_create
dmu_tx_wait cv_wait(&dn->dn_notxholds, &dn->dn_mtx);

0xffffff80071fe550 zfs_vfs_sync
zil_commit cv_wait(&tx->tx_sync_done_cv, &tx->tx_sync_lock);

0xffffff8009700000 taskq_thread
cv_wait(&tq->tq_dispatch_cv, &tq->tq_lock);

0xffffff8007b51550 txg_quiesce
txg_quiesce_thread cv_wait(&tc->tc_cv[g], &tc->tc_lock);
g = 3
(gdb) p tc->tc_cv[g]                         
$6 = {
  cv_waiters = 1, 
  pad = 0
}

0xffffff80094f5aa0 txg_sync_thread
txg_thread_wait cv_wait_interruptible(cv, &tx->tx_sync_lock);

Both zfs_vnop_mkdir and zfs_vnop_create are in dmu_tx_wait so they are most likely the symptoms.

zfs_vfs_sync is waiting for zil_commit to complete. So it most likely comes down to the txg_quiesce_thread and txg_sync_thread.

File IO in kernel for zfs send/recv/diff.

ZFS passes an "int fd" to do the file IO to the kernel. Used in zfs send, recv and diff. This section is completely missing in OSX, as the API is unknown.

We need to either find an API for it, or change the way the userland and kernel talk together.

'make distclean' fails when deleting zfs.kext

The make target "distclean" fails, because the zfs.kext is a directory, not a file, and as such can't be removed with the usual "rm -f". Should be "rm -rf".

Check libshare and libzfs interaction: fix double initialization

Some strange crashes today revealed a problem in the initialization sequence in libzfs:
All programs using libzfs (i.e. zfs and zpool) call early in the code libzfs_init() to get a handle. However, libshare_init was declared as static initializer and called from the linker before main and itself tried to init the libzfs and then enumerate all datasets (and do other work). Since libzfs_init is not reentrant, this causes losing the sate from the first initialization.

This particular early initialization problem is fixed in 175a5d6, but there are more problems: In libzfs_mount.c are some more calls into libshare, which will again call libzfs_init().

The whole interaction of libshare and libzfs needs to be reworked. I wonder how and why it works on ZoL ...

send recv broken: oracle solaris 11 28/5 --> openzfs 28

send (openzfs 28) ---> recv (oracle solaris 11 28/5) works.
send (oracle solaris 11 28/5) ---> recv (openzfs 28) fails with error

"cannot receive: stream has unsupported feature, feature flags = 24"

The error message is emanating from the openzfs, receiving side.

These files show openzfs-->solaris11 stream and the solaris11-->openzfs stream.
https://www.dropbox.com/s/rzcfczai16z1gxf/orig.zip

http://comments.gmane.org/gmane.os.solaris.opensolaris.zfs/48661

http://echelog.com/logs/browse/smartos/1363820400

http://www.listbox.com/member/archive/182179/2013/03/sort/time_rev/page/3/entry/21:150/20130315182631:619F9A92-8DBF-11E2-ACEA-A643D5BAC4CE/

https://updates.oracle.com/readme/147440-15

unmount prevented by open files within a subdirectory of .Spotlight-V100/ at the root of the file system (mds and/or related processes)

To track the work in progress mentioned by lundman at http://zevo.getgreenbytes.com/forum/viewtopic.php?p=5040#p5040:

… hold a directory open so you can't unmount it. The latter I am working on now. …

Incorrect mmap behavior

The mmap, pagein and pageout functionality was ported over from MacZFS, but appears to be a little incorrect in the current version.

The problem can be demonstrated with:

# ./fsx -R -W /BOOM/without_mmap
mapped writes DISABLED
Using file /BOOM/without_mmap
command line: ./fsx -R -W /BOOM/without_mmap
Seed set to 1372654525
truncating to largest ever: 0x1d088
truncating to largest ever: 0x3b730
truncating to largest ever: 0x3f430
truncating to largest ever: 0x3f634
truncating to largest ever: 0x3fcdc
truncating to largest ever: 0x3fcfc
truncating to largest ever: 0x3ffe0
truncating to largest ever: 0x3fffc
^Csignal 2
testcalls = 150232

# ./fsx /BOOM/with_mmap
Using file /BOOM/with_mmap
command line: ./fsx /BOOM/with_mmap
Seed set to 1372654555
truncating to largest ever: 0x38730
LOG DUMP (16 total operations):
1: SKIPPED (no operation)
2: SKIPPED (no operation)
3: SKIPPED (no operation)
4: SKIPPED (no operation)
5: SKIPPED (no operation)
6: TRUNCATE UP     from 0x0 (0) to 0x38730 (231216)
7: MAPWRITE        0x2a1f8 (172536) thru 0x3872f (231215)       (0xe538 (58680) bytes)
8: MAPWRITE        0x26140 (155968) thru 0x290c7 (168135)       (0x2f88 (12168) bytes)
9: READ            0x151d8 (86488) thru 0x211ef (135663)        (0xc018 (49176) bytes)
10: READ            0x1ab44 (109380) thru 0x1f2f3 (127731)      (0x47b0 (18352) bytes)
11: MAPREAD         0x34080 (213120) thru 0x3872f (231215)      (0x46b0 (18096) bytes)
12: TRUNCATE DOWN   from 0x38730 (231216) to 0xde28 (56872)
13: TRUNCATE UP     from 0xde28 (56872) to 0x21b6c (138092)
14: MAPREAD         0x20688 (132744) thru 0x21b6b (138091)      (0x14e4 (5348) bytes)
15: TRUNCATE UP     from 0x21b6c (138092) to 0x278dc (162012)
16: MAPWRITE        0x22860 (141408) thru 0x278db (162011)      (0x507c (20604) bytes)
Mapped Write: non-zero data past EOF (0x278db) page offset 0x8dc is 0x0008
Correct content saved for comparison
(maybe hexdump "/BOOM/with_mmap" vs "/BOOM/with_mmap.fsxgood")
Seed was set to 1372654555

Or alternate

# ./fsx /BOOM/with_mmap2Using file /BOOM/with_mmap2
command line: ./fsx /BOOM/with_mmap2
Seed set to 1372654921
truncating to largest ever: 0xefdc
truncating to largest ever: 0x2813c
data miscompare @ 91956
LOG DUMP (20 total operations):
1: TRUNCATE UP     from 0x0 (0) to 0xefdc (61404)
2: WRITE           0x29348 (168776) thru 0x37f73 (229235)       (0xec2c (60460) bytes) HOLE     ***WWWW
3: WRITE           0x131a0 (78240) thru 0x1a34b (107339)        (0x71ac (29100) bytes)  ***WWWW
4: WRITE           0x94 (148) thru 0xd57b (54651)       (0xd4e8 (54504) bytes)
5: MAPREAD         0x349ec (215532) thru 0x37f73 (229235)       (0x3588 (13704) bytes)
6: WRITE           0x11ff8 (73720) thru 0x1a20b (107019)        (0x8214 (33300) bytes)  ***WWWW
7: MAPWRITE        0x15df0 (89584) thru 0x244c3 (148675)        (0xe6d4 (59092) bytes)  ******WWWW
8: MAPWRITE        0x16df0 (93680) thru 0x18bfb (101371)        (0x1e0c (7692) bytes)
9: MAPREAD         0x1b0e8 (110824) thru 0x228af (141487)       (0x77c8 (30664) bytes)
10: READ            0xdca8 (56488) thru 0x114f3 (70899) (0x384c (14412) bytes)
11: READ            0x2a9ac (174508) thru 0x33ff3 (212979)      (0x9648 (38472) bytes)
12: MAPREAD         0xfee4 (65252) thru 0x1977b (104315)        (0x9898 (39064) bytes)  ***RRRR***
13: MAPREAD         0x1c96c (117100) thru 0x1ffb3 (130995)      (0x3648 (13896) bytes)
14: READ            0x33460 (210016) thru 0x37f73 (229235)      (0x4b14 (19220) bytes)
15: MAPREAD         0xbaf0 (47856) thru 0x1a913 (108819)        (0xee24 (60964) bytes)  ***RRRR***
16: WRITE           0x4330 (17200) thru 0xfaff (64255)  (0xb7d0 (47056) bytes)
17: MAPWRITE        0x114d8 (70872) thru 0x14547 (83271)        (0x3070 (12400) bytes)
18: TRUNCATE DOWN   from 0x37f74 (229236) to 0x2813c (164156)
19: WRITE           0xe270 (57968) thru 0x1dbf7 (121847)        (0xf988 (63880) bytes)  ***WWWW
20: READ            0x16734 (91956) thru 0x17ba7 (97191)        (0x1474 (5236) bytes)   ***RRRR***
data miscompare @ 91956
OFFSET     GOOD       BAD        LENGTH     BADOP#   Last: WRITE    TRUNC-   TRUNC+  
0x00016734 0x00000013 0x00000007 0x00001474 7              19       -1       -1

That is the main problem. The data returned is inconsistent. The problem will show even if only one of -R and -W are used.

There is a secondary problem with mmap as well, the panic:

"zfs: accessing past end of object 5e/71005 
    (size=110592 access=110496+4096)"@spl-err.c:48

For non-mmap operations, zfs_write calls zfs_range_lock() for the write operation, and if this sets rl->r_len == UINT64_MAX we need to grow the buffer by calling zfs_grow_blocksize(). The failure to grow the buffer, can cause the panic above. Currently, pageout does no range locking at all.

Userland iteration of datasets fails to iterate everything.

Once creating multiple datasets, listing the sets appears to be incorrect

# zfs create BOOM
# zfs create BOOM/hello
# zfs send BOOM@send | zfs recv BOOM@backup
# zfs create -V 1G BOOM/vol
# zfs list -t all

NAME               USED  AVAIL  REFER  MOUNTPOINT
BOOM              28.3G   200G  16.8G  /BOOM
BOOM/backup       10.5G   200G  10.5G  /BOOM/backup
BOOM/backup@send      0      -  10.5G  -

(Where is 'vol' and 'hello')

# df -h
Filesystem      Size   Used  Avail Capacity iused     ifree %iused  Mounted on
BOOM           217Gi   17Gi  200Gi     8%  460549 419523040    0%   /BOOM
BOOM/backup    211Gi   11Gi  200Gi     6%  460533 419523040    0%   /BOOM/backup
BOOM/hello     200Gi  278Ki  200Gi     1%      12 419523040    0%   /BOOM/hello

# ./cmd.sh zfs list BOOM/hello
NAME         USED  AVAIL  REFER  MOUNTPOINT
BOOM/hello   278K   200G   278K  /BOOM/hello

Which is possibly the real problem with #20
and maybe even touching on #16

Possibly we are not iterating/reading-pools correctly in userland.

Large file IO will hang.

Create a pool, and copy any file of "some size", most likely larger than recordsize, and sync will hang.

bash-3.2# ./zpool.sh create -f BOOM ~/pool-image.bin 
zfs_mount: unused options: "defaults,atime,dev,exec,rw,suid,xattr,nomand"
bash-3.2# echo "HELLO WORLD" > /BOOM/file1
bash-3.2# sync
bash-3.2# sync
bash-3.2# ls -l configure
-rwxr-xr-x  1 lundman  staff  497984 Apr  5 10:15 configure
bash-3.2# cp configure /BOOM/file2
bash-3.2# sync

.. hung ..

Compilation fails on Snow Leopard / MacPorts 2.1.3 / gcc 4.2.1

Compilation fails in dsl_scan.c and zio.c when trying to inline some function marked as "always inline":

zio.c: In function ‘zio_ready’:
zio.c:1250: error: ‘always_inline’ function could not be inlined in call to ‘__zio_execute’: function not considered for inlining
zio.c:537: error: called from here

dsl_scan.c: In function ‘dsl_scan_visitbp’:
dsl_scan.c:712: error: ‘always_inline’ function could not be inlined in call to ‘dsl_scan_visitdnode’: function not considered for inlining
dsl_scan.c:668: error: called from here
dsl_scan.c:712: error: ‘always_inline’ function could not be inlined in call to ‘dsl_scan_visitdnode’: function not considered for inlining
dsl_scan.c:686: error: called from here
dsl_scan.c:712: error: ‘always_inline’ function could not be inlined in call to ‘dsl_scan_visitdnode’: function not considered for inlining
dsl_scan.c:696: error: called from here
dsl_scan.c:712: error: ‘always_inline’ function could not be inlined in call to ‘dsl_scan_visitdnode’: function not considered for inlining
dsl_scan.c:699: error: called from here

A work-around is to remove the always inle qualifier, but the question is, if that is wise to do.

diff --git a/module/zfs/dsl_scan.c b/module/zfs/dsl_scan.c
index 297caa0..fa344ea 100644
--- a/module/zfs/dsl_scan.c
+++ b/module/zfs/dsl_scan.c
@@ -599,7 +599,7 @@ dsl_scan_check_resume(dsl_scan_t *scn, const dnode_phys_t *dnp,
  * Return nonzero on i/o error.
  * Return new buf to write out in *bufp.
  */
-inline __attribute__((always_inline)) static int
+inline /* __attribute__((always_inline)) */ static int
 dsl_scan_recurse(dsl_scan_t *scn, dsl_dataset_t *ds, dmu_objset_type_t ostype,
     dnode_phys_t *dnp, const blkptr_t *bp,
     const zbookmark_t *zb, dmu_tx_t *tx, arc_buf_t **bufp)
@@ -705,7 +705,7 @@ dsl_scan_recurse(dsl_scan_t *scn, dsl_dataset_t *ds, dmu_objset_type_t ostype,
        return (0);
 }
 
-inline __attribute__((always_inline)) static void
+inline /* __attribute__((always_inline)) */ static void
 dsl_scan_visitdnode(dsl_scan_t *scn, dsl_dataset_t *ds,
     dmu_objset_type_t ostype, dnode_phys_t *dnp, arc_buf_t *buf,
     uint64_t object, dmu_tx_t *tx)
diff --git a/module/zfs/zio.c b/module/zfs/zio.c
index bc7b759..594406e 100644
--- a/module/zfs/zio.c
+++ b/module/zfs/zio.c
@@ -1244,7 +1244,7 @@ zio_execute(zio_t *zio)
        __zio_execute(zio);
 }
 
-__attribute__((always_inline))
+/* __attribute__((always_inline)) */
 static inline void
 __zio_execute(zio_t *zio)
 {

panic(cpu 1 caller 0xffffff80002438d8): "zalloc: \"kalloc.1024\" (100535 elements) retry fail 3, kfree_nop_count: 0"@/SourceCache/xnu/xnu-2050.7.9/osfmk/kern/zalloc.c:1826

Running large iozone on HDD, ~200GB results in the following panic:

Backtrace (CPU 1), Frame : Return Address
0xffffff8040eb32b0 : 0xffffff800021d5f6 mach_kernel : _panic + 0xc6
0xffffff8040eb3320 : 0xffffff80002438d8 mach_kernel : _zalloc_canblock + 0x428
0xffffff8040eb3400 : 0xffffff80002245bd mach_kernel : _kalloc_canblock + 0x7d
0xffffff8040eb3430 : 0xffffff8000224c39 mach_kernel : _OSMalloc + 0x89
0xffffff8040eb3460 : 0xffffff7f80e8e7df net.lundman.spl : _kmem_cache_alloc + 0x2f
0xffffff8040eb34a0 : 0xffffff7f80f79673 net.lundman.zfs : _zio_create + 0xc3
0xffffff8040eb35a0 : 0xffffff7f80f79fe9 net.lundman.zfs : _zio_write + 0x139
0xffffff8040eb36b0 : 0xffffff7f80e9ff30 net.lundman.zfs : _arc_write + 0x180
0xffffff8040eb37a0 : 0xffffff7f80eac5c8 net.lundman.zfs : _dbuf_write + 0x628
0xffffff8040eb3960 : 0xffffff7f80eabc6b net.lundman.zfs : _dbuf_sync_leaf + 0x3cb
0xffffff8040eb39d0 : 0xffffff7f80eab737 net.lundman.zfs : _dbuf_sync_list + 0x87
0xffffff8040eb3a00 : 0xffffff7f80eab87e net.lundman.zfs : _dbuf_sync_indirect + 0x12e
0xffffff8040eb3a40 : 0xffffff7f80eab725 net.lundman.zfs : _dbuf_sync_list + 0x75
0xffffff8040eb3a70 : 0xffffff7f80eab87e net.lundman.zfs : _dbuf_sync_indirect + 0x12e
0xffffff8040eb3ab0 : 0xffffff7f80eab725 net.lundman.zfs : _dbuf_sync_list + 0x75
0xffffff8040eb3ae0 : 0xffffff7f80eab87e net.lundman.zfs : _dbuf_sync_indirect + 0x12e
0xffffff8040eb3b20 : 0xffffff7f80eab725 net.lundman.zfs : _dbuf_sync_list + 0x75
0xffffff8040eb3b50 : 0xffffff7f80ecea6e net.lundman.zfs : _dnode_sync + 0x60e
0xffffff8040eb3bb0 : 0xffffff7f80eb96a6 net.lundman.zfs : _dmu_objset_sync_dnodes + 0x96
0xffffff8040eb3bf0 : 0xffffff7f80eb93ed net.lundman.zfs : _dmu_objset_sync + 0x44d
0xffffff8040eb3d30 : 0xffffff7f80ed4ef1 net.lundman.zfs : _dsl_dataset_sync + 0x51
0xffffff8040eb3d60 : 0xffffff7f80ee326c net.lundman.zfs : _dsl_pool_sync + 0xfc
0xffffff8040eb3e50 : 0xffffff7f80f09970 net.lundman.zfs : _spa_sync + 0x4e0
0xffffff8040eb3f20 : 0xffffff7f80f16362 net.lundman.zfs : _txg_sync_thread + 0x332
0xffffff8040eb3fb0 : 0xffffff80002b2677 mach_kernel : _call_continuation + 0x17
      Kernel Extensions in backtrace:

System uptime in nanoseconds: 2309433338440
vm objects:4441920
pv_list:2420736
vm pages:36918072
kalloc.16:25956352
kalloc.32:3534848
kalloc.64:8560640
kalloc.128:78135296
kalloc.256:303702016
kalloc.512:222441472
kalloc.1024:102961152
kalloc.8192:1392640
vnodes:4689432
namecache:2040000
HFS node:5917776
HFS fork:1773568
buf.8192:24805376
Kernel Stacks:3833856
PageTables:13852672
Kalloc.Large:31064098

Backtrace suspected of leaking: (outstanding bytes: 94208)
0xffffff8000243589
0xffffff80002245bd
0xffffff8000224c39
0xffffff7f80e8e7df
0xffffff7f80f79673
0xffffff7f80f79fe9
0xffffff7f80e9ff30
0xffffff7f80eac5c8
0xffffff7f80eabc6b
0xffffff7f80eab737
0xffffff7f80eab87e
0xffffff7f80eab725
0xffffff7f80eab87e
0xffffff7f80eab725
0xffffff7f80eab87e

(gdb) zprint
ZONE                   COUNT   TOT_SZ   MAX_SZ   ELT_SZ ALLOC_SZ         TOT_ALLOC         TOT_FREE NAME
0xffffff8002a89000       178    1a290    1c800      592     5000               178                0 zones X$
0xffffff8002a8a970     18669   43c740   5b2000      224     4000           1134656          1115987 vm objects CX$
0xffffff8002a8a720     13248    817e0    c0000       40     1000             14655             1407 vm object hash entries CX$
0xffffff8002a8a4d0        91     6e90     a000      232     2000              2681             2590 maps X$
0xffffff8002a8a280      8135    c0f30   100000       80     5000           1532690          1524555 VM map entries CX$
0xffffff8002a8a030        33    cb3e0   280000       80     1000             79812            79779 Reserved VM map entries $
0xffffff8002a89de0         0      ff0     4000       80     1000             11503            11503 VM map copies CX$
0xffffff8002a89b90        81     6000    19000      256     1000               904              823 pmap CX$
0xffffff8002a89940        81    51000    81bf1     4096     1000               904              823 pagetable anchors CX$
0xffffff8002a896f0     50200   24f000   255600       48     3000             50200                0 pv_list CX$
0xffffff8002a894a0    512442  2335338        0       72     2000           2299521          2291262 vm pages HC
0xffffff8002a89250   1620133  18c1000  22a3599       16     1000         125203838        123583705 kalloc.16 CX
0xffffff8006306c50    110335   35f000   4ce300       32     1000          13634985         13524650 kalloc.32 CX
0xffffff8006306a00    133584   82a000   e6a900       64     1000          26510120         26376536 kalloc.64 CX
0xffffff80063067b0    610090  4a84000  614f4c0      128     1000          50524515         49914425 kalloc.128 CX
0xffffff8006306560   1070398 121a2000 1b5e4d60      256     1000          72534632         71464234 kalloc.256 CX
0xffffff8006306310    399302  d423000  daf26b0      512     1000          39231204         38831902 kalloc.512 CX
0xffffff80063060c0    100404  6231000  c29e980     1024     1000          22949693         22849289 kalloc.1024 CX
0xffffff8006305e70       292    9a000   200000     2048     1000          77633725         77633433 kalloc.2048 CX
0xffffff8006305c20       131    83000   900000     4096     1000             47564            47433 kalloc.4096 CX
0xffffff80063059d0       170   154000  2000000     8192     2000             29553            29383 kalloc.8192 CX
0xffffff8006305780     13248    34000    48000       16     1000             14655             1407 mem_obj_control CX$

Which makes malloc.256 be a likely candidate for memory leaks.

I suspect that the panic at malloc.1024, from zio_create() just happens to be the call running into the trouble first, and maybe not the place we leak.

zfs clone makes invisible dataset.

zfs clone a snapshot to new filesystem appears to work, except the new dataset is not visible;

# ./cmd.sh zfs clone BOOM@now BOOM/hello

NAME         USED  AVAIL  REFER  MOUNTPOINT
BOOM        1.31M  62.2M   938K  /BOOM
BOOM@now      22K      -   372K  -
BOOM/other    31K  62.2M    31K  /BOOM/other

# ls -l /BOOM/
total 1816
drwx------  2 root     wheel       3 Apr  8 17:01 .fseventsd
-rwxr-xr-x  1 lundman  wheel  490131 Apr  8 17:19 configure
drwxr-xr-x  3 root     wheel       4 Apr  8 17:01 hello
-rwxr-xr-x  1 root     wheel  293274 Apr  8 17:01 libtool
drwxr-xr-x  2 root     wheel       2 Apr  8 17:19 other

# ./cmd.sh zfs mount BOOM/hello
Filesystem    512-blocks     Used Available Capacity iused   ifree %iused  Mounted on
BOOM/hello        128426      744    127682     1%      10  127682    0%   /BOOM/hello

# ./cmd.sh zfs list -tall
NAME         USED  AVAIL  REFER  MOUNTPOINT
BOOM        1.16M  62.3M   938K  /BOOM
BOOM@now    23.5K      -   372K  -
BOOM/other    31K  62.3M    31K  /BOOM/other
# df

Although I like the idea of invisible datasets, this might not be all that useful.

zfs_setacl() passing arg 2 from incompatible pointer

The call to zfs_setacl() in zfs_setsecattr() starting at line 5471 of zfs_vnops.c passes arg 2 as vsecattr_t instead of kauth_acl. We need to check what really should go in here, and if we should globally replace vsecattr_t with kauth_acl or the other way around.

Unloading kext should check for busy

Currently we do not check for busy filesystems in unload and attempt to unload anyway. This results in a panic. We should put the isbusy checks back into the kext unload function.

Saving Tinderbox document does not work

Replicated with Tinderbox 5.12.2, Mac OS X 10.8.4:
http://www.eastgate.com/Tinderbox/
Saving a Tinderbox document to a MacZFS volume fails. One is left with a "Tinderbox temp" file that does not contain any of the original data.

Based on information from Tinderbox support, Tinderbox relies on FSExchangeObjecs and/or FSRefExchangeFiles.

Apple developer documentation:
The FSExchangeObjects function allows programs to implement a “safe save” operation by creating and writing a complete new file and swapping the contents. An alias, FSSpec, or FSRef that refers to the old file will now access the new data. The corresponding information in in-memory data structures are also exchanged.

FSExchangeObjects is deprecated in 10.8

Workaround for now to create a HFS+ formatted volume and save to that.

VNOP_FSYNC: mutex_enter: locking against myself!"@spl-mutex.c:108

Large rsync (Copying "/" to ZFS) produced:

panic(cpu 0 caller 0xffffff7f86bdf4a8): "mutex_enter: locking against myself!"@spl-mutex.c:108
Backtrace (CPU 0), Frame : Return Address
0xffffff804ab13360 : 0xffffff800541d626 mach_kernel : _panic + 0xc6
0xffffff804ab133d0 : 0xffffff7f86bdf4a8 net.lundman.spl : _spl_mutex_enter + 0x78
0xffffff804ab133f0 : 0xffffff7f86cc4614 net.lundman.zfs : _zfs_zget + 0x74
0xffffff804ab13490 : 0xffffff7f86caf1ed net.lundman.zfs : _zfs_get_data + 0x7d
0xffffff804ab13540 : 0xffffff7f86ccc0a6 net.lundman.zfs : _zil_lwb_commit + 0x356
0xffffff804ab135d0 : 0xffffff7f86cca128 net.lundman.zfs : _zil_commit_writer + 0x128
0xffffff804ab13620 : 0xffffff7f86cc9f87 net.lundman.zfs : _zil_commit + 0x137
0xffffff804ab13650 : 0xffffff7f86cb2ae0 net.lundman.zfs : _zfs_fsync + 0x110
0xffffff804ab136a0 : 0xffffff7f86cb9dfe net.lundman.zfs : _zfs_vnop_fsync + 0x6e
0xffffff804ab136e0 : 0xffffff8005511f3f mach_kernel : _VNOP_FSYNC + 0x2f
0xffffff804ab13710 : 0xffffff80054f1583 mach_kernel : _vflush + 0x673
0xffffff804ab13760 : 0xffffff80054f0cd1 mach_kernel : _vnode_rele_ext + 0x351
0xffffff804ab137a0 : 0xffffff80054f7ac6 mach_kernel : _vfs_addtrigger + 0x2a6
0xffffff804ab137d0 : 0xffffff80054efa5e mach_kernel : _vnode_create + 0x15e
0xffffff804ab13890 : 0xffffff7f86cbce9c net.lundman.zfs : _zfs_znode_getvnode + 0x20c
0xffffff804ab13930 : 0xffffff7f86cc3260 net.lundman.zfs : _zfs_znode_alloc + 0x760
0xffffff804ab13b20 : 0xffffff7f86cc4849 net.lundman.zfs : _zfs_zget + 0x2a9
0xffffff804ab13bc0 : 0xffffff7f86cbb780 net.lundman.zfs : _zfs_vnop_readdirattr + 0x5f0
0xffffff804ab13e00 : 0xffffff8005513246 mach_kernel : _VNOP_READDIRATTR + 0x56
0xffffff804ab13e60 : 0xffffff8005501f1a mach_kernel : _getdirentriesattr + 0x2da
0xffffff804ab13f50 : 0xffffff80057e16aa mach_kernel : _unix_syscall64 + 0x20a
0xffffff804ab13fb0 : 0xffffff80054ce9c3 mach_kernel : _hndl_unix_scall64 + 0x13

Pretty clear that we call zget which allocates a new vnode which trigger a vflush which triggers a zil_commit which has to call zget.

Quite similar to the reclaim issue, and can be solved the same. This explains why MacZFS has logic to test for this case finally.

Add mmap, pagein and pageout support

Panic creating pool

# ./zpool.sh create -f BOOM pool-image.bin
[zfs] Yay, got ioctl 4
[zfs] ioctl done 0
[zfs] Yay, got ioctl 5
In  stats
spa_get_stats: 2
  spa_get_stats 2
Out stats 
[zfs] ioctl done 2
[zfs] Yay, got ioctl 0

(gdb) where
#0  0xffffff7f80f2b8b5 in vdev_add_child (pvd=0xffffff8009851000, cvd=0xffffff8009851800) at vdev.c:208
#1  0xffffff7f80f2cdfe in vdev_alloc (spa=0xffffff8006d9f000, vdp=0xffffff804621b940, nv=0xffffff8007da9e38, parent=0xffffff8009851000, id=0, alloctype=1) at vdev.c:564
#2  0xffffff7f80f0554e in spa_config_parse (spa=0xffffff8006d9f000, vdp=0xffffff804621b940, nv=0xffffff8007da9e38, parent=0xffffff8009851000, id=0, atype=1) at spa.c:1084
#3  0xffffff7f80f05633 in spa_config_parse (spa=0xffffff8006d9f000, vdp=0xffffff804621bba0, nv=0xffffff8007db7080, parent=0x0, id=0, atype=1) at spa.c:1104
#4  0xffffff7f80f0e17f in spa_create (pool=0xffffff800683d000 "BOOM", nvroot=0xffffff8007db7080, props=0xffffff80066cc520, history_str=0x0, zplprops=0xffffff8007dba8e0) at spa.c:3362

(gdb) 
#0  0xffffff7f80f2b8b5 in vdev_add_child (pvd=0xffffff8009851000, cvd=0xffffff8009851800) at vdev.c:208
208             ASSERT(cvd->vdev_top->vdev_parent->vdev_parent == NULL);


(gdb) p *cvd
$10 = {
  vdev_id = 0, 
  vdev_guid = 4055797430771032621, 
  vdev_guid_sum = 4055797430771032621, 
  vdev_orig_guid = 7236549231704963631, 
  vdev_asize = 3327997875516897839, 
  vdev_min_asize = 8030052479819475308, 
  vdev_max_asize = 140734804028527, 
  vdev_ashift = 140734787661824, 
  vdev_state = 1, 
  vdev_prevstate = 140734812175232, 
  vdev_ops = 0xffffff7f81045b40, 
  vdev_spa = 0xffffff8006d9f000, 
  vdev_tsd = 0x0, 
  vdev_name_vp = 0x0, 
  vdev_devid_vp = 0x0, 
  vdev_top = 0x11000000000000, 
  vdev_parent = 0xffffff8009851000, 
  vdev_child = 0xffffff8006d9f644, 
  vdev_children = 17179869186, 
  vdev_dtl = {{
      sm_root = {
        avl_root = 0x0, 
        avl_compar = 0xffffff7f80f24aa0 <space_map_seg_compare>, 
        avl_offset = 0, 
        avl_numnodes = 0, 
        avl_size = 64
      }, 
      sm_space = 0, 
      sm_start = 0, 
      sm_size = 18446744073709551615, 
      sm_shift = 0 '\0', 
      sm_pad = "\000\000", 
      sm_loaded = 0 '\0', 
      sm_loading = 0 '\0', 
      sm_load_cv = {
        cv_waiters = 0
      }, 
      sm_ops = 0x0, 
      sm_pp_root = 0x0, 
      sm_ppd = 0x0, 
      sm_lock = 0xffffff8009851ee0
    }, {
      sm_root = {
        avl_root = 0x0, 
        avl_compar = 0xffffff7f80f24aa0 <space_map_seg_compare>, 
        avl_offset = 0, 
        avl_numnodes = 0, 
        avl_size = 64
      }, 
      sm_space = 0, 
      sm_start = 0, 
      sm_size = 18446744073709551615, 
      sm_shift = 0 '\0', 
      sm_pad = "\000\000", 
      sm_loaded = 0 '\0', 
      sm_loading = 0 '\0', 
      sm_load_cv = {
        cv_waiters = 0
      }, 
      sm_ops = 0x0, 
      sm_pp_root = 0x0, 
      sm_ppd = 0x0, 
      sm_lock = 0xffffff8009851ee0
    }, {
      sm_root = {
        avl_root = 0x0, 
        avl_compar = 0xffffff7f80f24aa0 <space_map_seg_compare>, 
        avl_offset = 0, 
        avl_numnodes = 0, 
        avl_size = 64
      }, 
      sm_space = 0, 
      sm_start = 0, 
      sm_size = 18446744073709551615, 
      sm_shift = 0 '\0', 
      sm_pad = "\000\000", 
      sm_loaded = 0 '\0', 
      sm_loading = 0 '\0', 
      sm_load_cv = {
        cv_waiters = 0
      }, 
      sm_ops = 0x0, 
      sm_pp_root = 0x0, 
      sm_ppd = 0x0, 
      sm_lock = 0xffffff8009851ee0
    }, {
      sm_root = {
        avl_root = 0x0, 
        avl_compar = 0xffffff7f80f24aa0 <space_map_seg_compare>, 
        avl_offset = 0, 
        avl_numnodes = 0, 
        avl_size = 64
      }, 
      sm_space = 0, 
      sm_start = 0, 
      sm_size = 18446744073709551615, 
      sm_shift = 0 '\0', 
      sm_pad = "\000\000", 
      sm_loaded = 0 '\0', 
      sm_loading = 0 '\0', 
      sm_load_cv = {
        cv_waiters = 0
      }, 
      sm_ops = 0x0, 
      sm_pp_root = 0x0, 
      sm_ppd = 0x0, 
      sm_lock = 0xffffff8009851ee0
    }}, 
  vdev_stat = {
    vs_timestamp = 14781527861, 
    vs_state = 0, 
    vs_aux = 0, 
    vs_alloc = 0, 
    vs_space = 0, 
    vs_dspace = 0, 
    vs_rsize = 0, 
    vs_esize = 0, 
    vs_ops = {0, 0, 0, 0, 0, 0}, 
    vs_bytes = {0, 0, 0, 0, 0, 0}, 
    vs_read_errors = 0, 
    vs_write_errors = 0, 
    vs_checksum_errors = 0, 
    vs_self_healed = 0, 
    vs_scan_removing = 0, 
    vs_scan_processed = 0
  }, 
  vdev_expanding = 0, 
  vdev_reopening = 0, 
  vdev_open_error = 0, 
  vdev_open_thread = 0x0, 
  vdev_crtxg = 0, 
  vdev_ms_array = 0, 
  vdev_ms_shift = 0, 
  vdev_ms_count = 0, 
  vdev_mg = 0xffffff8007cc9600, 
  vdev_ms = 0x0, 
  vdev_pending_fastwrite = 0, 
  vdev_ms_list = {
    tl_lock = {
      m_owner = 0x0, 
      initialized = 0, 
      m_lock = {{
          opaque = {0, 18446744069414584320, 0}
        }}
    }, 
    tl_offset = 1368, 
    tl_head = {0x0, 0x0, 0x0, 0x0}
  }, 
  vdev_dtl_list = {
    tl_lock = {
      m_owner = 0x0, 
      initialized = 0, 
      m_lock = {{
          opaque = {0, 18446744069414584320, 0}
        }}
    }, 
    tl_offset = 1192, 
    tl_head = {0x0, 0x0, 0x0, 0x0}
  }, 
  vdev_txg_node = {
    tn_next = {0x0, 0x0, 0x0, 0x0}, 
    tn_member = "\000\000\000"
  }, 
  vdev_remove_wanted = 0, 
  vdev_probe_wanted = 0, 
  vdev_removing = 0, 
  vdev_config_dirty_node = {
    list_next = 0x0, 
    list_prev = 0x0
  }, 
  vdev_state_dirty_node = {
    list_next = 0x0, 
    list_prev = 0x0
  }, 
  vdev_deflate_ratio = 0, 
  vdev_islog = 0, 
  vdev_ishole = 0, 
  vdev_psize = 0, 
  vdev_dtl_smo = {
    smo_object = 0, 
    smo_objsize = 0, 
    smo_alloc = 0
  }, 
  vdev_dtl_node = {
    tn_next = {0x0, 0x0, 0x0, 0x0}, 
    tn_member = "\000\000\000"
  }, 
  vdev_wholedisk = 18446744073709551615, 
  vdev_offline = 0, 
  vdev_faulted = 0, 
  vdev_degraded = 0, 
  vdev_removed = 0, 
  vdev_resilvering = 0, 
  vdev_nparity = 0, 
  vdev_path = 0xffffff8007dba680 "/Users/lundman/pool-image.bin", 
  vdev_devid = 0x0, 
  vdev_physpath = 0x0, 
  vdev_fru = 0x0, 
  vdev_not_present = 0, 
  vdev_unspare = 0, 
  vdev_last_try = 0, 
  vdev_nowritecache = 0, 
  vdev_checkremove = 0, 
  vdev_forcefault = 0, 
  vdev_splitting = 0, 
  vdev_delayed_close = 0, 
  vdev_tmpoffline = 0 '\0', 
  vdev_detached = 0 '\0', 
  vdev_cant_read = 0 '\0', 
  vdev_cant_write = 0 '\0', 
  vdev_isspare = 0, 
  vdev_isl2cache = 0, 
  vdev_queue = {
    vq_deadline_tree = {
      avl_root = 0x0, 
      avl_compar = 0xffffff7f80f407b0 <vdev_queue_deadline_compare>, 
      avl_offset = 568, 
      avl_numnodes = 0, 
      avl_size = 944
    }, 
    vq_read_tree = {
      avl_root = 0x0, 
      avl_compar = 0xffffff7f80f408b0 <vdev_queue_offset_compare>, 
      avl_offset = 544, 
      avl_numnodes = 0, 
      avl_size = 944
    }, 
    vq_write_tree = {
      avl_root = 0x0, 
      avl_compar = 0xffffff7f80f408b0 <vdev_queue_offset_compare>, 
      avl_offset = 544, 
      avl_numnodes = 0, 
      avl_size = 944
    }, 
    vq_pending_tree = {
      avl_root = 0x0, 
      avl_compar = 0xffffff7f80f408b0 <vdev_queue_offset_compare>, 
      avl_offset = 544, 
      avl_numnodes = 0, 
      avl_size = 944
    }, 
    vq_io_list = {
      list_size = 131088, 
      list_offset = 131072, 
      list_head = {
        list_next = 0xffffff80367ae000, 
        list_prev = 0xffffff80368d7000
      }
    }, 
    vq_lock = {
      m_owner = 0x0, 
      initialized = -559038737, 
      m_lock = {{
          opaque = {0, 18446744069414584320, 16045690984833335023}
        }}
    }
  }, 
  vdev_cache = {
    vc_offset_tree = {
      avl_root = 0x0, 
      avl_compar = 0xffffff7f80f36da0 <vdev_cache_offset_compare>, 
      avl_offset = 24, 
      avl_numnodes = 0, 
      avl_size = 88
    }, 
    vc_lastused_tree = {
      avl_root = 0x0, 
      avl_compar = 0xffffff7f80f36e20 <vdev_cache_lastused_compare>, 
      avl_offset = 48, 
      avl_numnodes = 0, 
      avl_size = 88
    }, 
    vc_lock = {
      m_owner = 0x0, 
      initialized = -559038737, 
      m_lock = {{
          opaque = {0, 18446744069414584320, 16045690984833335023}
        }}
    }
  }, 
  vdev_aux = 0xdeadbeefdeadbeef, 
  vdev_probe_zio = 0xdeadbeefdeadbeef, 
  vdev_label_aux = 3735928559, 
  vdev_dtl_lock = {
    m_owner = 0x0, 
    initialized = -559038737, 
    m_lock = {{
        opaque = {0, 18446744069414584320, 16045690984833335023}
      }}
  }, 
  vdev_stat_lock = {
    m_owner = 0x0, 
    initialized = -559038737, 
    m_lock = {{
        opaque = {0, 18446744069414584320, 16045690984833335023}
      }}
  }, 
  vdev_probe_lock = {
    m_owner = 0x0, 
    initialized = -559038737, 
    m_lock = {{
        opaque = {0, 18446744069414584320, 16045690984833335023}
      }}
  }
}

(gdb) p *pvd
$11 = {
  vdev_id = 0, 
  vdev_guid = 6654298998853802424, 
  vdev_guid_sum = 6654298998853802424, 
  vdev_orig_guid = 0, 
  vdev_asize = 18446743524074649904, 
  vdev_min_asize = 0, 
  vdev_max_asize = 18446743524113570720, 
  vdev_ashift = 18446743524113571216, 
  vdev_state = 1, 
  vdev_prevstate = 18446743524075957279, 
  vdev_ops = 0xffffff7f81045e20, 
  vdev_spa = 0xffffff8006d9f000, 
  vdev_tsd = 0x686c536b444b2e62, 
  vdev_name_vp = 0x110000, 
  vdev_devid_vp = 0x11000000000000, 
  vdev_top = 0x11000000000000, 
  vdev_parent = 0x0, 
  vdev_child = 0xffffff8007db0610, 
  vdev_children = 1, 
  vdev_dtl = {{
      sm_root = {
        avl_root = 0x0, 
        avl_compar = 0xffffff7f80f24aa0 <space_map_seg_compare>, 
        avl_offset = 0, 
        avl_numnodes = 0, 
        avl_size = 64
      }, 
      sm_space = 0, 
      sm_start = 0, 
      sm_size = 18446744073709551615, 
      sm_shift = 0 '\0', 
      sm_pad = "\000\000", 
      sm_loaded = 0 '\0', 
      sm_loading = 0 '\0', 
      sm_load_cv = {
        cv_waiters = 0
      }, 
      sm_ops = 0x0, 
      sm_pp_root = 0x0, 
      sm_ppd = 0x0, 
      sm_lock = 0xffffff80098516e0
    }, {
      sm_root = {
        avl_root = 0x0, 
        avl_compar = 0xffffff7f80f24aa0 <space_map_seg_compare>, 
        avl_offset = 0, 
        avl_numnodes = 0, 
        avl_size = 64
      }, 
      sm_space = 0, 
      sm_start = 0, 
      sm_size = 18446744073709551615, 
      sm_shift = 0 '\0', 
      sm_pad = "\000\000", 
      sm_loaded = 0 '\0', 
      sm_loading = 0 '\0', 
      sm_load_cv = {
        cv_waiters = 0
      }, 
      sm_ops = 0x0, 
      sm_pp_root = 0x0, 
      sm_ppd = 0x0, 
      sm_lock = 0xffffff80098516e0
    }, {
      sm_root = {
        avl_root = 0x0, 
        avl_compar = 0xffffff7f80f24aa0 <space_map_seg_compare>, 
        avl_offset = 0, 
        avl_numnodes = 0, 
        avl_size = 64
      }, 
      sm_space = 0, 
      sm_start = 0, 
      sm_size = 18446744073709551615, 
      sm_shift = 0 '\0', 
      sm_pad = "\000\000", 
      sm_loaded = 0 '\0', 
      sm_loading = 0 '\0', 
      sm_load_cv = {
        cv_waiters = 0
      }, 
      sm_ops = 0x0, 
      sm_pp_root = 0x0, 
      sm_ppd = 0x0, 
      sm_lock = 0xffffff80098516e0
    }, {
      sm_root = {
        avl_root = 0x0, 
        avl_compar = 0xffffff7f80f24aa0 <space_map_seg_compare>, 
        avl_offset = 0, 
        avl_numnodes = 0, 
        avl_size = 64
      }, 
      sm_space = 0, 
      sm_start = 0, 
      sm_size = 18446744073709551615, 
      sm_shift = 0 '\0', 
      sm_pad = "\000\000", 
      sm_loaded = 0 '\0', 
      sm_loading = 0 '\0', 
      sm_load_cv = {
        cv_waiters = 0
      }, 
      sm_ops = 0x0, 
      sm_pp_root = 0x0, 
      sm_ppd = 0x0, 
      sm_lock = 0xffffff80098516e0
    }}, 
  vdev_stat = {
    vs_timestamp = 14781017576, 
    vs_state = 0, 
    vs_aux = 1114112, 
    vs_alloc = 4785074604081152, 
    vs_space = 0, 
    vs_dspace = 1114112, 
    vs_rsize = 4785074605195264, 
    vs_esize = 0, 
    vs_ops = {1114112, 4785074604081152, 0, 1114112, 4785074605195264, 0}, 
    vs_bytes = {1114112, 4785074604081152, 0, 1114112, 1, 2315061383420444675}, 
    vs_read_errors = 18446743524089413808, 
    vs_write_errors = 18446743524113453472, 
    vs_checksum_errors = 0, 
    vs_self_healed = 1024, 
    vs_scan_removing = 18446743524075956240, 
    vs_scan_processed = 16346022891462066184
  }, 
  vdev_expanding = 0, 
  vdev_reopening = 0, 
  vdev_open_error = 256, 
  vdev_open_thread = 0x0, 
  vdev_crtxg = 1114112, 
  vdev_ms_array = 4785074605195264, 
  vdev_ms_shift = 0, 
  vdev_ms_count = 1114112, 
  vdev_mg = 0x11000000000000, 
  vdev_ms = 0x0, 
  vdev_pending_fastwrite = 1114112, 
  vdev_ms_list = {
    tl_lock = {
      m_owner = 0x0, 
      initialized = 0, 
      m_lock = {{
          opaque = {0, 18446744069414584320, 34359738368}
        }}
    }, 
    tl_offset = 1368, 
    tl_head = {0x0, 0x0, 0x0, 0x0}
  }, 
  vdev_dtl_list = {
    tl_lock = {
      m_owner = 0x0, 
      initialized = 0, 
      m_lock = {{
          opaque = {0, 18446744069414584320, 0}
        }}
    }, 
    tl_offset = 1192, 
    tl_head = {0x0, 0x0, 0x0, 0x0}
  }, 
  vdev_txg_node = {
    tn_next = {0x0, 0x0, 0x0, 0x2000000000000}, 
    tn_member = "�\000\000"
  }, 
  vdev_remove_wanted = -1840513017, 
  vdev_probe_wanted = -1840460457, 
  vdev_removing = 10542026603767844183, 
  vdev_config_dirty_node = {
    list_next = 0x0, 
    list_prev = 0x0
  }, 
  vdev_state_dirty_node = {
    list_next = 0x0, 
    list_prev = 0x0
  }, 
  vdev_deflate_ratio = 0, 
  vdev_islog = 0, 
  vdev_ishole = 0, 
  vdev_psize = 0, 
  vdev_dtl_smo = {
    smo_object = 0, 
    smo_objsize = 0, 
    smo_alloc = 0
  }, 
  vdev_dtl_node = {
    tn_next = {0x0, 0x0, 0x0, 0x0}, 
    tn_member = "\000\000\000"
  }, 
  vdev_wholedisk = 18446744073709551615, 
  vdev_offline = 0, 
  vdev_faulted = 0, 
  vdev_degraded = 0, 
  vdev_removed = 0, 
  vdev_resilvering = 0, 
  vdev_nparity = 0, 
  vdev_path = 0x0, 
  vdev_devid = 0x0, 
  vdev_physpath = 0x0, 
  vdev_fru = 0x0, 
  vdev_not_present = 0, 
  vdev_unspare = 0, 
  vdev_last_try = 0, 
  vdev_nowritecache = 0, 
  vdev_checkremove = 0, 
  vdev_forcefault = 0, 
  vdev_splitting = 0, 
  vdev_delayed_close = 0, 
  vdev_tmpoffline = 0 '\0', 
  vdev_detached = 0 '\0', 
  vdev_cant_read = 0 '\0', 
  vdev_cant_write = 0 '\0', 
  vdev_isspare = 0, 
  vdev_isl2cache = 0, 
  vdev_queue = {
    vq_deadline_tree = {
      avl_root = 0x0, 
      avl_compar = 0xffffff7f80f407b0 <vdev_queue_deadline_compare>, 
      avl_offset = 568, 
      avl_numnodes = 0, 
      avl_size = 944
    }, 
    vq_read_tree = {
      avl_root = 0x0, 
      avl_compar = 0xffffff7f80f408b0 <vdev_queue_offset_compare>, 
      avl_offset = 544, 
      avl_numnodes = 0, 
      avl_size = 944
    }, 
    vq_write_tree = {
      avl_root = 0x0, 
      avl_compar = 0xffffff7f80f408b0 <vdev_queue_offset_compare>, 
      avl_offset = 544, 
      avl_numnodes = 0, 
      avl_size = 944
    }, 
    vq_pending_tree = {
      avl_root = 0x0, 
      avl_compar = 0xffffff7f80f408b0 <vdev_queue_offset_compare>, 
      avl_offset = 544, 
      avl_numnodes = 0, 
      avl_size = 944
    }, 
    vq_io_list = {
      list_size = 131088, 
      list_offset = 131072, 
      list_head = {
        list_next = 0xffffff8036664000, 
        list_prev = 0xffffff803678d000
      }
    }, 
    vq_lock = {
      m_owner = 0x0, 
      initialized = -559038737, 
      m_lock = {{
          opaque = {0, 18446744069414584320, 16045690984833335023}
        }}
    }
  }, 
  vdev_cache = {
    vc_offset_tree = {
      avl_root = 0x0, 
      avl_compar = 0xffffff7f80f36da0 <vdev_cache_offset_compare>, 
      avl_offset = 24, 
      avl_numnodes = 0, 
      avl_size = 88
    }, 
    vc_lastused_tree = {
      avl_root = 0x0, 
      avl_compar = 0xffffff7f80f36e20 <vdev_cache_lastused_compare>, 
      avl_offset = 48, 
      avl_numnodes = 0, 
      avl_size = 88
    }, 
    vc_lock = {
      m_owner = 0x0, 
      initialized = -559038737, 
      m_lock = {{
          opaque = {0, 18446744069414584320, 16045690984833335023}
        }}
    }
  }, 
  vdev_aux = 0xdeadbeefdeadbeef, 
  vdev_probe_zio = 0xdeadbeefdeadbeef, 
  vdev_label_aux = 3735928559, 
  vdev_dtl_lock = {
    m_owner = 0x0, 
    initialized = -559038737, 
    m_lock = {{
        opaque = {0, 18446744069414584320, 16045690984833335023}
      }}
  }, 
  vdev_stat_lock = {
    m_owner = 0x0, 
    initialized = -559038737, 
    m_lock = {{
        opaque = {0, 18446744069414584320, 16045690984833335023}
      }}
  }, 
  vdev_probe_lock = {
    m_owner = 0x0, 
    initialized = -559038737, 
    m_lock = {{
        opaque = {0, 18446744069414584320, 16045690984833335023}
      }}
  }
}

(gdb) p cvd->vdev_top 
$12 = (vdev_t *) 0x11000000000000

(gdb) p pvd->vdev_top
$13 = (vdev_t *) 0x11000000000000

parent top should probably be NULL. Instead is an invalid pointer.

Files appear as Directories in new SA code

We are not setting vnode's vtype correctly somewhere in the new SA code, or possibly the z_mode, or vap_type.

-rwxr-xr-x  1 lundman  staff  490131 Apr 16  2013 configure
# cp configure /BOOM/
# ls -la /BOOM
drwxr-xr-x  30610 root  wheel   512 Apr  8 16:53 configure

ZVOLs do not show in diskutil or Disk Utility arbitration

Currently if you create ZVOLs, the kernel/BSD/Unix side of things work great, but the Apple OS X sections does not recognise (or possible is not told about) the volume.

# ./zpool.sh create -f BOOM ~/pool-image.bin 
# ./cmd.sh zfs create -V 50M -o volblocksize=4096 BOOM/vol
# ls -l /dev/*disk*
brw-r-----  1 root      operator    1,   3 Apr  9 01:48 disk0s2
brw-r-----  1 root      operator    1,   2 Apr  9 01:48 disk0s1
brw-r-----  1 root      operator    1,   0 Apr  9 01:48 disk0
crw-r-----  1 root      operator    1,   0 Apr  9 01:48 rdisk0
crw-------  1 root      operator    33,   1 May 27 08:53 rdisk_BOOM_vol
brw-------  1 root      operator    3,   1 May 27 08:53 disk_BOOM_vol
# newfs_msdos  /dev/rdisk_BOOM_vol
/dev/rdisk_BOOM_vol: 12781 sectors in 12781 FAT16 clusters (4096 bytes/cluster)
bps=4096 spc=1 res=1 nft=2 rde=512 sec=12800 mid=0xf0 spf=7 spt=32 hds=16 hid=0 drv=0x00
# mkdir /Volumes/pc
# mount_msdos /dev/disk_BOOM_vol /Volumes/pc
# df -h
/dev/disk_BOOM_vol   102248    560  101688     1%     512   0  100%   /Volumes/pc
# mkdir /Volumes/pc/HELLO
# ls -l /Volumes/pc
drwxrwxrwx  1 _unknown  _unknown  4096 May 27 08:53 HELLO
# diskutil list
/dev/disk0
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      GUID_partition_scheme                        *21.5 GB    disk0
   1:                        EFI                         209.7 MB   disk0s1
   2:                  Apple_HFS Dev                     21.1 GB    disk0s2
# diskutil list /dev/rdisk_BOOM_vol
Could not find whole disk for disk: /dev/rdisk_BOOM_vol

Naturally, I have attempted to name the device nodes 'disk3' and 'rdisk3' (the next one along) and it does not make any difference. Possibly we need to contact IOkit about the new device/disk.

I hope that the major numbers of the c and b devices differing to not be relevant.

I can successfully newfs to msdos, udf and hfs. Although hfs fails to mount. I can also use gpt to write a partition table, although we do not create diskXsY nodes as of yet.

doo (document app) does not index OpenZFS on OS X volumes/datasets

https://doo.net/en/

Note that doo does actually not support non HFS+ formatted volumes for indexing:
https://support.doo.net/entries/24319352-How-to-index-documents-on-external-storage-devices

However, it worked just fine with ZEVO, so may be OpenZFS on OS X can be taught to work with it as well.

Problem: As soon as one tries to add a storage location for doo to index that is a folder on a MacZFS dataset, doo will refuse to work with the folder. Console entry:
Error Domain=net.doo.DKDocumentLocationManager Code=6 "The volume you are trying to connect this folder from is currently not supported by doo." UserInfo=0x7f95e1e810e0 {NSLocalizedDescription=The volume you are trying to connect this folder from is currently not supported by doo.}

Discuss: accessing past end of object panic

I did a temporary fix for the panic with commit 698317b

Which relies on making sure that we only grow buffers by power of 2, as the code in dmu_buf_hold_array_by_dnode() splits based on this criteria. This fix will probably be removed in deference to fixing the non-power-2 z_blksize (dn->dn_datablksz) path.

But at the same time it is interesting to note that non-power-of-2 buffersizes do not panic on ZOL. With that in mind, I investigated was actually goes wrong for us.

It trickles down to this call;

#ifdef __APPLE__
            update_pages(vp, tx_bytes, uio, tx);
#else
            update_pages(vp, woff, tx_bytes, zfsvfs->z_os, zp->z_id, uio->uio_segflg, tx);
#endif

Which in this case is only called for mmapped writes, which is what I am currently tracking.

https://github.com/zfs-osx/zfs/blob/master/module/zfs/zfs_vnops.c#L363

Inside update_pages() we get the offset woff by calling uio_offset(uid). It is interesting to note that they differ.

I added the output

            printf("Updatepage call %llu vs %llu\n", woff, uio_offset(uio));

which yields

Jul  3 10:30:44 Lundmans-Mac kernel[0]: Updatepage call 102056 vs 125096

right before panic. This is in fact, the reason we go "outside" the memory buffer, and induce a panic.

What would happen if we adjust this to match that of ZOL;

            uio_setoffset(uio, woff);
            update_pages(vp, tx_bytes, uio, tx);

Jul  3 10:28:47 Lundmans-Mac kernel[0]: Updatepage call 102056 vs 125096
Jul  3 10:28:47 Lundmans-Mac kernel[0]: update_pages 102056 - 23040 (adjusted 98304 - 28672)
Jul  3 10:28:47 Lundmans-Mac kernel[0]: accessing size=125440 access=102056+344

No more panics, and in fact, follows the output given in ZOL with the same debug prints. Possibly the mmap function has always been incorrect.

But fsx now fail with a new error

Size error: expected 0x1e8a8 stat 0x1c23c seek 0x1c23c

The filesize is now incorrect, it would seem we need to update the uio struct as well.

Ie,

#ifdef __APPLE__
            printf("Updatepage call %llu vs %llu\n", woff, uio_offset(uio));
            uio_setoffset(uio, woff);
            update_pages(vp, tx_bytes, uio, tx);
            uio_setoffset(uio, woff+tx_bytes);
            printf("New location %llu\n", uio_offset(uio));
#else
            update_pages(vp, woff, tx_bytes, zfsvfs->z_os, zp->z_id, uio->uio_segflg, tx);
#endif

which results in;

Jul  3 10:30:44 Lundmans-Mac kernel[0]: Updatepage call 102056 vs 125096
Jul  3 10:30:44 Lundmans-Mac kernel[0]: update_pages 102056 - 23040 (adjusted 98304 - 28672)
Jul  3 10:30:44 Lundmans-Mac kernel[0]: accessing size=125440 access=102056+344
Jul  3 10:30:44 Lundmans-Mac kernel[0]: accessing size=125440 access=102056+4096
Jul  3 10:30:44 --- last message repeated 4 times ---
Jul  3 10:30:44 Lundmans-Mac kernel[0]: accessing size=125440 access=102056+2216
Jul  3 10:30:44 Lundmans-Mac kernel[0]: New location 125096

No more incorrect file sizes either. fsx carries on much further until

data miscompare @ 70044
OFFSET     GOOD       BAD        LENGTH     BADOP#   Last: WRITE    TRUNC-   TRUNC+  
0x0001119c 0x0000001c 0x0000000c 0x00000754 12             28       27       -1

which brings us back to the regular mmap inconsistencies.

I would think that it is more correct to use the woff offset than uio_offset() but I am wondering if the update of tx_bytes should be the PAGE_SIZE adjusted size. Although I suppose then that fsx would complain about the wrong size.

It would possibly be cleaner to put the uio_offset adjustments inside update_pages(), especially since we don't know if it will use the uio_move() path or not. (Which will also adjust the offset).

The update_pages function does not differ much between ZOL and OS-X.

Does anyone have any opinions on this?

Support shareafp property

There is no reason why we can't handle shareafp property in ZFS.

SA_LOOKUP called with NULL bulk[i].sa_addr

When importing ZEVO volumes (in particular), if we attempt to chown a file we will receive an error (even though technically the chown worked - but due to failure is not committed, and we end up in an inconsistent state).

Note that setattr receiving an error, leaving dmu_tx confused. Possibly we don't clean up properly after failures. This should also be looked at.

However, that we call SA_LOOKUP with a NULL sa_addr feels incorrect.

zfs_vnops.c:3618

        err = zfs_acl_chown_setattr(zp);
        ASSERT(err == 0);
        if (attrzp) {
            err = zfs_acl_chown_setattr(attrzp);
            ASSERT(err == 0);
        }

        /*
         * When importing ZEVO volumes, and 'chown' is used, we end up calling
         * SA_LOOKUP with 'sa_addr' == NULL. Unsure why this happens, for
         * now, we shall stick a plaster over this open-fracture
         */
        if (err == 2) {
            printf("setattr: triggered SA_LOOKUP == NULL problem\n");
            err = 0;
        }

Has been added as a temporary fix. We ignore the error=2 from SA_LOOKUP/sa_addr=NULL, and chown works as expected. Even when using ZEVO pools.

The error is returned from;

sa.c:392

        switch (data_op) {
        case SA_LOOKUP:
            if (bulk[i].sa_addr == NULL)
                return (ENOENT);

Writes will sometimes be shorter than expected.

Possibly during all the large IO issue hunting, the re-try writes was disabled. Currently some writes will be shorter than expected;

# dd if=/dev/zero of=/BOOM/big bs=131072 count=512
-rw-r--r--  30820 root  wheel  65929216 Apr  8 16:57 big

expected size 67108864.

Date/Time stamps are not updated correctly.

Reading pools will display correct timestamps (atime, ctime, mtime) but on updating the timestamps, the code has bugs.

# touch -t 199901011234 /BOOM/file
 -rw-r--r--  30820 root  wheel  0 -7283474 16:57 file

fstorture + extend trigger "zfs: accessing past end of object"

 ./fstorture /BOOM/one /BOOM/two 1
Capabilities of /BOOM/one (on /BOOM): softlinks?Y hardlinks?Y ACLs?N
Capabilities of /BOOM/two (on /BOOM): softlinks?Y hardlinks?Y ACLs?N
Test started   [fstorture 2.1-pfh]: Mon Jun 17 09:44:03 JST 2013

panic(cpu 1 caller 0xffffff7f82bde579): "zfs: accessing past end of object 15/1d (size=28160 access=24136+21111)"@spl-err.c:48
Backtrace (CPU 1), Frame : Return Address
0xffffff80471cb4c0 : 0xffffff800141d626 mach_kernel : _panic + 0xc6
0xffffff80471cb530 : 0xffffff7f82bde579 net.lundman.spl : _vcmn_err + 0x59
0xffffff80471cb760 : 0xffffff7f82c8f4fd net.lundman.zfs : _zfs_panic_recover + 0x18d
0xffffff80471cb910 : 0xffffff7f82c01e23 net.lundman.zfs : _dmu_buf_hold_array_by_dnode + 0x173
0xffffff80471cb9d0 : 0xffffff7f82c03539 net.lundman.zfs : _dmu_write_uio_dnode + 0x79
0xffffff80471cba60 : 0xffffff7f82c0349c net.lundman.zfs : _dmu_write_uio_dbuf + 0x6c
0xffffff80471cbab0 : 0xffffff7f82cae4ce net.lundman.zfs : _zfs_write + 0xbae
0xffffff80471cbd30 : 0xffffff7f82cba368 net.lundman.zfs : _zfs_vnop_write + 0x58
0xffffff80471cbd70 : 0xffffff8001511c32 mach_kernel : _VNOP_WRITE + 0x52
0xffffff80471cbdd0 : 0xffffff8001507e99 mach_kernel : _utf8_normalizestr + 0x6e9
0xffffff80471cbe40 : 0xffffff8001776f69 mach_kernel : _write_nocancel + 0x1b9
0xffffff80471cbef0 : 0xffffff8001776e77 mach_kernel : _write_nocancel + 0xc7
0xffffff80471cbf50 : 0xffffff80017e16aa mach_kernel : _unix_syscall64 + 0x20a

Cleanest run that produces said panic appears to be:

vnop_create: '00-000-08Td.rtf'
getvnode zp 0xffffff800bb98200 with vpp 0xffffff80471cb3c0 zfsvfs 0xffffff8036f26000 vfs 0xffffff800738b3d0
Assigned zp 0xffffff800bb98200 with vp 0xffffff800baf93e0
+vnop_getattr zp 0xffffff800bb98200 vp 0xffffff800baf93e0
+vnop_getattr zp 0xffffff800bb98200 vp 0xffffff800baf93e0
+setattr: zp 0xffffff800bb98200, vp 0xffffff800baf93e0
zfs_extend: 27812
-setattr: zp 0xffffff800bb98200 size 27812
vnop_setattr: called on vp 0xffffff800baf93e0 with mask 0010, err=0

  #define VNODE_ATTR_va_data_size         (1LL<< 4)       /* 00000010 */

getxattr vp 0xffffff800baf93e0 : ENOTSUP
+vnop_lookup '._00-000-08Td.rtf'
+setattr: zp 0xffffff800bb98200, vp 0xffffff800baf93e0
-setattr: zp 0xffffff800bb98200 size 27812
vnop_setattr: called on vp 0xffffff800baf93e0 with mask 0200, err=0

   #define VNODE_ATTR_va_mode              (1LL<< 9)       /* 00000200 */

+vnop_getattr zp 0xffffff800bb98200 vp 0xffffff800baf93e0
zfs_write: resid/n 21111

Which appears to be to create file, extend it to 27812, then write 21111 bytes.
(size=28160 access=24136+21111). Unsure who/what seeks to 24136.

Mavericks loading kext - certificate warnings

osx.zfs appears to work well with 10.9, or at least as well as 10.8. However, one change is a kextd warning in the logs;

Requesting load of /tmp/spl.kext.
Jun 13 13:38:44  com.apple.kextd[12]: kext net.lundman.spl  
   100009000 is in exception list, allowing to load 
/tmp/spl.kext loaded successfully (or already loaded).

Requesting load of /tmp/zfs.kext.
Jun 13 13:38:45  com.apple.kextd[12]: WARNING - Invalid signature
 -67062 0xFFFFFFFFFFFEFA0A for kext <OSKext 0x7f88b2d57590 [0x7fff74cc7a10]>
  { URL = "file:///tmp/zfs.kext/", ID = "net.lundman.zfs" }
Jun 13 13:38:45  kernel[0]: ZFS: Loading module ...

Investigate what needs to be done for official kexts. And why spl is ok, and zfs is not is also curious.

wrapper: funky function call problem on import

The wrapper branch is working rather well, but import is currently broken. We get this strange call happening;

 error = VOP_GETATTR(vf->vf_vnode, &vattr, 0, kcred, NULL);
 (gdb) p vf->vf_vnode
 $5 = (struct vnode *) 0xffffff800994cd90
 (gdb) down
 #3  0xffffff7f80e8a231 in VOP_GETATTR (vp=0x1, vap=0xed2, flags=46,
          x3=0x30, x4=0x7) at spl-vnode.c:199

Note that in this case, I placed a call to panic at the very start of "VOP_GETATTR" so it isn't that it messed up the stack after running something. Replacing the call VOP_GETATTR with direct calls to vnode_getattr() and the same problem happens.

We are currently # 20 frames into the stack

zio_wait() hangs in multiple cases.

The current state of OSX port of ZoL is that is generally works, but has a nasty habit of hanging in many cases.

The easiest to trigger is importing a pool from vdev_file images.

For example:

 # ./zpool.sh import -d images/

During the spa_tryimport() phase, we call zap_lookup($MOS), which issues IO to read the MOS, and then calls zio_wait() for it to complete. This call never returns.

If zio_wait() is forced to return, nvlist parsing fails, which implies the data is not read correctly. This might need to be verified.

With extra printing, import trial looks like;

# ./zpool.sh import -d ~/image/
Apr  8 12:16:33 jind0806 kernel[0]: [zfs] got ioctl 4
Apr  8 12:16:33 jind0806 kernel[0]: [zfs] ioctl done 0
Apr  8 12:16:33 jind0806 kernel[0]: [zfs] got ioctl 5
Apr  8 12:16:33 jind0806 kernel[0]: [zfs] ioctl done 2
Apr  8 12:16:33 jind0806 kernel[0]: [zfs] got ioctl 6
Apr  8 12:16:33 jind0806 kernel[0]: +spa_tryimport
Apr  8 12:16:33 jind0806 kernel[0]: +spa_load
Apr  8 12:16:33 jind0806 kernel[0]: spa_load: guid_exists?
Apr  8 12:16:33 jind0806 kernel[0]: +spa_load_impl
Apr  8 12:16:33 jind0806 kernel[0]: spa_load_impl 1
Apr  8 12:16:33 jind0806 kernel[0]: spa_load_impl 2
Apr  8 12:16:33 jind0806 kernel[0]: spa_load_impl 3
Apr  8 12:16:33 jind0806 kernel[0]: spa_load_impl 4
Apr  8 12:16:33 jind0806 kernel[0]: vdev_file: reopen
Apr  8 12:16:33 jind0806 kernel[0]: vdev_file io err 0, size 8192
Apr  8 12:16:33 --- last message repeated 2 times ---
Apr  8 12:16:33 jind0806 kernel[0]: +zio_wait: 0
Apr  8 12:16:33 jind0806 kernel[0]: zio done, cleared 0xffffff8008cffaa0
Apr  8 12:16:33 jind0806 kernel[0]: -zio_wait
Apr  8 12:16:33 jind0806 kernel[0]: spa_load_impl 5
Apr  8 12:16:33 jind0806 kernel[0]: vdev_file io err 0, size 114688
Apr  8 12:16:33 jind0806 kernel[0]: +zio_wait: 0
Apr  8 12:16:33 jind0806 kernel[0]: zio done, cleared 0xffffff8008cffaa0
Apr  8 12:16:33 jind0806 kernel[0]: -zio_wait
Apr  8 12:16:33 jind0806 kernel[0]: vdev_file io err 0, size 114688
Apr  8 12:16:33 jind0806 kernel[0]: +zio_wait: 0
Apr  8 12:16:33 jind0806 kernel[0]: zio done, cleared 0xffffff8008cffaa0
Apr  8 12:16:33 jind0806 kernel[0]: -zio_wait
Apr  8 12:16:33 jind0806 kernel[0]: vdev_file io err 0, size 114688
Apr  8 12:16:33 jind0806 kernel[0]: +zio_wait: 0
Apr  8 12:16:33 jind0806 kernel[0]: zio done, cleared 0xffffff8008cffaa0
Apr  8 12:16:33 jind0806 kernel[0]: -zio_wait
Apr  8 12:16:33 jind0806 kernel[0]: vdev_file io err 0, size 114688
Apr  8 12:16:33 jind0806 kernel[0]: +zio_wait: 0
Apr  8 12:16:33 jind0806 kernel[0]: zio done, cleared 0xffffff8008cffaa0
Apr  8 12:16:33 jind0806 kernel[0]: -zio_wait
Apr  8 12:16:33 jind0806 kernel[0]: spa_load_impl 6
Apr  8 12:16:33 jind0806 kernel[0]: vdev_file io err 0, size 1024
Apr  8 12:16:33 --- last message repeated 3 times ---
Apr  8 12:16:33 jind0806 kernel[0]: vdev_file io err 0, size 35840
Apr  8 12:16:33 jind0806 kernel[0]: vdev_file io err 0, size 4096
Apr  8 12:16:33 jind0806 kernel[0]: vdev_file io err 0, size 18432
Apr  8 12:16:33 jind0806 kernel[0]: vdev_file io err 0, size 1024
Apr  8 12:16:33 --- last message repeated 3 times ---
Apr  8 12:16:33 jind0806 kernel[0]: vdev_file io err 0, size 64512
Apr  8 12:16:33 jind0806 kernel[0]: +zio_wait: 0
Apr  8 12:16:33 jind0806 kernel[0]: vdev_file io err 0, size 54272
Apr  8 12:16:33 jind0806 kernel[0]: vdev_file io err 0, size 76800
Apr  8 12:16:33 jind0806 kernel[0]: vdev_file io err 0, size 131072
Apr  8 12:16:33 --- last message repeated 1 time ---
Apr  8 12:16:33 jind0806 kernel[0]: zio done, cleared 0xffffff80098a9550
Apr  8 12:16:33 jind0806 kernel[0]: -zio_wait
Apr  8 12:16:33 jind0806 kernel[0]: vdev_file io err 0, size 114688
Apr  8 12:16:33 jind0806 kernel[0]: +zio_wait: 0
Apr  8 12:16:33 jind0806 kernel[0]: zio done, cleared 0xffffff8008cffaa0
Apr  8 12:16:33 jind0806 kernel[0]: -zio_wait
Apr  8 12:16:33 jind0806 kernel[0]: vdev_file io err 0, size 114688
Apr  8 12:16:33 jind0806 kernel[0]: +zio_wait: 0
Apr  8 12:16:33 jind0806 kernel[0]: zio done, cleared 0xffffff8008cffaa0
Apr  8 12:16:33 jind0806 kernel[0]: -zio_wait
Apr  8 12:16:33 jind0806 kernel[0]: vdev_file io err 0, size 114688
Apr  8 12:16:33 jind0806 kernel[0]: +zio_wait: 0
Apr  8 12:16:33 jind0806 kernel[0]: zio done, cleared 0xffffff8008cffaa0
Apr  8 12:16:33 jind0806 kernel[0]: -zio_wait
Apr  8 12:16:33 jind0806 kernel[0]: vdev_file io err 0, size 114688
Apr  8 12:16:33 jind0806 kernel[0]: +zio_wait: 0
Apr  8 12:16:33 jind0806 kernel[0]: zio done, cleared 0xffffff8008cffaa0
Apr  8 12:16:33 jind0806 kernel[0]: -zio_wait
Apr  8 12:16:33 jind0806 kernel[0]: spa_load_impl 7
Apr  8 12:16:33 jind0806 kernel[0]: spa_load_impl 8
Apr  8 12:16:33 jind0806 kernel[0]: spa_load_impl 9
Apr  8 12:16:33 jind0806 kernel[0]: spa_load_impl 10
Apr  8 12:16:33 jind0806 kernel[0]: +zio_wait: 0
Apr  8 12:16:33 jind0806 kernel[0]: vdev_file io err 0, size 512
Apr  8 12:16:33 jind0806 kernel[0]: zio done, cleared 0xffffff80098a9550
Apr  8 12:16:33 jind0806 kernel[0]: -zio_wait
Apr  8 12:16:33 jind0806 kernel[0]: zap_lookup 1
Apr  8 12:16:33 jind0806 kernel[0]: vdev_file io err 0, size 2560
Apr  8 12:16:33 jind0806 kernel[0]: +zio_wait: 0
Apr  8 12:16:33 jind0806 kernel[0]: zio done, cleared 0xffffff80098a9550
Apr  8 12:16:33 jind0806 kernel[0]: -zio_wait
Apr  8 12:16:33 jind0806 kernel[0]: vdev_file io err 0, size 512
Apr  8 12:16:33 jind0806 kernel[0]: +zio_wait: 0
Apr  8 12:16:33 jind0806 kernel[0]: zio done, cleared 0xffffff80098a9550
Apr  8 12:16:33 jind0806 kernel[0]: -zio_wait
Apr  8 12:16:33 jind0806 kernel[0]: zap_lookup 2
Apr  8 12:16:33 jind0806 kernel[0]: zap_lookup 5
Apr  8 12:16:33 jind0806 kernel[0]: zap_lookup 6
Apr  8 12:16:33 jind0806 kernel[0]: spa_load_impl 11
Apr  8 12:16:33 jind0806 kernel[0]: spa_load_impl 12
Apr  8 12:16:33 jind0806 kernel[0]: +dsl_pool_open
Apr  8 12:16:33 jind0806 kernel[0]: dsl_pool_open 1
Apr  8 12:16:33 jind0806 kernel[0]: zap_lookup 1
Apr  8 12:16:33 jind0806 kernel[0]: zap_lookup 2
Apr  8 12:16:33 jind0806 kernel[0]: zap_lookup 5
Apr  8 12:16:33 jind0806 kernel[0]: zap_lookup 6
Apr  8 12:16:33 jind0806 kernel[0]: dsl_pool_open 2
Apr  8 12:16:33 jind0806 kernel[0]: +dsl_dir_open_obj
Apr  8 12:16:33 jind0806 kernel[0]: +zio_wait: 0
Apr  8 12:16:33 jind0806 kernel[0]: zio done, cleared 0xffffff8009991000
Apr  8 12:16:33 jind0806 kernel[0]: -zio_wait
Apr  8 12:16:33 jind0806 kernel[0]: dsl_dir_open_obj 1
Apr  8 12:16:33 jind0806 kernel[0]: dsl_dir_open_obj 2
Apr  8 12:16:33 jind0806 kernel[0]: dsl_dir_open_obj 3
Apr  8 12:16:33 jind0806 kernel[0]: dsl_dir_open_obj 5
Apr  8 12:16:33 jind0806 kernel[0]: dsl_dir_open_obj 6
Apr  8 12:16:33 jind0806 kernel[0]: +zio_wait: 0
Apr  8 12:16:33 jind0806 kernel[0]: zio done, cleared 0xffffff8009991000
Apr  8 12:16:33 jind0806 kernel[0]: -zio_wait
Apr  8 12:16:33 jind0806 kernel[0]: dsl_dir_open_obj 7
Apr  8 12:16:33 jind0806 kernel[0]: dsl_dir_open_obj 8
Apr  8 12:16:33 jind0806 kernel[0]: -dsl_dir_open_obj
Apr  8 12:16:33 jind0806 kernel[0]: dsl_pool_open 3
Apr  8 12:16:33 jind0806 kernel[0]: +dsl_pool_open_special_dir: '$MOS'
Apr  8 12:16:33 jind0806 kernel[0]: dp 0xffffff80080dac00
Apr  8 12:16:33 jind0806 kernel[0]: dp->meta 0xffffff800acd5800
Apr  8 12:16:33 jind0806 kernel[0]: dp->root 0xffffff80096a4e00
Apr  8 12:16:33 jind0806 kernel[0]: dp->root->phys 0xffffff80096a4c00
Apr  8 12:16:33 jind0806 kernel[0]: dp->root->phys->zap 0x4
Apr  8 12:16:33 jind0806 kernel[0]: zap_lookup 1
Apr  8 12:16:33 jind0806 kernel[0]: +zio_wait: 0

Note that quite a few ZIOs happen, and pass correctly. But at the end, we end up stuck looking up $MOS.

It is also interesting to note that (at least 1) zio keeps being re-executed, at fairly high frequency. This is presumably the issue.

symlinks are non-working

# cd /BOOM/
# mkdir Hello
# ln -s Hello World
# ls -l 
total 7
drwx------  65535 root  wheel  3 Apr  8 17:04 .fseventsd
drwxr-xr-x  65535 root  wheel  2 Apr  8 17:31 Hello

ls: ./World: Unknown error: -128
lrwxr-xr-x  32966 root  wheel  5 Apr  8 17:33 World

userland: avl is broken (was: libzfs export fails to umount all filesystems)

During zpool export, the userland libzfs iterates the mounted filesystems to issue zfs_umount. However the mnttab logic is empty.

libzfs_dataset.c:

libzfs_mnttab_add(libzfs_handle_t *hdl, const char *special,
    const char *mountp, const char *mntopts)
{   
    mnttab_node_t *mtn;

    printf("mnttab ADDING '%s'\n", mountp);

# ./zpool.sh create BOOM pool-image.bin
mnttab ADDING 'BOOM'

on export;

libzfs_mount.c:

    printf("disable datasets\n");

    namelen = strlen(zhp->zpool_name);

    used = alloc = 0;
    for (mntn = libzfs_mnttab_first(hdl); mntn != NULL;
         mntn = libzfs_mnttab_next(hdl, mntn)) {
        struct mnttab *mt = &mntn->mtn_mt;

        printf("disable_datasets: '%s'\n",mt->mnt_mountp);

#./zpool.sh export BOOM
disable datasets

Alas, even though we add BOOM to mnttab, the call libzfs_mnttab_first() returns NULL.

Deep mkdir fails

What I have called "the rsync" problem appears to be related to directory lookup.

I am unsure if it is two separate problems.

One output is

# rsync -arv include/ /BOOM/
rsync: mkstemp "/BOOM/sys/fm/fs/.Makefile.XxV1RO" failed: No such file or directory (2)

Interestingly, rsync works fine with "--inplace".

This seems to stem from this issue:

# stat /BOOM/sys/fm
stat: /BOOM/sys/fm: stat: No such file or directory

# cd /BOOM/sys/
# stat fs
771751939 14 drwxr-xr-x 2 lundman staff 0 6 "Apr  8 17:15:45 2013" "

Full path stat fails, but cd there and use relative path and it finds it.

Second issue is easy to trigger;

# mkdir -p /BOOM/testA/testB/testC
mkdir: /BOOM/testA/testB/testC/: No such file or directory

bash-3.2# May 24 15:24:30 lundmans-Mac-Pro kernel[0]: +vnop_lookup 'testA/testB/testC/'
May 24 15:24:30 lundmans-Mac-Pro kernel[0]: -vnop_lookup 2
May 24 15:24:30 lundmans-Mac-Pro kernel[0]: +vnop_lookup 'testA/testB/testC'
May 24 15:24:30 lundmans-Mac-Pro kernel[0]: -vnop_lookup 2
May 24 15:24:30 lundmans-Mac-Pro kernel[0]: +vnop_lookup 'testA/testB'
May 24 15:24:30 lundmans-Mac-Pro kernel[0]: -vnop_lookup 2
May 24 15:24:30 lundmans-Mac-Pro kernel[0]: +vnop_lookup 'testA'
May 24 15:24:30 lundmans-Mac-Pro kernel[0]: -vnop_lookup -2
May 24 15:24:30 lundmans-Mac-Pro kernel[0]: vnop_mkdir 'testA'
May 24 15:24:30 lundmans-Mac-Pro kernel[0]: +acl_ids_create
May 24 15:24:30 lundmans-Mac-Pro kernel[0]: getvnode zp 0xffffff800961ca00 with vpp 0xffffff8040ed34e0 zfsvfs 0xffffff80362b7000 vfs 0xffffff8006fe89e8
May 24 15:24:30 lundmans-Mac-Pro kernel[0]: Assigned zp 0xffffff800961ca00 with vp 0xffffff800d3af648
May 24 15:24:30 lundmans-Mac-Pro kernel[0]: mkdir checking cache vp 0
May 24 15:24:30 lundmans-Mac-Pro kernel[0]: +vnop_lookup 'testA'
May 24 15:24:30 lundmans-Mac-Pro kernel[0]: +zget 133
May 24 15:24:30 lundmans-Mac-Pro kernel[0]: -vnop_lookup 0
May 24 15:24:30 lundmans-Mac-Pro kernel[0]: +vnop_lookup 'testB'
May 24 15:24:30 lundmans-Mac-Pro kernel[0]: -vnop_lookup -2
May 24 15:24:30 lundmans-Mac-Pro kernel[0]: vnop_mkdir 'testB'
May 24 15:24:30 lundmans-Mac-Pro kernel[0]: +acl_ids_create
May 24 15:24:30 lundmans-Mac-Pro kernel[0]: getvnode zp 0xffffff8009625a00 with vpp 0xffffff8040ed34e0 zfsvfs 0xffffff80362b7000 vfs 0xffffff8006fe89e8
May 24 15:24:30 lundmans-Mac-Pro kernel[0]: Assigned zp 0xffffff8009625a00 with vp 0xffffff800d3af550
May 24 15:24:30 lundmans-Mac-Pro kernel[0]: mkdir checking cache vp 0
May 24 15:24:30 lundmans-Mac-Pro kernel[0]: +vnop_lookup 'testB/testC'
May 24 15:24:30 lundmans-Mac-Pro kernel[0]: -vnop_lookup 2
May 24 15:24:30 lundmans-Mac-Pro kernel[0]: +zget 4
May 24 15:24:30 lundmans-Mac-Pro kernel[0]: +vnop_lookup 'testA'
May 24 15:24:30 lundmans-Mac-Pro kernel[0]: +zget 133
May 24 15:24:30 lundmans-Mac-Pro kernel[0]: -vnop_lookup 0
May 24 15:24:30 lundmans-Mac-Pro kernel[0]: +zget 4
May 24 15:24:30 --- last message repeated 4 times ---
May 24 15:24:30 lundmans-Mac-Pro kernel[0]: vfs_getattr
May 24 15:24:30 lundmans-Mac-Pro kernel[0]: +vnop_lookup 'testB'
May 24 15:24:30 lundmans-Mac-Pro kernel[0]: +zget 134
May 24 15:24:30 lundmans-Mac-Pro kernel[0]: -vnop_lookup 0

Where it creates testA and testA/testB just fine, but then fails to create testA/testB/testC

Add xattr support

Currently setxattr and getxattr return ENOTSUP. Pull in needed files from fbsd/solaris/maczfs to add support.

Missing snapshot/.zfs mounts

The entire ZFSCTL section of the sources are commented out, the work required is almost a complete copy of zfs_vnops.c / zfs_vnops_osx.c, except read-only (so only half the vnops required).

fairly straight forward, if mind numbing, porting work.

Reclaim design: mutex_enter: locking against myself

With heavy testing today, we do have one issue to address.

We have the situation like this:

_zfs_mknode
  ZFS_OBJ_HOLD_ENTER(zfsvfs, obj);
  _zfs_znode_alloc
    _zfs_znode_getvnode
      _vnode_create
        _zfs_vnop_reclaim
          _zfs_zinactive
            ZFS_OBJ_HOLD_ENTER(zfsvfs, z_id);

So basically, we hold zfsvfs, then call vnode_create which decides now is the time for a reclaim, so we end up in zfs_zinactive and try to hold zfsvfs.

The FreeBSD approach is to call getnewvnode_reserve(1); before it holds zfsvfs. This means the vnode is already allocated (and any reclaim has already happened) before we hold zfsvfs. This will not work on OSX, as you need to have the vtype when calling vnode_create and there is no way to change it afterward.

The ZEVO approach is a reclaim_thread. Ie, release the vnode immediately, but keep the zp around until a better time to release it. (Once zfsvfs is no longer held).

I tried to simply call vnode_create and vnode_recycle in our getnewvnode_reserve but it appears not to work. I think the recycled node isn't reclaimed immediately, so vnode_create calls reclaim anyway.

ZEVO method would work, but I feel it is a little over-engineered. We only call vnode_create (or rather zfs_znode_alloc) in one place. Any reclaims that happen inside that section could be placed on a list, then call zfs_zinactive for all zp nodes, after we release zfsvfs.

There is a third option maybe, the comment in zinactive seems to imply;

    /*                                                                          
     * Don't allow a zfs_zget() while were trying to release this znode         
     */
    ZFS_OBJ_HOLD_ENTER(zfsvfs, z_id);

and if that is all it needs to protect against, we can simply call if (ZFS_OBJ_HELD(zfsvfs, z_id)) to check if we are already holding it (held in zfs_znode_alloc) and skip the zfsvfs mutex calls as it is already being held.
This is currently being tested by my contiguous bonnie runs.

Any comments?

recordsize-irregular write()s causes panic

Default recordsize of 131072
dd using bs=10000
panic:

#6  0xffffff7f80eacaa8 in dmu_buf_hold_array_by_dnode (dn=0xffffff8007ebac00, offset=100000, length=31072, read=0, tag=0xffffff7f80fa1f68, numbufsp=0xffffff80469dba7c, dbpp=0xffffff80469dba80, flags=0) at dmu.c:395
395                             zfs_panic_recover("zfs: accessing past end of object 15/9 (size=100352, access 100000+31072)

Which appears to be from

        if (offset + length > dn->dn_datablksz) {
            zfs_panic_recover("zfs: accessing past end of object "
                "%llx/%llx (size=%u access=%llu+%llu)",

because for some reason, dn->dn_datablksz is 100352, when one would expect 131072.

Linking of kext fails on snow Leopard / MacPorts 2.1.3 / gcc 4.21. with superfluous linking of libz from MacPorts

Link fails with this error:

/bin/sh ../../libtool --tag=CC --silent --mode=link gcc -Wall -Wstrict-prototypes -fno-strict-aliasing -D_GNU_SOURCE -D__EXTENSIONS__ -D_REENTRANT -D_POSIX_PTHREAD_SEMANTICS -D_FILE_OFFSET_BITS=64 -D_LARGEFILE64_SOURCE -DTEXT_DOMAIN="zfs-user" -g -Xlinker -kext -nostdlib -lkmodc++ -lkmod -lcc_kext -o zfs zfs-arc.o zfs-bplist.o zfs-bpobj.o zfs-bptree.o zfs-dbuf.o zfs-ddt.o zfs-ddt_zap.o zfs-dmu.o zfs-dmu_diff.o zfs-dmu_object.o zfs-dmu_objset.o zfs-dmu_send.o zfs-dmu_traverse.o zfs-dmu_tx.o zfs-dmu_zfetch.o zfs-dnode.o zfs-dnode_sync.o zfs-dsl_dataset.o zfs-dsl_deadlist.o zfs-dsl_deleg.o zfs-dsl_dir.o zfs-dsl_pool.o zfs-dsl_prop.o zfs-dsl_scan.o zfs-dsl_synctask.o zfs-fm.o zfs-gzip.o zfs-lzjb.o zfs-metaslab.o zfs-refcount.o zfs-rrwlock.o zfs-sa.o zfs-sha256.o zfs-spa.o zfs-spa_boot.o zfs-spa_config.o zfs-spa_errlog.o zfs-spa_history.o zfs-spa_misc.o zfs-space_map.o zfs-txg.o zfs-uberblock.o zfs-unique.o zfs-vdev.o zfs-vdev_cache.o zfs-vdev_disk.o zfs-vdev_file.o zfs-vdev_label.o zfs-vdev_mirror.o zfs-vdev_missing.o zfs-vdev_queue.o zfs-vdev_raidz.o zfs-vdev_root.o zfs-zap.o zfs-zap_leaf.o zfs-zap_micro.o zfs-zfeature.o zfs-zfeature_common.o zfs-zfs_acl.o zfs-zfs_byteswap.o zfs-zfs_debug.o zfs-zfs_dir.o zfs-zfs_fm.o zfs-zfs_fuid.o zfs-zfs_ioctl.o zfs-zfs_log.o zfs-zfs_onexit.o zfs-zfs_osx.o zfs-zfs_rlock.o zfs-zfs_vfsops.o zfs-zfs_vnops.o zfs-zfs_znode.o zfs-zil.o zfs-zio.o zfs-zio_checksum.o zfs-zio_compress.o zfs-zio_inject.o zfs-zle.o zfs-zrlock.o zfs-zvol.o zfs-avl.o zfs-fnvpair.o zfs-nvpair.o zfs-nvpair_alloc_fixed.o zfs-nvpair_alloc_spl.o zfs-u8_textprep.o zfs-uconv.o zfs-zfs_comutil.o zfs-zfs_deleg.o zfs-zfs_fletcher.o zfs-zfs_namecheck.o zfs-zfs_prop.o zfs-zpool_prop.o zfs-zprop_common.o -lz -lz -lz
ld: warning: unexpected dylib (/usr/lib/libz.dylib) on link line
ld: can't find ordinal for imported symbol _deflate from /usr/lib/libz.dylib
collect2: ld returned 1 exit status

As seen, it added three time "-lz" executing the line manually without any reference to "-lz", then the link succedes. No idea why and where it pulls in libz.

support ACLs

This request for enhancement follows discussion in IRC for MacZFS, and may be a point of reference from http://bit.ly/osxzfs

Thanks

Foreign pool import is peculiar

Creating a pool on Solaris (version=28, VERSION=5) we can not quite import it;

bash-3.2# ./zpool.sh import -d ~/image/
   pool: FROMSOLARIS
     id: 17449499049191331784
  state: ONLINE
 status: The pool is formatted using a legacy on-disk version.
 action: The pool can be imported using its name or numeric identifier, though
        some features will not be available without an explicit 'zpool upgrade'.
 config:

        FROMSOLARIS                            ONLINE
          /Users/lundman/image/pool-image.bin  ONLINE
bash-3.2# ./zpool.sh import -d ~/image/ FROMSOLARIS
zfs_mount: unused options: "defaults,atime,dev,exec,rw,suid,xattr,nomand"
cannot mount 'FROMSOLARIS': Unknown error: -1

bash-3.2# ./cmd.sh zfs mount FROMSOLARIS

zfs_mount: unused options: "defaults,atime,dev,exec,rw,suid,xattr,nomand"

cannot mount 'FROMSOLARIS': Unknown error: -1
Apr  8 16:52:17 jind0806 kernel[0]: zfs_vfs_mount: error 22

What is quite interesting, if you do the opposite, create a pool from OSX, and mount on Solaris, the same damned thing happens;

# zpool import -d /export/home/lundman FROMOSX
/FROMOSX mount; invalid argument
FROMOSX pool imported but some filesystems failed to mount.

Interestingly, both platforms can use "zfs create" a NEW filesystem, which DOES mount, but only on the native platform.

Occasional unmount failures

ZFS can at time fail to unmount the file system, usually after a little usage.

It gets stuck in this part of dnode.c

dnode_special_close(dnode_handle_t *dnh)
{
    /*
     * Wait for final references to the dnode to clear.  This can
     * only happen if the arc is asyncronously evicting state that
     * has a hold on this dnode while we are trying to evict this
     * dnode.
     */
    while (refcount_count(&dn->dn_holds) > 0) {
        delay(hz);
        if (count++ > 9) {
            printf("dnode: ARC release bug triggered: %p (%lld)-- sorry\n", dn,
                   refcount_count(&dn->dn_holds));

For some reason the refcount is always 9.

ZFS pool on Core Storage, created with a 20130712 build of zfs-osx, can not be used by ZEVO

With a 20130712 build of zfs-osx

Preparation

Destruction of a previous test pool

GPES3E-gjp4-1:bin bbsadmin-l$ sudo ./zfs unmount maczfsprototype
unmountall
zfs_unmount
located '/Users/maczfsprototype'
Unmount successful for /Users/maczfsprototype
GPES3E-gjp4-1:bin bbsadmin-l$ sudo ./zpool destroy maczfsprototype
GPES3E-gjp4-1:bin bbsadmin-l$ clear

Creation of a version 28 pool

GPES3E-gjp4-1:bin bbsadmin-l$ diskutil list
/dev/disk0
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      GUID_partition_scheme                        *750.2 GB   disk0
   1:                        EFI EFI                     209.7 MB   disk0s1
   2:                  Apple_HFS swap                    32.0 GB    disk0s2
   3:                  Apple_HFS disk0s3                 536.9 MB   disk0s3
   4:                  Apple_HFS spare                   671.1 MB   disk0s4
   5:          Apple_CoreStorage                         99.5 GB    disk0s5
   6:                 Apple_Boot Boot OS X               650.0 MB   disk0s6
   7:          Apple_CoreStorage                         616.3 GB   disk0s7
   8:                 Apple_Boot Boot OS X               134.2 MB   disk0s8
/dev/disk1
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:                  Apple_HFS OS                     *99.2 GB    disk1
/dev/disk3
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      GUID_partition_scheme                        *100.0 GB   disk3
   1:                        EFI EFI                     209.7 MB   disk3s1
   2:                  Apple_HFS OS 100                  99.2 GB    disk3s2
   3:                 Apple_Boot Recovery HD             650.0 MB   disk3s3
/dev/disk4
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      GUID_partition_scheme                        *8.0 GB     disk4
   1:                        EFI EFI                     209.7 MB   disk4s1
   2:                        ZFS                         7.7 GB     disk4s2
/dev/disk5
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      GUID_partition_scheme                        *7.7 GB     disk5
   1:                        EFI EFI                     209.7 MB   disk5s1
   2:          Apple_CoreStorage                         7.4 GB     disk5s2
   3:                 Apple_Boot Boot OS X               134.2 MB   disk5s3
/dev/disk6
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      GUID_partition_scheme                        *640.1 GB   disk6
   1:                        EFI EFI                     209.7 MB   disk6s1
   2:                        ZFS                         639.8 GB   disk6s2
/dev/disk8
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      GUID_partition_scheme                        *7.1 GB     disk8
   1:                        ZFS                         7.1 GB     disk8s1
   2: 6A945A3B-1DD2-11B2-99A6-080020736631               8.4 MB     disk8s9
/dev/disk9
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:     Apple_partition_scheme                        *37.8 MB    disk9
   1:        Apple_partition_map                         32.3 KB    disk9s1
   2:                  Apple_HFS osx.zfs-20130712        37.8 MB    disk9s2
GPES3E-gjp4-1:bin bbsadmin-l$ diskutil unmountDisk /dev/disk8
Unmount of all volumes on disk8 was successful
GPES3E-gjp4-1:bin bbsadmin-l$ sudo ./zpool create -o version=28 -O casesensitivity=insensitive -O normalization=formD -O compression=on -O snapdir=visible -O mountpoint=/Users/maczfsprototype maczfsprototype /dev/disk8
efi_write mate3
efi_write mate
checking path '/dev/disk8'
zfs_mount: unused options: "defaults,atime,dev,exec,rw,suid,xattr,nomand"
GPES3E-gjp4-1:bin bbsadmin-l$

A later review of properties, and an export

GPES3E-gjp4-1:~ bbsadmin-l$ sudo /Volumes/osx.zfs-20130712/load_zfs.sh 
Password:
ZFS loaded... Please add the bin to PATH, ie;
export PATH="$PATH:/Volumes/osx.zfs-20130712/64/bin"
GPES3E-gjp4-1:~ bbsadmin-l$ cd /Volumes/osx.zfs-20130712/64/bin 
GPES3E-gjp4-1:bin bbsadmin-l$ sudo ./zpool import
   pool: maczfsprototype
     id: 4802247225081302216
  state: ONLINE
 status: The pool is formatted using a legacy on-disk version.
 action: The pool can be imported using its name or numeric identifier, though
    some features will not be available without an explicit 'zpool upgrade'.
 config:

    maczfsprototype  ONLINE
      disk3s1   ONLINE
GPES3E-gjp4-1:bin bbsadmin-l$ sudo ./zpool import maczfsprototype
zfs_mount: unused options: "defaults,atime,dev,exec,rw,suid,xattr,nomand"
GPES3E-gjp4-1:bin bbsadmin-l$ sudo ./zpool get all maczfsprototype
NAME             PROPERTY               VALUE                  SOURCE
maczfsprototype  size                   6.56G                  -
maczfsprototype  capacity               0%                     -
maczfsprototype  altroot                -                      default
maczfsprototype  health                 ONLINE                 -
maczfsprototype  guid                   4802247225081302216    local
maczfsprototype  version                28                     local
maczfsprototype  bootfs                 -                      default
maczfsprototype  delegation             on                     default
maczfsprototype  autoreplace            off                    default
maczfsprototype  cachefile              -                      default
maczfsprototype  failmode               wait                   default
maczfsprototype  listsnapshots          off                    default
maczfsprototype  autoexpand             off                    default
maczfsprototype  dedupditto             0                      default
maczfsprototype  dedupratio             1.00x                  -
maczfsprototype  free                   6.56G                  -
maczfsprototype  allocated              280K                   -
maczfsprototype  readonly               off                    -
maczfsprototype  ashift                 0                      default
maczfsprototype  comment                -                      default
maczfsprototype  expandsize             16.0E                  -
maczfsprototype  freeing                0                      local
maczfsprototype  feature@async_destroy  disabled               local
maczfsprototype  feature@empty_bpobj    disabled               local
GPES3E-gjp4-1:bin bbsadmin-l$ sudo ./zfs get all maczfsprototype
NAME             PROPERTY              VALUE                   SOURCE
maczfsprototype  type                  filesystem              -
maczfsprototype  creation              Tue Jul 16  0:01 2013   -
maczfsprototype  used                  184K                    -
maczfsprototype  available             6.46G                   -
maczfsprototype  referenced            113K                    -
maczfsprototype  compressratio         7.39x                   -
maczfsprototype  quota                 none                    default
maczfsprototype  reservation           none                    default
maczfsprototype  recordsize            128K                    default
maczfsprototype  mountpoint            /Users/maczfsprototype  local
maczfsprototype  sharenfs              off                     default
maczfsprototype  checksum              on                      default
maczfsprototype  compression           on                      local
maczfsprototype  zoned                 off                     default
maczfsprototype  snapdir               visible                 local
maczfsprototype  aclinherit            restricted              default
maczfsprototype  canmount              on                      default
maczfsprototype  copies                1                       default
maczfsprototype  version               5                       -
maczfsprototype  utf8only              on                      -
maczfsprototype  normalization         formD                   -
maczfsprototype  casesensitivity       insensitive             -
maczfsprototype  vscan                 off                     default
maczfsprototype  sharesmb              off                     default
maczfsprototype  refquota              none                    default
maczfsprototype  refreservation        none                    default
maczfsprototype  primarycache          all                     default
maczfsprototype  secondarycache        all                     default
maczfsprototype  usedbysnapshots       0                       -
maczfsprototype  usedbydataset         113K                    -
maczfsprototype  usedbychildren        70.5K                   -
maczfsprototype  usedbyrefreservation  0                       -
maczfsprototype  logbias               latency                 default
maczfsprototype  dedup                 off                     default
maczfsprototype  mlslabel              none                    default
maczfsprototype  sync                  standard                default
maczfsprototype  refcompressratio      9.23x                   -
maczfsprototype  written               113K                    -
GPES3E-gjp4-1:bin bbsadmin-l$ sudo ./zpool export maczfsprototype
Unmount successful for /Users/maczfsprototype
GPES3E-gjp4-1:bin bbsadmin-l$ exit
logout

[Process completed]

With ZEVO Community Edition 1.1.1 on Mountain Lion

Unlocking the Core Storage logical volume:

gpes3e-gjp4:~ gjp22$ diskutil cs unlockVolume D0699260-3F5A-47A6-BFA4-B1CE79ADB4ED
Passphrase:
Started CoreStorage operation
Logical Volume successfully unlocked
Logical Volume successfully attached as disk6
Error: -69842: Couldn't mount disk
gpes3e-gjp4:~ gjp22$ diskutil list /dev/disk6
/dev/disk6
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      GUID_partition_scheme                        *7.1 GB     disk6
   1:                        ZFS                         7.1 GB     disk6s1
   2: 6A945A3B-1DD2-11B2-99A6-080020736631               8.4 MB     disk6s9

The issue

gpes3e-gjp4:~ gjp22$ sudo zpool import
Password:
  pool: maczfsprototype
    id: 4802247225081302216
 state: UNAVAIL
status: One or more devices contains corrupted data.
action: The pool cannot be imported due to damaged devices or data.
config:

    maczfsprototype                              UNAVAIL  insufficient replicas
      GPTE_AB543C80-915E-A44D-92BE-5CEFB3C45457  UNAVAIL  corrupted data
gpes3e-gjp4:~ gjp22$

Comparison

For a Core Storage logical volume with which there's no difficulty

gpes3e-gjp4:~ gjp22$ diskutil list /dev/disk3
/dev/disk3
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      GUID_partition_scheme                        *616.0 GB   disk3
   1:                        EFI                         209.7 MB   disk3s1
   2:                        ZFS                         615.7 GB   disk3s2
gpes3e-gjp4:~ gjp22$

libzfs_mount uses plain mount(2) call

At the moment we can not pass any mount arguments along. BSD has nmount, and Linux mount2. OsX appears to have neither. Possibly we can change to use __macmount() ?

openzfsonosx / zfs Goto Github PK

zfs's Introduction

zfs's People

Contributors

Stargazers

Watchers

Forkers

zfs's Issues

With a 2013-07-12 build of zfs-osx

Side note

With a 2013-07-12 build of zfs-osx

A 4 GB USB flash drive at disk5

Intention

Result

Result, with force

With a 20130712 build of zfs-osx

Preparation

Destruction of a previous test pool

Creation of a version 28 pool

A later review of properties, and an export

With ZEVO Community Edition 1.1.1 on Mountain Lion

Unlocking the Core Storage logical volume:

The issue

Comparison

For a Core Storage logical volume with which there's no difficulty

Recommend Projects

Recommend Topics

Recommend Org