Tuesday, October 2, 2012

ZFS file system on Raspberry Pi

FISH


I do a good bit of hardware integration with the web, with manufacturing equipment, with embedded systems and with big data set, or that can sustain multiple failures. Not necessarily all at once, but typically, people expect FISH from me :)

FISH is Fully Integrated Software and Hardware (btw, as a side note, the internal project at Sun to create appliances based on ZFS was known as FISHWorks). The Raspberry Pi is a cool piece of hardware, but I typically need stuff that is only (or mostly) found on Solaris and derived OSes, such as ZFS. I've been using ZFS for many years now, since the first public release on Solaris Nevada. ZFS scales and give you data integrity. And it can run on the largest systems known to man.

It scales


For example, I'm listening right now to ZFS Day's live video stream and hearing a talk about ZFS on the Sequoia supercomputer, which is the fastest supercomputer out there. They are using it as a native port, not using FUSE.

What is ZFS? 


Wikipedia: "ZFS is a combined file system and logical volume manager designed by Sun Microsystems. The features of ZFS include data integrity verification against data corruption modes, support for high storage capacities, integration of the concepts of filesystem and volume management, snapshots and copy-on-write clones, continuous integrity checking and automatic repair, RAID-Z and native NFSv4 ACLs. ZFS is implemented as open-source software, licensed under the Common Development and Distribution License (CDDL)."

From Supercomputers to $35 computers


So, ZFS scales at the highest level obviously. Well, it also scales down: I've been using a bit ZFS on the Raspberry Pi using FUSE, until I can get a Solaris derived OS ( such as illumos, smartos, openindiana, opensolaris etc) on the Raspberry Pi. That way, at least I have ZFS. Still missing zones, smf and dtrace, but it is a start.

Now just a reminder, the Pi only has 256MB total ram, and a BCM arm processor. So first thing first, we need to give as much ram to the OS as possible, and reduce the video buffer size:



I'm using a 240MB split on that Raspberry Pi since it is running only in text mode at the console, and I remote to it using ssh -X.


If you use the composite out you might want to use the 224MB split and definitely 192 or 128 using HDMI, but then at that point, you are chocking ZFS. That's 128 for OS and ZFS and whatever apps you are running...

Fully loaded


Altough Raspbian comes with a good amount of stuff preloaded, it was not intended to be used with FUSE out of the box, and ZFS was probably never on the radar screen of anybody. So let's start with adding the FUSE stuff and the libraries and tools we will need to build ZFS. This is the shortlist:


fdion@raspberrypi ~/zfs $ sudo apt-get install fuse-utils libfuse-dev libfuse2
fdion@raspberrypi ~/zfs $ sudo apt-get install libaio-dev libattr1-dev attr
fdion@raspberrypi ~/zfs $ sudo apt-get install git scons



If you build it...


So we have the prerequisites. Let's get the code, compile it and install the tools:


fdion@raspberrypi ~ $ mkdir zfs
fdion@raspberrypi ~ $ cd zfs
fdion@raspberrypi ~/zfs $ git clone https://bitbucket.org/cli/zfs-fuse-arm.git
fdion@raspberrypi ~/zfs $ cd zfs-fuse-arm/
fdion@raspberrypi ~/zfs/zfs-fuse-arm $ cd src
fdion@raspberrypi ~/zfs/zfs-fuse-arm/src $ scons
[a lot of stuff will scroll by]
fdion@raspberrypi ~/zfs/zfs-fuse-arm/src $ sudo scons install
[again, more stuff will scroll by]

Wow, it compiled (scons). And installed (sudo scons install). It's a good thing we are using the zfs-fuse-arm version, because the mainline wont go very far on the compile.

A demonstration, if you please? 


Well of course! Let's start the zfs-fuse daemon and create two virtual disks. I'm creating two 100M disks here using dd/ (this is on a slow SD card, rated 10MB/s). You could also use an actual /dev (like a pair of USB keys):


fdion@raspberrypi ~/zfs/zfs-fuse-arm/src/zfs-fuse $ sudo sh run.sh &

fdion@raspberrypi ~/zfs/zfs-fuse-arm/src/zfs-fuse $ cd
fdion@raspberrypi ~ $ cd zfs
fdion@raspberrypi ~/zfs $ mkdir test
fdion@raspberrypi ~/zfs $ cd test
fdion@raspberrypi ~/zfs/test $ dd if=/dev/zero of=fakedisk1 bs=1024k count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 10.2747 s, 10.2 MB/s
fdion@raspberrypi ~/zfs/test $ dd if=/dev/zero of=fakedisk2 bs=1024k count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 10.7517 s, 9.8 MB/s

Up to now we haven't done anything with ZFS per say. And basically to mirror two drives in ZFS and create a new storage out of that, all we have to do:


fdion@raspberrypi ~/zfs/test $ sudo zpool create mymirror mirror /home/fdion/zfs/test/fakedisk1 /home/fdion/zfs/test/fakedisk2


Now let's create a filesystem on that new zpool device, and mount it to a local folder in my home directory, change permissions so I can write to it and finally copy some files from /etc to my new filesystem:


fdion@raspberrypi ~/zfs/test $ cd
fdion@raspberrypi ~ $ mkdir myfilesystem
fdion@raspberrypi ~ $ sudo zfs create mymirror/myfilesystem -o mountpoint=/home/fdion/myfilesystem
fdion@raspberrypi ~ $ sudo chown fdion:pi myfilesystem/
fdion@raspberrypi ~/myfilesystem $ cp /etc/*.conf .
cp: cannot open `/etc/fuse.conf' for reading: Permission denied
fdion@raspberrypi ~/myfilesystem $ ls
adduser.conf          gssapi_mech.conf  libaudit.conf   pnm2ppa.conf
asound.conf           hdparm.conf       logrotate.conf  resolv.conf
ca-certificates.conf  host.conf         mke2fs.conf     rsyslog.conf
colord.conf           idmapd.conf       mtools.conf     sensors3.conf
debconf.conf          insserv.conf      nsswitch.conf   sysctl.conf
deluser.conf          ld.so.conf        ntp.conf        ts.conf
gai.conf              libao.conf        pam.conf        ucf.conf
fdion@raspberrypi ~/myfilesystem $ sudo zfs list
NAME                    USED  AVAIL  REFER  MOUNTPOINT
mymirror                191K  63.3M    22K  /mymirror
mymirror/myfilesystem  89.5K  63.3M  89.5K  /home/fdion/myfilesystem
fdion@raspberrypi ~/myfilesystem $ sudo zpool list
NAME       SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
mymirror  95.5M   196K  95.3M     0%  1.00x  ONLINE  -
fdion@raspberrypi ~/myfilesystem $ 




How cool is that? I now have a mirrored backup of my .conf files. Well, not quite. We are using fake disks, so if the SD card dies I loose all.

So next time we'll demo with actual USB drives.

15 comments:

  1. Mirroring works fine, and if a device fails, it is handled properly, however zpool status doesn't reflect the reality. I hadn't tested that part yet, so I'll have to dig in the code.

    No failure:
    pi@raspberrypi ~ $ sudo zpool status -v
    pool: mymirror
    state: ONLINE
    scrub: none requested
    config:

    NAME STATE READ WRITE CKSUM
    mymirror ONLINE 0 0 0
    mirror-0 ONLINE 0 0 0
    sda1 ONLINE 0 0 0
    sdb1 ONLINE 0 0 0

    errors: No known data errors

    I then pulled the second usb device:
    pi@raspberrypi ~ $ ls /dev/sd*
    /dev/sda /dev/sda1

    sdb and sdb1 are gone, but:
    pi@raspberrypi ~ $ sudo zpool status -v
    pool: mymirror
    state: ONLINE
    scrub: none requested
    config:

    NAME STATE READ WRITE CKSUM
    mymirror ONLINE 0 0 0
    mirror-0 ONLINE 0 0 0
    sda1 ONLINE 0 0 0
    sdb1 ONLINE 0 0 0

    errors: No known data errors
    pi@raspberrypi ~ $ sudo zfs list
    NAME USED AVAIL REFER MOUNTPOINT
    mymirror 55.4M 7.27G 55.3M /mymirror


    Before doing this I was accessing a file in a loop, it is still looping. So the zfs side of things is working, just not the notification to the status.

    ReplyDelete
  2. What's the chance that the file data was in the arc cache hence zfs didn't notice the disk vanish. Try a write, that'll force IO to the disk which should cause the failure notification.

    ReplyDelete
  3. Writing didn't do it, but something else did:

    fdion@raspberrypi ~ $ sudo zpool status
    pool: mymirror
    state: ONLINE
    scrub: none requested
    config:

    NAME STATE READ WRITE CKSUM
    mymirror ONLINE 0 0 0
    mirror-0 ONLINE 0 0 0
    disk/by-id/usb-SanDisk_Cruzer_Edge_20054054620F3CC11EC2-0:0-part1 ONLINE 0 0 0
    disk/by-id/usb-SanDisk_Cruzer_Edge_20052845410F3CC16219-0:0-part1 ONLINE 0 0 0

    errors: No known data errors

    I then pulled the plug on one device (the reason they are not sda1 and sdb1 is that I shutdown the Pi to bring it to CHS last night - and I had to do a zpool export mymirror and zpool import mymirror for it to mount again):


    fdion@raspberrypi ~ $ sudo zpool status
    pool: mymirror
    state: ONLINE
    scrub: none requested
    config:

    NAME STATE READ WRITE CKSUM
    mymirror ONLINE 0 0 0
    mirror-0 ONLINE 0 0 0
    disk/by-id/usb-SanDisk_Cruzer_Edge_20054054620F3CC11EC2-0:0-part1 ONLINE 0 0 0
    disk/by-id/usb-SanDisk_Cruzer_Edge_20052845410F3CC16219-0:0-part1 ONLINE 0 0 0

    errors: No known data errors


    Still showing online, although clearly it is not since I pulled it. Let's try some writes


    fdion@raspberrypi ~ $ sudo cp -r hardware_projects /mymirror/
    fdion@raspberrypi ~ $ sudo zpool status
    pool: mymirror
    state: ONLINE
    scrub: none requested
    config:

    NAME STATE READ WRITE CKSUM
    mymirror ONLINE 0 0 0
    mirror-0 ONLINE 0 0 0
    disk/by-id/usb-SanDisk_Cruzer_Edge_20054054620F3CC11EC2-0:0-part1 ONLINE 0 0 0
    disk/by-id/usb-SanDisk_Cruzer_Edge_20052845410F3CC16219-0:0-part1 ONLINE 0 0 0

    errors: No known data errors


    Alright, we got to get this to trigger. Let's do a scrub.

    fdion@raspberrypi ~ $ sudo zpool scrub mymirror
    fdion@raspberrypi ~ $ sudo zpool status
    pool: mymirror
    state: DEGRADED
    status: One or more devices could not be opened. Sufficient replicas exist for
    the pool to continue functioning in a degraded state.
    action: Attach the missing device and online it using 'zpool online'.
    see: http://www.sun.com/msg/ZFS-8000-2Q
    scrub: scrub completed after 0h0m with 0 errors on Thu Oct 4 10:45:16 2012
    config:

    NAME STATE READ WRITE CKSUM
    mymirror DEGRADED 0 0 0
    mirror-0 DEGRADED 0 0 0
    disk/by-id/usb-SanDisk_Cruzer_Edge_20054054620F3CC11EC2-0:0-part1 ONLINE 0 0 0
    disk/by-id/usb-SanDisk_Cruzer_Edge_20052845410F3CC16219-0:0-part1 UNAVAIL 0 76 0 cannot open

    errors: No known data errors



    So that works, but normally on native zfs that works right away without scrub. I dont want to have to schedule scrubs every 5 minutes :)

    I'm thinking there might be an unimplemented message. I'll have to look at the code when I have a few minutes.

    ReplyDelete
  4. It takes a while for zfs to give up on IO to devices - it can be 5 minutes or more before zfs offlines the device. If you generate some IO to the device then wait it should eventually notice the device is gone. Shortening device timeouts will help here.

    ReplyDelete
  5. I am far from expert in this field and I had some issues however I managed to resolve them - building from scratch I had some additional steps:

    install Raspbian
    http://www.raspbian.org/RaspbianInstaller

    install gcc:
    sudo apt-get install git gcc build-essential libsdl1.2-dev

    install openssl library and headers:
    sudo apt-get install libssl-dev

    then follow this post.

    Thanks for the great info!

    ReplyDelete
  6. Have you tried running freebsd on your pi? Freebsd has native zfs support so it may be easier than having to install fuse and zfs separately

    ReplyDelete
  7. Yes, I do have a freebsd Pi too. My long term goal though is to run IllumOS on the Pi.

    See: http://solarisdesktop.blogspot.com/2013/02/illumos-on-raspberrypi.html

    ReplyDelete
    Replies
    1. I was hoping someone had done this when i found this blog... def gonna check that out!

      Delete
  8. Hello. I have an external hard drive that is formatted in NTFS file system and I would like to have it in FAT32 so I can use it on my ps3 system. Thing is that the drive is half full with data and my main drive isn't big enough to copy, reformat and copy back. Is there any software that could change the file system without deleting the data from the drive? Thank you for your answer.


    phlebotomy training in nevada

    ReplyDelete
  9. When running scons, is this something to worry about ?
    "scons: warning: BuildDir() and the build_dir keyword have been deprecated;
    use VariantDir() and the variant_dir keyword instead.
    File "/root/zfs/zfs-fuse-arm/src/lib/libzpool/SConscript", line 4, in "

    ReplyDelete
  10. Also saw this;
    "lib/libzpool/vdev.c: In function 'vdev_open_children':
    lib/libzpool/vdev.c:1085:3: warning: comparison between pointer and integer [enabled by default]"

    ReplyDelete
  11. zfs-fuse/zfs_ioctl.c: In function 'zfs_ioc_set_prop':
    zfs-fuse/zfs_ioctl.c:2292:24: warning: comparison between pointer and integer [enabled by default]

    ReplyDelete
  12. Hi, I believe we have native ZFS support in linux now. Did you try it ?

    ReplyDelete
  13. Just seen this blog because I was google-ing Raspberry Pi and ZFS.

    I'm going to give native ZFS a try on a raspberry pi this weekend using Gentoo.
    My laptop already runs root on ZFS with Gentoo and the ability to jump back to snapshots instantly is a god send when an emerge goes bad.

    Just got my first R-PI today and cant wait to try it out, I will use my laptop to cross compile over distcc though because I think I could wait a long time for the R-PI to compile just the kernel let alone everything else.

    ReplyDelete