Building a simple RIAK ORM

I've been interested in RIAK for a while, and ORM's are nothing short of fascinating. I decided to try writing an ORM for Riak, and the results are here:

https://github.com/teverett/cbean

My ORM is not an ORM of course, because RIAK is not relational. However it is ORM-like; I can store POJO's and retrieve them.

The features I wanted for my ORM were those that I am accustomed to with Hibernate, or eBean.

  • Ability to store POJO's and retrieve them
  • Ability to store object trees of POJO's which contain POJO's
  • Support for Lists of POJO's composed inside a POJO
  • Lazy Loading

In the end, I ended up with a ORM-like layer that can store data into any Key-Value store. There is a plugin which supports RIAK, and there is an emulated Key-Value store built on a filesystem, which is useful for development purposes. Theoretically cBean could work on any Key-Value store, such as MongoDB, but I haven't built that support yet. Adding support for a Key-Value store is as simple as implementing the the interface KVService.

Supported Types

  • All java simple types (int, String, long, etc)
  • All java wrapper types (Integer, Long, etc)
  • Contained POJOs
  • java.util.List of POJOs
  • java.util.UUID
  • java.util.Date

Annotations

Similar to Hibernate or eBean, cBean POJO's must be annotated. I didn't chose to use the JPA annotations, but instead defined my own. They are here.  There numerous cBean annotated POJO's in the tests, here.

@Entity

Similar to JPA, the @Entity annotation simply marks the POJO as one which is of interest to cBean.

@Id

Every POJO must have an id field and it must be a String or UUID.  The @Id field is a little different in cBean than in JPA.  Inserting a new object with the same JPA @Id as one that already exists in the RDBMS is an error.  In cBean, you simply overwrite the existing object.

@Property

The @Property is used to indicate that a specific property (simple type, wrapper type, object type or list type) will be persisted.  POJO fields without the @Property annotation are ignored.  There are a number of properties which are valid on an @Property annotation.

  • cascadeSave
  • cascadeDelete
  • cascadeLoad
  • ignore
  • nullable

cascadeSave, cascadeDelete, and cascadeDelete are only relevant for POJO fields which are themselves POJOs, or lists of POJOs.  Setting "cascadeLoad=false", naturally, indicates that the field is lazy-loaded.

@Version

An POJO can define an Integer field and annotate it with @Version.  cBean will increment the annotated field by 1 each time the POJO is saved.

Referential Integrity

Frameworks like Hibernate or eBean provide referential integrity because RDBMS's provide referential integrity.  Key-Value stores, such as RIAK do not provide referential integrity, and therefore neither does cBean.  Therefore is it entirely possible to persist a POJO which contains a POJO, and have the contained POJO be deleted underneath the parent.  In a RDBMS this can be prevented with a foreign key; there is no such protection in cBean.  Therefore application code using cBean must be aware that POJOs in lists, for example, may not be resolvable when the list is reloaded.

The strategy that cBean uses for handling broken "foreign keys" is two-fold:

  • If a POJO contains a child POJO and the child is deleted, that Object will be set to null on reload
  • If a POJO contains a list of POJO's and one of the elements is deleted, the element Object will be set to null.  The List size() will remain the same as when it was persisted.

Example Code

There is a working example at https://github.com/teverett/cbean/tree/master/example.

Building QEMU

In general, I install QEMU on my Macbook using MacPorts.  However I recently had a need to get the tip of the QEMU development tree.

Getting the QEMU source tree is trivial:

git clone git://git.qemu-project.org/qemu.git

I needed an updated version of dtc:

git submodule update --init dtc

The build instructions from the README are:

 mkdir build
 cd build
 ../configure
 make

However, my case I only need ARM emulation, so:

../configure --target-list=arm-softmmu
make
make install

The binary qemu-system-arm will be at /usr.local/bin

oscar:build tom$ /usr/local/bin/qemu-system-arm --version
QEMU emulator version 2.4.94, Copyright (c) 2003-2008 Fabrice Bellard

 

 

MusicBrainzTagger

I've tried a couple different mp3 taggers to tag my mp3 library, however, most seem to have trouble with large mp3 libraries.  So, after doing some reading about AcoustID and MusicBrainz I decided to quickly code up my own tagger, MusicBrainzTagger.

MusicBrainzTagger is a command-line application which recurses a directory of mp3 files and tags each one, one by one.  This approach allows it to handle very large libraries; it only processes one file at a time.  File processing consists of reading any ID3 tags in the input mp3, and then calculating the Acoustic ID fingerprint.  The fingerprint is then resolved to a MusicBrainz ID which is used to look up the recording.

MusicBrainzTagger then tags the file, renames it, and moves it to a new directory.

Bare Metal coding on FreeBSD

I recently got interested in the technical details of how ARM OS's work, so I decided to try my hand at writing a simple one.  This blog post is not about the OS itself, but about setting up the development environment.

In my case, I'm developing in a terminal session, on FreeBSD 10 on an AMD-64 host, so I'll need to cross-compile all my code.  Luckily, the ports tree includes gcc-arm-embedded a port of the launchpad ARM cross tools.  It's easy to install:

pkg install gcc-arm-embedded

This package includes all the tools which are needed:

-rwxr-xr-x 1 root wheel 711488 Oct 3 11:17 arm-none-eabi-addr2line
-rwxr-xr-x 2 root wheel 740040 Oct 3 11:17 arm-none-eabi-ar
-rwxr-xr-x 2 root wheel 1298680 Oct 3 11:17 arm-none-eabi-as
-rwxr-xr-x 2 root wheel 620816 Oct 3 11:17 arm-none-eabi-c++
-rwxr-xr-x 1 root wheel 710528 Oct 3 11:17 arm-none-eabi-c++filt
-rwxr-xr-x 1 root wheel 620608 Oct 3 11:17 arm-none-eabi-cpp
-rwxr-xr-x 1 root wheel 29416 Oct 3 11:17 arm-none-eabi-elfedit
-rwxr-xr-x 2 root wheel 620816 Oct 3 11:17 arm-none-eabi-g++
-rwxr-xr-x 2 root wheel 620608 Oct 3 11:17 arm-none-eabi-gcc
-rwxr-xr-x 2 root wheel 620608 Oct 3 11:17 arm-none-eabi-gcc-4.8.4
-rwxr-xr-x 1 root wheel 24480 Oct 3 11:17 arm-none-eabi-gcc-ar
-rwxr-xr-x 1 root wheel 24448 Oct 3 11:17 arm-none-eabi-gcc-nm
-rwxr-xr-x 1 root wheel 24448 Oct 3 11:17 arm-none-eabi-gcc-ranlib
-rwxr-xr-x 1 root wheel 271072 Oct 3 11:17 arm-none-eabi-gcov
-rwxr-xr-x 1 root wheel 3992568 Oct 3 11:17 arm-none-eabi-gdb
-rwxr-xr-x 1 root wheel 776672 Oct 3 11:17 arm-none-eabi-gprof
-rwxr-xr-x 4 root wheel 1025912 Oct 3 11:17 arm-none-eabi-ld
-rwxr-xr-x 4 root wheel 1025912 Oct 3 11:17 arm-none-eabi-ld.bfd
-rwxr-xr-x 2 root wheel 722928 Oct 3 11:17 arm-none-eabi-nm
-rwxr-xr-x 2 root wheel 906848 Oct 3 11:17 arm-none-eabi-objcopy
-rwxr-xr-x 2 root wheel 1123424 Oct 3 11:17 arm-none-eabi-objdump
-rwxr-xr-x 2 root wheel 740056 Oct 3 11:17 arm-none-eabi-ranlib
-rwxr-xr-x 1 root wheel 365208 Oct 3 11:17 arm-none-eabi-readelf
-rwxr-xr-x 1 root wheel 712976 Oct 3 11:17 arm-none-eabi-size
-rwxr-xr-x 1 root wheel 712080 Oct 3 11:17 arm-none-eabi-strings
-rwxr-xr-x 2 root wheel 906864 Oct 3 11:17 arm-none-eabi-strip

Additionally, an ARM simulator such as QEMU will be needed.  FreeBSD also include that port:

pkg install qemu-devel

I can easily use BSD Make, however I prefer GNU Make, so I've installed that too

pkg install gmake

With these tools installed, I have enough to cross-compile ARM assembler and C code, link it, and run it in QEMU and debug with GDB.

FemtoOS; a simple bootloader and protected mode kernel

I've recently become interested in how the i386 boot loader works.  There is an excellent example of a boot loader here, and another here.  Some simple protected-mode code which implements a kernel capable of writing a line of text, is here.  FemtoOS, is the culmination of combining code from all those, and building a simple boot loader plus a protected mode kernel, that outputs text.  In my case, I'm compiling the kernel on FreeBSD.

The bootloader is in boot.asm.  Like any i386 boot loader, it's loaded by the BIOS at address 0xC700, in 16-bit real mode.

It starts by clearing the screen, outputting a message, and then loading kernel.bin from the disk.  kernel.bin is loaded at address 0x1000.  The bootloader then enters protected mode, sets up a GDT and passes control to kernel.bin at address 0x1000.  Looking at the documentation for BIOS interrupt 13 here, it's clear that a single sector has 512 bytes.  boot.bin is exactly 512 bytes long and the floppy image was created using this code from Makefile.  Therefore kernel.bin starts at the second sector on the disk.

cat boot.bin kernel.bin /dev/zero | dd bs=512 count=2880 of=floppy.img

This code from boot.asm reads 18 sectors, starting at sector 02, into RAM at 0x1000.  The largest kernel.bin can be, therefore is 512*18=9KB.

 mov ax, 0                              
 mov es, ax                              
 mov bx, 0x1000          ; Destination address = 0000:1000
 mov ah, 02h             ; READ SECTOR-command
 mov al, 12h             ; Number of sectors to read (0x12 = 18 sectors)
 mov dl, [drive]         ; Load boot disk
 mov ch, 0               ; Cylinder = 0
 mov cl, 2               ; Starting Sector = 3
 mov dh, 0               ; Head = 1
 int 13h                 ; Call interrupt 13h

kernel.bin is linked from the object files created from loader.asm, main.c and video.c.  When the bootloader passes control to kernel.bin, it starts at loader.asm which in turn passes control to main() from main.c.  Note that while boot.asm contains both 16-bit and 32-bit code, loader.asm contains only 32-bit code; it is called after boot.asm has put the host in 32-bit protected mode.

This code sets up the GDT

 xor ax, ax              ; Clear AX register
 mov ds, ax              ; Set DS-register to 0 - used by lgdt
 lgdt [gdt_desc]         ; Load the GDT descriptor

and this code puts the machine into protected mode, followed by passing control to loader.asm.  Note that immediately after putting the machine into protected mode a jmp instruction is issued to the label "kernel_segments" which is the first 32 bit instruction executed on boot.  From this point on, we are in 32 bit protected mode.

 mov eax, cr0            ; Copy the contents of CR0 into EAX
 or eax, 1               ; Set bit 0     (0xFE = Real Mode)
 mov cr0, eax            ; Copy the contents of EAX into CR0       
 jmp 08h:kernel_segments ; Jump to code segment, offset kernel_segments
        
[BITS 32]                ; We now need 32-bit instructions
kernel_segments:
 mov ax, 10h             ; Save data segment identifyer
 mov ds, ax              ; Move a valid data segment into the data segment register
 mov ss, ax              ; Move a valid data segment into the stack segment register
 mov esp, 090000h        ; Move the stack pointer to 090000h       
 jmp 08h:0x1000          ; Jump to section 08h (code), offset 01000h

The code in loader.asm that call's the C code main() is pretty simple:

start:
  call main  ; Call our kernel's main() function

There is good documentation for the protected mode text console here.  The simple implementation of this is in video.c.

There is a floppy disk image here, which boots in both qemu and VMWare.

Time Machine backups using FreeBSD ZFS

In this blog article, I described a way to use Netatalk3 to do Time Machine backups on FreeBSD.  This approach worked well, but it had some problems:

  • The Time Machine backups are in every user's home dir.  That's messy and there is the potential that they'll accidentally delete the backup.
  • If I put the backups on a ZFS disk, I can compress them
  • I would like the potential to use ZFS snapshots down the road
  • I would prefer to have all the backups in one directory, rather than scattered across user profiles

So, here is a new, better recipe.  Note that this recipe will only work with Netatalk 3.1.2 or better.  The current FreeBSD port is version 3.1.3, so that helps.  Firstly, as in the other recipe, the first step is to install netatalk3, and nss_mdns

pkg install netatalk3
pkg install nss_mdns

avahi needs mDNS, so that needs to be configured in /etc/nsswitch.conf.  Ensure that this line exists:

hosts: files dns mdns

Next create a ZFS dataset for the backups.  my zpool is /storage, and the Time Machine backups will be in /storage/timemachine.  Notice that I've turned on compression.

zfs create storage/timemachine
zfs set compression=gzip storage/timemachine
zfs get all storage/timemachine
zfs list

I want to grant the ability to do time machine backups to a FreeBSD group, so I'll make that, and add the users.  This group will be referred to in afp.conf.

pw groupadd timemachine
pw groupmod timemachine -m tom
pw groupshow timemachine

After that, we need to create user directories, one for each time machine user.

mkdir storage/timemachine/tom
chown tom:timemachine /storage/timemachine/tom
chmod 700 /storage/timemachine/tom
chmod 777 /storage/timemachine

So, now that I have a compressed ZFS dataset to store the backups on, backup directories created, and a FreeBSD group created, I can create afp.conf.  I've chosen to limit Time Machine to 500GB space for each user.

;
; Netatalk 3.x configuration file
;

[Global]
vol preset = default_for_all_vol
log file = /var/log/netatalk.log
log level = default:warn 
hosts allow = 192.168.77.0/24
mimic model = TimeCapsule6,116
disconnect time = 1 
vol dbpath = /var/netatalk/CNID/$u/$v/ 

[default_for_all_vol]
file perm = 0640
directory perm = 0750
cnid scheme = dbd

[Homes]
basedir regex = /storage/home
#500 GB (units of MB)
vol size limit = 512000

[TimeMachine]
time machine = yes
path=/storage/timemachine/$u
valid users = @timemachine
#500 GB (units of MB)
vol size limit = 512000

You should notice that the path of the [TimeMachine] share is set to

path=/storage/timemachine/$u

This means that for each logged in user, $u is substituted wth the name of the user. So, when Time Machine logs in as "tom", the data is stored at /storage/timemachine/tom.

Only members of the group "timemachine" will be able to use Time Machine.

This line

vol dbpath = /var/netatalk/CNID/$u/$v/

Ensures that each user as a CNID database for each volume.  Without this line, there is a single CNID database which is shared for all users who are using TimeMachine.  That generally results in corrupting the CNID database.  By specifying /$u/$v, we get a CNID database for each user for each volume, which is much more reliable.

By specifying a disconnect time of one hour:

disconnect time = 1

We can disconnect orphaned sessions and hopefully avoid the dreaded "volume in use" error in TimeMachine.

The rest of the steps are exactly as in the previous blog entry.

You'll need to start dbus, avahi, and netatalk, like this:

/usr/local/etc/rc.d/dbus onestart
/usr/local/etc/rc.d/avahi-daemon onestart
/usr/local/etc/rc.d/netatalk onestart

The next step takes place on your OS X client machine.  On each host that will perform backups, enable Time Machine to see non-TM volumes:

defaults write com.apple.systempreferences TMShowUnsupportedNetworkVolumes 1

Then, mount your user's share using afp://.  This will make the share visible to TimeMachine.

After this, you should be able to see your Netatalk shares in Time Machine, and perform backups