Outlook for Mac Archives

I recently had a reason to parse a large data set, for another project.  I decided that an ideal "large data set" would be my Outlook mail saved archives.  Sadly, Outlook for Mac doesn't output PST files, it outputs OLM archives, which are, essentially, giant zip files full of XML.  I was coding this all in Java, so I needed a Java library to parse OLM files.

The resulting source code is here.

 

 

AGC Grammar

Every IT geek is, to some degree, fascinated with the Apollo program which put a human on the moon for the first time.  Naturally, there is also curiosity about the computers on the Apollo moon lander, and the software that ran on them.  The source code that went to the moon is available now, and you can take a look at it here.

I'm interested in the Apollo program, but I'm also interested in formal grammars, and a committer to the Antlr project.  So, I spent some time building an Antlr4 grammar for the Apollo source code.  You can take a look at it here.  The grammar can parse a number of files from the Solarium55 source code, which is the source code that flew Apollo4.  If you're keen you could try it on the Apollo13 source code, called Artemis072, but you'd have to key in the source from jpg images of the form-feed printouts (here).

It's natural to ask why a Antlr4 grammar for AGC source code would be useful.  In addition to the obvious "because that goal will serve to organize and measure the best of our energies and skills", it's the first step in building a simulator.  There is already an excellent C simulator here, and there are numerous JS ones on the web, but I thought it might be helpful to have an Antlr4 grammar that can output parser-lexers for new simulators in other languages.  Also, it was very interesting to learn about the AGC computer and to see how software development has progressed since the 1960's.

 

 

A simple modelling language

Recently I had reason to get interested in process modelling.  Ultimately I ended up writing an Antlr4 grammar for Modelica (here), but in the mean time I came up with SML (Simple Modeling Language).  The Antlr4 grammar is sml.g4.

The characteristics I wanted in my own modeling languages were:

  • Ability to define models as text files
  • Models should be as Object Oriented as possible
  • Ability to compose models.  That is; ability to have models that include models.
  • Ability to define variables that are internal to models and variables that are exposed by models (i.e. "ports")
  • Ability to put models in namespaces
  • Ability to define equations in models that related the variables.  The equations should be expressed in standard form.
  • Support for differential equations is essential

SML accomplishes these goals.  An example SML model is a standard spring from 1st year Engineering, here.  The model file is:

model tge.spring;
#
# vars
# 
variables:
    # spring constant
    public k;
    # force difference
    public df;
    # distance difference
    public x;
#
# Equations
#
equations:
	df:= k*x;

This model is in the namespace "tge.string".  It exposes three public variables "k", "df" and "x".  The relationship b/t the variables is "df=k*x".  There is a simple example, of a standard pendulum here.

More complex examples are here.  One such example is a classic RC circuit.  This model defines the structure of the circuit itself, and references the resistor, capacitor and source via includes of those models from their own SML model files

model tge.rc1;
#
# A simple RC model 
#
#
#      ------- R1 ---- C2 --
#      |                   |
#      V                   |
#      |-------------------|
#
#
imports:
	tge.resistor;
	tge.capacitor;
	tge.vsource;
variables:
	# declare a resistor instance called "R1"
	component tge.resistor R1;
	# declare a capacitor instance called "C1"
	component tge.capacitor C1;
	# declare a vsource instance called "V"
	component tge.vsource V;
equations:
	# set dV of tge.vsource to 5V
	voltage:=D.dV-5;
	# set R of tge.resistor to 10 ohms
	resistance:=R1.R-10;
	# set C of tge.capacitor to 100 F
	resistance:=C1.C-100;
	# connect the +ve end of V to R1
	positiveConnection:= V.v1-R1.v1;
        # connect the resistor to the capacitor
        resistors:=R1.v2-C1.v1;
	# connect the -ve end of V
	groundConnection:= V.v2-R2.v2;

Ultimately, with Antlr4 it should be possible to generate model parsers in Java, C# and potentially C++, that can consume SML files, ensure that the model composition is reasonable, and generate input files for mathematical solvers.  The work of producing solver input files from SML models is essentially the work of collapsing an object tree to a flat model.

 

pdp-7 Unix

Unix version 0 was written in 1963 by Ken Thompson, on a PDP-7.  Recently, the source code code Unix V0 has been discovered, and you can read it here, as pdf scans of printouts.  You can read about the discovery, and the effort to boot Unix V0 on a real PDP-7 here.  The project home page is here.

I got interested in PDP-7 unix, and then in PDP-7 assembler.  Eventually, I wrote an Antlr4 grammar to parse PDP-7 assembler files in the original as format that Thompson wrote them here.  The resulting grammar is here.

 

Building a simple RIAK ORM

I've been interested in RIAK for a while, and ORM's are nothing short of fascinating. I decided to try writing an ORM for Riak, and the results are here:

https://github.com/teverett/cbean

My ORM is not an ORM of course, because RIAK is not relational. However it is ORM-like; I can store POJO's and retrieve them.

The features I wanted for my ORM were those that I am accustomed to with Hibernate, or eBean.

  • Ability to store POJO's and retrieve them
  • Ability to store object trees of POJO's which contain POJO's
  • Support for Lists of POJO's composed inside a POJO
  • Lazy Loading

In the end, I ended up with a ORM-like layer that can store data into any Key-Value store. There is a plugin which supports RIAK, and there is an emulated Key-Value store built on a filesystem, which is useful for development purposes. Theoretically cBean could work on any Key-Value store, such as MongoDB, but I haven't built that support yet. Adding support for a Key-Value store is as simple as implementing the the interface KVService.

Supported Types

  • All java simple types (int, String, long, etc)
  • All java wrapper types (Integer, Long, etc)
  • Contained POJOs
  • java.util.List of POJOs
  • java.util.UUID
  • java.util.Date

Annotations

Similar to Hibernate or eBean, cBean POJO's must be annotated. I didn't chose to use the JPA annotations, but instead defined my own. They are here.  There numerous cBean annotated POJO's in the tests, here.

@Entity

Similar to JPA, the @Entity annotation simply marks the POJO as one which is of interest to cBean.

@Id

Every POJO must have an id field and it must be a String or UUID.  The @Id field is a little different in cBean than in JPA.  Inserting a new object with the same JPA @Id as one that already exists in the RDBMS is an error.  In cBean, you simply overwrite the existing object.

@Property

The @Property is used to indicate that a specific property (simple type, wrapper type, object type or list type) will be persisted.  POJO fields without the @Property annotation are ignored.  There are a number of properties which are valid on an @Property annotation.

  • cascadeSave
  • cascadeDelete
  • cascadeLoad
  • ignore
  • nullable

cascadeSave, cascadeDelete, and cascadeDelete are only relevant for POJO fields which are themselves POJOs, or lists of POJOs.  Setting "cascadeLoad=false", naturally, indicates that the field is lazy-loaded.

@Version

An POJO can define an Integer field and annotate it with @Version.  cBean will increment the annotated field by 1 each time the POJO is saved.

Referential Integrity

Frameworks like Hibernate or eBean provide referential integrity because RDBMS's provide referential integrity.  Key-Value stores, such as RIAK do not provide referential integrity, and therefore neither does cBean.  Therefore is it entirely possible to persist a POJO which contains a POJO, and have the contained POJO be deleted underneath the parent.  In a RDBMS this can be prevented with a foreign key; there is no such protection in cBean.  Therefore application code using cBean must be aware that POJOs in lists, for example, may not be resolvable when the list is reloaded.

The strategy that cBean uses for handling broken "foreign keys" is two-fold:

  • If a POJO contains a child POJO and the child is deleted, that Object will be set to null on reload
  • If a POJO contains a list of POJO's and one of the elements is deleted, the element Object will be set to null.  The List size() will remain the same as when it was persisted.

Example Code

There is a working example at https://github.com/teverett/cbean/tree/master/example.

Building QEMU

In general, I install QEMU on my Macbook using MacPorts.  However I recently had a need to get the tip of the QEMU development tree.

Getting the QEMU source tree is trivial:

git clone git://git.qemu-project.org/qemu.git

I needed an updated version of dtc:

git submodule update --init dtc

The build instructions from the README are:

 mkdir build
 cd build
 ../configure
 make

However, my case I only need ARM emulation, so:

../configure --target-list=arm-softmmu
make
make install

The binary qemu-system-arm will be at /usr.local/bin

oscar:build tom$ /usr/local/bin/qemu-system-arm --version
QEMU emulator version 2.4.94, Copyright (c) 2003-2008 Fabrice Bellard

 

 

MusicBrainzTagger

I've tried a couple different mp3 taggers to tag my mp3 library, however, most seem to have trouble with large mp3 libraries.  So, after doing some reading about AcoustID and MusicBrainz I decided to quickly code up my own tagger, MusicBrainzTagger.

MusicBrainzTagger is a command-line application which recurses a directory of mp3 files and tags each one, one by one.  This approach allows it to handle very large libraries; it only processes one file at a time.  File processing consists of reading any ID3 tags in the input mp3, and then calculating the Acoustic ID fingerprint.  The fingerprint is then resolved to a MusicBrainz ID which is used to look up the recording.

MusicBrainzTagger then tags the file, renames it, and moves it to a new directory.