I recently had a reason to parse a large data set, for another project. I decided that an ideal "large data set" would be my Outlook mail saved archives. Sadly, Outlook for Mac doesn't output PST files, it outputs OLM archives, which are, essentially, giant zip files full of XML. I was coding this all in Java, so I needed a Java library to parse OLM files.
The resulting source code is here. Schema for OLM XML is here.