I’ve been interested in OS development for a while, and now have a prototypical ARM OS on my private source tree. For that I used gcc-arm-embedded, which worked quite well. However, as time went on, i became interested in building my own tool chains. I started with this list of requirements An up-to-date C/C++ compiler, such as gcc or clang I prefer gmake On i386 I like nasm, but I’ll use gas if necessary I like ld, objcopy, etc from binutils The targets I prefer to cross compile from are OS X and FreeBSD. I prefer to cross-compile to i386, arm, sparc and mips targets I ended up with two shell scripts, one for GCC/Binutils/Make and one for Clang/Binutils/Make, which download and install the latest GCC/Clang/Binutils and create a fully operational toolchain for the target of my choice. You can find them here: https://github.com/teverett/build_clang_crosscompiler.git https://github.com/teverett/build_gcc_crosscompiler.git
Every IT geek is, to some degree, fascinated with the Apollo program which put a human on the moon for the first time. Naturally, there is also curiosity about the computers on the Apollo moon lander, and the software that ran on them. The source code that went to the moon is available now, and you can take a look at it here. I’m interested in the Apollo program, but I’m also interested in formal grammars, and a committer to the Antlr project. So, I spent some time building an Antlr4 grammar for the Apollo source code. You can take a look at it here. The grammar can parse a number of files from the Solarium55 source code, which is the source code that flew Apollo4. If you’re keen you could try it on the Apollo13 source code, called Artemis072, but you’d have to key in the source from jpg images of the form-feed printouts (here). It’s natural to ask why a Antlr4 grammar for AGC source code would be useful. In addition to the obvious “because that goal will serve to organize and measure the best of our energies and skills”, it’s the first step in building a simulator. There is already an excellent C simulator here, and there are numerous JS ones on the web, but I thought it might be helpful to have an Antlr4 grammar that can output parser-lexers for new simulators in other languages. Also, it was very interesting to learn about the AGC computer and to see how software development has progressed since the 1960’s.
Recently I had reason to get interested in process modelling. Ultimately I ended up writing an Antlr4 grammar for Modelica (here), but in the mean time I came up with SML (Simple Modeling Language). The Antlr4 grammar is sml.g4. The characteristics I wanted in my own modeling languages were: Ability to define models as text files Models should be as Object Oriented as possible Ability to compose models. That is; ability to have models that include models. Ability to define variables that are internal to models and variables that are exposed by models (i.e. “ports”) Ability to put models in namespaces Ability to define equations in models that related the variables. The equations should be expressed in standard form. Support for differential equations is essential SML accomplishes these goals. An example SML model is a standard spring from 1st year Engineering, here. The model file is: model tge.spring; # # vars # variables: # spring constant public k; # force difference public df; # distance difference public x; # # Equations # equations: df:= k*x; This model is in the namespace “tge.string”. It exposes three public variables “k”, “df” and “x”. The relationship b/t the variables is “df=k*x”. There is a simple example, of a standard pendulum here. More complex examples are here. One such example is a classic RC circuit. This model defines the structure of the circuit itself, and references the resistor, capacitor and source via includes of those models from their own SML model files model tge.rc1; # # A simple RC model # # # ——- R1 —- C2 — # | | # V | # |——————-| # # imports: tge.resistor; tge.capacitor; tge.vsource; variables: # declare a resistor instance called “R1” component tge.resistor R1; # declare a capacitor instance called “C1” component tge.capacitor C1; # declare a vsource instance called “V” component tge.vsource V; equations: # set dV of tge.vsource to 5V voltage:=D.dV-5; # set R of tge.resistor to 10 ohms resistance:=R1.R-10; # set C of tge.capacitor to 100 F resistance:=C1.C-100; # connect the +ve end of V to R1 positiveConnection:= V.v1-R1.v1; # connect the resistor to the capacitor resistors:=R1.v2-C1.v1; # connect the -ve end of V groundConnection:= V.v2-R2.v2; Ultimately, with Antlr4 it should be possible to generate model parsers in Java, C# and potentially C++, that can consume SML files, ensure that the model composition is reasonable, and generate input files for mathematical solvers. The work of producing solver input files from SML models is essentially the work of collapsing an object tree to a flat model.
Unix version 0 was written in 1963 by Ken Thompson, on a PDP-7. Recently, the source code code Unix V0 has been discovered, and you can read it here, as pdf scans of printouts. You can read about the discovery, and the effort to boot Unix V0 on a real PDP-7 here. The project home page is here. I got interested in PDP-7 unix, and then in PDP-7 assembler. Eventually, I wrote an Antlr4 grammar to parse PDP-7 assembler files in the original as format that Thompson wrote them here. The resulting grammar is here.
In general, I install QEMU on my Macbook using MacPorts. However I recently had a need to get the tip of the QEMU development tree. Getting the QEMU source tree is trivial: git clone git://git.qemu-project.org/qemu.git I needed an updated version of dtc: git submodule update –init dtc The build instructions from the README are: mkdir build cd build ../configure make However, my case I only need ARM emulation, so: ../configure –target-list=arm-softmmu make make install The binary qemu-system-arm will be at /usr.local/bin oscar:build tom$ /usr/local/bin/qemu-system-arm –version QEMU emulator version 2.4.94, Copyright (c) 2003-2008 Fabrice Bellard
I’ve recently become interested in how the i386 boot loader works. There is an excellent example of a boot loader here, and another here. Some simple protected-mode code which implements a kernel capable of writing a line of text, is here. FemtoOS, is the culmination of combining code from all those, and building a simple boot loader plus a protected mode kernel, that outputs text. In my case, I’m compiling the kernel on FreeBSD. The bootloader is in boot.asm. Like any i386 boot loader, it’s loaded by the BIOS at address 0xC700, in 16-bit real mode. It starts by clearing the screen, outputting a message, and then loading kernel.bin from the disk. kernel.bin is loaded at address 0x1000. The bootloader then enters protected mode, sets up a GDT and passes control to kernel.bin at address 0x1000. Looking at the documentation for BIOS interrupt 13 here, it’s clear that a single sector has 512 bytes. boot.bin is exactly 512 bytes long and the floppy image was created using this code from Makefile. Therefore kernel.bin starts at the second sector on the disk. cat boot.bin kernel.bin /dev/zero | dd bs=512 count=2880 of=floppy.img This code from boot.asm reads 18 sectors, starting at sector 02, into RAM at 0x1000. The largest kernel.bin can be, therefore is 512*18=9KB. mov ax, 0 mov es, ax mov bx, 0x1000 ; Destination address = 0000:1000 mov ah, 02h ; READ SECTOR-command mov al, 12h ; Number of sectors to read (0x12 = 18 sectors) mov dl, [drive] ; Load boot disk mov ch, 0 ; Cylinder = 0 mov cl, 2 ; Starting Sector = 3 mov dh, 0 ; Head = 1 int 13h ; Call interrupt 13h kernel.bin is linked from the object files created from loader.asm, main.c and video.c. When the bootloader passes control to kernel.bin, it starts at loader.asm which in turn passes control to main() from main.c. Note that while boot.asm contains both 16-bit and 32-bit code, loader.asm contains only 32-bit code; it is called after boot.asm has put the host in 32-bit protected mode. This code sets up the GDT xor ax, ax ; Clear AX register mov ds, ax ; Set DS-register to 0 – used by lgdt lgdt [gdt_desc] ; Load the GDT descriptor and this code puts the machine into protected mode, followed by passing control to loader.asm. Note that immediately after putting the machine into protected mode a jmp instruction is issued to the label “kernel_segments” which is the first 32 bit instruction executed on boot. From this point on, we are in 32 bit protected mode. mov eax, cr0 ; Copy the contents of CR0 into EAX or eax, 1 ; Set bit 0 (0xFE = Real Mode) mov cr0, eax ; Copy the contents of EAX into CR0 jmp 08h:kernel_segments ; Jump to code segment, offset kernel_segments [BITS 32] ; We now need 32-bit instructions kernel_segments: mov ax, 10h ; Save data segment identifyer mov ds, ax ; Move a valid data segment into the data segment register mov ss, ax ; Move a valid data segment into the stack segment register mov esp, 090000h ; Move the stack pointer to 090000h jmp 08h:0x1000 ; Jump to section 08h (code), offset 01000h The code in loader.asm that call’s the C code main() is pretty simple: start: call main ; Call our kernel’s main() function There is good documentation for the protected mode text console here. The simple implementation of this is in video.c. There is a floppy disk image here, which boots in both qemu and VMWare.
Back in the dark ages (the 1980’s), people like myself coded on Apple][ computers. If you were good you coded in Applesoft BASIC or Integer BASIC. If you were really geeky you coded in Assembly language on the 6502 processor. The Apple][ OS was coded in assembly language, so if you really wanted to understand what was going on inside your computer, you needed to learn assembly language and you needed to learn about the 6502. Obviously, the first thing you would do was write your own code to read and write disks. It was a big deal in those days to be able to copy 5.25″ floppies that had games on them. For the serious geeks, it was much less interesting to play the games, and much more interesting to figure out how to copy them. One technique game manufacturers used was to write data between the tracks, by moving the drive arm 1/2 tracks and 1/4 tracks. Another technique was to modify the sector header bytes from the usual $D5 $AA $96 to something else. This change disabled standard Apple DOS from finding the sectors, and therefore made them unreadable to anyone other than the manufacturer of the game. In order to be able to inspect disks, read sectors, read between tracks and so on, I wrote a small program that would enable me to inspect disks easily. I was young, so I simply called the program “M”. You can see the source here. It’s most likely that I used the LISA assembler, judging by the syntax of the source code. “M” allowed me to put a disk in the floppy drive and then using keyboard commands navigate through the disk and look at the contents, at a byte level. There are lots of other 6502 assembler programs around the internet, including a nice archive at 6502.org. I had a lazy Saturday afternoon, so I decided to write an ANTLR4 grammar for LISA assembler. You can see it here. This grammar produces Java or C# (if you have the C# ANTLR Target) code which can parse LISA assembler. It’s the first step to writing a Java or C# assembler for 6502 assembly code. ANTLR has a useful feature where it can parse input and produce LISP-like output showing the fully parsed program. This feature is primary useful for debugging; it’s an easy way to look at the AST in text format. Here is the LISP-like output from running my ANTLR grammar on “M”. Of course, the parser and lexer that ANTLR produces in Java or C# does not output this LISP-like string, it produces an AST. A proper assembler would then walk that AST and output binary opcodes. Here is an example of Bubble sort, coded in assembler. This was cut-pasted from 6502.org. ;THIS SUBROUTINE ARRANGES THE 8-BIT ELEMENTS OF A LIST IN ASCENDING ;ORDER. THE STARTING ADDRESS OF THE LIST IS IN LOCATIONS $30 AND ;$31. THE LENGTH OF THE LIST IS IN THE FIRST BYTE OF THE LIST. LOCATION ;$32 IS USED TO HOLD AN EXCHANGE FLAG. SORT8 LDY #$00 ;TURN EXCHANGE FLAG OFF (= 0) STY $32 LDA ($30),Y ;FETCH ELEMENT COUNT TAX ; AND PUT IT INTO X INY ;POINT TO FIRST ELEMENT IN LIST DEX ;DECREMENT ELEMENT COUNT NXTEL LDA ($30),Y ;FETCH ELEMENT INY CMP ($30),Y ;IS IT LARGER THAN THE NEXT ELEMENT? BCC CHKEND BEQ CHKEND ;YES. EXCHANGE ELEMENTS IN MEMORY PHA ; BY SAVING LOW BYTE ON STACK. LDA ($30),Y ; THEN GET HIGH BYTE AND DEY ; STORE IT AT LOW ADDRESS STA ($30),Y PLA ;PULL LOW BYTE FROM STACK INY ; AND STORE IT AT HIGH ADDRESS STA ($30),Y LDA #$FF ;TURN EXCHANGE FLAG ON (= -1) STA $32 CHKEND DEX ;END OF LIST? BNE NXTEL ;NO. FETCH NEXT ELEMENT BIT $32 ;YES. EXCHANGE FLAG STILL OFF? BMI SORT8 ;NO. GO THROUGH LIST AGAIN RTS ;YES. LIST IS NOW ORDERED The LISP-ish output produced by parsing this with my grammar looks like: (prog (line (comment ;THIS SUBROUTINE ARRANGES THE 8-BIT ELEMENTS OF A LIST IN ASCENDING)) \n (line (comment ;ORDER. THE STARTING ADDRESS OF THE LIST IS IN LOCATIONS $30 AND)) \n (line (comment ;$31. THE LENGTH OF THE LIST IS IN THE FIRST BYTE OF THE LIST. LOCATION)) \n (line (comment ;$32 IS USED TO HOLD AN EXCHANGE FLAG.)) \n \n (line (instruction (label (name SORT8)) (opcode LDY) (argumentlist (argument (prefix #) (number $00))) (comment ;TURN EXCHANGE FLAG OFF (= 0)))) \n (line (instruction (opcode STY) (argumentlist (argument (number $32))))) \n (line (instruction (opcode LDA) (argumentlist (argument ( (argument (number $30)) )) , (argumentlist (argument (name Y)))) (comment ;FETCH ELEMENT COUNT))) \n (line (instruction (opcode TAX) (comment ; AND PUT IT INTO X))) \n (line (instruction (opcode INY) (comment ;POINT TO FIRST ELEMENT IN LIST))) \n (line (instruction (opcode DEX) (comment ;DECREMENT ELEMENT COUNT))) \n (line (instruction (label (name NXTEL)) (opcode LDA) (argumentlist (argument ( (argument (number $30)) )) , (argumentlist (argument (name Y)))) (comment ;FETCH ELEMENT))) \n (line (instruction (opcode INY))) \n (line (instruction (opcode CMP) (argumentlist (argument ( (argument (number $30)) )) , (argumentlist (argument (name Y)))) (comment ;IS IT LARGER THAN THE NEXT ELEMENT?))) \n (line (instruction (opcode BCC) (argumentlist (argument (name CHKEND))))) \n (line (instruction (opcode BEQ) (argumentlist (argument (name CHKEND))))) \n (line (comment ;YES. EXCHANGE ELEMENTS IN MEMORY)) \n (line (instruction (opcode PHA) (comment ; BY SAVING LOW BYTE ON STACK.))) \n (line (instruction (opcode LDA) (argumentlist (argument ( (argument (number $30)) )) , (argumentlist (argument (name Y)))) (comment ; THEN GET HIGH BYTE AND))) \n (line (instruction (opcode DEY) (comment ; STORE IT AT LOW ADDRESS))) \n (line (instruction (opcode STA) (argumentlist (argument ( (argument (number $30)) )) , (argumentlist (argument (name Y)))))) \n (line (instruction (opcode PLA) (comment ;PULL LOW BYTE FROM STACK))) \n (line (instruction (opcode INY) (comment ; AND STORE IT AT HIGH ADDRESS))) \n (line (instruction (opcode STA) (argumentlist (argument ( (argument (number $30)) )) , (argumentlist (argument (name Y)))))) \n (line (instruction (opcode LDA) (argumentlist (argument (prefix #) (number $FF))) (comment ;TURN EXCHANGE FLAG ON (= -1)))) \n (line (instruction (opcode STA) (argumentlist (argument (number $32))))) \n (line (instruction (label (name CHKEND)) (opcode DEX) (comment ;END OF LIST?))) \n (line (instruction (opcode BNE) (argumentlist (argument (name NXTEL))) (comment ;NO. FETCH NEXT ELEMENT))) \n (line (instruction (opcode BIT) (argumentlist (argument (number $32))) (comment ;YES. EXCHANGE FLAG STILL OFF?))) \n (line (instruction (opcode BMI) (argumentlist (argument (name SORT8))) (comment ;NO. GO THROUGH LIST AGAIN))) \n (line (instruction (opcode RTS) (comment ;YES. LIST IS NOW ORDERED))) \n) If you have Apple][ code on floppies of your own and you wish to retrieve it, I used a device from here. It worked very well.
After FASTA, I got interested in gff3. The link to the grammar for this file is here: antlr/grammars-v4. You can read about gff3 here.
I’ve bee hacking away trying to set up Crochet-BSD to build boot images for Wandboard. Wandboard uses Cortex A9 processor, so it’s ARM. The linux distros for it use U-Boot, so that seemed like a likely place to start for FreeBSD. Looking at the Wandboard Wiki, there is an article on U-Boot which explains how to build U-Boot for Wandboard. Firstly, download the October 2013 release of U-Boot here. If you are building on FreeBSD, you’ll need this patch to the Makefile to include libc. Then, build U-Boot for wandboard: make wandboard_quad_config make and copy the imx file to the sd card sudo dd if=u-boot.imx of=/dev/mmcblk0 bs=1 seek=1024 This writes the imx file 1K into the disk; which is where the i.mx6 processor expects to find the U-boot image. The processor manual is here. It specifies offset 0x400 as the Program Image start on the SD card. The U-boot startup looks like this: CPU: Freescale i.MX6Q rev1.2 at 792 MHz Reset cause: POR Board: Wandboard DRAM: 2 GiB MMC: FSL_SDHC: 0, FSL_SDHC: 1 *** Warning – bad CRC, using default environment In: serial Out: serial Err: serial Net: FEC [PRIME] Hit any key to stop autoboot: 1 0 mmc0 is current device SD/MMC found on device 0 Keep in mind that the default serial protocol for Wandboard is 8n1 and 115200. Once U-boot is started, you will need to boot the FreeBSD kernel. My disk image is partitioned with a FAT partition on which I have kernel.bin, and a UFS partition which is the FreeBSD root file system. kernel.bin is the raw kernel. The partition table looks like this: Disk: /dev/rdisk1 geometry: 3880/255/63 [62333952 sectors] Signature: 0xAA55 Starting Ending #: id cyl hd sec – cyl hd sec [ start – size] ———————————————————————— *1: 0C 8 5 1 – 58 29 63 [ 16443 – 102375] Win95 FAT32L 2: A5 58 30 1 – 992 1 63 [ 118818 – 1881180] FreeBSD 3: 00 0 0 0 – 0 0 0 [ 0 – 0] unused 4: 00 0 0 0 – 0 0 0 [ 0 – 0] unused You should notice that the FAT partition starts at 16443. I’ve moved the start of the FAT partition further into the disk to accomodate u-boot at disk offset 0x400. Once you’ve got an SD card with u-boot and kernel.bin on it, and you’ve booted to u-boot, you’ll want to start the kernel. At the u-boot console, load the kernel.bin into memory at address 0x12000000. fatload mmc 0:1 12000000 kernel.bin The address 0x1200000 is specified in the FreeBSD ARM configuration for Freescale. Check here. Once the kernel is loaded, go ahead and start it: go 12000000 This will start FreeBSD: KDB: debugger backends: ddb KDB: current backend: ddb Copyright (c) 1992-2013 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 11.0-CURRENT #0 r259276: Thu Dec 12 17:20:23 MST 2013 tom@bernice:/storage/home/tom/crochet/crochet-wandboard/crochet-freebsd/work/obj/arm.arm/storage/home/tom/crochet/src/FreeBSDHead/head/sys/WANDBOARD-QUAD arm FreeBSD clang version 3.3 (tags/RELEASE_33/final 183502) 20130610 Preloaded elf kernel “kernel” at 0xc24afdac. CPU: Cortex A9-r2 rev 10 (Cortex-A core) Supported features: ARM_ISA THUMB2 JAZELLE THUMBEE ARMv4 Security_Ext WB disabled EABT branch prediction enabled In my case, I’ve run into a kernel crash starting the kernel, so I don’t have a fully working system yet, but I do have a working boot loader. The Crochet-BSD code to build an image file containing USB, a partitioned disk image, kernel.bin and the userland is here:
I’ve been reading lots of interesting information about iTunes and IOS, so I thought I would investigate, just what is in an iTunes backup. Typically, on OS X, you can find your iTunes backup here, under the appropriate OS X user profile: /Library/Application Support/MobileSync/Backup When you look at the backup, it’s a giant list of 40 character hexadecimal file names. After doing some quick reading on theiphonewiki.com, those file names are SHA-1 hashes. Each of the files, is a backed-up file from the iPhone. The problem of restoring the file system then is that we need to find the original file names from the hashes. Luckily, Apple provides an index. There is a file called “Manifest.mbdb” which is a binary index of all the SHA-1 files. There is a pretty good description of the format of that file here. After reading the Manifest.mbdb into memory, we have enough information to generate all the SHA-1 hashes. From there, we can match the generated hashes to the filenames on the file system, and we have enough information to regenerate the backed-up filesystem. Once we have the file system, it’s interesting to look around and find out what information was actually backed-up. Here’s some highlights: SMS messages: “Library/SMS/sms.db”. This is a sqlite database. Address Book: “Library/AddressBook/AddressBook.sqlitedb”. sqlite database. Notes: “Library/notes/notes.sqlite”. sqlite database. Call History: “Library/CallHistory/call_history.db”. sqlite database. Photos: “Media/DCIM/”. File systems of JPG files. SMS photos: “Library/SMS/Attachments”. File system of JPG files. Safari bookmarks: “Library/Safari/Bookmarks.db”. sqlite database. I have working proof of concept code, however, in the interest of being a good guy, I’m keeping it private.