Cross-compiling NetBSD on FreeBSD-10

I was curious to see how difficult it would be to compile NetBSD on FreeBSD 10.  Once you have the source, there is a build.sh script which is very helpful.  The build is as simple as:

./build.sh -m i386 tools
./build.sh -m i386 kernel=GENERIC release

In my case I have both clang and gcc installed.  I believe the cross-compile tools were built with clang.  The resulting gcc cross-compiler "i486--netbsdelf-gcc" was used to build the NetBSD source.

The generated binaries appear in obj/releasedir/i386/binary/

Enabling i2c on FreeBSD Wandboard

FreeBSD-11 has support for i2c on Wandboard.

The working boot image is here, and the boot log is here.

In order to enable it, there were three steps

  • Enable the kernel options
  • Update the files.imx6 file
  • Create the DTS mappings

The kerne config for Wandboard-Quad is here, however, most of the kernel options for Wandboard Quad are in the IMX6 config. The kernel options to enable are:

device fsliic # Freescale i2c/iic
device iic # iic protocol
device iicbus # iic bus

The files files.imx6 contains the mappings between the devices in the conf file, and the files to compile for that device.  In order to compile in the fsliic device for Freescale, I need to enable the mapping for it, so that /sys/arm/freescale/imx/i2c.c is compiled in.

arm/freescale/imx/i2c.c optional fsliic

Finally, I need to update the DTS files.  The file wandboard-quad.dts contains the device mappings for wandboard quad. However, similar to the kernel configuration, all the real mappings are in imx6.dtsi.  There are three i2c controllers on the wandboard quad, so I added these mappings:

i2c@021a0000 {

#address-cells = <1>;

#size-cells = <0>;

compatible = "fsl,imx-i2c";

reg = <0x021a0000 0x4000>;

interrupt-parent = <&gic>; interrupts = <68>;

};

i2c@021a4000 {

#address-cells = <1>;

#size-cells = <0>;

compatible = "fsl,imx-i2c";

reg = <0x021a4000 0x4000>;

interrupt-parent = <&gic>; interrupts = <69>;

};

i2c@021a8000 {

#address-cells = <1>;

#size-cells = <0>;

compatible = "fsl,imx-i2c";

reg = <0x021a8000 0x4000>;

interrupt-parent = <&gic>; interrupts = <70>;

};

The memory locations (0x21a000, 0x21a4000, 0x21a8000) as well as the interrupts (68,69,70) come from the IMX6 documentation.  This line

compatible = "fsl,imx-i2c";

is the line that the i2c.c driver uses in probe() to determine that this hardware is a compatible Freescale IMX i2c device.

Once the kernel is compiled and started it will detect the 3 iic devices.  The relevant lines from the boot log are:

iichb0: <I2C bus controller> mem 0x21a0000-0x21a3fff irq 68 on simplebus2
iicbus0: <OFW I2C bus> on iichb0
iic0: <I2C generic I/O> on iicbus0
iichb1: <I2C bus controller> mem 0x21a4000-0x21a7fff irq 69 on simplebus2
iicbus1: <OFW I2C bus> on iichb1
iic1: <I2C generic I/O> on iicbus1
iichb2: <I2C bus controller> mem 0x21a8000-0x21abfff irq 70 on simplebus2
iicbus2: <OFW I2C bus> on iichb2
iic2: <I2C generic I/O> on iicbus2

You should have 3 iic devices in the /dev/ directory

crw------- 1 root wheel 0x20 Jul 6 01:23 iic0
crw------- 1 root wheel 0x21 Jul 6 01:23 iic1
crw------- 1 root wheel 0x22 Jul 6 01:23 iic2

You can check which devices are specified in the dts file, using ofwdump.

root@wandboard:/dev # ofwdump -a
Node 0x38:
Node 0xa8: cpus
Node 0xd4: cpu@0
Node 0x190: aliases
Node 0x1bc: soc@00000000
Node 0x230: generic-interrupt-controller@00a00100
Node 0x2cc: mp_tmr0@00a00200
Node 0x348: l2-cache@00a02000
Node 0x3d0: aips@02000000
Node 0x458: ccm@020c4000
Node 0x4b4: anatop@020c8000
Node 0x520: timer@02098000
Node 0x594: gpio@0209c000
Node 0x668: gpio@020a0000
Node 0x71c: gpio@020a4000
Node 0x7f0: gpio@020a8000
Node 0x8a4: gpio@020ac000
Node 0x958: gpio@020b0000
Node 0xa0c: gpio@020b4000
Node 0xac0: serial@02020000
Node 0xb4c: serial@021e8000
Node 0xbdc: serial@021ec000
Node 0xc6c: serial@021f0000
Node 0xcfc: serial@021f4000
Node 0xd8c: usbphy@020c9000
Node 0xe2c: usbphy@020ca000
Node 0xed0: aips@02100000
Node 0xf58: ethernet@02188000
Node 0xfec: usb@02184000
Node 0x1088: usb@02184200
Node 0x1124: usb@02184400
Node 0x11b4: usb@02184600
Node 0x1244: usbmisc@02184800
Node 0x12c4: usdhc@02190000
Node 0x1368: usdhc@02194000
Node 0x1404: usdhc@02198000
Node 0x14a8: usdhc@0219c000
Node 0x1538: i2c@021a0000
Node 0x15d0: i2c@021a4000
Node 0x1668: i2c@021a8000
Node 0x1700: ocotp@021bc000
Node 0x1750: memory
Node 0x1774: chosen

You can also check that the iic modules are compiled into the kernel, using kldstat

root@wandboard:/usr/home/tom # kldstat -v | grep iic
131 i2c/iicbus
13 iichb/iicbus
12 iicbus/iic
55 iichb/ofw_iicbus
54 iicbb/ofw_iicbus

Once you have a device connected to the wandboard, you can use the i2c command to access the devices.

 

6502 Assembler

Back in the dark ages (the 1980's), people like myself coded on Apple][ computers.  If you were good you coded in Applesoft BASIC or Integer BASIC. If you were really geeky you coded in Assembly language on the 6502 processor.  The Apple][ OS was coded in assembly language, so if you really wanted to understand what was going on inside your computer, you needed to learn assembly language and you needed to learn about the 6502.

Obviously, the first thing you would do was write your own code to read and write disks.  It was a big deal in those days to be able to copy 5.25" floppies that had games on them.  For the serious geeks, it was much less interesting to play the games, and much more interesting to figure out how to copy them.  One technique game manufacturers used was to write data between the tracks, by moving the drive arm 1/2 tracks and 1/4 tracks.  Another technique was to modify the sector header bytes from the usual $D5 $AA $96 to something else.  This change disabled standard Apple DOS from finding the sectors, and therefore made them unreadable to anyone other than the manufacturer of the game.

In order to be able to inspect disks, read sectors, read between tracks and so on, I wrote a small program that would enable me to inspect disks easily.  I was young, so I simply called the program "M".  You can see the source here.  It's most likely that I used the LISA assembler, judging by the syntax of the source code.  "M" allowed me to put a disk in the floppy drive and then using keyboard commands navigate through the disk and look at the contents, at a byte level.

There are lots of other 6502 assembler programs around the internet, including a nice archive at 6502.org.  I had a lazy Saturday afternoon, so I decided to write an ANTLR4 grammar for LISA assembler.  You can see it here.  This grammar produces Java or C# (if you have the C# ANTLR Target) code which can parse LISA assembler.  It's the first step to writing a Java or C# assembler for 6502 assembly code.

ANTLR has a useful feature where it can parse input and produce LISP-like output showing the fully parsed program.  This feature is primary useful for debugging; it's an easy way to look at the AST in text format.  Here is the LISP-like output from running my ANTLR grammar on "M".  Of course, the parser and lexer that ANTLR produces in Java or C# does not output this LISP-like string, it produces an AST. A proper assembler would then walk that AST and output binary opcodes.

Here is an example of Bubble sort, coded in assembler.  This was cut-pasted from 6502.org.

;THIS SUBROUTINE ARRANGES THE 8-BIT ELEMENTS OF A LIST IN ASCENDING
;ORDER. THE STARTING ADDRESS OF THE LIST IS IN LOCATIONS $30 AND
;$31. THE LENGTH OF THE LIST IS IN THE FIRST BYTE OF THE LIST. LOCATION
;$32 IS USED TO HOLD AN EXCHANGE FLAG.

SORT8 LDY #$00 ;TURN EXCHANGE FLAG OFF (= 0)
STY $32
LDA ($30),Y ;FETCH ELEMENT COUNT
TAX ; AND PUT IT INTO X
INY ;POINT TO FIRST ELEMENT IN LIST
DEX ;DECREMENT ELEMENT COUNT
NXTEL LDA ($30),Y ;FETCH ELEMENT
INY
CMP ($30),Y ;IS IT LARGER THAN THE NEXT ELEMENT?
BCC CHKEND
BEQ CHKEND
;YES. EXCHANGE ELEMENTS IN MEMORY
PHA ; BY SAVING LOW BYTE ON STACK.
LDA ($30),Y ; THEN GET HIGH BYTE AND
DEY ; STORE IT AT LOW ADDRESS
STA ($30),Y
PLA ;PULL LOW BYTE FROM STACK
INY ; AND STORE IT AT HIGH ADDRESS
STA ($30),Y
LDA #$FF ;TURN EXCHANGE FLAG ON (= -1)
STA $32
CHKEND DEX ;END OF LIST?
BNE NXTEL ;NO. FETCH NEXT ELEMENT
BIT $32 ;YES. EXCHANGE FLAG STILL OFF?
BMI SORT8 ;NO. GO THROUGH LIST AGAIN
RTS ;YES. LIST IS NOW ORDERED

The LISP-ish output produced by parsing this with my grammar looks like:

(prog (line (comment ;THIS SUBROUTINE ARRANGES THE 8-BIT ELEMENTS OF A LIST IN ASCENDING)) \n (line (comment ;ORDER.  THE STARTING ADDRESS OF THE LIST IS IN LOCATIONS $30 AND)) \n (line (comment ;$31.  THE LENGTH OF THE LIST IS IN THE FIRST BYTE OF THE LIST.  LOCATION)) \n (line (comment ;$32 IS USED TO HOLD AN EXCHANGE FLAG.)) \n \n (line (instruction (label (name SORT8)) (opcode LDY) (argumentlist (argument (prefix #) (number $00))) (comment ;TURN EXCHANGE FLAG OFF (= 0)))) \n (line (instruction (opcode STY) (argumentlist (argument (number $32))))) \n (line (instruction (opcode LDA) (argumentlist (argument ( (argument (number $30)) )) , (argumentlist (argument (name Y)))) (comment ;FETCH ELEMENT COUNT))) \n (line (instruction (opcode TAX) (comment ; AND PUT IT INTO X))) \n (line (instruction (opcode INY) (comment ;POINT TO FIRST ELEMENT IN LIST))) \n (line (instruction (opcode DEX) (comment ;DECREMENT ELEMENT COUNT))) \n (line (instruction (label (name NXTEL)) (opcode LDA) (argumentlist (argument ( (argument (number $30)) )) , (argumentlist (argument (name Y)))) (comment ;FETCH ELEMENT))) \n (line (instruction (opcode INY))) \n (line (instruction (opcode CMP) (argumentlist (argument ( (argument (number $30)) )) , (argumentlist (argument (name Y)))) (comment ;IS IT LARGER THAN THE NEXT ELEMENT?))) \n (line (instruction (opcode BCC) (argumentlist (argument (name CHKEND))))) \n (line (instruction (opcode BEQ) (argumentlist (argument (name CHKEND))))) \n (line (comment ;YES. EXCHANGE ELEMENTS IN MEMORY)) \n (line (instruction (opcode PHA) (comment ; BY SAVING LOW BYTE ON STACK.))) \n (line (instruction (opcode LDA) (argumentlist (argument ( (argument (number $30)) )) , (argumentlist (argument (name Y)))) (comment ; THEN GET HIGH BYTE AND))) \n (line (instruction (opcode DEY) (comment ; STORE IT AT LOW ADDRESS))) \n (line (instruction (opcode STA) (argumentlist (argument ( (argument (number $30)) )) , (argumentlist (argument (name Y)))))) \n (line (instruction (opcode PLA) (comment ;PULL LOW BYTE FROM STACK))) \n (line (instruction (opcode INY) (comment ; AND STORE IT AT HIGH ADDRESS))) \n (line (instruction (opcode STA) (argumentlist (argument ( (argument (number $30)) )) , (argumentlist (argument (name Y)))))) \n (line (instruction (opcode LDA) (argumentlist (argument (prefix #) (number $FF))) (comment ;TURN EXCHANGE FLAG ON (= -1)))) \n (line (instruction (opcode STA) (argumentlist (argument (number $32))))) \n (line (instruction (label (name CHKEND)) (opcode DEX) (comment ;END OF LIST?))) \n (line (instruction (opcode BNE) (argumentlist (argument (name NXTEL))) (comment ;NO. FETCH NEXT ELEMENT))) \n (line (instruction (opcode BIT) (argumentlist (argument (number $32))) (comment ;YES. EXCHANGE FLAG STILL OFF?))) \n (line (instruction (opcode BMI) (argumentlist (argument (name SORT8))) (comment ;NO. GO THROUGH LIST AGAIN))) \n (line (instruction (opcode RTS) (comment ;YES. LIST IS NOW ORDERED))) \n)

If you have Apple][ code on floppies of your own and you wish to retrieve it, I used a device from here. It worked very well.

TNT

If you haven't read this book, I highly recommend it.  I discovered it in high school and finally purchased my first copy at the now-gone Duthie Books in Kitsilano.  Without going into the details of the book, the author uses a simple Peano arithmetic called Typographical Number Theory  (TNT) to illustrate some of his points.

An example expression in TNT could look like this (from Wikipedia):

∀a:∀b:(a + b) = (b + a)

Which means "for every number a and every number b, a plus b equals b plus a"

I decided to write a simple ANTLR grammar for TNT, which you can find here.

Parsing HTML

I've been interested in HTML parsing for a while now.  There are a number of reasons to do this, such as:

  • Validating that what claims to be HTML, is HTML
  • Finding every style sheet and script in an HTML file
  • Pretty-printing
  • Syntax highlighting
  • Linting
  • Translating between markup languages, for example generating JSPs from PHP, or perhaps generating JSPs from ASPs.

One of the most difficult aspects of modern web programming is that any example server-side markup file likely contains 4 programming languages:

So, if you're going to write an HTML parser, you need to be able to not only parse the HTML, but also to find the style and script sections, and pull them out.  You also need to be able to find the scriptlets where the markup is generated.

Additionally, there is the fact that modern HTML is messy.  It's perfectly valid to have missing end-tags, or attribute values that aren't quoted.  These edge cases just add to the difficultly in writing the parser.

If the end goal is to read .php source and emit similar .jsp source, then one needs an HTML parser that can do all of the above.  The .php source will have to be pulled out of each scriptlet, and fed to another parser, which can parse the PHP.  Strange as it may sound, this is not actually as difficult as it seems.  It's not hard to imagine doing something similar with legacy .asp pages.

There are perfectly legitimate reasons to convert source from one language to another.  For example, an organization may have  significant investment in an application that works, but is in an outdated language such as ASP.  Re-writing the application is an option, however it's usually an expensive option.  Conversion from one language to another might be cheaper, and approaches of that sort have been used before.

The tree of ANTLR4 grammars didn't have a HTML parser, and I like ANTLR, so I wrote an HTML grammar for ANTLR4 which, I believe, does all of the above.  You can take a look here.

In order to show the parser working, I wrote a quick java program that reads an HTML input file and dumps all scripts and styles to the console.  It's here.

If you're interested to see what the generated AST looks like for an HTML page, here's the front page of reddit this morning, as an AST.

Bioinformatics data

I recently had a chance to learn a little about Bioinformatics, and ended up browsing the NIH's database of genomes here.  Inside the genome data for any particular strain of a species, you'll find various files with file extensions like "ffa", "fna", "ffn" and "frn".  These are FASTA files.

If you'd like an example, here's the genomic data for a certain strain of E-coli.

The file format of FASTA files is described pretty well on the Wikipedia link.  I immediately wondered how difficult it would be to read the entire files and import them into a relational database.  The difficult part of this work is, of course, parsing the FASTA files.  In order to support that, I wrote an ANLTR4 grammar for FASTA files.  The result is here.  Once the parser is built, it's trivial to walk the AST and insert appropriate rows.

If you're interested, the human genome is here, listed by chromosome.  However, those files are in GenBank format, which is a grammar for another day.

Update: the link to the source on the Antlr4 git: antlr/grammars-v4