6502 Assembler

Back in the dark ages (the 1980's), people like myself coded on Apple][ computers.  If you were good you coded in Applesoft BASIC or Integer BASIC. If you were really geeky you coded in Assembly language on the 6502 processor.  The Apple][ OS was coded in assembly language, so if you really wanted to understand what was going on inside your computer, you needed to learn assembly language and you needed to learn about the 6502.

Obviously, the first thing you would do was write your own code to read and write disks.  It was a big deal in those days to be able to copy 5.25" floppies that had games on them.  For the serious geeks, it was much less interesting to play the games, and much more interesting to figure out how to copy them.  One technique game manufacturers used was to write data between the tracks, by moving the drive arm 1/2 tracks and 1/4 tracks.  Another technique was to modify the sector header bytes from the usual $D5 $AA $96 to something else.  This change disabled standard Apple DOS from finding the sectors, and therefore made them unreadable to anyone other than the manufacturer of the game.

In order to be able to inspect disks, read sectors, read between tracks and so on, I wrote a small program that would enable me to inspect disks easily.  I was young, so I simply called the program "M".  You can see the source here.  It's most likely that I used the LISA assembler, judging by the syntax of the source code.  "M" allowed me to put a disk in the floppy drive and then using keyboard commands navigate through the disk and look at the contents, at a byte level.

There are lots of other 6502 assembler programs around the internet, including a nice archive at 6502.org.  I had a lazy Saturday afternoon, so I decided to write an ANTLR4 grammar for LISA assembler.  You can see it here.  This grammar produces Java or C# (if you have the C# ANTLR Target) code which can parse LISA assembler.  It's the first step to writing a Java or C# assembler for 6502 assembly code.

ANTLR has a useful feature where it can parse input and produce LISP-like output showing the fully parsed program.  This feature is primary useful for debugging; it's an easy way to look at the AST in text format.  Here is the LISP-like output from running my ANTLR grammar on "M".  Of course, the parser and lexer that ANTLR produces in Java or C# does not output this LISP-like string, it produces an AST. A proper assembler would then walk that AST and output binary opcodes.

Here is an example of Bubble sort, coded in assembler.  This was cut-pasted from 6502.org.

;THIS SUBROUTINE ARRANGES THE 8-BIT ELEMENTS OF A LIST IN ASCENDING
;ORDER. THE STARTING ADDRESS OF THE LIST IS IN LOCATIONS $30 AND
;$31. THE LENGTH OF THE LIST IS IN THE FIRST BYTE OF THE LIST. LOCATION
;$32 IS USED TO HOLD AN EXCHANGE FLAG.

SORT8 LDY #$00 ;TURN EXCHANGE FLAG OFF (= 0)
STY $32
LDA ($30),Y ;FETCH ELEMENT COUNT
TAX ; AND PUT IT INTO X
INY ;POINT TO FIRST ELEMENT IN LIST
DEX ;DECREMENT ELEMENT COUNT
NXTEL LDA ($30),Y ;FETCH ELEMENT
INY
CMP ($30),Y ;IS IT LARGER THAN THE NEXT ELEMENT?
BCC CHKEND
BEQ CHKEND
;YES. EXCHANGE ELEMENTS IN MEMORY
PHA ; BY SAVING LOW BYTE ON STACK.
LDA ($30),Y ; THEN GET HIGH BYTE AND
DEY ; STORE IT AT LOW ADDRESS
STA ($30),Y
PLA ;PULL LOW BYTE FROM STACK
INY ; AND STORE IT AT HIGH ADDRESS
STA ($30),Y
LDA #$FF ;TURN EXCHANGE FLAG ON (= -1)
STA $32
CHKEND DEX ;END OF LIST?
BNE NXTEL ;NO. FETCH NEXT ELEMENT
BIT $32 ;YES. EXCHANGE FLAG STILL OFF?
BMI SORT8 ;NO. GO THROUGH LIST AGAIN
RTS ;YES. LIST IS NOW ORDERED

The LISP-ish output produced by parsing this with my grammar looks like:

(prog (line (comment ;THIS SUBROUTINE ARRANGES THE 8-BIT ELEMENTS OF A LIST IN ASCENDING)) \n (line (comment ;ORDER.  THE STARTING ADDRESS OF THE LIST IS IN LOCATIONS $30 AND)) \n (line (comment ;$31.  THE LENGTH OF THE LIST IS IN THE FIRST BYTE OF THE LIST.  LOCATION)) \n (line (comment ;$32 IS USED TO HOLD AN EXCHANGE FLAG.)) \n \n (line (instruction (label (name SORT8)) (opcode LDY) (argumentlist (argument (prefix #) (number $00))) (comment ;TURN EXCHANGE FLAG OFF (= 0)))) \n (line (instruction (opcode STY) (argumentlist (argument (number $32))))) \n (line (instruction (opcode LDA) (argumentlist (argument ( (argument (number $30)) )) , (argumentlist (argument (name Y)))) (comment ;FETCH ELEMENT COUNT))) \n (line (instruction (opcode TAX) (comment ; AND PUT IT INTO X))) \n (line (instruction (opcode INY) (comment ;POINT TO FIRST ELEMENT IN LIST))) \n (line (instruction (opcode DEX) (comment ;DECREMENT ELEMENT COUNT))) \n (line (instruction (label (name NXTEL)) (opcode LDA) (argumentlist (argument ( (argument (number $30)) )) , (argumentlist (argument (name Y)))) (comment ;FETCH ELEMENT))) \n (line (instruction (opcode INY))) \n (line (instruction (opcode CMP) (argumentlist (argument ( (argument (number $30)) )) , (argumentlist (argument (name Y)))) (comment ;IS IT LARGER THAN THE NEXT ELEMENT?))) \n (line (instruction (opcode BCC) (argumentlist (argument (name CHKEND))))) \n (line (instruction (opcode BEQ) (argumentlist (argument (name CHKEND))))) \n (line (comment ;YES. EXCHANGE ELEMENTS IN MEMORY)) \n (line (instruction (opcode PHA) (comment ; BY SAVING LOW BYTE ON STACK.))) \n (line (instruction (opcode LDA) (argumentlist (argument ( (argument (number $30)) )) , (argumentlist (argument (name Y)))) (comment ; THEN GET HIGH BYTE AND))) \n (line (instruction (opcode DEY) (comment ; STORE IT AT LOW ADDRESS))) \n (line (instruction (opcode STA) (argumentlist (argument ( (argument (number $30)) )) , (argumentlist (argument (name Y)))))) \n (line (instruction (opcode PLA) (comment ;PULL LOW BYTE FROM STACK))) \n (line (instruction (opcode INY) (comment ; AND STORE IT AT HIGH ADDRESS))) \n (line (instruction (opcode STA) (argumentlist (argument ( (argument (number $30)) )) , (argumentlist (argument (name Y)))))) \n (line (instruction (opcode LDA) (argumentlist (argument (prefix #) (number $FF))) (comment ;TURN EXCHANGE FLAG ON (= -1)))) \n (line (instruction (opcode STA) (argumentlist (argument (number $32))))) \n (line (instruction (label (name CHKEND)) (opcode DEX) (comment ;END OF LIST?))) \n (line (instruction (opcode BNE) (argumentlist (argument (name NXTEL))) (comment ;NO. FETCH NEXT ELEMENT))) \n (line (instruction (opcode BIT) (argumentlist (argument (number $32))) (comment ;YES. EXCHANGE FLAG STILL OFF?))) \n (line (instruction (opcode BMI) (argumentlist (argument (name SORT8))) (comment ;NO. GO THROUGH LIST AGAIN))) \n (line (instruction (opcode RTS) (comment ;YES. LIST IS NOW ORDERED))) \n)

If you have Apple][ code on floppies of your own and you wish to retrieve it, I used a device from here. It worked very well.

TNT

If you haven't read this book, I highly recommend it.  I discovered it in high school and finally purchased my first copy at the now-gone Duthie Books in Kitsilano.  Without going into the details of the book, the author uses a simple Peano arithmetic called Typographical Number Theory  (TNT) to illustrate some of his points.

An example expression in TNT could look like this (from Wikipedia):

∀a:∀b:(a + b) = (b + a)

Which means "for every number a and every number b, a plus b equals b plus a"

I decided to write a simple ANTLR grammar for TNT, which you can find here.

Parsing HTML

I've been interested in HTML parsing for a while now.  There are a number of reasons to do this, such as:

  • Validating that what claims to be HTML, is HTML
  • Finding every style sheet and script in an HTML file
  • Pretty-printing
  • Syntax highlighting
  • Linting
  • Translating between markup languages, for example generating JSPs from PHP, or perhaps generating JSPs from ASPs.

One of the most difficult aspects of modern web programming is that any example server-side markup file likely contains 4 programming languages:

So, if you're going to write an HTML parser, you need to be able to not only parse the HTML, but also to find the style and script sections, and pull them out.  You also need to be able to find the scriptlets where the markup is generated.

Additionally, there is the fact that modern HTML is messy.  It's perfectly valid to have missing end-tags, or attribute values that aren't quoted.  These edge cases just add to the difficultly in writing the parser.

If the end goal is to read .php source and emit similar .jsp source, then one needs an HTML parser that can do all of the above.  The .php source will have to be pulled out of each scriptlet, and fed to another parser, which can parse the PHP.  Strange as it may sound, this is not actually as difficult as it seems.  It's not hard to imagine doing something similar with legacy .asp pages.

There are perfectly legitimate reasons to convert source from one language to another.  For example, an organization may have  significant investment in an application that works, but is in an outdated language such as ASP.  Re-writing the application is an option, however it's usually an expensive option.  Conversion from one language to another might be cheaper, and approaches of that sort have been used before.

The tree of ANTLR4 grammars didn't have a HTML parser, and I like ANTLR, so I wrote an HTML grammar for ANTLR4 which, I believe, does all of the above.  You can take a look here.

In order to show the parser working, I wrote a quick java program that reads an HTML input file and dumps all scripts and styles to the console.  It's here.

If you're interested to see what the generated AST looks like for an HTML page, here's the front page of reddit this morning, as an AST.

Bioinformatics data

I recently had a chance to learn a little about Bioinformatics, and ended up browsing the NIH's database of genomes here.  Inside the genome data for any particular strain of a species, you'll find various files with file extensions like "ffa", "fna", "ffn" and "frn".  These are FASTA files.

If you'd like an example, here's the genomic data for a certain strain of E-coli.

The file format of FASTA files is described pretty well on the Wikipedia link.  I immediately wondered how difficult it would be to read the entire files and import them into a relational database.  The difficult part of this work is, of course, parsing the FASTA files.  In order to support that, I wrote an ANLTR4 grammar for FASTA files.  The result is here.  Once the parser is built, it's trivial to walk the AST and insert appropriate rows.

If you're interested, the human genome is here, listed by chromosome.  However, those files are in GenBank format, which is a grammar for another day.

Update: the link to the source on the Antlr4 git: antlr/grammars-v4

Reading Paradox Files

Back in the day, Paradox was a pretty amazing database.  I recently had a reason to read some Paradox files for a friend, who had a client with a Paradox database.  They needed their data out of the database to insert into something more modern.

Googling for the file format of a Paradox DB didn't turn up much, other than this excellent document written in 1996.  It was enough to give me a good start.  I also found some sample DB files, interestingly from the Paradox documentation.

The end result was paradoxReader, which you can download here.  It handles most of the data types other than BLOBs, so far.  The documentation on how to use it is here.  It uses the visitor pattern, which means all you need to do is pass it an InputStream to a .db file, and implement a single interface which is called once per row, with the row data.

The default implementation currently outputs CSV for each table, however it wouldn't be difficult to have it output SQL or even just connect to a JDBC database and insert the data into a table.

Bootstrapping pkgng on FreeBSD ARM

There is no official pkgng repo for FreeBSD-arm, yet.  There is an unofficial one, here, however in order to use it, you have to have pkgng installed.  As far as I can tell, the only way to install pkgng on ARM is to builds and install from source.
On the platform I'm using, Wandboard, the mmc device isn't working 100% yet, so I decided to compile on a RAM disk.  I did this to create the RAM disk

mkdir /mnt/tmpdisk
mount -t tmpfs tmpfs /mnt/tmpdisk
cd /mnt/tmpdisk

The second step is:

ftp ftp://ftp.freebsd.org/pub/FreeBSD/distfiles/pkg-1.1.4.tar.xz

Once you have the source, untar it, build and install

tar zxvf pkg-1.1.tar.xz
cd pkg-1.1
make
make install

In my case, "make install" on FreeBSD-Current failed due to lack of certain directories.  This may help:

mkdir /usr/local/lib
mkdir /usr/local/man
mkdir /usr/local/libdata
mkdir /usr/local/sbin
mkdir /usr/local/man/man8

Once you've installed pkgng, you should be able to verify that it's available

root@wandboard:/mnt/tmpfs/pkg-1.1.4 # /usr/local/sbin/pkg -v
1.1.4

From here, there are a couple of options.

  • You can use the unofficial repo provided here.
  • You can download the packages you need from here, and install them.

Building my own wireless point

I got interested in building my own wireless point after seeing some of the wireless firmware issues like this.  Besides, I've always been interested in embedded devices and FreeBSD.

So, the first step was a device.  I chose to use a Wandboard.  I'm a committer to Crochet-FreeBSD, so I built out the device support for Crochet-FreeBSD.  You can take a look here.

For the wireless interface I used an Cisco AE1000 wireless interface. The AE1000 uses the run driver.

Starting the wireless interface and scanning for wireless points looks like this

ifconfig wlan0 create wlandev run0
ifconfig wlan0 up scan

On this board I have two interfaces:

  • ffec0.  The wired interface
  • run0.  The Cisco USB wireless

ffec0 is configured to get an IP, gateway and DNS via DHCP, in /etc/rc.conf

ifconfig_ffec0="DHCP"

I had these design criteria.

  • I already have a DHCP server, so I didn't want to assign IP leases on the wireless point; I want to delegate to my existing DHCP server
  • I prefer to use WPA Personal for authentication
  • I'd like to install as little software as possible; this doesn't need to be complicated
  • It would be great to automatically firewall any IPs that fail to log in more than a couple times
  • A simple web administration interface would be very helpful

Of course, I'm not interesting in connecting to an existing wireless point, instead I want to be the wireless point.   I need only one piece of software installed to function as a wireless point; hostapd.  Fortunately hostapd is part of the base FreeBSD install.

There are a couple kernel features I needed, so I loaded them at boot time.  My /boot/loader.conf looks like:

console="comconsole"

#pf
pf_load="YES"
pflog_load="YES"
pfsync_load="YES"

#altq
alq_load="YES"

#wlan
wlan_wep_load="YES"
wlan_ccmp_load="YES"
wlan_tkip_load="YES"
wlan_acl_load="YES"
wlan_xauth_load="YES"

# run driver
if_run_load="YES"
runfw_load="YES"

# bridge
if_bridge_load="YES"
if_bridgestp_load="YES"

# set wandboard to use 1 cpu
hw.ncpu=1

These options give me various wlan capabilties, the pf devices, the bridge device, and altq.  I've also loaded the kernel module for the run driver.

The strategy I want to use for leveraging my existing DHCP server and existing network is to configure my wireless point as a transparent proxy. The bridge device provides me exactly what I want, by enabling me to bridge the ffec0 and run0 interfaces.

My /etc/rc.conf includes:

# hostname
hostname="wandboard"

# services
ntpdate_enable="YES"
sshd_enable="YES"
hostapd_enable="YES"

# pf
pf_enable="YES"
pf_rules="/etc/pf.conf"
pflog_enable="YES"
pflog_logfile="/var/log/pflog"

# lan
ifconfig_ffec0="DHCP"

# turn off sendmail
sendmail_submit_enable="NO"
sendmail_outbound_enable="NO"
sendmail_msp_queue_enable="NO"

# wireless
wlans_run0="wlan0"
create_args_wlan0="wlanmode hostap mode 11g"
ifconfig_wlan0="ssid snagglepuss11 channel 11"

# bridge
cloned_interfaces="bridge0"
ifconfig_bridge0="addm ffec0 addm wlan0 up"

This configuration sets up the lan interface on DHCP, the wifi interface as an 11g access point on channel 11, and then bridges the interface.  At this point, we have a working wifi interface.  However, it's not secured yet.

My /etc/hostapd.conf file, the configuration file for hostapd looks like this

interface=wlan0
logger_syslog=-1
logger_syslog_level=2
debug=1
ctrl_interface=/var/run/hostapd
ctrl_interface_group=wheel
ssid=snagglepuss1
wpa=1
wpa_passphrase=xxxx
wpa_key_mgmt=WPA-PSK
wpa_pairwise=CCMP TKIP

It's pretty simple; I have WEP authentication on the interface wlan0, with the ssid khublacom1.

Finally, I decided to implement some simple packet filtering.  /etc/pf.conf looks like this:

# interfaces
lan_if="ffec0"
wifi_if="wlan0"

# options
set block-policy return
set optimization conservative

# normalization
scrub in all
scrub out all

# anti-spoof
antispoof for $lan_if inet

# pass on lo
set skip on lo

# default, deny everything
block in log all

# out is ok
pass out quick

# pass inet4 and inet6 traffic in on wifi and lan
pass in on $wifi_if inet
pass in on $wifi_if inet6
pass in on $lan_if inet
pass in on $lan_if inet6

# icmp all good
pass out inet proto icmp from any to any keep state
pass in quick inet proto icmp from any to any keep state

I allow all IP4 and IP6 traffic in on the wifi interface.

I don't have a web interface yet; I've had some trouble reliably compiling on the current builds of FreeBSD ARM.  However, I'm sure that'll be worked out shortly.