Content tagged lisp

Back to Clones

posted on 2022-03-20 20:34:00

Ancient History

When I last wrote about clones, I was 32 and still working at Showcase IDX. I never got around to finishing clones and in fact worked on rawbones with my dear friend James Dabbs for a spell while teaching at the Flatiron School. By my count I have something like 4 half-finished NES emulators now.

  • famiclom - My original half-hearted attempt in CL when I had no idea what I was doing.
  • nescavation - A NES emulator I hacked up while learning to teach Javascript.
  • clones - A better thought out attempt in CL.
  • rawbones - The ReasonML version I wrote with James.

I seem to write one whenever I get bored and with any luck I'll wind up finishing one of them sooner or later. Nescavation and Famiclom really never got close to running games, clones and rawbones both got much closer to playable territory but I never got background scrolling right. I still find it a bit funny that famiclom gets more attention than my later, improved efforts like clones or rawbones. (Probably because cl-6502 mentions it and achieved a little notoriety.)

Getting to a playable state has never been the point though. These projects have been part learning exercise, part avenue for exploring literate programming, and often just a fun project to noodle with for my own entertainment. I still like the idea that a fast and reasonably accurate emulator can be written in a concise, clear way with a garbage-collected language.

Present Day

Recently, I got the itch again and so I decided to start fresh with clones. There are a few interesting changes this time around. When I made cl-6502, creating a readable document from the program was a primary goal and resulted in a literate book. This ethos never quite made the transition from the CPU stage to the full system emulators. This time I'll be leaning heavily into that spirit using mgl-pax. I'll also be testing with try and relying as heavily as I can on CPU and PPU test roms.

This is all happening in the "once-more-with-feeling" branch on sourcehut. So far there isn't a lot there though I'm on vacation starting in 6 days so I'm hoping to get ROM parsing and a basic structure for stepping the CPU in place to crank through NEStest. I do have some nice automation set up though. Every push runs the test suite and deploys the docs. I also have a very basic twitch stream working in case I want to indulge in the silliness of coding on camera.

For now, here's a look at the .build.yml file that powers the CI on sourcehut. It really isn't harder to set up an automatation pipeline for a CL app than anything else. Here's to working on fun projects again. More soon. 👋

image: alpine/latest
oauth: pages.sr.ht/PAGES:RW
environment:
  site: clones.kingcons.io
packages:
  - sbcl
sources:
  - https://git.sr.ht/~kingcons/clones
tasks:
  - install-quicklisp: |
      curl -O https://beta.quicklisp.org/quicklisp.lisp
      sbcl --non-interactive \
           --eval "(load \"~/quicklisp.lisp\")" \
           --eval "(quicklisp-quickstart:install)" \
      mkdir -p ~/quicklisp/local-projects/
  - test: |
      ln -sf ~/clones ~/quicklisp/local-projects/clones
      sbcl --non-interactive \
           --eval "(load (merge-pathnames \"quicklisp/setup.lisp\" (user-homedir-pathname)))" \
           --eval "(ql:quickload '(clones clones/test))" \
           --eval "(unless (try:passedp (try:try 'clones.test:test-all)) (uiop:quit 1))"
  - build-site: |
      cd clones
      echo 'Building site'
      sbcl --non-interactive \
           --eval "(load (merge-pathnames \"quicklisp/setup.lisp\" (user-homedir-pathname)))" \
           --eval "(ql:quickload '(clones mgl-pax/document))" \
           --eval "(clones.docs::build-site)"
      mv ~/clones/site/clones.html ~/clones/site/index.html
      tar -C site -cvz . > site.tar.gz
      acurl -f https://pages.sr.ht/publish/$site -Fcontent=@site.tar.gz
      rm site.tar.gz

Things I learned writing Clones

posted on 2018-07-29 12:08:00

wip

I'm turning 32 in a week so thank goodness I'm finally making progress on clones. After my last post, I didn't work on clones for 7 months. Then in May, I just sat down and started hacking. Despite some gaps, there has been steady progress.

There's still a lot I want to do and audio isn't implemented so that's next, but for now I'm going to try to summarize the current status and some of the lessons I've learned thus far.

Current Status

Clones, At Present

The Clones CPU emulation is finished and tested and there is support for input handling and basic graphics support (backgrounds and sprites, scrolling is next). A lot of what determines compatibility for a NES emulator comes down to mapper (cartridge) support and the accuracy of the PPU support. In that regard, clones supports NROM, UNROM, and MMC1 though UNROM and MMC1 have some issues that need ironing out once scrolling is finished.

The circuit board used in NES cartridges actually varied and added additional capabilities to the console, primarily a paging system for switching banks in and out of memory to allow for larger levels, more artwork, more game code, etc. The different cartridge types were called mappers. Thankfully, 6 different mappers accounted for something like 80% of all games commercially available in the US. As a result, mapper support is a big deal since you can't play a game without the matching cartridge support.

Clones, In the Future

The first priority is fixing some sprite glitches and getting scrolling implemented. Once that's done, fixing up the lingering issues with UNROM and MMC1 will take precedence. Once Mega Man 2 is booting, then I'll start work on the audio.

After that's done the real fun begins. I have all sorts of ideas and ambitions for how to build a Control Flow Graph of the game dynamically while it executes and then let the player annotate the structure and save it for later revision. I want to be able to reverse engineer old games interactively and am wondering how much the computer can help in the process with the use of Constraint Logic Programming tools like screamer. In general, I'm interested in how we can examine shipped binaries at runtime as a teaching tool for how the software and hardware work.

More on this soon, I hope. 🙏

Lessons Learned

Test ROMs == Joy

There are many test roms for ensuring that various components in your NES behave accurately. I found it particularly useful to write the memory interface, addressing modes, disassembler, and a stepper for the CPU with no instructions implemented. Then I had a unit test which looped over a verified correct log for a ROM called "nestest" which exhaustively checks the operation of all legal CPU instructions. After I could run the test until it failed, implement a single instruction, and re-run. I had all 56 instructions with their various opcodes written in a day. Super pleasant!

It helped that I'd written a CPU emulator before, of course. This process required having a good idea up front about how I wanted to interact with memory, represent addressing modes, and execute instructions. If you don't understand those pieces though, you'll run headlong into them while trying to implement an emulator anyhow so start there. Spend some time reading nesdev wiki or asking questions online if you need to. 🤘

A Good Macro-Friendly Decomposition for Instructions

There is a bit of a Gordian Knot in the CPU in how the addressing modes and different opcodes interact if you want to define each instruction exactly one time without a mess of switch statements for the different variations. In short:

  1. Addressing modes should only access CPU and Memory to compute an address. Any cycle counting (e.g. for crossing pages) can be done at the call site with macros!

  2. Your opcode-defining macro should set up address and argument variables, or an update function as needed based on the access pattern of the instruction.
    This has bitten me on previous attempts as I assumed the access pattern came from the addressing mode rather than the instruction itself. Instructions can be implied and use no argument, or only use the address and jump to it, or read an argument from the address, or write a value to an address, or read a value, modify it and write it back. It was very worthwhile to split these cases out and handle them independently. It meant a little extra work while writing up the metadata but kept concerns separated later.

  3. Separating the opcode metadata from the actual instruction definition. This is more arbitrary than the earlier recommendations but it felt very clean while hacking the opcode definitions and I think I only found myself going back to edit the instruction metadata one time from making a typo.

PPUs == Pain

PPU stands for Picture Processing Unit and it was the graphics card in the original NES. The central innovation of the PPU was that it supported pixel-level scrolling of levels.

I have no experience in graphics or game programming so this was a big challenge for me. Four other factors contributed to the difficulty of writing the PPU:

  1. There isn't really a suggested ordering of tasks like there is for writing a CPU emulator and subtasks are hard to test in isolation.
  2. The data layout of the PPU is complex: OAM, nametables, pattern tables, attribute tables, etc.
  3. The test ROMs that exist generally assume you have the basics working.
  4. While many working open source emulators exist, the control flow was difficult to follow without getting lost in the weeds and it was unclear why different calculations were used.

I'll try to tackle these briefly and write up more details at a later date.

A Suggested PPU Ordering

First, you need an object to represent the hardware state. It'll need to access the currently loaded game ROM for graphics data so remember to give it a slot for storing the cartridge object.

Second, you'll need to implement the PPU memory map. There's no operating system on the NES so there are no video card drivers and you'll do everything yourself via Memory Mapped I/O. If you've never heard of memory mapped I/O, the idea is that reading and writing to specific addresses in memory directly manipulates the PPU so write those methods and wire it up!

Third, you'll want to get the timing synchronized between the CPU and PPU. You'll want to do this before trying to render graphics probably as many games wait for an interrupt called vblank from the PPU that the graphics card is ready before even reaching the title. Many games will infinite loop until the PPU wakes them up with this interrupt, then do the work needed to render the next frame and return to the infinite loop. This is part of why it's so important to get the timing right.

Fourth, you'll want to make sure the address computations are right. This was the single hardest bit of code for me to get right in the PPU. It's also the code I'm happiest with and hoping to figure out how to test in an automated way for next time.

Fifth, try to just render the backgrounds using the addressing logic you arrived at ealier. If you can get backgrounds rendering correctly, you should be well on your way to getting sprites and scrolling working. With any luck, the PPU operation should start becoming clearer.

PPU Data Flow

Internally the graphics are represented as 8x8 tiles that are either sprites or backgrounds. Crucially, the information needed to render those tiles is divided up into the different areas inside the PPU: nametables, attribute tables, the palette, and pattern tables (in the ROM).

Nametables represent the background and are 960 byte long arrays where each byte is an index into the pattern table for an 8x8 tile. Why 960 bytes you ask? Because the NES resolution is 256 by 240 and if you divide that by 8 (pixels in a tile) you get 32 x 30. 32 * 30 = 960.

So nametables point to the "pattern" or texture that will be used for a given tile but for space reasons that pattern doesn't actually store all the information about what color it should be. The pattern table is 4kb and holds 256 tiles with each 8x8 tile taking 16 bytes to store. Those 16 bytes are enough for each pixel to get two bits to represent a color ... so 4 options.

The PPU has a 64 byte palette table to select 32 colors for the background and 32 colors for the sprites. But why bother when each pixel in a pattern can only count from 0-3? Well, did it seem a little odd that the Nametable was 960 bytes? That's because the last 64 bytes in that kilobyte are used to store something called an attribute table. Every 16 tiles share a single attribute byte which determines the top 2 bits of the palette index for tiles in groups of 4. There are implications from this about how many colors can be represented in a 16x16 pixel area of the screen, on a scanline, etc.

It's pretty confusing until you sit down and draw it all out. A lot of the calculations for the PPU are exactly this sort of thing. You can just imagine the hardware designers saying: "But how do we do it with less RAM to bring the price down?"

This has been written up well elsewhere, Scott Ferguson's blog comes to mind. But I still never found a high level description of how the PPU renders that wasn't based on perfectly emulating the state of a bunch of internal shift registers and latches and running the PPU cycle by cycle. And, pardon my french, but that's fucking gross. Not because it's inaccurate or slow or anything like that but just because it's hard to see the forest for the trees.

Here's something like how I think of background rendering now. Sprites are more complicated but follow the same basic framework:

  1. For each scanline, you have to draw 32 tiles.
  2. For a (background) tile, you have to get the nametable byte and attribute byte.
  3. The pattern table has 16 bytes for a tile but we only need two for the 8 pixels on a scanline.
  4. Go ahead and extract the two high bits of the palette index from the attribute byte.
  5. Now loop for pixels 0 to 7 and combine those bits with the bits in the pattern to get a color.
  6. Show the backdrop color if it's zero (transparent), otherwise show the background color.

I know this is inaccurate, but it's clear to follow at a high level and if you then pointed out the various address computations in the substeps it ought to be pretty straightforward.

A lot of my remaining questions concern how to support scrolling at least kinda correctly without basing everything off internal registers and how to render things tile by tile instead of pixel by pixel. But I may abandon that because it was mostly to avoid repeated fetches of the same data and I recently made a RENDER-CONTEXT object that can help with that. Maybe down the road at some point I'll make a cycle-accurate PPU. :)

Test ROMs

There aren't really good test ROMs because the ROMs that exist mostly assume you have the basics working and are testing tricky details. While I don't think a test ROM could be written since a lot of what needs testing are internal details of the PPU that weren't exposed to NES programmers, I do think a ROM coupled with some JSON dumps of what internal data should be visible after rendering for a frame or two would be incredibly useful. Because at some point I spent 3 days on a single bug because I wasn't incrementing a counter appropriately.

I'd like to think on this some more but I have some basic ideas. A lot of the difficulty is that in the 90s there were working emulators that did more abstract high level emulation both because PCs were less powerful and because less was known about the underlying hardware. That required workarounds of various sorts for accuracy and so they're frowned upon now. But as a result, I haven't found much high-level documentation of the rendering algorithm in the PPU. Everyone seems to point back to a frame timing diagram on the nesdev wiki. Which is great but I was hoping to write a 2500 lines of code readable NES implementation that doesn't require a solid understanding of latches and assembly to get a basic idea of how the thing worked.

But I believe the address calculations, among other things, can be expressed clearly (and tested!) in terms of the X,Y coordinates to be rendered as opposed to internal registers. More soon...

Hard to Follow Implementations

I'm still unsatisfied with my PPU implementation but it also isn't completely finished. I hope to have more to show here after scrolling is working and some refactoring is done.

A Word On Performance

The output resolution of the Nintendo was 256x240. At a high level, all the PPU is doing is looping from left to right (0-255), top to bottom (0-240) and deciding on a color for the current pixel, then outputting it once per frame. Of course, it has to do that 60 times a second and 256 * 240 * 60 is 3.6 million so pixel rendering needs to be pretty fast. I didn't have to do any optimizing to hit 60 frames per second but I was careful to write code that didn't allocate as I went and we're still using 50% CPU which is definitely more than I'd like.

Til Next Time

Wish me luck, lispers. Cheers. <3

Going Faster with Lisp

posted on 2017-09-17 16:10:00

For the first time in 3+ years, I'm working in earnest on a hobby project.

It feels like coming home, to be writing lisp and blogging again. Once again I'm playing with Nintendo emulation, don't ask why it's captured my imagination so. I'd like to briefly discuss what's brought me back and then we'll talk about what I learned writing lisp today.

My Absence

I haven't really worked on hobby projects since mid 2014. Even then my output was reduced substantially from 2012 when I lived alone and cl-6502/coleslaw had my full attention. I never stopped wanting to learn more, or loving lisp specifically, I just lost the energy to pursue it. That was due to (in rough order): Work, burnout, my relationship, and buying a house. Where burnout == a curiously strong blend of exhaustion, impostor syndrome, and unclear goals. It was scary at times when I wondered if I had lost my passion or commitment but ultimately good for me.

A lot of why I stalled out had to do with my old Nintendo emulator. I had made some bad assumptions, especially about how memory worked, due to not knowing much about systems programming or hardware when I started and didn't want to throw away everything I had to start fresh. cl-6502 had also felt very public so when progress had stalled before even being able to play a game that was quite embarrassing. I also didn't really know about test ROMs until way too late in the going.

But time heals all wounds and I have plenty of ideas. So here we are.

Design Changes

With cl-6502, I just focused on the CPU since that was something I had an inkling of how to approach. My biggest mistake was treating RAM as a 64k element array. The actual Nintendo used Memory Mapped I/O to talk to the graphics and sound cards. The only way to support that in famiclom was to overwrite the routines that read and wrote to RAM in cl-6502. It was unacceptable to me from both a design and performance perspective.

This time around, I'm using a separate object to represent the Memory Map so that when an CPU instruction reads or writes to an address, it'll actually get handled by the right part of the system: the RAM, Video Card, Sound, or cartridge. I'm also going to be focused on using test ROMs through as much of the process as I can. I'll write more about that in a future article but, long story short, TDD is hard to do when writing an emulator.

Lisp, Fast, OO: Pick 3

I managed to get cl-6502 running fast enough last time around but it was still 100x slower than Ian Piumarta's lib6502 written in C. There's no reason that has to be the case, I simply didn't know how to approach optimizing Lisp. I would use SBCL's statistical profiler, sprinkle compiler declarations in my code, re-profile, and pray. Today I'd like to focus on a few tricks for figuring out if declarations are helping or not and getting friendly with your disassembler. I'll also talk a little about why I wound up going with DEFSTRUCT over DEFCLASS.

lol, dis

Profilers are great for helping you figure out what parts of your code you spend the most time in. Once you've identified a function that needs to go fast, the next step is usually to add an optimize declaration. Something like:

(declare (optimize (speed 3) (safety 1))) ; or even (safety 0)

Recompiling the function afterward will result in the compiler printing out notes about what tripped it up while compiling the code. One thing I didn't realize back when I was working on cl-6502 (but seems obvious in retrospect) is that you can include optimize and type declarations in methods too! That said, it can be a pain to constantly write out different optimize and type declarations, recompile, and call disassemble on the code to see differences in the output. Additionally, there is not a portable way to disassemble methods, only their generic functions which is really just the dispatch machinery and not the work that you're interested in.

While Doug Hoyte's book Let Over Lambda is a bit controversial among lispers, he offers some good advice and good tools for remedying these points in Chapter 7. In particular, he supplies a read macro to quickly enable maximum optimization in a method or function and a regular macro to allow testing out type declarations effect on an anonymous function quickly at the REPL. I've taken the liberty of adding both to my .sbclrc file so I have easy access to them when I'm trying things out.

(defun enable-sharpf-read-macro ()
  (set-dispatch-macro-character #\# #\f
    (lambda (stream sub-char numarg)
      (declare (ignore stream sub-char))
      (setf numarg (or numarg 3))
      (unless (<= numarg 3)
        (error "Invalid value for optimize declaration: ~a" numarg))
      `(declare (optimize (speed ,numarg)
                          (safety ,(- 3 numarg)))))))

(defmacro dis (args &rest body)
  (flet ((arg-name (arg)
           (if (consp arg)
               (cadr arg)
               arg))
         (arg-decl (arg)
           (if (consp arg)
               `(type ,(car arg) ,(cadr arg))
               nil)))
    (let ((arglist (mapcar #'arg-name args))
          (declarations (mapcar #'arg-decl args)))
      `(disassemble
         (lambda ,arglist
           (declare ,@(remove nil declarations))
           ,@body)))))

I also dug around to see if there was a way to get disassembly for a single method and found a helpful thread on Google Groups from which I built a little function for disassembling the "fast-function" commonly invoked for a method.

(defun disasm-method (name specializers)
  "E.g. (disasm-method 'package:generic-fun '(class t))"
  (let* ((method (find-method name nil specializers))
         (function (sb-mop:method-function method))
         (fast-function (sb-pcl::%method-function-fast-function function)))
    (disassemble fast-function)))

A "Practical" Example

All code for this section is on the ground-floor branch on Github

Today I was working on memory mappers / cartridges for the NES emulator. Let's look at how I used these tools to optimize a method on the simplest mapper NROM. (Used in titles like Donkey Kong and the original Super Mario Brothers.) The method we'll be looking at is called load-prg. Put simply, it takes an address and loads a byte from the PRG section of the game cartridge.

Since any game will load data from the cartridge a lot we really want this to be a fast operation. And since it's loading from a static array, we would hope we can get this down to a handful of assembly instructions. Here's my initial implementation:

(defmethod load-prg ((mapper nrom) address)
  (let ((rom (mapper-rom mapper)))
    (if (= 1 (rom-prg-count rom))
        (aref (rom-prg rom) (logand address #x3fff))
        (aref (rom-prg rom) (logand address #x7fff)))))

You can see it takes an NROM mapper and an address and, based on the number of PRG banks in the cartridge, does a little math on the address and accesses the PRG with AREF. Let your eyes skim over the unoptimized disassembly:

CL-USER> (disasm-method #'clones.mappers::load-prg '(clones.mappers::nrom t))
; disassembly for (SB-PCL::FAST-METHOD CLONES.MAPPERS:LOAD-PRG
                   (CLONES.MAPPERS::NROM T))
; Size: 280 bytes. Origin: #x2290E8F5
; 8F5:       498B4C2460       MOV RCX, [R12+96]               ; no-arg-parsing entry point
                                                              ; thread.binding-stack-pointer
; 8FA:       48894DF8         MOV [RBP-8], RCX
; 8FE:       498B5805         MOV RBX, [R8+5]
; 902:       48895DE0         MOV [RBP-32], RBX
; 906:       8D43FD           LEA EAX, [RBX-3]
; 909:       A80F             TEST AL, 15
; 90B:       0F85F3000000     JNE L11
; 911:       8B4B01           MOV ECX, [RBX+1]
; 914:       4881F903FD5020   CMP RCX, #x2050FD03             ; #<SB-KERNEL:LAYOUT for CLONES.ROM:ROM {2050FD03}>
; 91B:       0F85C8000000     JNE L10
; 921: L0:   488B531D         MOV RDX, [RBX+29]
; 925:       BF02000000       MOV EDI, 2
; 92A:       E8411C1FFF       CALL #x21B00570                 ; GENERIC-=
; 92F:       488B5DE0         MOV RBX, [RBP-32]
; 933:       488B75E8         MOV RSI, [RBP-24]
; 937:       4C8B45F0         MOV R8, [RBP-16]
; 93B:       7456             JEQ L5
; 93D:       488B4B0D         MOV RCX, [RBX+13]
; 941:       8D46F1           LEA EAX, [RSI-15]
; 944:       A801             TEST AL, 1
; 946:       750A             JNE L1
; 948:       A80F             TEST AL, 15
; 94A:       7542             JNE L4
; 94C:       807EF111         CMP BYTE PTR [RSI-15], 17
; 950:       753C             JNE L4
; 952: L1:   488BFE           MOV RDI, RSI
; 955:       40F6C701         TEST DIL, 1
; 959:       7407             JEQ L2
; 95B:       488B7FF9         MOV RDI, [RDI-7]
; 95F:       48D1E7           SHL RDI, 1
; 962: L2:   4881E7FEFF0000   AND RDI, 65534
; 969:       8D41F1           LEA EAX, [RCX-15]
; 96C:       A80F             TEST AL, 15
; 96E:       7519             JNE L3
; 970:       8B41F1           MOV EAX, [RCX-15]
; 973:       2C85             SUB AL, -123
; 975:       3C74             CMP AL, 116
; 977:       7710             JNBE L3
; 979:       488BD1           MOV RDX, RCX
; 97C:       B904000000       MOV ECX, 4
; 981:       FF7508           PUSH QWORD PTR [RBP+8]
; 984:       E9AFEDA1FD       JMP #x2032D738                  ; #<FDEFN SB-KERNEL:HAIRY-DATA-VECTOR-REF/CHECK-BOUNDS>
; 989: L3:   0F0B0A           BREAK 10                        ; error trap
; 98C:       36               BYTE #X36                       ; OBJECT-NOT-VECTOR-ERROR
; 98D:       08               BYTE #X08                       ; RCX
; 98E: L4:   0F0B0A           BREAK 10                        ; error trap
; 991:       41               BYTE #X41                       ; OBJECT-NOT-INTEGER-ERROR
; 992:       30               BYTE #X30                       ; RSI
; 993: L5:   488B4B0D         MOV RCX, [RBX+13]
; 997:       8D46F1           LEA EAX, [RSI-15]
; 99A:       A801             TEST AL, 1
; 99C:       750A             JNE L6
; 99E:       A80F             TEST AL, 15
; 9A0:       7542             JNE L9
; 9A2:       807EF111         CMP BYTE PTR [RSI-15], 17
; 9A6:       753C             JNE L9
; 9A8: L6:   488BFE           MOV RDI, RSI
; 9AB:       40F6C701         TEST DIL, 1
; 9AF:       7407             JEQ L7
; 9B1:       488B7FF9         MOV RDI, [RDI-7]
; 9B5:       48D1E7           SHL RDI, 1
; 9B8: L7:   4881E7FE7F0000   AND RDI, 32766
; 9BF:       8D41F1           LEA EAX, [RCX-15]
; 9C2:       A80F             TEST AL, 15
; 9C4:       7519             JNE L8
; 9C6:       8B41F1           MOV EAX, [RCX-15]
; 9C9:       2C85             SUB AL, -123
; 9CB:       3C74             CMP AL, 116
; 9CD:       7710             JNBE L8
; 9CF:       488BD1           MOV RDX, RCX
; 9D2:       B904000000       MOV ECX, 4
; 9D7:       FF7508           PUSH QWORD PTR [RBP+8]
; 9DA:       E959EDA1FD       JMP #x2032D738                  ; #<FDEFN SB-KERNEL:HAIRY-DATA-VECTOR-REF/CHECK-BOUNDS>
; 9DF: L8:   0F0B0A           BREAK 10                        ; error trap
; 9E2:       36               BYTE #X36                       ; OBJECT-NOT-VECTOR-ERROR
; 9E3:       08               BYTE #X08                       ; RCX
; 9E4: L9:   0F0B0A           BREAK 10                        ; error trap
; 9E7:       41               BYTE #X41                       ; OBJECT-NOT-INTEGER-ERROR
; 9E8:       30               BYTE #X30                       ; RSI
; 9E9: L10:  488B512D         MOV RDX, [RCX+45]
; 9ED:       4883FA04         CMP RDX, 4
; 9F1:       7E11             JLE L11
; 9F3:       488B4125         MOV RAX, [RCX+37]
; 9F7:       81781103FD5020   CMP DWORD PTR [RAX+17], #x2050FD03  ; #<SB-KERNEL:LAYOUT for CLONES.ROM:ROM {2050FD03}>
; 9FE:       0F841DFFFFFF     JEQ L0
; A04: L11:  0F0B0A           BREAK 10                        ; error trap
; A07:       0A               BYTE #X0A                       ; OBJECT-NOT-TYPE-ERROR
; A08:       18               BYTE #X18                       ; RBX
; A09:       23               BYTE #X23                       ; 'CLONES.ROM:ROM
; A0A:       0F0B10           BREAK 16                        ; Invalid argument count trap

WOOF! 280 bytes of assembly, including a full CALL to a generic equality test, and two JMP instructions to other functions. Even without knowing any assembly, this seems like an awful lot of junk just for a measly array lookup! I think one valuable insight I got from Chapter 7 of Let Over Lambda was to disregard what I thought I know or didn't about assembly and just use my damn eyes. Doesn't this seem like a silly amount of code? Let's crank the optimization up:

(defmethod load-prg ((mapper nrom) address)
  #f
  (let ((rom (mapper-rom mapper)))
    (if (= 1 (rom-prg-count rom))
        (aref (rom-prg rom) (logand address #x3fff))
        (aref (rom-prg rom) (logand address #x7fff)))))

As soon as I recompiled this code, I got 6 notes from the compiler stating that it wasn't confident about the return value of (rom-prg-count rom) hence the generic equality test. It also wasn't confident what kind of array (rom-prg rom) was or if all the elements even shared a type! That will cause AREF to be slow. Even so, the generated assembly drops to 116 bytes since the #f read macro expands to a declaration with maximum speed (3) and minimum safety (0). It should go without saying that you only want to do this in code that A) really needs to be fast and for which, B) you're very confident about who will call it and how. Here's the disassembly:

CL-USER> (disasm-method #'clones.mappers::load-prg '(clones.mappers::nrom t))
; disassembly for (SB-PCL::FAST-METHOD CLONES.MAPPERS:LOAD-PRG
                   (CLONES.MAPPERS::NROM T))
; Size: 116 bytes. Origin: #x2290F6CB
; 6CB:       48895DF0         MOV [RBP-16], RBX               ; no-arg-parsing entry point
; 6CF:       488B4605         MOV RAX, [RSI+5]
; 6D3:       488945F8         MOV [RBP-8], RAX
; 6D7:       488B501D         MOV RDX, [RAX+29]
; 6DB:       BF02000000       MOV EDI, 2
; 6E0:       E88B0E1FFF       CALL #x21B00570                 ; GENERIC-=
; 6E5:       488B5DF0         MOV RBX, [RBP-16]
; 6E9:       488B45F8         MOV RAX, [RBP-8]
; 6ED:       7528             JNE L1
; 6EF:       488B500D         MOV RDX, [RAX+13]
; 6F3:       488BFB           MOV RDI, RBX
; 6F6:       40F6C701         TEST DIL, 1
; 6FA:       7407             JEQ L0
; 6FC:       488B7FF9         MOV RDI, [RDI-7]
; 700:       48D1E7           SHL RDI, 1
; 703: L0:   4881E7FE7F0000   AND RDI, 32766
; 70A:       B904000000       MOV ECX, 4
; 70F:       FF7508           PUSH QWORD PTR [RBP+8]
; 712:       E9E166A2FD       JMP #x20335DF8                  ; #<FDEFN SB-KERNEL:HAIRY-DATA-VECTOR-REF>
; 717: L1:   488B500D         MOV RDX, [RAX+13]
; 71B:       488BFB           MOV RDI, RBX
; 71E:       40F6C701         TEST DIL, 1
; 722:       7407             JEQ L2
; 724:       488B7FF9         MOV RDI, [RDI-7]
; 728:       48D1E7           SHL RDI, 1
; 72B: L2:   4881E7FEFF0000   AND RDI, 65534
; 732:       B904000000       MOV ECX, 4
; 737:       FF7508           PUSH QWORD PTR [RBP+8]
; 73A:       E9B966A2FD       JMP #x20335DF8                  ; #<FDEFN SB-KERNEL:HAIRY-DATA-VECTOR-REF>

A Fork in the Road

Those two JMP instructions and the generic equality CALL are still in the assembly though as you can see from the comments on the right hand side. Why? Because we didn't actually resolve any of the compiler's uncertainties about the code. We have to help it know what type of values it will be working with. The question is how to best do that. One way would be to just add a bunch of local type declarations in the method:

(defmethod load-prg ((mapper nrom) address)
  #f
  (let* ((rom (mapper-rom mapper))
         (prg (rom-prg rom))
         (prg-count (rom-prg-count rom)))
    (declare (type byte-vector prg)
             (type fixnum prg-count))
    (if (= 1 prg-count)
        (aref prg (logand address #x3fff))
        (aref prg (logand address #x7fff)))))

That will work and does generately substantially nicer code (82 bytes and no CALLs or JMPs). But boy, it forced us to completely restructure the method and, well, the new version feels a bit disjointed. The declarations stick out and distract from the underlying ideas. The alternative is to try and teach the compiler what types are returned by the accessor functions we're using to pull data out of the ROM. And this is where we come to the important difference about DEFCLASS and DEFSTRUCT from where I'm sitting as an emulator author.

Optimizing with Structs is Easier

(Ed. note 09/19/2017: Rainer Joswig left a very informative comment about Structs vs Classes and Optimizing with CLOS on reddit.)

Getting struct-related code to go fast is easier for a very specific reason. Both DEFCLASS and DEFSTRUCT allow you to optionally specify the types of their slots. Unfortunately, DEFCLASS does absolutely no optimization with this information, while DEFSTRUCT treats it as a guarantee and propagates it through the auto-generated slot accessors and, therefore, the rest of your code.

Now there's a good reason for this and I am certainly not advocating for using DEFSTRUCT by default. The reason is that DEFSTRUCT is not designed to be interactively redefined or changed at runtime unlike most of the language. DEFCLASS could have the types of its slots (or even the slots themselves) change at any time including runtime and so it isn't reasonable for it to treat the type declaration as a fact.

DEFSTRUCT has other downsides as well, including auto-generating a bunch of symbols in the current package among other things. It's clunkier to work with in several ways than DEFCLASS but for truly performance intensive stuff, the type declaration behavior makes it worth it from where I'm sitting. Just don't default to DEFSTRUCT in general. This message from the Rob Warnock Archive may also prove interesting.

This is something I always had questions about though and it was compounded a bit due to the fact that DEFSTRUCT is barely mentioned by Practical Common Lisp or Common Lisp Recipes. Practical Common Lisp is still the best way to learn the language in my opinion. I also honestly enjoy the things that are in the Common Lisp standard due to history but I'd never quite found an answer to "When should I use structs vs classes?" that I liked. Hopefully future lispers will be able to stumble on these notes (or parse the spec better than I did).

Modifying the ROM definition

Here's what our ROM struct looks like with the added type declarations:

(defstruct rom
  (pathname    nil :read-only t)
  (prg         #() :read-only t :type byte-vector)
  (chr         #() :read-only t :type byte-vector)
  (prg-count     0 :read-only t :type ub8)
  (chr-count     0 :read-only t :type ub8)
  (mirroring   nil :read-only t)
  (mapper-name nil :read-only t))

The previous version had no :type options and the default values were all nil. After changing the struct and recompiling, we can write the same version of load-prg as before but get much better generated assembly since the compiler knows the types returned by the struct accessors (and thus the array element type):

(defmethod load-prg ((mapper nrom) address)
  #f
  (let ((rom (mapper-rom mapper)))
    (if (= 1 (rom-prg-count rom))
        (aref (rom-prg rom) (logand address #x3fff))
        (aref (rom-prg rom) (logand address #x7fff)))))

; disassembly for (SB-PCL::FAST-METHOD CLONES.MAPPERS:LOAD-PRG (CLONES.MAPPERS::NROM T))
; Size: 90 bytes. Origin: #x22910BDE
; BDE:       488B4005         MOV RAX, [RAX+5]                ; no-arg-parsing entry point
; BE2:       488B501D         MOV RDX, [RAX+29]
; BE6:       4883FA02         CMP RDX, 2
; BEA:       7528             JNE L2
; BEC:       488B400D         MOV RAX, [RAX+13]
; BF0:       F6C101           TEST CL, 1
; BF3:       7407             JEQ L0
; BF5:       488B49F9         MOV RCX, [RCX-7]
; BF9:       48D1E1           SHL RCX, 1
; BFC: L0:   4881E1FE7F0000   AND RCX, 32766
; C03:       48D1F9           SAR RCX, 1
; C06:       0FB6540801       MOVZX EDX, BYTE PTR [RAX+RCX+1]
; C0B:       48D1E2           SHL RDX, 1
; C0E: L1:   488BE5           MOV RSP, RBP
; C11:       F8               CLC
; C12:       5D               POP RBP
; C13:       C3               RET
; C14: L2:   488B400D         MOV RAX, [RAX+13]
; C18:       F6C101           TEST CL, 1
; C1B:       7407             JEQ L3
; C1D:       488B49F9         MOV RCX, [RCX-7]
; C21:       48D1E1           SHL RCX, 1
; C24: L3:   4881E1FEFF0000   AND RCX, 65534
; C2B:       48D1F9           SAR RCX, 1
; C2E:       0FB6540801       MOVZX EDX, BYTE PTR [RAX+RCX+1]
; C33:       48D1E2           SHL RDX, 1
; C36:       EBD6             JMP L1

Finally, we can improve things just a bit by promising that the address we call the load-prg method with will be an unsigned 16-bit value since the 6502 only has a 64k address space:

(defmethod load-prg ((mapper nrom) address)
  #f
  (declare (type ub16 address))
  (let ((rom (mapper-rom mapper)))
    (if (= 1 (rom-prg-count rom))
        (aref (rom-prg rom) (logand address #x3fff))
        (aref (rom-prg rom) (logand address #x7fff)))))

; disassembly for (SB-PCL::FAST-METHOD CLONES.MAPPERS:LOAD-PRG (CLONES.MAPPERS::NROM T))
; Size: 66 bytes. Origin: #x22910DDE
; DDE:       488B4005         MOV RAX, [RAX+5]                ; no-arg-parsing entry point
; DE2:       488B501D         MOV RDX, [RAX+29]
; DE6:       4883FA02         CMP RDX, 2
; DEA:       751C             JNE L1
; DEC:       488B400D         MOV RAX, [RAX+13]
; DF0:       4881E1FE7F0000   AND RCX, 32766
; DF7:       48D1F9           SAR RCX, 1
; DFA:       0FB6540801       MOVZX EDX, BYTE PTR [RAX+RCX+1]
; DFF:       48D1E2           SHL RDX, 1
; E02: L0:   488BE5           MOV RSP, RBP
; E05:       F8               CLC
; E06:       5D               POP RBP
; E07:       C3               RET
; E08: L1:   488B400D         MOV RAX, [RAX+13]
; E0C:       4881E1FEFF0000   AND RCX, 65534
; E13:       48D1F9           SAR RCX, 1
; E16:       0FB6540801       MOVZX EDX, BYTE PTR [RAX+RCX+1]
; E1B:       48D1E2           SHL RDX, 1
; E1E:       EBE2             JMP L0

Updates

(Ed. note 09/19/2017: Some additional speedups have been made since this article was published.)

Paul Khuong was kind enough to note that SBCL was unable to hoist the (logand address xxx) computation out of the conditional. This duplication can be seen in the disassembly from the two MOV .. AND .. SAR .. MOVZX blocks. Doing so improved the assembly a bit further to 51 bytes. Reflecting on it further, I realized there's no need for a conditional expression at all!

In NROM cartridges, they can either have 1 or 2 PRG banks each of which are 16k. Because the 6502 has a 64k address space and the cartridge data begins at 32k, an NROM cartridge with only 1 PRG bank doesn't actually fill the address space. In our load-prg method, we just want to make sure that if we're given a higher address like 54321 that we wrap that around to not run off the end of our 16k worth of PRG. To do that, we can just logical AND the address with (1- (length array)).

Doing that eliminates the branch and results in a nice, lean 40 bytes for our final disassembly:

(defmethod load-prg ((mapper nrom) address)
  #f
  (declare (type ub16 address))
  (let* ((rom (mapper-rom mapper))
         (end-of-rom (1- (length (rom-prg rom))))
         (wrapped-address (logand address end-of-rom)))
    (aref (rom-prg rom) wrapped-address)))

; disassembly for (SB-PCL::FAST-METHOD CLONES.MAPPERS:LOAD-PRG (CLONES.MAPPERS::NROM T))
; Size: 40 bytes. Origin: #x22844CCE
; CE:       488B4005         MOV RAX, [RAX+5]                 ; no-arg-parsing entry point
; D2:       488B500D         MOV RDX, [RAX+13]
; D6:       488B52F9         MOV RDX, [RDX-7]
; DA:       4883EA02         SUB RDX, 2
; DE:       4821D1           AND RCX, RDX
; E1:       488B400D         MOV RAX, [RAX+13]
; E5:       48D1F9           SAR RCX, 1
; E8:       0FB6540801       MOVZX EDX, BYTE PTR [RAX+RCX+1]
; ED:       48D1E2           SHL RDX, 1
; F0:       488BE5           MOV RSP, RBP
; F3:       F8               CLC
; F4:       5D               POP RBP
; F5:       C3               RET

Wrap Up

There's a lot of work left to do on the (new) emulator but I'm writing code again, having fun, learning, and using lisp and that's the most important part to me. If you made it this far, thanks for reading. Let me know what you think and happy hacking!

Labor Day Link Fixup

posted on 2017-09-04 11:18:00

Ancient History

For whatever reason, yesterday seemed a good day to decommission Linode 18032 that I bought way back in February of 2009 and had been using to run redlinernotes.com ever since. I bought redlinernotes.com in June 2007 in the wake of my first serious breakup (with Sonya Z) but ran it out of my parents basement up until getting the linode.

Ever since, though my knowledge increased, I didn't bother revamping the linode other than one reinstall with Ubuntu 12.04 when I first migrated my blog away from Wordpress to Coleslaw. I've been gradually moving my online presence towards kingcons.io and making plans to get back to blogging in earnest the past few weeks. But even though kingcons.io was made with (semi-workable) ansible roles, redlinernotes.com was still a big hand rolled minefield with years of digital detritus to boot. Nothing like a 3 day weekend to clean out your digital woodshed.

A Fresh Start

After deleting a lot of crap, I moved the statically hosted documents over to kingcons.io, added an nginx vhost, swapped the DNS over, and nuked the old linode. There was one last thing to do though. When I moved over to coleslaw from Wordpress years ago, I didn't bother fixing up the old links. So I have a bunch of ".post" documents in my blog's git repo that reference expired wordpress links to other posts (like "blog/?p=1234") instead of the slug coleslaw would assign the post.

I guess I didn't care enough at the time or was too focused on coleslaw itself to worry about "legacy content". But I found a wordpress XML backup and figured I might as well fix up the dead links today while I was at it. All while rolling my eyes at dealing with XML.

Link Fixup

Since coleslaw is designed to be backed by git as a content store, I started by grepping through the blog posts to get a list of all the old wordpress post IDs I linked to.

grep -Eo "redlinernotes.com/blog/\?p=(\d+)" *.post | cut -d "?" -f 2

Armed with that, I could tackle digging into the wordpress XML backup to map the post IDs to coleslaw generated titles. Shinmera has been writing stupid amounts of good lisp code over the past few years including Plump, an XML parser. I never completely gelled with CXML back in the day so I figured I'd give plump a go. The following assumes you have a decent lisp installed (I heartily recommend SBCL) and quicklisp.

(ql:quickload 'coleslaw)
(ql:quickload 'plump)

;; Make sure to use a pathname (the #p) for this, plump will treat a plain string as XML to be parsed, not a file.
(defvar *wp-xml* #p"/Users/brit/projects/linode-retirement/wordpress.2010-09-14.xml")

(defvar *doc* (plump:parse *wp-xml*))

I was actually stumped here for a bit because I used a string instead of a pathname as the argument to PARSE and it took me a few minutes querying and getting no results before I looked in the slime inspector and realized the doc hadn't parsed as expected. Once I had this though, it was pretty straightforward to build a hash of IDs to titles...

(defvar *posts* (plump-dom:get-elements-by-tag-name *doc* "item"))

;; For those wondering, :test #'equal is there so we can use string keys. Read Practical Common Lisp or google to learn more.
(defvar *post-id-map* (make-hash-table :test #'equal))

(defun extract (key node)
  (let ((value (first (plump-dom:get-elements-by-tag-name node key))))
    (plump:text value)))

;; Yes, I'm using a private coleslaw function here but I wrote coleslaw so ... uh ... do what I want! 
;; And in case you were wondering, handler-case is lisp's try/catch equivalent and I'm pretty much doing "rescue nil" here.
(defun fixed-title (title)
  (handler-case (coleslaw::slugify title)
    (simple-error () :junk)))
               
(loop for post in *posts*
      do (let ((title (extract "title" post))
               (id (extract "wp:post_id" post)))
           (setf (gethash id *post-id-map*) (fixed-title title))))

And now we're good to update the existing posts to have proper relative links to the content we actually want.

(ql:quickload 'cl-ppcre)

;; Can you tell I did all this in one repl session and just copy pasted it into this blog post?
(coleslaw::load-config "/Users/brit/projects/improvedmeans/")
(coleslaw::load-content)

;; Cute sidenote, since ppcre turns this regex into to a tree of closures which SBCL compiles, this can be #'disassemble-d.
;; Also, it's pretty fast.
(defvar *wp-url-regex*
  (cl-ppcre:create-scanner "w{0,3}?\.?redlinernotes\.com/blog/\\?p=(\\d+)"))

(defstruct counts
  (invalid-slug   0 :type fixnum)
  (post-not-found 0 :type fixnum)
  (updated-url    0 :type fixnum))

(defvar *results* (make-counts))

(defun invalid-slug-p (slug)
  (or (null slug)
      (every #'digit-char-p slug)))

(defun mismatch-p (slug)
  (let ((key (make-pathname :directory '(:relative "posts")
                            :name slug :type "html")))
    (null (gethash key coleslaw::*site*))))

(defun slug-for-match (text start end match-start match-end reg-start reg-end)
  (declare (ignore start end))
  (let* ((id (subseq text (aref reg-start 0) (aref reg-end 0)))
         (match (subseq text match-start match-end))
         (slug (gethash id *post-id-map*))
         (new-url (concatenate 'string "blog.kingcons.io/posts/" slug ".html")))
    (cond ((invalid-slug-p slug)
           (incf (counts-invalid-slug *results*))
           (format t "Couldn't find valid slug for post id: ~d~%" id)
           "/&")
          ((mismatch-p slug)
           (incf (counts-post-not-found *results*))
           (format t "Not found in site content: ~A~%" slug)
           "/&")
          (t
           (incf (counts-updated-url *results*))
           (format t "Replacing ~A with ~A~%" match new-url)
           new-url))))

(coleslaw::do-files (path "/Users/brit/projects/improvedmeans/" "post")
  (let* ((text (alexandria:read-file-into-string path))
         (updated-post (cl-ppcre:regex-replace-all *wp-url-regex* text #'slug-for-match)))
    (with-open-file (out path :direction :output :if-exists :supersede)
      (format out "~A~%" updated-post))))

(format t "~%~%===RESULTS===~%")
(dolist (type '(invalid-slug post-not-found updated-url))
  (let ((symb (alexandria:symbolicate 'counts- type)))
    (format t "~A: ~D Posts~%" type (funcall symb *results*))))

And there you go. A few links are still broken but things are generally improved, I'm down to 1 linode instead of 2, and I had a bit of fun on a lazy Sunday.

===RESULTS===

INVALID-SLUG: 18 Posts

POST-NOT-FOUND: 7 Posts

UPDATED-URL: 58 Posts

Big Changes for Coleslaw

posted on 2014-09-22 10:36:00

I'm working towards 1.0 and Coleslaw's basic architecture seems to have settled down. The areas of focus for 1.0 will be better error handling, command-line conveniences, more content types, and possibly some new ways to ingest data.

Coleslaw 0.9.6 will be released this Saturday and, not long after, make it into the next quicklisp release. Seeing as it contains big changes, some of them breaking, I thought I'd put out an announcement.

Coleslaw 0.9.6

Coleslaw 0.9.6 unifies how we handles URLs throughout the application and simplifies the deploy strategy. The good news is, this makes the install process easier for new users. The bad news is, if you've got an existing install, you'll need to add a new plugin (versioned) to your config file for the old deploy behavior.

That's not so rough, right? In addition, custom themes and plugins that haven't been upstreamed may need some minor tweaks. The NEWS has more details.

Feel free to grab the basic-deploy branch from my repo and try it out. There are some new docs and the README has been cleaned up. There's also a plugin for Twitter Summary Card support and the usual smattering of bugfixes.

Going Forward

While I'm happy to maintain Coleslaw if no one else steps up to work on it, I'm going to try and shift my focus towards emulation work and weird lisp noodling. If you're interested in taking on a co-maintainer role or working with me on the project please get in touch. I've been very appreciative of the help and interest thus far. If there's anything I can do to make the project more approachable or help people get started, do let me know.

Breaking Radio Silence

posted on 2014-05-05 13:12:11

Long time, no blog.

I've been offline for a while. I burned out last July and only really started hacking on my lisp projects again in March. So what's changed in the last two months? Actually, kind of a lot.

Coleslaw 0.9.4

Coleslaw 0.9.4 is hereby released. I apologize that 0.9.3 which went out in the last quicklisp release had an embarrassing escaping bug.

The most fun part of Coleslaw is trying my hand at API design. Lisp is a great tool for writing extensible software and Coleslaw has been a good proving ground for that since everyone has a slightly different set of requirements for their blogware.

I've been reading Sonya Keene's Object Oriented Programming in CL lately which led to a large refactoring around the new Document Protocol. I'm not prepared to say anything intelligent about protocols yet, but thankfully plenty of people have done so elsewhere. This blog post by sykopomp isn't a bad place to start.

In addition to the document protocol and the usual litany of bugfixes, Coleslaw now has a new theme based on bootswatch readable, user-defined routing, support for static pages, and greatly expanded docs.

The main things to tackle before 1.0 are a plugin to support incremental compilation for very large sites and a twitter/tumblr cross-posting plugin.

cl-6502 0.9.7

Additionally, someone actually found a use for my Readable CPU emulator! Dustin Long was working on a homebrew Nintendo game and wanted a way to unit test his code, so he's been using cl-6502 to get cycle counts and otherwise check behavior. Naturally, the very basic assembler got on his nerves so he sent me a nice pull request adding support for labels, compile-time expressions, and decimal, hex, and binary literals. Thanks, Dustin!

I also rewrote the addressing modes again, reduced consing, and made debugging easier by using Alexandria's named-lambda for all the opcodes. The cl-6502 book has been updated, of course.

Upcoming

With any luck, I'll get back to work on famiclom or tools for analyzing old NES games like Super Mario Bros and Mega Man 2. It's good to be back.

Lessons from cl-6502

posted on 2013-07-05 11:44:00

This will be the last post about emulation that doesn't involve graphics or disassembly of old NES games, I promise. cl-6502 0.9.5 is out and, in my testing with SBCL, pretty snappy. The book has received updates and is also available on lulu. Below is the 'Lessons Learned - Common Lisp' chapter:

Structures can be preferable to classes

Structures are much more static than classes. They also enforce their slot types. When you have a solid idea of the layout of your data and really need speed, they're ideal.

CLOS is fast enough

CLOS, for single-dispatch at least, is really quite fast. When I redesigned the emulator to avoid a method call for every memory read/write, my benchmark only ran ~10% faster. I eventually chose to stick with the new scheme for several reasons, performance was only a minor factor.

Destructuring is more expensive than you think

My second big speedup came, indirectly, from changing the arguments to the opcode lambdas. By having the opcode only take a single argument, the CPU, I avoided the need to destructure the opcode metadata in step-cpu. You don't want to destructure a list in your inner loop, no matter how readable it is!

Eval-when is about data more than code

That is, the times I found myself using it always involved computing data at compile-time that would be stored or accessed in a later phase. E.g. I used it to ensure that the status-bit enum was created for use by set-flags-if and the *mode-bodies* variable was bound in time for defaddress. Regardless, try to go without it if possible.

Use DECLAIM (and DECLARE) wisely

DECLAIM is for global declarations and DECLARE is for local ones. Once you've eked out as many algorithmic gains as possible and figured out your hotspots with the profiler, recompile your code with (declaim (optimize speed)) to see what is keeping the compiler from generating fast code. Letting the compiler know the FTYPE of your most called functions and inlining a few things can make a big difference.

My Lisp Summer Project

posted on 2013-06-21 14:58:00

cl-6502 0.9.4

I haven't been doing any hacking on coleslaw or famiclom for the last month. I've been focused almost entirely on my 6502 CPU emulator. In particular, I've been optimizing it and turning it into a "readable program".

The optimizations have gone swimmingly, taking cl-6502 from 3.8 emulated mhz circa May 1st (commit eecfe7) to 29.3 emulated mhz today (commit b729e8).(0) A factor of 8 speedup feels pretty good though it would be fun to coax more speed out later.

(0): All figures obtained with SBCL 1.1.4 on Debian 64-bit on an old Thinkpad X200. See this 6502 forum post.

I feel that the readability of the program has remained, maybe even improved, through all that optimization. The same overall design is in place; most refactorings were, approximately, tweaking macros and their callsites. The readability is especially improved when the code is broken into chapters, each with an introduction for context, and typeset with LaTeX. The latter is done thanks to a simple Makefile and some very nifty code swiped from Luke Gorrie's snabbswitch. If you've been curious about how cl-6502 is implemented or just wanted to dive in, there's never been a better time. Grab the book!

NES + Lisp + Static Analysis = ?

I'm still planning to make famiclom a full NES emulator. I won't consider it done until I can play Mega Man 2 with it. Hopefully using a USB controller. It doesn't make much sense for Lisp in Summer Projects though. I've already started the project, the scope is ill-defined, and I want to work on something fresh and new to me. So I've come up with a project that I can't possibly complete instead. It'll be great!

In short, I intend to rewrite Super Mario Bros 1. ... in a not-yet-existing lisp-like macro-assembler/incredibly simple compiler. I already have a 6502 assembler/disassembler in cl-6502 and a tool to parse NES roms in romreader. There's also a very thorough annotated disassembly of Super Mario Bros floating around. I've got a good start on a static analyzer that will take the SMB binary and an entry point and try to build a CFG of the game. The current scheme won't work with memory mapped titles but it's good enough for Mario.

Wild Speculation

Once I have a graph representation of Mario Bros, I'll try both manual analysis of the annotated disassembly with a pen and pad, and automated analysis on the graph using Lisp. I'll try to find as many idioms and patterns as possible to condense and improve readability of the code. Then, somewhere in early August, I'll start trying to rewrite the source and see how far I can get.

This approach is probably completely: insane, unworkable, unviable, inadvisable, and just all around wrong. But I think I'll have fun and learn something, so it's good enough for me. And hell, who knows, maybe I'll get lucky and be able to attend ECLM next year. :)

Coleslaw 0.9.2 and other lispiness

posted on 2013-05-11 20:41:00

Coleslaw 0.9.2

It still amuses me that my most successful project to date is a blog engine. Not that I'm complaining about having contributors. When I last mentioned it, version 0.8 had just been released. Since then there have been 2 new contributors and a bunch of new features. I think the code has mostly improved in cleanliness.

The biggest changes are new shiny docs, a new tags implementation, cleanups to theming, and plugins for Google Analytics, Github Pages, and Sitemap Generation. For the full details, see the changelog.

My plans for 1.0 are primarily to take advantage of the extensible content types added in 0.8 and add some sort of tumblr-like auto-embedding support. But I probably won't get around to working on that for a spell. Why?

Other Lispiness

Because my lisp emulation experiment/art project is ongoing. Nyef was kind enough to share some code he'd hacked up for NES emulation years ago and it helped give me the motivation to rewrite famiclom's PPU (Graphics Card). The former code was largely cribbed from Patrick Walton's sprocketnes and I didn't understand it very well. I've hit the nesdev wiki again and am getting more comfortable with the PPU's actual workings. The code is on github in the new-ppu branch and I'm hoping to spend more time on it this coming week.

I also spent the last week porting cl-6502 to clojurescript for giggles. Heresy, I know. ;)
cljs-6502 is in a basic working state but there are bugs aplenty and I haven't implemented the assembler or disassembler. The must frustrating part was dealing with A) differences in macro hygiene and B) poor macro debugging facilities.

The browser is a fun target though. I'll have to try parenscript or ... jscl! JSCL is a full CL->JS compiler in early development, which I contributed a tiny patch to for fboundp. It's a great project and if you have any interest in helping implement a lisp, I'd encourage you to get involved. The maintainers are very approachable and there's plenty of fun hacking to be had.

All for now. It's time to play around trying static analysis of Nintendo ROMs with Lisp. I'm bound to learn something...hopefully.

So Close and Yet So Far

posted on 2013-03-16 12:22:00

Famiclom Progress


"Low-level programming is good for the programmer's soul." - John Carmack, via ahefner

"What I like about Lisp is that you can feel the bits between your toes." - Drew McDermott, via Michael Weber


Previously...

I never did enough systems programming. In college, I actually convinced my Operating Systems professor to let me do the course project in lisp. So when I decided I wanted to get closer to the metal a year ago, I thought I'd look into Nintendo emulation with Common Lisp rather than systems hacking with C. Besides, my needs for a web server or other daemons were filled. So I embarked on that weird journey and came out with a shiny, readable, reasonably fast 6502 CPU emulator in under 800 lines of code. It even has an assembler and disassembler!

Announcing Famiclom version least-positive-fixnum

But a CPU emulator isn't much fun. No graphics, no sound, no I/O! After a break from September through January I got back to work in earnest a month ago. It started with getting Klaus Dorfmann's exhaustive correctness tests for the CPU added to my testsuite and a lot of bugfixing. Then I used pcwalton's lovely Rust code as inspiration and started getting the memory mappers and PPU (graphics) working with lispbuilder-sdl. So far we only support NROM mapped games though MMC1 should be coming soon(tm). As you can see at the top of the post, there are still some rendering bugs to work out. All testing so far has been done on CCL and SBCL on Linux.

Current Status

The good news is that the CPU, PPU, and .nes file reading are all done in ~1570 lines of Lisp code! The CPU in particular I think makes for quite nice reading at this point. The main NES code still needs work. The bad news is that while the CPU runs at 2-3x the speed of the NES the graphics are about 15-20x slower so I'm going to have to spend some time optimizing. I'm in #lisp on freenode regularly and would love advice or patches from any low-level SBCL or lispbuilder-sdl hackers. :)

Where to next?

  1. Preliminary input handling is written, hook it into the main event loop.
  2. Finish MMC1 loader. Get Mega Man 2 title screen loading at least.
  3. Fix rendering bugs and try to play a few games.
  4. Optimize and/or add audio support!

Lessons from Coleslaw

posted on 2013-01-06 14:40:00

Is there anything more pointless than a new blog engine? Probably not. 4 months ago, I wouldn't have thought that I would be distracted from my Lisp 6502 emulator so long or that I'd have this much fun writing blogware. It is amazing, however, just how much you can do with a bit of time and ~600 lines of lisp.

Lately I've come to realize my favorite part of hobby programming is that I essentially treat it as creative writing. One of the reasons I love Lisp and find myself using it so much for hobby code is how easily it enables me to experiment with new coding styles. In Coleslaw's case, this has meant a stronger focus on CLOS and API design.

I like to think there's a clear stylistic shift in my projects, from the earlier and messy imperative of Paktahn, through the neat but overly macro-heavy cl-scrobbler, to the more balanced style of my present day code. It's no surprise that some of my favorite lisp luminaries, Peter Seibel and Luke Gorrie, talk a lot about code as literature and readable programs. Hopefully, I will continue to progress in that tradition.

Coleslaw 0.8 is hereby released. The biggest features are multi-site publishing and support for new content types. Here is an example bookmark or tweet-like content type that may ship in a future release, Shouts. See the NEWS for further details. It's time to get back to Memory Mappers for a bit and see if I can't get actual NES emulation going in pure Common Lisp. See you next time, Planet Lisp.

Now with even more Lisp!

posted on 2012-09-19 20:47:00

I haven't finished my headless NES emulator in time for Strange Loop. On the other hand, I have done some cool things that I didn't anticipate. Here's what's been going on in hackland lately:

Coleslaw: Heroku & Plugins

The majority of my efforts have been related to my new blogging engine, Coleslaw. I've substantially cleaned up the rendering pass, added optional RSS feeds for specific tags, revamped the plugin architecture, and added a plugin for Disqus support. Jose Pereira also wrote a heroku buildpack for running Coleslaw so if you feel like having a simple managed install, problem solved!

While docs are still needed, here is a rough overview of the plugin architecture. I'll add a simple, hello-world-ish example to the README shortly.

  • Plugins are lisp files placed in coleslaw's plugins directory.

  • Each plugin should define a package :coleslaw-$filename where $filename is the name, excluding extension, of the plugin.

  • The package should export a function named enable that can be called to activate the plugin.

  • enable is mostly useful for adding Javascript to the page via add-injection or passing initialization args elsewhere.
    • add-injection takes a cons like (js-string predicate) and a location (:head or :tail) to insert it.
    • The predicate should take a single content object as argument. A content object can be either a POST or an INDEX.
    • The predicate will be called at render time and a non-nil return value will cause the injection to be included in the page.
  • Plugins can also extend render-content to support new post formats such as ReStructuredText or modify deploy with :before, :after, or :around methods to support deployment to S3, serving via Hunchentoot, etc.

  • End users take advantage of plugins by enabling them in their .coleslawrc's :plugins list.
    • A plugin given as a symbol will call enable with no args whereas a plugin given as a list will call enable with the args after the plugin's name.

Colorize Reborn

I am pleased to announce I've taken over as maintainer of colorize. It provides the syntax highlighting in Coleslaw's 3bmd markdown mode. I've backported patches from lisppaste to colorize for Haskell, Erlang, Python, and a number of other languages. I also added very rudimentary support for Clojure highlighting. While I'm interested in seeing further language support (particularly Clojure and Rust) I lack the time to work on further coloring modes myself. If you have any inclination to hack on colorize to add support for new languages or otherwise improve it please feel free to contact me. I'm more than happy to review and merge pull requests. :)

Wanting Types, Demanding Mirrors

I delivered a final Lunch and Learn at CMGdigital on Dynamic Systems. I have a screencast of it like my last talk but I haven't uploaded it yet. I'm a bit more out of my depth this time and am still considering tweaking the content and rerecording. The slides are linked above though and I would welcome comments. I'm also working on a Magic the Gathering tutorial/unsession for Strange Loop and an Emacs Crash Course for my new coworkers at Primedia. Finally, I'm giving a talk on the internals of cl-6502 to the Atlanta Lisp User Group on October 1st.

Vacietis and trivial-dump-core

Vladimir Sedach has been working on a C to Common Lisp compiler called Vacietis for a while now. It's become mature enough to generate Lisp executables for small C programs. However, dumping executables isn't a portable CL feature so I contributed a minor pull request to use trivial-dump-core to improve portability from ccl to clisp, sbcl, and ccl. I hope trivial-dump-core sees more use and gains support for more implementations as portable executable creation would be a nice thing to have.

Swanky Developments

posted on 2012-09-04 13:22:05

Now that the i's have been dotted and the t's crossed I'm pleased to announce I've accepted a new job. Starting September 17th, I'll be a Senior Developer working for Primedia. I'll be helping them migrate from ruby to clojure. I've been meaning to spend more time hacking Clojure as it is. I'm particularly delighted that I'll be in something of a teaching role and able to share my knowledge and experience with lisp with interested hackers.

CMGdigital has been a phenomenal place to work for the last year and I'll miss everyone there dearly. I wasn't looking for a new job but Primedia found me and this was in many ways the right opportunity at the right time.

I'm also very excited for the arrival of Leiningen 2.0 and happily running the latest preview. After using quicklisp, I disliked having to setup a mock project to experiment with arbitrary libraries in Clojure. Leiningen 2 uses a library called pomegranate under the covers which allows modifying the REPL classpath. Thus, dependencies can be easily added to a running REPL and experimented with!

In other lispy news, the dream of endless swank backends and SLIME on everything has died. Previously, I had coerced SLIME into running Clojure, Scheme, and Common Lisp simultaneously and knowing which filetypes to associate with which repls. It took a lot of fiddling. I actually had a rant against the proliferation of swank backends for other languages like Chicken Scheme and Clojure happening outside the main SLIME tree. Anyway, between Emacs 24 shipping package.el, marmalade, nrepl.el for Clojure, and Geiser for scheme, the situation has resolved itself even if the infinite SLIME dream is dead. And ultimately, that's better for hackers everywhere...so who am I to complain? :)

Coleslaw Lives!

posted on 2012-08-26 12:42:00

New Beginnings

So, Coleslaw is alive (you're looking at it) and I've done a clean reinstall on my server for the first time since 2008 or so. Thank GOD.

With any luck, I'll get back to hacking emulators now. :) But first... a test of some features! I should really overhaul the README for coleslaw too.

;; God do I love being able to write this post in emacs. And commit and push to publish.
(defun slug-char-p (char)
  "Determine if CHAR is a valid slug (i.e. URL) character."
  (or (char<= #\0 char #\9)
      (char<= #\a char #\z)
      (char<= #\A char #\Z)
      (member char '(#\_ #\- #\.))))

(defun slugify (string)
  "Return a version of STRING suitable for use as a URL."
  (remove-if-not #'slug-char-p (substitute #\- #\Space string)))

A LaTeX/mathjax test...

$$ \lambda \scriptstyle{f}. (\lambda x. (\scriptstyle{f} (x x)) \lambda x. (\scriptstyle{f} (x x))) $$


Unless otherwise credited all material Creative Commons License by Brit Butler