Content tagged clones

Clones FAQ

posted on 2022-06-21 09:30:00

An Update on Clones

Even though I've been finding less time to work on it, I've enjoyed the process of hacking on clones and things are going pretty well. Input handling and background rendering are finished and with any luck it will be another weekend until sprites are working. I do expect to need some tweaks for fine scrolling to be implemented correctly so it may be another week or two still until more advanced NROM titles like Super Mario Bros run. It should be a hop, skip, and a jump from there to MMC1 titles like Mega Man 2 though. 🙏

clones-progress

Commonly asked questions

There are 3 questions that have come up pretty regularly though, either on stream chat or elsewhere, and I'd like to jot down some thoughts about them while it's fresh in my mind.

Why are you building clones?

Because it brings me joy. I feel compelled to work on it as a vehicle to try to create something that I find aesthetically appealing and that indulges parts of my curiosity.

Why are you building clones in lisp?

Because it's my favorite language to work in. I don't think the feature set of Common Lisp is critical. Macros, CLOS, and conditions and restarts are all great but the motivating factor for me remains the interactive development workflow with Emacs and SLIME/Sly. When I was young and not yet a programmer, I imagined that at some point working on software or digging into the internals would be more like having a conversation than working out a math problem. Our tooling remains far from those (naive) visions, but Common Lisp is closer to what I imagined and so it feels more comfortable to me. Mikel Evins has written some great posts about this.

What are your goals in building clones?

This is the toughest part to answer. Initially, I'd just like to be able to have a functioning emulator and play childhood games that I loved, like Mega Man 2, tolerably well. Once that piece is completed though, I'm very interested in trying to add tools for debugging and reverse engineering games.

This isn't so much about the games themselves as it is about having better tools for investigating software without access to source code. An emulator is a great place to experiment with approaches to that. Now I admit I have not spent a lot of time doing reverse engineering or security work and am not familiar with the state of the art around static analysis or disassembly tools like IDA Pro.

I'm limited in my ability to express what I imagine. So since I can't tell you exactly what it should be, here's a sketch:

I would love it if I could build a graph of the control flow of the game as I play it. I would love it if I could later annotate the graph, name segments of assembly, and receive hints around what specific parts might be interacting with graphics data, or the APU, or handling player inputs.

The code is an artifact, the leftover cocoon of the program being written. The interesting pieces are in the constraints of the level design, the physics, the musical score, the artwork. I would like, as much as possible, to have tools for exploring the shape of a process as it lives, exploring the data it operates on, and understanding the constraints of the problem, rather than relying on code to understand one specific approach to solving that problem.

In the abstract, this isn't a solvable problem and I will never have a proof of correctness or confidence in completion. But it's worth striving to see how software in general could leave breadcrumbs behind it, given how much of our ideas and culture are being poured into it and fossilized in amber.

Back to Clones

posted on 2022-03-20 20:34:00

Ancient History

When I last wrote about clones, I was 32 and still working at Showcase IDX. I never got around to finishing clones and in fact worked on rawbones with my dear friend James Dabbs for a spell while teaching at the Flatiron School. By my count I have something like 4 half-finished NES emulators now.

  • famiclom - My original half-hearted attempt in CL when I had no idea what I was doing.
  • nescavation - A NES emulator I hacked up while learning to teach Javascript.
  • clones - A better thought out attempt in CL.
  • rawbones - The ReasonML version I wrote with James.

I seem to write one whenever I get bored and with any luck I'll wind up finishing one of them sooner or later. Nescavation and Famiclom really never got close to running games, clones and rawbones both got much closer to playable territory but I never got background scrolling right. I still find it a bit funny that famiclom gets more attention than my later, improved efforts like clones or rawbones. (Probably because cl-6502 mentions it and achieved a little notoriety.)

Getting to a playable state has never been the point though. These projects have been part learning exercise, part avenue for exploring literate programming, and often just a fun project to noodle with for my own entertainment. I still like the idea that a fast and reasonably accurate emulator can be written in a concise, clear way with a garbage-collected language.

Present Day

Recently, I got the itch again and so I decided to start fresh with clones. There are a few interesting changes this time around. When I made cl-6502, creating a readable document from the program was a primary goal and resulted in a literate book. This ethos never quite made the transition from the CPU stage to the full system emulators. This time I'll be leaning heavily into that spirit using mgl-pax. I'll also be testing with try and relying as heavily as I can on CPU and PPU test roms.

This is all happening in the "once-more-with-feeling" branch on sourcehut. So far there isn't a lot there though I'm on vacation starting in 6 days so I'm hoping to get ROM parsing and a basic structure for stepping the CPU in place to crank through NEStest. I do have some nice automation set up though. Every push runs the test suite and deploys the docs. I also have a very basic twitch stream working in case I want to indulge in the silliness of coding on camera.

For now, here's a look at the .build.yml file that powers the CI on sourcehut. It really isn't harder to set up an automatation pipeline for a CL app than anything else. Here's to working on fun projects again. More soon. 👋

image: alpine/latest
oauth: pages.sr.ht/PAGES:RW
environment:
  site: clones.kingcons.io
packages:
  - sbcl
sources:
  - https://git.sr.ht/~kingcons/clones
tasks:
  - install-quicklisp: |
      curl -O https://beta.quicklisp.org/quicklisp.lisp
      sbcl --non-interactive \
           --eval "(load \"~/quicklisp.lisp\")" \
           --eval "(quicklisp-quickstart:install)" \
      mkdir -p ~/quicklisp/local-projects/
  - test: |
      ln -sf ~/clones ~/quicklisp/local-projects/clones
      sbcl --non-interactive \
           --eval "(load (merge-pathnames \"quicklisp/setup.lisp\" (user-homedir-pathname)))" \
           --eval "(ql:quickload '(clones clones/test))" \
           --eval "(unless (try:passedp (try:try 'clones.test:test-all)) (uiop:quit 1))"
  - build-site: |
      cd clones
      echo 'Building site'
      sbcl --non-interactive \
           --eval "(load (merge-pathnames \"quicklisp/setup.lisp\" (user-homedir-pathname)))" \
           --eval "(ql:quickload '(clones mgl-pax/document))" \
           --eval "(clones.docs::build-site)"
      mv ~/clones/site/clones.html ~/clones/site/index.html
      tar -C site -cvz . > site.tar.gz
      acurl -f https://pages.sr.ht/publish/$site -Fcontent=@site.tar.gz
      rm site.tar.gz

Things I learned writing Clones

posted on 2018-07-29 12:08:00

wip

I'm turning 32 in a week so thank goodness I'm finally making progress on clones. After my last post, I didn't work on clones for 7 months. Then in May, I just sat down and started hacking. Despite some gaps, there has been steady progress.

There's still a lot I want to do and audio isn't implemented so that's next, but for now I'm going to try to summarize the current status and some of the lessons I've learned thus far.

Current Status

Clones, At Present

The Clones CPU emulation is finished and tested and there is support for input handling and basic graphics support (backgrounds and sprites, scrolling is next). A lot of what determines compatibility for a NES emulator comes down to mapper (cartridge) support and the accuracy of the PPU support. In that regard, clones supports NROM, UNROM, and MMC1 though UNROM and MMC1 have some issues that need ironing out once scrolling is finished.

The circuit board used in NES cartridges actually varied and added additional capabilities to the console, primarily a paging system for switching banks in and out of memory to allow for larger levels, more artwork, more game code, etc. The different cartridge types were called mappers. Thankfully, 6 different mappers accounted for something like 80% of all games commercially available in the US. As a result, mapper support is a big deal since you can't play a game without the matching cartridge support.

Clones, In the Future

The first priority is fixing some sprite glitches and getting scrolling implemented. Once that's done, fixing up the lingering issues with UNROM and MMC1 will take precedence. Once Mega Man 2 is booting, then I'll start work on the audio.

After that's done the real fun begins. I have all sorts of ideas and ambitions for how to build a Control Flow Graph of the game dynamically while it executes and then let the player annotate the structure and save it for later revision. I want to be able to reverse engineer old games interactively and am wondering how much the computer can help in the process with the use of Constraint Logic Programming tools like screamer. In general, I'm interested in how we can examine shipped binaries at runtime as a teaching tool for how the software and hardware work.

More on this soon, I hope. 🙏

Lessons Learned

Test ROMs == Joy

There are many test roms for ensuring that various components in your NES behave accurately. I found it particularly useful to write the memory interface, addressing modes, disassembler, and a stepper for the CPU with no instructions implemented. Then I had a unit test which looped over a verified correct log for a ROM called "nestest" which exhaustively checks the operation of all legal CPU instructions. After I could run the test until it failed, implement a single instruction, and re-run. I had all 56 instructions with their various opcodes written in a day. Super pleasant!

It helped that I'd written a CPU emulator before, of course. This process required having a good idea up front about how I wanted to interact with memory, represent addressing modes, and execute instructions. If you don't understand those pieces though, you'll run headlong into them while trying to implement an emulator anyhow so start there. Spend some time reading nesdev wiki or asking questions online if you need to. 🤘

A Good Macro-Friendly Decomposition for Instructions

There is a bit of a Gordian Knot in the CPU in how the addressing modes and different opcodes interact if you want to define each instruction exactly one time without a mess of switch statements for the different variations. In short:

  1. Addressing modes should only access CPU and Memory to compute an address. Any cycle counting (e.g. for crossing pages) can be done at the call site with macros!

  2. Your opcode-defining macro should set up address and argument variables, or an update function as needed based on the access pattern of the instruction.
    This has bitten me on previous attempts as I assumed the access pattern came from the addressing mode rather than the instruction itself. Instructions can be implied and use no argument, or only use the address and jump to it, or read an argument from the address, or write a value to an address, or read a value, modify it and write it back. It was very worthwhile to split these cases out and handle them independently. It meant a little extra work while writing up the metadata but kept concerns separated later.

  3. Separating the opcode metadata from the actual instruction definition. This is more arbitrary than the earlier recommendations but it felt very clean while hacking the opcode definitions and I think I only found myself going back to edit the instruction metadata one time from making a typo.

PPUs == Pain

PPU stands for Picture Processing Unit and it was the graphics card in the original NES. The central innovation of the PPU was that it supported pixel-level scrolling of levels.

I have no experience in graphics or game programming so this was a big challenge for me. Four other factors contributed to the difficulty of writing the PPU:

  1. There isn't really a suggested ordering of tasks like there is for writing a CPU emulator and subtasks are hard to test in isolation.
  2. The data layout of the PPU is complex: OAM, nametables, pattern tables, attribute tables, etc.
  3. The test ROMs that exist generally assume you have the basics working.
  4. While many working open source emulators exist, the control flow was difficult to follow without getting lost in the weeds and it was unclear why different calculations were used.

I'll try to tackle these briefly and write up more details at a later date.

A Suggested PPU Ordering

First, you need an object to represent the hardware state. It'll need to access the currently loaded game ROM for graphics data so remember to give it a slot for storing the cartridge object.

Second, you'll need to implement the PPU memory map. There's no operating system on the NES so there are no video card drivers and you'll do everything yourself via Memory Mapped I/O. If you've never heard of memory mapped I/O, the idea is that reading and writing to specific addresses in memory directly manipulates the PPU so write those methods and wire it up!

Third, you'll want to get the timing synchronized between the CPU and PPU. You'll want to do this before trying to render graphics probably as many games wait for an interrupt called vblank from the PPU that the graphics card is ready before even reaching the title. Many games will infinite loop until the PPU wakes them up with this interrupt, then do the work needed to render the next frame and return to the infinite loop. This is part of why it's so important to get the timing right.

Fourth, you'll want to make sure the address computations are right. This was the single hardest bit of code for me to get right in the PPU. It's also the code I'm happiest with and hoping to figure out how to test in an automated way for next time.

Fifth, try to just render the backgrounds using the addressing logic you arrived at ealier. If you can get backgrounds rendering correctly, you should be well on your way to getting sprites and scrolling working. With any luck, the PPU operation should start becoming clearer.

PPU Data Flow

Internally the graphics are represented as 8x8 tiles that are either sprites or backgrounds. Crucially, the information needed to render those tiles is divided up into the different areas inside the PPU: nametables, attribute tables, the palette, and pattern tables (in the ROM).

Nametables represent the background and are 960 byte long arrays where each byte is an index into the pattern table for an 8x8 tile. Why 960 bytes you ask? Because the NES resolution is 256 by 240 and if you divide that by 8 (pixels in a tile) you get 32 x 30. 32 * 30 = 960.

So nametables point to the "pattern" or texture that will be used for a given tile but for space reasons that pattern doesn't actually store all the information about what color it should be. The pattern table is 4kb and holds 256 tiles with each 8x8 tile taking 16 bytes to store. Those 16 bytes are enough for each pixel to get two bits to represent a color ... so 4 options.

The PPU has a 64 byte palette table to select 32 colors for the background and 32 colors for the sprites. But why bother when each pixel in a pattern can only count from 0-3? Well, did it seem a little odd that the Nametable was 960 bytes? That's because the last 64 bytes in that kilobyte are used to store something called an attribute table. Every 16 tiles share a single attribute byte which determines the top 2 bits of the palette index for tiles in groups of 4. There are implications from this about how many colors can be represented in a 16x16 pixel area of the screen, on a scanline, etc.

It's pretty confusing until you sit down and draw it all out. A lot of the calculations for the PPU are exactly this sort of thing. You can just imagine the hardware designers saying: "But how do we do it with less RAM to bring the price down?"

This has been written up well elsewhere, Scott Ferguson's blog comes to mind. But I still never found a high level description of how the PPU renders that wasn't based on perfectly emulating the state of a bunch of internal shift registers and latches and running the PPU cycle by cycle. And, pardon my french, but that's fucking gross. Not because it's inaccurate or slow or anything like that but just because it's hard to see the forest for the trees.

Here's something like how I think of background rendering now. Sprites are more complicated but follow the same basic framework:

  1. For each scanline, you have to draw 32 tiles.
  2. For a (background) tile, you have to get the nametable byte and attribute byte.
  3. The pattern table has 16 bytes for a tile but we only need two for the 8 pixels on a scanline.
  4. Go ahead and extract the two high bits of the palette index from the attribute byte.
  5. Now loop for pixels 0 to 7 and combine those bits with the bits in the pattern to get a color.
  6. Show the backdrop color if it's zero (transparent), otherwise show the background color.

I know this is inaccurate, but it's clear to follow at a high level and if you then pointed out the various address computations in the substeps it ought to be pretty straightforward.

A lot of my remaining questions concern how to support scrolling at least kinda correctly without basing everything off internal registers and how to render things tile by tile instead of pixel by pixel. But I may abandon that because it was mostly to avoid repeated fetches of the same data and I recently made a RENDER-CONTEXT object that can help with that. Maybe down the road at some point I'll make a cycle-accurate PPU. :)

Test ROMs

There aren't really good test ROMs because the ROMs that exist mostly assume you have the basics working and are testing tricky details. While I don't think a test ROM could be written since a lot of what needs testing are internal details of the PPU that weren't exposed to NES programmers, I do think a ROM coupled with some JSON dumps of what internal data should be visible after rendering for a frame or two would be incredibly useful. Because at some point I spent 3 days on a single bug because I wasn't incrementing a counter appropriately.

I'd like to think on this some more but I have some basic ideas. A lot of the difficulty is that in the 90s there were working emulators that did more abstract high level emulation both because PCs were less powerful and because less was known about the underlying hardware. That required workarounds of various sorts for accuracy and so they're frowned upon now. But as a result, I haven't found much high-level documentation of the rendering algorithm in the PPU. Everyone seems to point back to a frame timing diagram on the nesdev wiki. Which is great but I was hoping to write a 2500 lines of code readable NES implementation that doesn't require a solid understanding of latches and assembly to get a basic idea of how the thing worked.

But I believe the address calculations, among other things, can be expressed clearly (and tested!) in terms of the X,Y coordinates to be rendered as opposed to internal registers. More soon...

Hard to Follow Implementations

I'm still unsatisfied with my PPU implementation but it also isn't completely finished. I hope to have more to show here after scrolling is working and some refactoring is done.

A Word On Performance

The output resolution of the Nintendo was 256x240. At a high level, all the PPU is doing is looping from left to right (0-255), top to bottom (0-240) and deciding on a color for the current pixel, then outputting it once per frame. Of course, it has to do that 60 times a second and 256 * 240 * 60 is 3.6 million so pixel rendering needs to be pretty fast. I didn't have to do any optimizing to hit 60 frames per second but I was careful to write code that didn't allocate as I went and we're still using 50% CPU which is definitely more than I'd like.

Til Next Time

Wish me luck, lispers. Cheers. <3


Unless otherwise credited all material Creative Commons License by Brit Butler