After finishing Hellgate last year, I tried re-creating a scene from the classic SNES game Secret of Mana.
I wanted to do something different than FPU arithmetic this time, and I was curious to see if I could include ‘real’ sprites in a 256 bytes intro, so I gave it a try.
Foolishly I thought a scene like this would be almost possible :
Some quotes beforehand :
- “there are a lot of symmetries, you can probably save a lot of bytes !”
- “that grass doesn’t seem too hard to generate”
- “there are a lot of repetitions and rotations, it doesn’t look like there are that many different sprites”
Of course I had no idea what I was talking about, it was way harder than anticipated, and I failed miserably :)
~
The idea floated in my mind for the whole year after Revision 2024, wondering which scene I should try and which one would be “simple enough” ; I settled on Pure Lands.
I started coding ~30 days prior to Revision 2025, and finished less than 24h before the deadline (!), like the previous year. It was a hectic month, trying to salvage the intro’s concept when it became apparent I would not be able to realize the original vision.
The final intro and its source code is available here and on pouët.net, it needs DOSBox with “cycles=35000” configured to run it properly.
Tiling system
I started with a tiling system, dividing the 320x200 pixels screen into 16x16 pixels tiles, then calculating a pixel index from 0 to 255 for each tile :
- the first pixel (top-left corner) being 0
- the last pixel (bottom-right corner) being 255
pixel:
; get X/Y coordinates from DI (0 - 63 999)
mov ax, di
mov bx, 320
xor dx, dx
div bx
; DX : X screen coordinate (0 - 320)
; AX : Y screen coordinate (0 - 200)
mov [coordy], ax ; store Y screen coordinate for later
mov ax, dx
mov bx, 16
xor dx, dx
div bx
; AX : tile Y : 0 - 15
; DX : tile X : 0 - 15
(...)
mov ax, [coordy] ; get back Y (0 - 200) into AX
shl ax, 4 ; multiply screen Y by 16 by shifting left four times
add ax, dx ; add tile X
; AX : pixel index (0 - 255)
stosb ; draw pixel color stored in AL (1 byte), and increment DI
loop pixel ; loop as long as cx > 0
The equivalent in C# :
for (ushort screenPos = 0; screenPos < 64000; ++screenPos)
{
// screenPos corresponds to the pixel position on screen (DI register in ASM)
int screenX = screenPos % 320;
int screenY = screenPos / 320;
int tileX = screenX % 16;
int tileY = screenX / 16;
// ...
byte color = screenY << 4; // shift by 4 / multiply by 16
color = color + screenY % 16; // 0 - 255
}
Visualized like this with the default VGA palette :
Then I computed a “tile ID” for each tile, from 0 to 249 (320/16 * 200/16) :
This would help later to differentiate which tile we were on.
I figured it was enough to get started, I “just” needed to store an actual sprite in the code now, then figure out how to compute a layout later.
Storing sprite data
I extracted a 16x16 pixels tile from the scene above, starting with a leaves tile, trying to compress it as much as I could :
There are 256 pixels, each needing its color value.
I could simply store a VGA palette value (0-255) for each pixel : it would take one byte (8 bits) per pixel, that is, the whole intro size. Not good enough :)
Since there are not that many different colors, I can compress the color value on less than 8 bits :
- 1 bit per pixel means 2 different colors (e.g. black/white) for the whole sprite, not really usable
- 2 bits per pixel allows 4 colors, which could work for simpler sprites, like ground or grass
- 3 bits per pixel allows 8 colors, which was enough for most sprites I tested
- 4 bits per pixel allows 16 colors, but uses too much space.
This sprite could be reduced to 8 different colors with a bit of editing ; with 3 bits per pixels this means I can store it on a continuous list of 3x256/8 = 96 bytes.
Compressing BMP
After some testing, I realized it would be much simpler to draw and test multiple sprites with a real image editor, instead of trying to compute the compressed byte data by hand.
I used Aseprite as my image editor, since it’s good with pixel art.
The BMP format seemed like the ideal candidate :
- it’s simple enough to be parsed with custom code
- no compression
- it supports a variety of color encodings : 1, 4, 8, 16 or 24 bits per pixel
- when 8 bits per pixels or less are used, it comes with a palette (“color table”) binding these values to RGB colors stored on 3 bytes / 24 bits.
When editing an image in grayscale mode, Aseprite is able to create BMP files with 8 bits per pixel, although it’s not possible to control the exact BMP palette exported : a palette of 255 values (0 = black, 255 = white) is always exported instead.
~
So I converted the sprite to grayscale, and exported it as a BMP file :
Then I wrote a C# command line tool that :
- reads a BMP file,
- map the colors to 2-or 3-bits per pixel palette
- ensure the color mapping keep the darkest colors to the lowest values to preserve the same gradient as the VGA palette
- export a compressed byte sequence to be included into the intro’s assembly code :
This would give a list of values from 0 to 3 or 0 to 7 for each pixel, which would then be used as a color palette index.
The default VGA palette has a nice grayscale pattern starting from color 16 to 31 (16 is solid black, and 31 is solid white) :
I already figured at this point I would probably not have the luxury of using different colors.
With that done, it meant I could easily extract multiple sprites, adjust them in Aseprite, and copy their data in the intro to see the result.
Reading compressed sprite data
With the sprite data generated, I needed a way to address it from the assembly code, that is : starting from a pixel coordinate of 0-255, reading the corresponding color value in the compressed data.
The exact color storage and reading would depend on the color palette chosen : I tried 2- and 3-bits per pixel compression, since the other options weren’t very compelling.
2-bits color per pixel
Storing colors on 2 bits is relatively straightforward : since we need to read at least a byte / 8 bits of data into our register, this means the first byte will store the color for the four first pixels :
- bits 0-1 holds the color for the first pixel
- bits 2-3 holds the color for the second pixel
- bits 4-5 holds the color for the third pixel
- bits 6-7 holds the color for the fourth pixel
The second byte will store the color for the next four pixels, and so on.
For any given pixel in a tile, we need to figure out the address of this single byte to read, which is the [0-255] coordinate divided by 4.
Then we need to isolate the 2 bits we’re interested in : this is done by shifting the bits right to align the 2 bits for this color to bit 0 and bit 1, then apply a logical AND to set all the other bits to 0.
The resulting code looks like this 1 :
; AL starts here with the pixel coordinate : 0-255
; divide AL by 4 since its 2 bits per pixel
mov bx, 4
xor dx, dx
div bx
; AX: 0-63 DX: 0-3
; read the selected byte into AL
mov bx, ax
mov al, byte [sprite + bx]
; shift right by 2 * DX
cmp dx, word 0
jz mask
mov cx, dx
shift:
shr al, 2
loop shift
mask:
and al, 3 ; AND 0x03 : keep only the first two bits of AL
shl al, 1 ; multiply by 2 and add 16 to match the VGA grayscale palette
add al, 16
The equivalent C# code would be :
// pixel is a byte value between 0 and 255 in each tile
int byteIndex = pixel / 4; // 0-63
int colorIndex = pixel % 4; // 0-3
ushort readData = *(ushort*)(spriteData + byteIndex);
for (int i = 0; i < colorIndex; ++i) readData = readData >> 2;
readData = readData & 0x03;
readData = readData * 2;
readData = readData + 16;
The above code worked, unfortunately the chosen sprite used more than 4 colors, so I needed to use a 3-bits per pixel scheme instead.
3-bits color per pixel
Using 3 bits per pixel creates another issue, since 8 bits are not evenly divisible by 3.
As a single byte can hold only 2 colors, we will lose 2 “padding” bits if we still want to read a single byte every time, i.e. bit 6 and bit 7 won’t store any color value.
No CPU register (8 / 16 / 32-bits) can hold a sequence of 3-bits values without padding, because none of these is exactly divisible by 3.
Storing the entire 3-bits sequence without any padding would mean some colors would be stored accross two bytes, so at the time I thought padding was mandatory (it’s not 2).
I decided to store five 3-bits colors every 2 bytes, losing one bit to padding every 5 colors (16 - 3*5). The compressed sprite size was 816 bits / 102 bytes, with 6 bytes lost to padding.
Drawing the sprite
Putting all this together, I could finally display a sprite !
I tried some layout schemes, inserting black borders, generating a background from parts of the sprite data.
Things were looking great :
- my intro was already exceeding the 200 bytes mark,
- I was not anywhere close to display a real scene,
- I finally realized the original idea was indeed not possible with such constraints,
- Revision was in less than two weeks :)
While meditating once again on the 2 megabytes of the full SoM SNES cartridge, I searched for a sprite that would “work” for a simpler animation, without trying to design a whole scene.
Switching to Mana Fortress
I stumbled upon this crystal from the Mana Fortress, towards the end of the game :
Extracted it and converted it to grayscale :
The sprite was better, but I still didn’t know what to do with it.
At the same time I was considering dropping the intro altogether, and remembered another experiment during Revision 2024 where I read back previous frame data to create interesting patterns :
This is simply done by reading the screen pointer register DI + an offset, generally 1 (for the pixel immediately after the one currently drawn), 320 (the pixel under), or 321 (one pixel to the right on the next line) :
mov al, [es:di + 321]
I updated the computation for the tile ID to assign a random value between a smaller interval, with the idea that :
- a tile ID of 0 would mean a “background” tile (instead of drawing the sprite).
- a non-zero tile ID would mean the sprite should be drawn, with the ID used to add some variations.
I used the buffer reuse trick above for the background tiles to draw patterns as the sprites moved.
I also updated the sprite drawing, so that the lightest colors would be mapped on a time-varying full VGA palette instead of gray/white, which looked nice with a solid, darker background for the base of the crystal.
Combining these two techniques provided the variations I needed :
Exploiting symmetry
A friend remarked the crystal was nearly symmetrical, so in principle only half of the sprite could be stored, and the other half reconstructed with code.
So I switched to a half-crystal :
And updated the sprite reading code to this :
; BL starts here with the pixel coordinate divided by 2 :
; 0-127 instead of the original 0-255
; use symmetry to construct the right half
cmp bl, 8 ; if BL >= 8
jl after_symmetry
; pixel = 15 - pixel
mov dl, 15
sub dl, bl
mov bl, dl
after_symmetry:
This would change the pixel indices from this [0-255] :
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
(...)
To this [0-127] :
0 1 2 3 4 5 6 7 7 6 5 4 3 2 1 0
8 9 10 11 12 13 14 15 15 14 13 12 11 10 9 8
(...)
This reduced the sprite data in half (54 bytes), plus a small code size penalty with the instructions above.
The intro was around 200 bytes, the visuals were good enough.
I quickly tried alternative color palettes but nothing sticked, so I figured I would instead add a simple Secret of Mana-inspired music to go along with it.
Adding music
Listening to the OST, I found that despite the relative simplicity of the MIDI tracks, the ~50 bytes left were largely insufficient to reproduce any kind of recognizable melody.
Time was running short, so I settled on a simple drum pattern reminiscing of the Mana Fortress song, Leave Time for Love.
Like my previous intro, I stored all needed MIDI commands at the end of the code (the “music_init” label), and erased the selected note instruction every time before playing the command sequence :
test bp, word 1 ; play sound every 2 frames
jnz loop_end
and bp, 15 ; AND 4 bits : 8 notes [ 0, 2, 4, 6, 8, 10, 12, 14 ]
mov al, 0 ; default note is silence
jz play ; skip first note
cmp bp, 8 ; skip 4th note
je play
mov al, 44 ; hi hat ; all non-silence note are hi-hats
cmp bp, 14 ; .. except 8th note is a snare
jne play
mov al, 38 ; snare
play:
mov [note], al ; erase byte at address with selected note in AL
mov dx, 0x331 ; MIDI Control Port
mov si, music_init
outsb ; output byte at SI to port, and increment SI by 1
dec dx ; MIDI Data Port (0x330)
; send MIDI note commands
outsb
outsb
outsb
(...)
; data section
music_init:
db 0x3F ; UART mode
db 0x99 ; NOTE ON on channel 0 (drums by default)
note:
db 0 ; MIDI note, updated every 2 frames
db 100 ; volume (0-100)
I wasn’t particularly convinced by this music, and considered removing it. While vaguely reminiscing of the corresponding area of the game, I found it too repetitive and a bit disconnected from the visuals.
I didn’t have time nor bytes left, so I left it as is.
And with that, the intro was done.
Conclusion
I think I’m both proud and a bit frustrated by this intro, as it required a lot of work for a relatively simple outcome ; however I feel like I needed to try the straightforward algorithm to get a sense of what’s really possible with these size constraints.
I found some days after Revision that the whole tiling system and color adressing could be vastly simplified, and replaced by a function that directly compute a color index from the screen pointer without unnecessary tile coordinates computations.
Maybe this will free up enough space to do something more complex in the future.
New tricks
I found some new compression tricks while trying to optimize this intro, after applying most of the tricks I learned last year while developing Hellgate.
Two highlights :
Read/write after the end
Turns out the bytes after the end of the file are initialized at 0, so I removed the last byte of the sprite data, luckily storing two black pixels (0x00) :
sprite:
dw 0x0247, 0x7110, 0x4291, 0x2486, 0x0FA1,
0x425A, 0x163E, 0x694B, 0x14C8, 0x0D2E,
0x6E63, 0x5855, 0x2DE1, 0x53A1, 0x4374,
0x78BA, 0x28C8, 0x1254, 0x551A, 0x2306,
0x04A3, 0x3461, 0x1E24, 0x124B, 0x0047
; db 0x00 - not needed :)
Then I reused this technique by assigning a label aptly named “wtf” and reading/writing to this “unallocated” memory, removing all db/dw declarations that were not constants :
mov [wtf], dl ; save DL
mov [wtf+1], al ; we can also use pointer arithmetics
(...)
mov al, [wtf] ; get back DL into AL
wtf: ; just a label at the last line of the file
AAM instead of DIV
AAM is an instruction which divides AX by an immediate value and stores the quotient and remainder in AH and AL.
It’s a nice alternative to DIV when the the expected results are less than 255 (since AL/AH are one byte each) :
; assuming AX contains the value to divide
aam 3
; quotient in AH, remainder in AL, uses only 2 bytes
; can be equivalent to :
xor dx,dx
mov bx, 3
div bx
; quotient in AX, remainder in DX, uses 7 bytes
This was used to compress the whole color computation code :
; AL (texel) : 0 - 128
; 3bits per pixel : a 2-bytes register holds 5 texels
; compute color offset in 16-bits word : texel / 5
aam 5 ; divide AL by 5 and store quotient into AH, remainder into AL
; AH: 0-25 (word index), AL: 0-4 (color offset)
mov cl, al ; save AL in CL
shl ah, 1 ; multiply by 2 : 2 bytes (16 bits) read each time
; we will read and address between 0 and 50
movsx bx, ah ; expand AH (0-25, 8 bits) into a 16 bits register, BX
mov ax, word [sprite + bx] ; read 16bits at computed address
; AX contains 16 bits, we need to keep only the 3 bits for this pixel's palette index
; texel >> (3 * DX), since its 3 bits per pixel
; instead of using a loop, we shift right three times by the number of bits wanted (0-3)
shr ax, cl
shr ax, cl
shr ax, cl
; AND 0x07 : keep first three bits
and al, 7