VERA Programmer's Reference Guide
VERA Programmer's Reference Guide
This is preliminary documentation and the specification can still change at any point.
This document describes the Versatile Embedded Retro Adapter or VERA. The VERA consists of:
16-channel Programmable Sound Generator with multiple waveforms (Pulse, Sawtooth, Triangle, Noise)
High quality PCM audio playback from an 4kB FIFO buffer featuring up to 48kHz 16-bit stereo sound.
SPI controller for SecureDigital storage.
Registers
Addr Name Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0
ADDRx_L
$9F20 VRAM Address (7:0)
(x=ADDRSEL)
ADDRx_M
$9F21 VRAM Address (15:8)
(x=ADDRSEL)
VRAM
ADDRx_H Nibble Nibble
$9F22 Address Increment DECR Address
(x=ADDRSEL) Increment Address
(16)
IRQLINE_L (Write
$9F28 IRQ line (7:0)
only)
SCANLINE_L
$9F28 Scan line (7:0)
(Read only)
NTSC:
Chroma
DC_VIDEO Current Sprites Layer1 Layer0 NTSC/RGB:
$9F29 Disable / Output Mode
(DCSEL=0) Field Enable Enable Enable 240P
RGB: HV
Sync
DC_HSCALE
$9F2A Active Display H-Scale
(DCSEL=0)
DC_VSCALE
$9F2B Active Display V-Scale
(DCSEL=0)
DC_BORDER
$9F2C Border Color
(DCSEL=0)
-1-
Commander X16 Programmer's Reference Guide
DC_HSTART
$9F29 Active Display H-Start (9:2)
(DCSEL=1)
DC_HSTOP
$9F2A Active Display H-Stop (9:2)
(DCSEL=1)
DC_VSTART
$9F2B Active Display V-Start (8:1)
(DCSEL=1)
DC_VSTOP
$9F2C Active Display V-Stop (8:1)
(DCSEL=1)
FX_TILEBASE Affine
2-bit
$9F2A (DCSEL=2) FX Tile Base Address (16:11) Clip
Polygon
(Write only) Enable
FX_MAPBASE
$9F2B (DCSEL=2) FX Map Base Address (16:11) Map Size
(Write only)
Two-byte
FX_MULT Cache
Reset Subtract Multiplier Cache
$9F2C (DCSEL=2) Accumulate Cache Byte Index Nibble
Accum. Enable Enable Incr.
(Write only) Index
Mode
FX_X_INCR_L
$9F29 (DCSEL=3) X Increment (-2:-9) (signed)
(Write only)
FX_X_INCR_H
X Incr.
$9F2A (DCSEL=3) X Increment (5:-1) (signed)
32x
(Write only)
FX_Y_INCR_L
$9F2B (DCSEL=3) Y/X2 Increment (-2:-9) (signed)
(Write only)
FX_Y_INCR_H Y/X2
$9F2C (DCSEL=3) Incr. Y/X2 Increment (5:-1) (signed)
(Write only) 32x
FX_X_POS_L
$9F29 (DCSEL=4) X Position (7:0)
(Write only)
FX_X_POS_H
X Pos.
$9F2A (DCSEL=4) - X Position (10:8)
(-9)
(Write only)
FX_Y_POS_L
$9F2B (DCSEL=4) Y/X2 Position (7:0)
(Write only)
FX_Y_POS_H Y/X2
$9F2C (DCSEL=4) Pos. - Y/X2 Position (10:8)
(Write only) (-9)
FX_X_POS_S
$9F29 (DCSEL=5) X Postion (-1:-8)
(Write only)
-2-
Commander X16 Programmer's Reference Guide
(Write only)
FX_POLY_FILL_L
(DCSEL=5, 4-bit Fill Len
$9F2B X Position (1:0) Fill Len (3:0) 0
Mode=0) >= 16
(Read only)
FX_POLY_FILL_L
(DCSEL=5, 4-bit
Fill Len X Pos.
$9F2B Mode=1, 2-bit X Position (1:0) Fill Len (2:0) 0
>= 8 (2)
Polygon=0)
(Read only)
FX_POLY_FILL_L
(DCSEL=5, 4-bit
X2 Pos. X Pos. X Pos.
$9F2B Mode=1, 2-bit X Position (1:0) Fill Len (2:0)
(-1) (2) (-1)
Polygon=1)
(Read only)
FX_POLY_FILL_H
$9F2C (DCSEL=5) Fill Len (9:3) 0
(Read only)
FX_CACHE_L
$9F29 (DCSEL=6) Cache (7:0) | Multiplicand (7:0) (signed)
(Write only)
FX_ACCUM_RESET
$9F29 (DCSEL=6) Reset Accumulator
(Read only)
FX_CACHE_M
$9F2A (DCSEL=6) Cache (15:8) | Multiplicand (15:8) (signed)
(Write only)
FX_ACCUM
$9F2A (DCSEL=6) Accumulate
(Read only)
FX_CACHE_H
$9F2B (DCSEL=6) Cache (23:16) | Multiplier (7:0) (signed)
(Write only)
FX_CACHE_U
$9F2C (DCSEL=6) Cache (31:24) | Multiplier (15:8) (signed)
(Write only)
DC_VER0
$9F29 (DCSEL=63) The ASCII character "V"
(Read only)
DC_VER1
$9F2A (DCSEL=63) Major release
(Read only)
DC_VER2
$9F2B (DCSEL=63) Minor release
(Read only)
DC_VER3
$9F2C (DCSEL=63) Minor build number
(Read only)
Bitmap
$9F2D L0_CONFIG Map Height Map Width T256C Color Depth
Mode
-3-
Commander X16 Programmer's Reference Guide
Tile Tile
$9F2F L0_TILEBASE Tile Base Address (16:11)
Height Width
Bitmap
$9F34 L1_CONFIG Map Height Map Width T256C Color Depth
Mode
Tile Tile
$9F36 L1_TILEBASE Tile Base Address (16:11)
Height Width
FIFO
Full / FIFO Empty
FIFO (read-only)
$9F3B AUDIO_CTRL 16-Bit Stereo PCM Volume
Reset
Slow
$9F3F SPI_CTRL Busy - Select
clock
Important note: Video RAM locations 1F9C0-1FFFF contain registers for the PSG/Palette/Sprite attributes. Reading anywhere
in VRAM will always read back the 128kB VRAM itself (not the contents of the (write-only) PSG/Palette/Sprite attribute
registers). Writing to a location in the register area will write to the registers in addition to writing the value also to VRAM.
Since the VRAM contains random values at startup the values read back in the register area will not correspond to the actual
values in the write-only registers until they are written to once. Because of this it is highly recommended to initialize the
area from 1F9C0-1FFFF at startup.
-4-
Commander X16 Programmer's Reference Guide
to be set (ADDRx_L/ADDRx_M/ADDRx_H) and then the data on that VRAM address can be read from or written to via the DATA0/1 register. To
make accessing the VRAM more efficient an auto-increment mechanism is present.
There are 2 data ports to the VRAM. Which can be accessed using DATA0 and DATA1. The address and increment associated with the data
port is specified in ADDRx_L/ADDRx_M/ADDRx_H. These 3 registers are multiplexed using the ADDR_SEL in the CTRL register. When
ADDR_SEL = 0, ADDRx_L/ADDRx_M/ADDRx_H become ADDR0_L/ADDR0_M/ADDR0_H.
When ADDR_SEL = 1, ADDRx_L/ADDRx_M/ADDRx_H become ADDR1_L/ADDR1_M/ADDR1_H.
By setting the 'Address Increment' field in ADDRx_H, the address will be increment after each access to the data register. The increment
register values and corresponding increment amounts are shown in the following table:
0 0
1 1
2 2
3 4
4 8
5 16
6 32
7 64
8 128
9 256
10 512
11 40
12 80
13 160
14 320
15 640
Setting the DECR bit, will decrement instead of increment by the value set by the 'Address Increment' field.
Reset
When RESET in CTRL is set to 1, the FPGA will reconfigure itself. All registers will be reset. The palette RAM will be set to its default values.
Interrupts
Interrupts will be generated for the interrupt sources set in the lower 4 bits of IEN. ISR will indicate the interrupts that have occurred. Writing
a 1 to one of the lower 3 bits in ISR will clear that interrupt status. AFLOW can only be cleared by filling the audio FIFO for at least 1/4.
IRQ_LINE (write-only) specifies at which line the LINE interrupt will be generated. Note that bit 8 of this value is present in the IEN register.
For interlaced modes the interrupt will be generated each field and the bit 0 of IRQ_LINE is ignored.
SCANLINE (read-only) indicates the current scanline being sent to the screen. Bit 8 of this value is present in the IEN register. The value is 0
during the first visible line and 479 during the last. This value continues to count beyond the last visible line, but returns $1FF for lines 512-
524 that are beyond its 9-bit resolution. SCANLINE is not affected by interlaced modes and will return either all even or all odd values during
an even or odd field, respectively. Note that VERA renders lines ahead of scanout such that line 1 is being rendered while line 0 is being
scanned out. Visible changes may be delayed one scanline because of this.
The upper 4 (read-only) bits of the ISR register contain the sprite collisions as determined by the sprite renderer.
Display composer
-5-
Commander X16 Programmer's Reference Guide
The display composer is responsible of combining the output of the 2 layer renderers and the sprite renderer into the image that is sent to
the video output.
OUT_MODE Description
0 Video disabled
1 VGA output
2 NTSC (composite/S-Video)
Setting 'Chroma Disable' disables output of chroma in NTSC composite mode and will give a better picture on a monochrome display.
(Setting this bit will also disable the chroma output on the S-video output.)
Setting 'HV Sync' enables separate HSync/VSync signals in RGB output mode. Clearing the bit will enable the default of composite sync over
RGB.
Setting '240P' enables 240P progressive mode over NTSC or RGB. It has no effect if the VGA output mode is active. Instead of 262.5
scanlines per field, this mode outputs 263 scanlines per field. On CRT displays, the scanlines from both the even and odd fields will be
displayed on even scanlines.
'Current Field' is a read-only bit which reflects the active interlaced field in composite and RGB modes. In non-interlaced modes, this
reflects if the current line is even or odd. (0: even, 1: odd)
Setting 'Layer0 Enable' / 'Layer1 Enable' / 'Sprites Enable' will respectively enable output from layer0 / layer1 and the sprites renderer.
DC_HSCALE and DC_VSCALE will set the fractional scaling factor of the active part of the display. Setting this value to 128 will output 1
output pixel for every input pixel. Setting this to 64 will output 2 output pixels for every input pixel.
DC_BORDER determines the palette index which is used for the non-active area of the screen.
DC_HSTART/DC_HSTOP and DC_VSTART/DC_VSTOP determines the active part of the screen. The values here are specified in the native
640x480 display space. HSTART=0, HSTOP=640, VSTART=0, VSTOP=480 will set the active area to the full resolution. Note that the lower 2
bits of DC_HSTART/DC_HSTOP and the lower 1 bit of DC_VSTART/DC_VSTOP isn't available. This means that horizontally the start and
stop values can be set at a multiple of 4 pixels, vertically at a multiple of 2 pixels.
DC_VER0, DC_VER1, DC_VER2, and DC_VER3 can be queried for the version number of the VERA bitstream. If reading DC_VER0 returns
$56 , the remaining registers returns values forming the major, minor, and build numbers respectively. If DC_VER0 returns a value other
than $56 , the VERA bitstream version number is undefined.
'Tile Base Address' specifies the base address of the tile data. Note that the register only specifies bits 16:11 of the address, so the
address is always aligned to a multiple of 2048 bytes.
'H-Scroll' specifies the horizontal scroll offset. A value between 0 and 4095 can be used. Increasing the value will cause the picture to move
left, decreasing will cause the picture to move right.
'V-Scroll' specifies the vertical scroll offset. A value between 0 and 4095 can be used. Increasing the value will cause the picture to move up,
decreasing will cause the picture to move down.
'Map Width', 'Map Height' specify the dimensions of the tile map:
0 32 tiles
1 64 tiles
2 128 tiles
-6-
Commander X16 Programmer's Reference Guide
3 256 tiles
0 8 pixels
1 16 pixels
In bitmap modes, the 'H-Scroll (11:8)' register is used to specify the palette offset for the bitmap.
'Color Depth' specifies the number of bits used per pixel to encode color information:
0 1 bpp
1 2 bpp
2 4 bpp
3 8 bpp
The layer can either operate in tile mode or bitmap mode. This is selected using the 'Bitmap Mode' bit; 0 selects tile mode, 1 selects bitmap
mode.
The handling of 1 bpp tile mode is different from the other tile modes. Depending on the T256C bit the tiles use either a 16-color foreground
and background color or a 256-color foreground color. Other modes ignore the T256C bit.
MAP_BASE points to a tile map containing tile map entries, which are 2 bytes each:
0 Character index
Each bit in the tile data specifies one pixel. If the bit is set the foreground color as specified in the map data is used, otherwise the
background color as specified in the map data is used.
MAP_BASE points to a tile map containing tile map entries, which are 2 bytes each:
0 Character index
1 Foreground color
Each bit in the tile data specifies one pixel. If the bit is set the foreground color as specified in the map data is used, otherwise color 0 is used
(transparent).
-7-
Commander X16 Programmer's Reference Guide
Each pixel in the tile data gives a color index of either 0-3 (2bpp), 0-15 (4bpp), 0-255 (8bpp). This color index is modified by the palette offset
in the tile map data using the following logic:
TILEW specifies the bitmap width. TILEW=0 results in 320 pixels width and TILEW=1 results in 640 pixels width.
The palette offset (in 'H-Scroll (11:8)') modifies the color indexes of the bitmap in the same way as in the tile modes.
SPI controller
The SPI controller is connected to the SD card connector. The speed of the clock output of the SPI controller can be controlled by the 'Slow
Clock' bit. When this bit is 0 the clock is 12.5MHz, when 1 the clock is about 390kHz. The slow clock speed is to be used during the
initialization phase of the SD card. Some SD cards require a clock less than 400kHz during part of the initialization.
A transfer can be started by writing to SPI_DATA. While the transfer is in progress the BUSY bit will be set. After the transfer is done, the
result can be read from the SPI_DATA register.
The chip select can be controlled by writing the SELECT bit. Writing 1 will assert the chip-select (logic-0) and writing 0 will release the chip-
select (logic-1).
Palette
The palette translates 8-bit color indexes into 12-bit output colors. The palette has 256 entries, each with the following format:
0 Green Blue
1 - Red
-8-
Commander X16 Programmer's Reference Guide
Color indexes 0-15 contain a palette somewhat similar to the C64 color palette.
Color indexes 16-31 contain a grayscale ramp.
Color indexes 32-255 contain various hues, saturation levels, brightness levels.
Sprite attributes
128 entries of the following format:
0 Address (12:5)
2 X (7:0)
-9-
Commander X16 Programmer's Reference Guide
3 - X (9:8)
4 Y (7:0)
5 - Y (9:8)
Mode Description
0 4 bpp
1 8 bpp
Z-depth Description
0 Sprite disabled
0 8 pixels
1 16 pixels
2 32 pixels
3 64 pixels
Rendering Priority The sprite memory location dictates the order in which it is rendered. The sprite whose attributes are at the lowest
location will be rendered in front of all other sprites; the sprite at the highest location will be rendered behind all other sprites, and so forth.
Sprite collisions
At the start of the vertical blank Collisions in ISR is updated. This field indicates which groups of sprites have collided. If the field is non-zero
the SPRCOL interrupt will be set. The interrupt is generated once per field / frame and can be cleared by making sure the sprites no longer
collide.
Note that collisions are only detected on lines that are actually rendered. This can result in subtle differences between non-interlaced and
interlaced video modes.
VERA FX
The FX feature set is available in VERA firmware version v0.3.1 or later. The Commander X16 emulators also have this feature officially as of
R44.
FX is a set of mainly addressing mode changes. VERA FX does not accelerate rendering, but it merely assists the CPU with some of the slower
tasks, and when used cleverly, can allow for the programmer to perform some limited perspective transforms or basic 3D effects.
FX features are controlled mainly by registers $9F29-$9F2C with DCSEL set to 2 through 6. FX_CTRL ($9F29 w/ DCSEL=2) is the master
switch for enabling or disabling FX behaviors. When writing an application that uses FX, it is important that the FX mode be preserved and
disabled in interrupt handlers in cases where the handler accesses VERA registers or VRAM, including the PSG sound registers. Reading from
FX_CTRL returns the current state, and writing 0 to FX_CTRL suspends the FX behaviors so that the VERA can be accessed normally without
mutating other FX state.
Preliminary documentation for the feature can be found here , but as this is a brand new feature, examples and documentation still need to
be written.
- 10 -
Commander X16 Programmer's Reference Guide
Frequency word sets the frequency of the sound. The formula for calculating the output frequency is:
Thus the output frequency can be set in steps of about 0.373 Hz.
Example: to output a frequency of 440Hz (note A4) the Frequency word should be set to 440 / (48828.125 / (2^17)) = 1181
Volume controls the volume of the sound with a logarithmic curve; 0 is silent, 63 is the loudest. The Left and Right bits control to which
output channels the sound should be output.
Waveform Description
0 Pulse
1 Sawtooth
2 Triangle
3 Noise
Pulse width controls the duty cycle of the pulse waveform. A value of 63 will give a 50% duty cycle or square wave, 0 will give a very
narrow pulse.
Just like the other waveform types, the frequency of the noise waveform can be controlled using frequency. In this case a higher frequency
will give brighter noise and a lower value will give darker noise.
PCM audio
For PCM playback, VERA contains a 4kB FIFO buffer. This buffer needs to be filled in a timely fashion by the CPU. To facilitate this an AFLOW
(Audio FIFO low) interrupt can be generated when the FIFO is less than 1/4 filled.
Audio registers
AUDIO_CTRL ($9F3B)
FIFO Full (bit 7) is a read-only flag that indicates whether the FIFO is full. Any writes to the FIFO while this flag is 1 will be ignored. Writing a
1 to this register (FIFO Reset) will perform a FIFO reset, which will clear the contents of the FIFO buffer, except when written in combination
with a 1 in bit 6.
FIFO Loop (bit 6+7): If a 1 is written to both bit 6 and 7 (at the same time), the FIFO will loop when played. Any other write to AUDIO_CTRL
clears this loop flag. Note: this feature is currently only available in x16-emulator and is not in any released VERA firmware.
FIFO Empty (bit 6) is a read-only flag that indicates whether the FIFO is empty.
16-bit (bit 5) sets the data format to 16-bit. If this bit is 0, 8-bit data is expected.
- 11 -
Commander X16 Programmer's Reference Guide
Stereo (bit 4) sets the data format to stereo. If this bit is 0 (mono), the same audio data is send to both channels.
PCM Volume (bits 0..3)controls the volume of the PCM playback, this has a logarithmic curve. A value of 0 is silence, 15 is the loudest.
AUDIO_RATE ($9F3C)
PCM sample rate controls the speed at which samples are read from the FIFO. A few example values:
0 stop playback
>128 invalid
Using a value of 128 will give the best quality (lowest distortion); at this value for every output sample, an input sample from the FIFO is
read. Lower values will output the same sample multiple times to the audio DAC. Input samples are always read as a complete set (being
1/2/4 bytes).
AUDIO_DATA ($9F3D)
Audio FIFO data Writes to this register add one byte to the PCM FIFO. If the FIFO is full, the write will be ignored.
NOTE: When setting up for PCM playback it is advised to first set the sample rate at 0 to stop playback. First fill the FIFO buffer with some
initial data and then set the desired sample rate. This can prevent undesired FIFO underruns.
16-bit stereo <left sample (7:0)> <left sample (15:8)> <right sample (7:0)> <right sample (15:8)>
- 12 -
Commander X16 Programmer's Reference Guide
# VERA FX Reference
Author: MooingLemur, based on documentation written by JeffreyH
This is preliminary documentation and the specification can still change at any point.
Introduction
This is a reference for the VERA FX features. It is meant to be a complement to the tutorial, currently found here .
The FX Update mainly adds "helpers" inside of VERA that can be used by the CPU. There is no "magic button" that allows you to do 3D
graphics for example. It mainly helps at certain CPU time-consuming tasks, most notably the ones that are present in the (deep) inner-loop of
a game/graphics engine. The FX Update does therefore not fundamentally change the architecture or nature of VERA, it extends and
improves it.
In other words: the CPU is still the orchestrator of all that is done, but it is alleviated from certain operations where it is not (very) good at or
does not have direct access to.
Usage
DCSEL
VERA is mapped as 32 8-bit registers in the memory space of the Commander X16, starting at address $9F20 and ending at $9F3F. Many of
these are (fully) used, but some bits remain unused. The DCSEL bits in register $9F25 (also called CTRL) has been extended to 6-bits to allow
for the 4 registers $9F29-$9F2C to have additional meanings.
Addr Name Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0
DCSEL
$9F25 CTRL Reset ADDRSEL
The FX features use DCSEL values 2, 3, 4, 5, and 6. This effectively gives FX 20 8-bit registers. Note that 15 of these registers are write-only,
2 of them are read-only and 3 are both readable and writable,
Important: unless DCSEL values of 2-6 are used, the behavior of VERA is exactly the same as it was before the FX update. This ensures that
the FX update is backwards compatible with traditional non-FX uses of VERA.
Addr1 Mode
When DCSEL=2, the main FX configuration register becomes available (FX_CTRL/$9F29), which is both readable and writable. The 2 lower
bits are the addr1 mode bits, which will change the behavior of how and when ADDR1 is updated. This puts the FX helpers in a certain "role".
Addr Name Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0
Addr1 Mode
FX_CTRL Transp. Cache Write Cache Fill One-byte Cache 16-bit 4-bit
$9F29
(DCSEL=2) Writes Enable Enable Cycling Hop Mode
3 Affine helper
By default, Addr1 Mode is set to 0 (=00b), which is the normal and already-known behavior of ADDR1 .
- 13 -
Commander X16 Programmer's Reference Guide
Set ADDR0 increment in the direction you will sometimes increment. Even though this is the increment for ADDR0 , we are
using it in line draw mode as an incrementer for ADDR1 .
For 8-bit mode: (+1, -1, -320, or +320).
For 4-bit mode: (-0.5, +0.5, -160, or +160)
For 4-bit mode, the half increments are set via the Nibble Increment bit and optionally the DECR bit in ADDRx_H . For the
Nibble Increment bit to have effect, the main Address Increment must be set to 0, and the 4-bit Mode bit must be set in
FX_CTRL ($9F29, DCSEL=2).
Addr Name Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0
Nibble
DECR VRAM
ADDRx_H Increment Nibble
$9F22 Address Increment Address
(x=ADDRSEL) Address
(16)
- 14 -
Commander X16 Programmer's Reference Guide
Octant 8-bit ADDR1 increment 8-bit ADDR0 increment 4-bit ADDR1 increment 4-bit ADDR0 increment
Set your slope into the two "X Increment" registers (DCSEL=3, see below). Note that increment registers are 15-bit signed fixed-point
numbers, and for this mode, the range should be 0.0 to 1.0 inclusive, so you'll either want to store the value of 1, or you'll want to set
only the fractional part.
Addr Name Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0
Note: Of the two incrementers, the line draw helper uses only the X incrementer. However depending on the octant you are drawing in, this
incrementer will be used to depict either x or y pixel increments. So the "X" should not be taken literally here, it just means the first of the
two incrementers.
As a side effect of in line draw mode, by setting FX_X_INCR_H ($9F2A, DCSEL=3), the fractional part (the lower 9 bits) of X Position
are automatically set to half a pixel. Furthermore, the lowest bit of the pixel position (which acts as an overflow bit) is set to 0 as well.
This effectively sets the starting X-position to 0.5 (the center) of a pixel.
Note: There is no need to set the higher bits of the X position, since the FX X position (accumulator) is only used to track the fractional
(subpixel) part of the line draw.
- 15 -
Commander X16 Programmer's Reference Guide
Set ADDR0 to the address of the y-position of the top point of the triangle and x=0 (so on the left of the screen). Set its increment to
+320 (for 8-bit mode) or +160 (for 4-bit mode).
Note: ADDR0 is used as "base address" for calculating ADDR1 for each horizontal line of the triangle. ADDR0 should
therefore start at the top of the triangle and increment exactly one line each time.
There is no need to set ADDR1 . This is done by VERA.
Calculate your slopes (dx/dy) for both the left and right point. Unlike the line draw helper, these slopes can be negative and can
exceed 1.0. They are not dependent on octant, but cover the whole 180 degrees downwards. Below is an illustration of some (not-to-
Set your left slope into the two "X increment" registers and your right slope into the two "Y increment" registers (DCSEL=3, see
below).
Important: They should be set to half the increment (or decrement) per horizontal line! This is because the polygon filler
increments in two steps per line.
Addr Name Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0
FX_X_INCR_L
$9F29 (DCSEL=3) X Increment (-2:-9) (signed)
(Write only)
FX_X_INCR_H
$9F2A (DCSEL=3) X Incr. 32x X Increment (5:0) (signed) X Incr. (-1)
(Write only)
FX_Y_INCR_L
$9F2B (DCSEL=3) Y/X2 Increment (-2:-9) (signed)
(Write only)
FX_Y_INCR_H
$9F2C (DCSEL=3) Y/X2 Incr. 32x Y/X2 Increment (5:0) (signed) Y/X2 Incr. (-1)
(Write only)
Due to the fact that we are in "polygon fill"-mode, by setting the high bits of the "X increment" ($9F2A, DCSEL=3), the "X position"
(the lower 9 bits of the position in DCSEL=4 and DCSEL=5) are automatically set to half a pixel. The same goes for the high bits of
- 16 -
Commander X16 Programmer's Reference Guide
Addr Name Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0
Steps that are needed for filling a triangle part with lines:
This will not return any useful data but will do two things in the background:
Increment/decrement the X1 and X2 positions by their corresponding increment values.
Set ADDR1 to ADDR0 + X1
Then read the “Fill length (low)”-register. Its output depends on whether you're in 4 or 8-bit mode.
Addr Name Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0
FX_POLY_FILL_L
Fill Len >= X Position
$9F2B (DCSEL=5, 4-bit Mode=0) Fill Len (3:0) 0
16 (1:0)
(Read only)
FX_POLY_FILL_L
(DCSEL=5, 4-bit Mode=1, 2-bit Fill Len >= X Position X Pos.
$9F2B Fill Len (2:0) 0
Polygon=0) 8 (1:0) (2)
(Read only)
If fill_len >= 16 (or >= 8 in 4-bit mode) then also read the “Fill length (high)”-register:
Addr Name Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0
FX_POLY_FILL_H
$9F2C (DCSEL=5) Fill Len (9:3) 0
(Read only)
**Important**: when the two highest bits of Fill Len (bits 8 and 9) are both 1, it means there is a negative fill length. The line should
not be drawn!
Together they give you 10-bits of fill length (ignore the other bits for now). Since ADDR1 is already set properly you can immediately
start drawing this number of pixels (given by Fill Len).
Check if all lines of this triangle part have been drawn, if not go to the first step.
There is also a 2-bit polygon mode, which is better explained in the tutorial
Affine helper
When Addr1 Mode is set to 3 (=11b) the affine (transformation) helper is enabled.
- 17 -
Commander X16 Programmer's Reference Guide
When reading from ADDR1 in this mode, the affine helper reads tile data from a special tile area defined by two new FX registers:
FX_TILEBASE is pointed to a set of 8x8 tiles in either 4-bit or 8-bit depth. FX can support up to 256 tile definitions, and can overlap
the traditional layer tile bases.
FX_MAPBASE points to a square-shaped tile map, one byte per tile. This tile map has no attribute bytes. unlike the traditional layer
0/1 tile maps.
Addr Name Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0
Affine Clip Enable changes the behavior when the X/Y positions are outside of the tile map such that it always reads data from tile
0. The default behavior is to wrap the X/Y position to the opposite side of the map.
Map Size is a 2 bit value that affects both the width and height of the tile map.
0 2×2
1 8×8
2 32×32
3 128×128
The Transparent Writes toggle in FX_CTRL is especially useful in Affine helper mode. Setting this toggle causes a write of zero to
leave the byte (or the nibble) at the target address intact. This toggle is not limited to affine helper mode, and it affects writes to
both DATA0 and DATA1.
Addr Name Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0
Transp.
FX_CTRL Writes Cache Write Cache Fill One-byte Cache 16-bit 4-bit
$9F29 Addr1 Mode
(DCSEL=2) Enable Enable Cycling Hop Mode
When using the affine helper, the X and Y position registers (DCSEL=4) are used to set ADDR1 to the source pixel indirectly in the
aforementioned tile map, while the X and Y increments determine the step after each read of ADDR1.
Addr Name Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0
The affine helper supports the full range of X and Y increment values, including negative values.
Addr Name Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0
- 18 -
Commander X16 Programmer's Reference Guide
FX_X_INCR_L
$9F29 (DCSEL=3) X Increment (-2:-9) (signed)
(Write only)
FX_X_INCR_H
$9F2A (DCSEL=3) X Incr. 32x X Increment (5:0) (signed) X Incr. (-1)
(Write only)
FX_Y_INCR_L
$9F2B (DCSEL=3) Y/X2 Increment (-2:-9) (signed)
(Write only)
FX_Y_INCR_H
$9F2C (DCSEL=3) Y/X2 Incr. 32x Y/X2 Increment (5:0) (signed) Y/X2 Incr. (-1)
(Write only)
32-bit cache
When the CPU reads a byte via DATA0 or DATA1, and "cache fill enable" is set, the value read will be copied into an indexed location inside
the 32-bit cache.
Addr Name Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0
Cache Fill
FX_CTRL Transp. Cache Write Enable One-byte Cache 16-bit 4-bit
$9F29 Addr1 Mode
(DCSEL=2) Writes Enable Cycling Hop Mode
In 8-bit mode, a byte is cached, but in 4-bit mode, a nibble is cached instead. Afterwards, by default, the index into the cache is incremented,
and loops back around to 0 after the last index. The index can be set explicitly via the FX_MULT register. 8-bit mode uses bits 3:2 and ranges
from 0-3. 4-bit mode uses bits 3:1 and ranges from 0-7.
Addr Name Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0
Cache
FX_MULT Cache Byte
Nibble Two-byte
(DCSEL=2) Reset Subtract Multiplier Index
$9F2C Accumulate Index Cache Incr.
(Write Accum. Enable Enable
Mode
only)
Alternatively, the cache index can cycle between two adjacent bytes: 0, 1, and back to 0; or 2, 3, and back to 2. This option only has effect in
8-bit mode.
Addr Name Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0
Two-byte
FX_MULT
Cache Cache Incr.
(DCSEL=2) Reset Subtract Multiplier Cache Byte
$9F2C Accumulate Nibble Mode
(Write Accum. Enable Enable Index
Index
only)
Addr Name Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0
FX_CACHE_L
$9F29 (DCSEL=6) Cache (7:0) | Multiplicand (7:0) (signed)
(Write only)
- 19 -
Commander X16 Programmer's Reference Guide
(Write only)
FX_CACHE_H
$9F2B (DCSEL=6) Cache (23:16) | Multiplier (7:0) (signed)
(Write only)
FX_CACHE_U
$9F2C (DCSEL=6) Cache (31:24) | Multiplier (15:8) (signed)
(Write only)
Control over which parts are written are chosen by the value written to DATA0 or DATA1. The value written is treated as a nibble mask
where a 0-bit writes the data and a 1-bit masks the data from being [Link] other words, writing a 0 will flush the entire 32-bit cache.
Writing #%00001111 will write the second and third byte in the cache to VRAM in the second and third memory locations in the 4-byte-
aligned region.
Addr Name Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0
Cache Write
FX_CTRL Transp. Enable Cache Fill One-byte Cache 16-bit 4-bit
$9F29 Addr1 Mode
(DCSEL=2) Writes Enable Cycling Hop Mode
Transparency writes
Transparent writes, when enabled, also applies to cache writes. If enabled, zero bytes (or zero nibbles in 4-bit mode) in the cache, which are
treated as transparency pixels, are not written.
Addr Name Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0
Transp.
FX_CTRL Writes Cache Write Cache Fill One-byte Cache 16-bit 4-bit
$9F29 Addr1 Mode
(DCSEL=2) Enable Enable Cycling Hop Mode
When "one-byte cache cycling" is turned on and DATA0 or DATA1 is written to, the byte at the current cache index is written to VRAM. When
"Cache write enable" is set as well, the byte is duplicated 4 times when writing to VRAM.
Usually the incrementing of the cache index is only triggered by reading from DATA0 or DATA1 when cache filling is enabled. However it can
also be triggered by reading from DATA0 in polygon mode when cache filling is not enabled and "one-byte cache cycling" is enabled.
Addr Name Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0
One-byte Cache
FX_CTRL Transp. Cache Write Cache Fill Cycling 16-bit 4-bit
$9F29 Addr1 Mode
(DCSEL=2) Writes Enable Enable Hop Mode
Addr Name Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0
FX_MULT Multiplier
Cache Two-byte
(DCSEL=2) Reset Subtract Enable Cache Byte
$9F2C Accumulate Nibble Cache Incr.
(Write Accum. Enable Index
Index Mode
only)
To do a single multiplication, put the two 16-bit inputs into the two halves of the 32-bit cache.
- 20 -
Commander X16 Programmer's Reference Guide
The accumulator can be used to accumulate the sum of several multiplications. Before doing this single multiplication, ensure this is reset
this to zero, otherwise the output will be added to the value of the accumulator before being written. There are two methods to do this. The
first is to write a 1 into bit 7 of FX_MULT ($9F2C, DCSEL=2). The other, more conveniently, is to read FX_ACCUM_RESET (the same register
location as VERA_FX_CACHE_L).
To perform the multiplication, it must be written to VRAM first. This is done via the cache write mechanism. Usually the cache itself is written
to VRAM if "Cache Write Enable" is set. However, if the "Multiplier Enable" bit is also enabled, the multiplier result is written to VRAM instead.
; Set the ADDR0 pointer to $00000 and write our multiplication result there
lda #(2 << 1)
sta VERA_CTRL ; $9F25
lda #%01000000 ; Cache Write Enable
sta VERA_FX_CTRL ; $9F29
stz VERA_ADDRx_L ; $9F20 (ADDR0)
stz VERA_ADDRx_M ; $9F21
stz VERA_ADDRx_H ; $9F22 ; no increment
stz VERA_DATA0 ; $9F23 ; multiply and write out result
lda #%00010000 ; Increment 1
sta VERA_ADDRx_H ; $9F22 ; so we can read out the result
lda VERA_DATA0
sta $0400
lda VERA_DATA0
sta $0401
lda VERA_DATA0
sta $0402
lda VERA_DATA0
sta $0403
Note: the VERA works by pre-fetching the contents from VRAM whenever the address pointer is changed or incremented. This happens even
when the address increment is 0. Due to this behavior, it is possible to have stale data latched in one of the two data ports if the underlying
VRAM is changed via the other data port. This example avoids this scenario by only using ADDR0/DATA0. This potential gotcha was not
introduced by the FX update, but rather has always been how VERA behaves.
Accumulation
One can also trigger the multiplication and add it to (or subtract it from) the multiplication accumulator by calling "accumulate" in one of two
different ways. We could write a 1 into bit 6 of FX_MULT ($9F2C, DCSEL=2), but more conveniently, we can read FX_ACCUM (the same
register location as VERA_FX_CACHE_M)
Once the accumulation is triggered, the result of the operation is stored back into the accumulator.
- 21 -
Commander X16 Programmer's Reference Guide
The default accumulation operation is (multiply then) add. This can be switched to subtraction by setting the Subtract Enable bit in FX_MULT
Addr Name Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0
FX_MULT Subtract
Cache Two-byte
(DCSEL=2) Reset Enable Multiplier Cache Byte
$9F2C Accumulate Nibble Cache Incr.
(Write Accum. Enable Index
Index Mode
only)
If the multiplication accumulator has a nonzero value, any multiplications carried out via a VRAM Cache write will be offset by the value of
the accumulator (either added to or subtracted from the accumulator), but they will not change the value of the accumulator.
16-bit hop
There is a special address increment mode that can be used to read pairs of bytes via ADDR1.
Addr Name Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0
16-bit
FX_CTRL Transp. Cache Write Cache Fill One-byte Cache Hop 4-bit
$9F29 Addr1 Mode
(DCSEL=2) Writes Enable Enable Cycling Mode
In this mode, setting ADDR1's increment to +4 will result in alternating increments of +1 and +3. Setting it to +320 will result in alternating
increments of +1 and +319. All other increment values, including negative increments, lack this special hop property.
After this bit is set, writing to ADDRx_L resets the hop alignment such that the first increment is +1.
This mode is useful for reading out a series of 16-bit values after a series of multiplications.
For a more detailed explanation of chained math operations, see the tutorial .
- 22 -