# Pixel-by-pixel screen fills in Wolfenstein 3D

In the code id Software sometimes there are unmatched pearls. The most famous is, of course, 0x5f3759df , which even got comics on xkcd . Here we are talking about filling the screen: pixels are colored one at a time in random order, without repetition. How is this done?

A frontal solution would be to use a random number generator, and before filling in each pixel, check whether it is still painted. This would be extremely inefficient: by the time the last unallocated pixel remains on the screen, it will take about 320 × 200 RNG calls to get into it. (Recall that Wolfenstein 3D worked at 286 with a frequency of 12MHz!) Another frontal solution would be to compile a list of all 320 × 200 possible coordinates, shuffle it (you can even advance it, and paste it into the code already shuffled), and paint the pixels in the list; but this would require at least 320 × 200 × 2 = 125KB of memory - one-fifth of the total computer memory! (Remember that 640KB should be enough for any ?)

Here's how actually implemented this per-pixel fill: (the code is slightly simplified by compared with the original)

`boolean FizzleFade(unsigned width, unsigned height)`

{

unsigned x, y;

long rndval = 1;

do // while (1)

{

// seperate random value into x/y pair

asm mov ax, [WORD PTR rndval]

asm mov dx, [WORD PTR rndval+2]

asm mov bx, ax

asm dec bl

asm mov [BYTE PTR y], bl // low 8 bits - 1 = y xoordinate

asm mov bx, ax

asm mov cx, dx

asm mov [BYTE PTR x], ah // next 9 bits = x xoordinate

asm mov [BYTE PTR x+1], dl

// advance to next random element

asm shr dx, 1

asm rcr ax, 1

asm jnc noxor

asm xor dx, 0x0001

asm xor ax, 0x2000

noxor:

asm mov [WORD PTR rndval], ax

asm mov [WORD PTR rndval+2], dx

if (x>width || y>height)

continue;

// copy one pixel into (x, y)

if (rndval == 1) // entire sequence has been completed

return false ;

} while (1);

}

What the hell is going on here?For those who find it difficult to understand the assembler code for 8086 - that's the equivalent code on pure C:

`void FizzleFade(unsigned width, unsigned height)`

{

unsigned x, y;

long rndval = 1;

do {

y = (rndval & 0x000FF) - 1; // low 8 bits - 1 = coordinate y

x = (rndval & 0x1FF00) >> 8; // next 9 bits = coordinate x

unsigned lsb = rndval & 1; // the least significant bit is lost when shifted

rndval >>= 1;

if (lsb) // if the extended bit = 0, then do not xor

rndval ^= 0x00012000;

if (x<=width && y<=height)

copy_pixel_into(x, y);

} while (rndval != 1);

}

Simply put:` rndval `

starts with 1;Each time we divide the value of rndval by the x and y coordinates, and paint the pixel with these coordinates;

each time we shift and xor-imndndval with an incomprehensible "magic constant";

whenrndval somehow again takes the value 1 - ready, the whole screen is flooded.

This "magic" is called linear shift register : to generate each next value, we extend one bit to the right, and move in the left a new bit that results as xor . For example, a four-bit RSLOS with "taps" on the zero and second bits, xor of which is set by the bit to be inserted on the left, looks like this:

If we take the initial value of 1, then this RSLA generates five other values, and then it becomes looped: 0

__0__0

__1__→ 1

__0__0

__0 <tgsru > → 0 <u> 1__0

__0__→ 1

__0__1

__0__→ 0

__1__0

__1 < tgsru> → 0 <u> 0__1

__0__→ 0001 (the bits used for xor are underlined). If we take an initial value that does not participate in this cycle, we get another cycle: for example, 0

__0__1

__1__→ 1

__0__0

__1 <tgsru > 1 <u> 1__0

__0__→ 1

__1__1

__0__→ 1

__1__1

__1 < tgsru> → 0 <u> 1__1

__1__→ 0011. It is easy to verify that three four-bit values that do not hit either of these two cycles form the third cycle. The zero value always goes into itself, so it is not considered among the possible ones.

And is it possible to select the RLOS taps so that all possible values form one large cycle? Yes, if in the field modulo 2 factorize the polynomial xN + 1, where N = 2m-1 is the length of the resulting cycle. Omitting hardcore mathematics , we show how a four-bit RSLOS with taps on the zero and first bits will alternately take all 15 possible values:

Wolfenstein 3D uses a 17-bit RSLOS with taps on zero and third bits. This RSLOS could be implemented "in the forehead" in seven logical operations:

` unsigned new_bit = (rndval & 1) ^ (rndval>>3 & 1);`

rndval >>= 1;

rndval |= (new_bit << 16);

Such an implementation is called the Fibonacci configuration. Equivalent to her "Galois configuration" allows you to perform all the same for three logical operations:
a → b → c → d → e → f → g → h → i → j → k → l → m → n → o → p → qshift the value to the right;

↑ Fibonacci ↓ ↓

· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·

o → p → q → → a → b → c → d → e → f → g → h → i → j → k → l → m → n

↑ ↑ Galois ↓

· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·

copy the extended bit (n) to the highest position;

xor-im this bit with the 13th (q).

If you notice that the most significant bit after the shift is guaranteed to be zero, copying and xor can be combined into one operation:

` unsigned lsb = rndval & 1;`

rndval >>= 1;

if (lsb)

rndval ^= 0x00012000;

- which we see in the Wolfenstein 3D code.The value in the "Galois configuration" compared to the "Fibonacci configuration" is cyclically shifted by three bits: the initial value of 1 in the Galois configuration (used in the Wolfenstein 3D code) corresponds to the initial value of 8 in the Fibonacci configuration. The second value of the RSML will be 0x12000, corresponding to 0x10004 in the Fibonacci configuration, and so on. If one of the sequences takes all possible (non-zero) values, then the second also takes all these values, albeit in a different order. Due to the fact that the zero value for RSLOS is unattainable, one is subtracted from the value of the y coordinate in the code - otherwise the pixel (0,0) would never have been painted.

In conclusion, the author of the original article mentions that of the 217-1 = 131071 values generated by this RSLOS, only 320 × 200 = 64,000 are used, i.e. slightly less than half; every second generated value is discarded, i.e. The generator operates at half the available speed. For Wolfenstein 3D, 16-bit RSLOS would suffice, and then you would not have to bother with processing 32-bitndndval on a 16-bit processor. The author suggests that programmers id Software simply could not find a suitable combination of "taps", which would give the maximum 16-bit cycle. It seems to me that he greatly underestimates them :-) The fact is that to divide the 16-bit value by the x and y coordinates it would have to be divided with a remainder of 320 or 200, and the cost of such a division operation would be more than compensated all acceleration from the transition to 16-bit RSLOS.

Vote for this post
Bring it to the Main Page |
||

## Comments

## Leave a Reply