Game Development
Raiting:
12

Pixel-by-pixel screen fills in Wolfenstein 3D


In the code id Software sometimes there are unmatched pearls. The most famous is, of course, 0x5f3759df , which even got comics on xkcd . Here we are talking about filling the screen: pixels are colored one at a time in random order, without repetition. How is this done?

image

A frontal solution would be to use a random number generator, and before filling in each pixel, check whether it is still painted. This would be extremely inefficient: by the time the last unallocated pixel remains on the screen, it will take about 320 × 200 RNG calls to get into it. (Recall that Wolfenstein 3D worked at 286 with a frequency of 12MHz!) Another frontal solution would be to compile a list of all 320 × 200 possible coordinates, shuffle it (you can even advance it, and paste it into the code already shuffled), and paint the pixels in the list; but this would require at least 320 × 200 × 2 = 125KB of memory - one-fifth of the total computer memory! (Remember that 640KB should be enough for any ?)

Here's how actually implemented this per-pixel fill: (the code is slightly simplified by compared with the original)

boolean FizzleFade(unsigned width, unsigned height)
{
unsigned x, y;
long rndval = 1;
do // while (1)
{
// seperate random value into x/y pair
asm mov ax, [WORD PTR rndval]
asm mov dx, [WORD PTR rndval+2]
asm mov bx, ax
asm dec bl
asm mov [BYTE PTR y], bl // low 8 bits - 1 = y xoordinate
asm mov bx, ax
asm mov cx, dx
asm mov [BYTE PTR x], ah // next 9 bits = x xoordinate
asm mov [BYTE PTR x+1], dl

// advance to next random element
asm shr dx, 1
asm rcr ax, 1
asm jnc noxor
asm xor dx, 0x0001
asm xor ax, 0x2000

noxor:
asm mov [WORD PTR rndval], ax
asm mov [WORD PTR rndval+2], dx

if (x>width || y>height)
continue;

// copy one pixel into (x, y)

if (rndval == 1) // entire sequence has been completed
return false ;
} while (1);
}
What the hell is going on here?

For those who find it difficult to understand the assembler code for 8086 - that's the equivalent code on pure C:

void FizzleFade(unsigned width, unsigned height)
{
unsigned x, y;
long rndval = 1;
do {
    y = (rndval & 0x000FF) - 1; // low 8 bits - 1 = coordinate y
    x = (rndval & 0x1FF00) >> 8; // next 9 bits = coordinate x
    unsigned lsb = rndval & 1; // the least significant bit is lost when shifted
rndval >>= 1;
    if (lsb) // if the extended bit = 0, then do not xor
rndval ^= 0x00012000;

if (x<=width && y<=height)
copy_pixel_into(x, y);
} while (rndval != 1);
}
Simply put:
rndval starts with 1;
Each time we divide the value of rndval by the x and y coordinates, and paint the pixel with these coordinates;
each time we shift and xor-imndndval with an incomprehensible "magic constant";
whenrndval somehow again takes the value 1 - ready, the whole screen is flooded.

This "magic" is called linear shift register : to generate each next value, we extend one bit to the right, and move in the left a new bit that results as xor . For example, a four-bit RSLOS with "taps" on the zero and second bits, xor of which is set by the bit to be inserted on the left, looks like this:

image

If we take the initial value of 1, then this RSLA generates five other values, and then it becomes looped: 0 0 0 1 → 1 0 0 0 <tgsru > → 0 <u> 1 0 0 → 1 0 1 0 → 0 1 0 1 < tgsru> → 0 <u> 0 1 0 → 0001 (the bits used for xor are underlined). If we take an initial value that does not participate in this cycle, we get another cycle: for example, 0 0 1 1 → 1 0 0 1 <tgsru > 1 <u> 1 0 0 → 1 1 1 0 → 1 1 1 1 < tgsru> → 0 <u> 1 1 1 → 0011. It is easy to verify that three four-bit values ​​that do not hit either of these two cycles form the third cycle. The zero value always goes into itself, so it is not considered among the possible ones.

And is it possible to select the RLOS taps so that all possible values ​​form one large cycle? Yes, if in the field modulo 2 factorize the polynomial xN + 1, where N = 2m-1 is the length of the resulting cycle. Omitting hardcore mathematics , we show how a four-bit RSLOS with taps on the zero and first bits will alternately take all 15 possible values:

image

Wolfenstein 3D uses a 17-bit RSLOS with taps on zero and third bits. This RSLOS could be implemented "in the forehead" in seven logical operations:

unsigned new_bit = (rndval & 1) ^ (rndval>>3 & 1);
rndval >>= 1;
rndval |= (new_bit << 16);
Such an implementation is called the Fibonacci configuration. Equivalent to her "Galois configuration" allows you to perform all the same for three logical operations:
a → b → c → d → e → f → g → h → i → j → k → l → m → n → o → p → q
↑ Fibonacci ↓ ↓
· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·


o → p → q → → a → b → c → d → e → f → g → h → i → j → k → l → m → n
↑ ↑ Galois ↓
· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
shift the value to the right;
copy the extended bit (n) to the highest position;
xor-im this bit with the 13th (q).

If you notice that the most significant bit after the shift is guaranteed to be zero, copying and xor can be combined into one operation: unsigned lsb = rndval & 1;
rndval >>= 1;
if (lsb)
rndval ^= 0x00012000;
- which we see in the Wolfenstein 3D code.

The value in the "Galois configuration" compared to the "Fibonacci configuration" is cyclically shifted by three bits: the initial value of 1 in the Galois configuration (used in the Wolfenstein 3D code) corresponds to the initial value of 8 in the Fibonacci configuration. The second value of the RSML will be 0x12000, corresponding to 0x10004 in the Fibonacci configuration, and so on. If one of the sequences takes all possible (non-zero) values, then the second also takes all these values, albeit in a different order. Due to the fact that the zero value for RSLOS is unattainable, one is subtracted from the value of the y coordinate in the code - otherwise the pixel (0,0) would never have been painted.

In conclusion, the author of the original article mentions that of the 217-1 = 131071 values ​​generated by this RSLOS, only 320 × 200 = 64,000 are used, i.e. slightly less than half; every second generated value is discarded, i.e. The generator operates at half the available speed. For Wolfenstein 3D, 16-bit RSLOS would suffice, and then you would not have to bother with processing 32-bitndndval on a 16-bit processor. The author suggests that programmers id Software simply could not find a suitable combination of "taps", which would give the maximum 16-bit cycle. It seems to me that he greatly underestimates them :-) The fact is that to divide the 16-bit value by the x and y coordinates it would have to be divided with a remainder of 320 or 200, and the cost of such a division operation would be more than compensated all acceleration from the transition to 16-bit RSLOS.
xially 5 october 2017, 14:56
Vote for this post
Bring it to the Main Page
 

Comments

Leave a Reply

B
I
U
S
Help
Avaible tags
  • <b>...</b>highlighting important text on the page in bold
  • <i>..</i>highlighting important text on the page in italic
  • <u>...</u>allocated with tag <u> text shownas underlined
  • <s>...</s>allocated with tag <s> text shown as strikethrough
  • <sup>...</sup>, <sub>...</sub>text in the tag <sup> appears as a superscript, <sub> - subscript
  • <blockquote>...</blockquote>For  highlight citation, use the tag <blockquote>
  • <code lang="lang">...</code>highlighting the program code (supported by bash, cpp, cs, css, xml, html, java, javascript, lisp, lua, php, perl, python, ruby, sql, scala, text)
  • <a href="http://...">...</a>link, specify the desired Internet address in the href attribute
  • <img src="http://..." alt="text" />specify the full path of image in the src attribute