Back in the good old days when a 8Mhz 286 was top of the line, the PCs had so little bandwidth to the video memory that it was almost, if not completely, impossible to fill the whole screen (all the 64000 bytes of it) fast enough to achieve anything resembling smooth movement. The most commonly used graphics mode had a 256 color palette - the 64000 bytes consisted of a 320x200 byte array, and each byte referred to a color in the palette.
When the video card was displaying the image, it read the pixel's palette index, and then read what color the pixel should be by "looking it up" from the palette.
Palette lookup is just one example of lookup tables. Back then, lookup tables were everywhere. If you wanted to do some interesting stuff fast enough, you had better figure ways to precalculate values into lookup tables. Sine and cosine functions are a good example. CPUs also did not have floating point units (although you could buy external ones), and thus acquiring a simple sine value was a true "sin" against a decent framerate.
One of the positive sides of the pre-486 architecture was the lack of cache. This "feature" was not good in general, but it made lookup tables very efficient. With the introduction of CPU cache, some operations which were relatively expensive to do on the CPU (even with the new integrated floating point unit) became cheaper than precalculating the same values in lookup tables, since cache misses cost more than just doing the calculation real-time.
Anyway, one of the most impressive effects which relied on lookup tables were 2d bitmap tunnels.
Since we need an image for this, let's start with a couple of downloads. stb_image.h and tunneltexture.png (right-click, save as). Make a copy of the previous chapter as usual, and place the files in the new directory. You may also want to consider grabbing the latest version of stb_image from GitHub. And maybe a bunch of other libraries too.
First we'll define the lookup table. Add the following in the globals definitions near top of the file:
// Look-up table
unsigned short *gLut;
Next, let's start with this init()
function:
void init()
{
gLut = new unsigned short[WINDOW_WIDTH * WINDOW_HEIGHT];
int i,j;
for (i = 0; i < WINDOW_HEIGHT; i++)
{
for (j = 0; j < WINDOW_WIDTH; j++)
{
int xdist = j - (WINDOW_WIDTH / 2);
int ydist = i - (WINDOW_HEIGHT / 2);
int distance = (int)sqrt((float)(xdist * xdist + ydist * ydist));
if (distance > 0)
distance = (64 * 256 / distance) & 0xff;
int angle = (int)(((atan2((float)xdist, (float)ydist) / PI) + 1.0f) * 128);
gLut[i * WINDOW_WIDTH + j] = (distance << 8) + angle;
}
}
}
First we're allocating a screen-sized array of shorts (that is, 16 bit integers). Then we loop through every pixel.
For each pixel, we first calculate the horizontal and vertical distance from the center. Then, using the pythagoras' theorem (a*a + b*b = c*c
), we calculate the linear distance to the center. We also guard against division by zero.
Next, we're calculating p * 1 / z
from the distance, creating perspective. You'll see how this works once we render the whole thing.
In the p * 1 / z
formula, the p
(64 in our source) can be used to control the perspective. The 1
in our source is 256, which happens to be our texture size. If we were doing these calculations in floating point, the 1
might as well be 1
, and we'd be multiplying the value with texture size later. The z
stands for the distance.
We're also using the binary and
operation to only use the bottom 8 bits, in practice limiting the values to [0..255]
.
Then we calculate the angle of the current pixel relative to the screen's center. The atan2()
returns values in [-PI..PI]
range, so by dividing by PI
and adding one, we're at [0..2]
. Further multiplying by 128 we're converting the result into [0..256]
range (which will overflow a bit, but it won't matter).
Finally, we're storing the distance and angle in the lookup table, combining them into one 16-bit value.
Let's look at what the lookup data looks like. Here's a new render function:
void render(Uint64 aTicks)
{
int i, j;
for (i = 0; i < WINDOW_HEIGHT; i++)
{
for (j = 0; j < WINDOW_WIDTH; j++)
{
int lut = gLut[(i) * WINDOW_WIDTH + j];
gFrameBuffer[(j) + (i) * WINDOW_WIDTH] = lut | 0xff000000;
}
}
}
Compile and run. The resulting image should be red-greenish circley-thing, with green values depending on the distance from the center, and red values depending on the angle around the center:
It might not look as much yet, but trust me, it gets better. Let's add the texture.
Add the texture array as a global variable (near the top again)..
// Texture
int *gTexture;
Add the stb_image
to the project by adding the following lines after the other includes:
#define STB_IMAGE_IMPLEMENTATION
#include "stb_image.h"
stb_image
is a header-only library, meaning that's all you need to include it. The define is only meant to be used in one source file; if your project several source files, the others can include the header without the macro. stb_image
can handle just about any image format you might want to use.
Next, add the following to the beginning of the init()
function to load the image:
int x,y,n;
gTexture = (int*)stbi_load("tunneltexture.png", &x, &y, &n, 4);
We know our image is 256 by 256, so we don't bother storing the returned width and heigth values.
Next, we'll be completely replacing the rendering code we just pasted. Here's the new version:
void render(Uint64 aTicks)
{
for (int i = 0; i < WINDOW_HEIGHT; i++)
{
for (int j = 0; j < WINDOW_WIDTH; j++)
{
int lut = gLut[i * WINDOW_WIDTH + j];
gFrameBuffer[j + i * WINDOW_WIDTH] =
gTexture[(lut & 0xff) +
((lut >> 8) & 0xff) * 256];
}
}
}
The actual changes aren't huge, but it's easier to just paste over the old loop. Instead of drawing the value of lut
, we're using it to index the texture.
I'll go deeper into how we're reading the texture in a bit. Compile and run. Now it starts to look like something.
Let's animate it. Replace the texture reading line with this:
gTexture[((lut + aTicks / 32) & 0xff) +
(((lut >> 8) + aTicks / 8) & 0xff) * 256];
Compile and run.
Okay, so how are we reading the texture? First, the bottom 8 bits of the lookup value stand for the angle, and the higher 8 bits contain the distance value. These are mapped to the texture's x and y coordinates (or U and V coordinates, if you wish).
Our tunnel looks nice on the edges, but the center looks bad since the texture repeats there so often that it turns into noise. Let's mask the center smoothly out. You should still have the blending functions, but if not, copy them over from blending a bit.
Next, add this after the gLut variable definition:
// Distance mask
unsigned int *gMask;
Then, go to init()
, and allocate memory for the mask (after the gLut allocation):
gMask = new unsigned int[WINDOW_WIDTH * WINDOW_HEIGHT];
Then, still inside init()
, after the int distance
line, add the following:
int d = distance;
if (d > 255) d = 255;
gMask[i * WINDOW_WIDTH + j] = d * 0x010101;
Here we're first clamping the maximum value to 255, and then we're storing the value to the mask, after multiplying the value with 0x010101, in effect filling R, G and B values with the same 8-bit value.
Next, in render()
, change the inner loop to look like this:
int lut = gLut[i * WINDOW_WIDTH + j];
int mask = gMask[i * WINDOW_WIDTH + j];
gFrameBuffer[j + i * WINDOW_WIDTH] =
blend_mul(
gTexture[((lut + aTicks / 32) & 0xff) +
(((lut >> 8) + aTicks / 8) & 0xff) * 256],
mask);
First we're fetching the mask the same way as we're fetching the lookup value, and then we're using the blend_mul()
function to blend the texture with the mask. Compile and run.
The tunnel now looks much nicer. Unfortunately the blending also slows things down considerably, especially if the rendered area is large. You probably don't notice, running on that supercomputer of yours, but it's an issue that is good to keep in mind.
Next we'll make the lookup table larger than the screen. Displaying only part of the tunnel creates an illusion of looking around, although we're only looking at the very same 2d bitmap tunnel.
Go to init()
, and multiply the size of both of the tables by 4. Change the for
loops to double the size (WINDOW_HEIGHT * 2
and WINDOW_WIDTH * 2
), and remove all the divisions-by-two where we're calculating distances from the center.
Finally, when writing to the gMask[]
and gLut[]
tables, multiply the WINDOW_WIDTH
by two.
Now the lookup tables are four times as large as the screen. If you compile and run, you will find that things aren't quite like they should be yet, although the effect is pretty neat regardless.
To return to normality, go to the render()
function, and like when we were writing to the tables, multiply WINDOW_WIDTH
by two when reading from gLut
and gMask
. Compile and run.
You should be seeing a tunnel so that the center of the tunnel is in the bottom right corner. You may notice slight stuttering as well; this is normal. Remember that the tunnel consists of only 256 "pie slices", and thus on the edges the texels start getting quite large.
Let's add a bit of movement. In render()
, before the for-loops, add the following variables:
int posx = (int)((sin(aTicks * 0.000645234) + 1) * WINDOW_WIDTH / 2);
int posy = (int)((sin(aTicks * 0.000445234) + 1) * WINDOW_HEIGHT / 2);
Like before, the funny values (0.000645234) were just hammered in at a more or less random. The exact values don't really matter. Different values will give you different orbits.
The gLut
and gMask
access lines change to this:
int lut = gLut[(i + posy) * WINDOW_WIDTH * 2 + j + posx];
int mask = gMask[(i + posy) * WINDOW_WIDTH * 2 + j + posx];
Now we're looking around, in a manner of speaking.
What else can we do with this effect? There are plenty of things to try. Let's start with a square tunnel. It's pretty easy to achieve. Go to init()
, comment out the int distance...
line and paste this as a replacement:
int distance = (abs(xdist) > abs(ydist)) ? abs(xdist) : abs(ydist);
Compile and run. As you can see, the rotation of the tunnel doesn't work all too well, but other than that, it works.
For a diamond-shaped tunnel, try this:
int distance = (abs(xdist) + abs(ydist)) / 2;
Sum of the square and diamond distances results in a octagonal shape.
To distort the tunnel into "flowerlike" shapes, add the following after the distance calculation:
distance += (int)(sin(atan2((float)xdist, (float)ydist) * 5) * 8);
Further things you could try include scaling the xdist
and ydist
values to make the tunnel wider or narrower. You can also interpolate between any of the above to make "round square" tunnels and such.
Although we've had a lot of fun so far, there's still one more trick to go. Return the tunnel to its original, boring, round shape.
In addition to altering the way the lookup data is generated, we can also change the way we're reading the lookup data. Go to render()
, and add a new pair of position values after the old ones:
int posx2 = (int)((sin(0 - aTicks * 0.000645234) + 1) * WINDOW_WIDTH / 2);
int posy2 = (int)((sin(0 - aTicks * 0.000445234) + 1) * WINDOW_HEIGHT / 2);
The coordinates don't really matter, as long as they're different from the first set. You may notice we're using negative time, though.
Next, change the lookup table reading code:
int lut = gLut[(i + posy) * WINDOW_WIDTH * 2 + j + posx] -
gLut[(i + posy2) * WINDOW_WIDTH * 2 + j + posx2];
int mask = gMask[(i + posy) * WINDOW_WIDTH * 2 + j + posx];
int mask2 = gMask[(i + posy2) * WINDOW_WIDTH * 2 + j + posx2];
Here we're fetching two values from the lookup table and subtracting the second from the first. It may be a bit complicated to figure out how it will look like, so the easiest way is to try it out.
One thing that we do know is that there will be two center points (at posx,posy
and posx2,posy2
), so we need to mask things out twice. Change the rendering code to:
gFrameBuffer[j + i * WINDOW_WIDTH] =
blend_mul(
blend_mul(
gTexture[((lut + aTicks / 32) & 0xff) +
(((lut >> 8) + aTicks / 8) & 0xff) * 256],
mask),
mask2);
Here we first fetch a pixel from the texture, and multiply it with mask. Then we multiply the result with mask2. The result is then written to the screen buffer.
Compile and run.
You may start to notice - especially if you make the window considerably larger - that the effect eats a lot of CPU power. The reason for the slowdown is the blending. If you remove the blend_mul() function calls, and just plot the texture as is, the framerate will go way up.
So, now to fix that? I won't show you a fix here, but I'll give a couple of suggestions.
The first is to merge the masks and just blend once. The merging is much cheaper between two greyscale masks than doing full scale blend operations.
The second is to optimize the blend function. Since we're blending twice, it might be worthwhile to inline the blending calculations and see if there are things that are done several times needlessly.
The third is to precalculate the darker textures, and pick from a different map when needed. Doing so in a brute-force way will take a lot of memory, though (256*256*256*4 = 64MB
). Instead, we could use mipmapping, where every further away copy of the texture is 1/4 size of the previous one (which will then only take 256*256*4 + 128*128*4 + 64*64*4 + 32*32*4 + 16*16*4 + 8*8*4 + 4*4*4 + 2*2*4 + 1*1*4 = 349524
bytes, i.e. 33% more than just a simple texture). Unfortunately, preparing the mipmaps and especially calculating which one we should use at which point may turn out to be rather complicated.
Fourth is to consider the fact that the source textures we're using might not actually be truecolor images, but have 256 unique colors. If we were to use the palette, and just precalculated the 256 different palettes, we would be using 256*256 + 256*256*4 = 327680
bytes, or 64kB more than we are now.
p
value (64).xdist
as the distance? What's the difference to abs(xdist)
?Next up: 08 - Bumping Along..
Any comments etc. can be emailed to me.