This project was more like a proof of concept than actually making a fully functional 3D engine on the ATmega328, I was curious to see if the AVR could handle something like this and display lowpoly 3D models.

After meeting and talking with Xark (https://github.com/XarkLabs) on IRC, I discovered his fork of the Adafruit GFX library and was surprised with how great it was performing, it was another good reason to start this project.

From the very beginning, Xark told me that the bottleneck of this project would be the SPI communication and even with his fork performing between 3.5x and 12x faster than the Adafruit libraries it would still not perform fast enough to get a smooth colored mesh without flickering. He was absolutely right but I have to admit I had a lot of fun with this project and wireframe models are still looking nice.

## The cube

I started with a really basic 3D rotating cube, using floats and cos/sin math functions. Clearing the screen completely was causing flicker so I decided to make another array to store a copy of the projected nodes (vertices). Before drawing a new frame with the updated cube transformation I would draw the old nodes position with the background color giving the illusion I was completely clearing the screen. The flickering was completely gone. Another tweak for better performance was to actually make a memory comparison of both old and new projected nodes to decide if I had to update the frame or not. The main loop at this stage:

``````// ----------------------------------------------
// main loop
// ----------------------------------------------
void loop() {
loops = 0;
while( millis() > next_tick && loops < MAX_FRAMESKIP) {
rotate_z(0);
rotate_y(4);
rotate_x(1);
// ...
next_tick += SKIP_TICKS;
loops++;
}

// ===============
// draw
// ===============
// only redraw if nodes position have changed (less redraw - less flicker)
if (memcmp(old_nodes, nodes, sizeof(nodes))) {
// erase old projected nodes
draw_wireframe(proj_nodes, ST7735_WHITE);
// make new projection with new nodes position
project(nodes);
// draw new projection
draw_wireframe(proj_nodes, ST7735_BLACK);
// copy new nodes location to erase cube
memcpy(old_nodes, nodes, sizeof(nodes));
}
}``````

While not apparent yet because of the small number of nodes, it was a problem that I was using floats and sin/cos for the rotation. There would be issues later with complicated meshes containing more vertices. Storing a couple values for sin and cos in an array was a good start to avoid using these functions:

``````static const float sin_lut[LUTSIZE] = {    // to avoid calling multiple times sin() store a couple values in a lut
0.010000,
0.019999,
0.029995,
0.039989,
0.049979,
};

static const float cos_lut[LUTSIZE] = {    // to avoid calling multiple times cos() store a couple values in a lut
0.999950,
0.999800,
0.999550,
0.999200,
0.998750,
};

// ----------------------------------------------
// rotation on axis functions (0 - 4)
// ----------------------------------------------
void rotate_z(char theta) {
i = NODECOUNT-1;
do {
tmp_axis = nodes[i];
nodes[i] = (tmp_axis    * cos_lut[theta]) - (nodes[i] * sin_lut[theta]);
nodes[i] = (nodes[i] * cos_lut[theta]) + (tmp_axis    * sin_lut[theta]);
} while(i--);
}

void rotate_y(char theta) {
i = NODECOUNT-1;
do {
tmp_axis = nodes[i];
nodes[i] = (tmp_axis    * cos_lut[theta]) - (nodes[i] * sin_lut[theta]);
nodes[i] = (nodes[i] * cos_lut[theta]) + (tmp_axis    * sin_lut[theta]);
} while(i--);
}

void rotate_x(char theta) {
i = NODECOUNT-1;
do {
tmp_axis = nodes[i];
nodes[i] = (tmp_axis    * cos_lut[theta]) - (nodes[i] * sin_lut[theta]);
nodes[i] = (nodes[i] * cos_lut[theta]) + (tmp_axis    * sin_lut[theta]);
} while(i--);
}
``````

It was of course temporary since the rotation was based on the earlier transformation of the mesh and wouldn't cover 360 degrees. But it was the start of the 'optimization' process of the project.

## Can we go faster?

Another attempt at making things going faster was to actually store the screen in a buffer. It's not possible on the ATmega328 to store the full screen resolution (128x160) since there is not enough memory, but storing 1/4 of it was. The idea was to store a 32x40 array of char where I would specify which color index the 'render' function should draw and each represented pixel would be a 4x4 tile. This array by itself was already taking 1280 bytes which represented about 64% of the available memory on the ATmega328 and the compiler started to warn me that I might encounter stability issue because of the limited amount of memory left.

Beside the memory limitation, I was also losing a lot of details by drawing with this new resolution. But now I was able to completely clear the screen without flickering:

Cube using a 32x40 buffer and drawing 4x4 tiles for each 'pixel'

Switching to 3x3 tiles made the 'rendering' much faster, but the resolution was pretty rudimentary and would be an issue for complex meshes later. I was also limited to 256 colors because of the indexing in the buffer array since I was using chars to limit memory usage, and the render function would turn into a huge switch/case to return the appropriate color depending on said index:

``````// ----------------------------------------------
// render buffer to screen
// ----------------------------------------------
void render() {
uint16_t col;
TFT.spi_begin();
for (char y = 0; y < BUFFH; y++) {
for (char x = 0; x < BUFFW; x++) {
switch(buff[y][x]) {
case 0: col = COLOR0;
break;
case 1: col = COLOR1;
break;
case 2: col = COLOR2;
break;
case 3: col = COLOR3;
break;
case 4: col = COLOR4;
break;
case 5: col = COLOR5;
break;
}
//TFT.drawPixel(x*4, y*4, col);
// 4x to fill screen since buffer is 1/4 of resolution
i = PIXELOFFSET-1;
do {
TFT.spiWrite16(col, PIXELOFFSET);
} while(i--);
}
}
TFT.spi_end();
}

...

// ----------------------------------------------
// main loop
// ----------------------------------------------
void loop() {
loops = 0;
while( millis() > next_tick && loops < MAX_FRAMESKIP) {
rotate_z(0);
rotate_y(4);
rotate_x(1);
// ...
next_tick += SKIP_TICKS;
loops++;
}

// ===============
// draw
// ===============
// only redraw if nodes position have changed (less redraw - less flicker)
if (memcmp(old_nodes, nodes, sizeof(nodes))) {
// make new projection with new nodes position
project();
// clear buffer
memset(buff, 0, sizeof(buff));
// draw new projection with COLOR1
draw_wireframe(1);
render();
// copy new nodes location to erase cube
memcpy(old_nodes, nodes, sizeof(nodes));
}
}
``````

At this stage the 'engine' was still using floats which are extremly slow on the AVR and after implementing transformation matrix (ref: codinglabs.net) it was time to switch to fixed point. Depending on the project you are working on this task can be really frustrating, you have to make sure you are between valid ranges to avoid overflowing and ending with values much bigger than what your variable type can handle.

There is a nice thread on TIGForums by J-Snake explains the process: Fixed Point Arithmetic - A Comprehensive Introduction

32x40 buffer, 3x3 tiles and tranformation matrix

Because of the limited amount of memory, you really need to store as much as possible in PROGMEM as long as you do not plan to change these variables during runtime. But 32KB is still somehow limited if you are not careful, and trying to limit and optimize memory and storage usage is a good idea if you know that you might end up using much more than the current state of your project.

I had enough storage memory to store a full look up table for my my sin/cos calls, but if you observe a graph with sine and cosine you would notice that you actually don't need 360 degrees for each one of these. (you can also check values here: mathwarehouse.com)

You can actually just use 90 degrees values of either sine or cosine, and 'mirror' the returned value depending on the angle requested, here is a snippet of the fixed point look up table I ended up making:

``````#define LUT(a) (long)(pgm_read_word(&lut[a]))

...

const unsigned int lut[] PROGMEM = {         // 0 to 90 degrees fixed point COSINE look up table
16384, 16381, 16374, 16361, 16344, 16321, 16294, 16261, 16224, 16182, 16135, 16082, 16025, 15964,
15897, 15825, 15749, 15668, 15582, 15491, 15395, 15295, 15190, 15081, 14967, 14848, 14725, 14598,
14466, 14329, 14188, 14043, 13894, 13740, 13582, 13420, 13254, 13084, 12910, 12732, 12550, 12365,
12175, 11982, 11785, 11585, 11381, 11173, 10963, 10748, 10531, 10310, 10086, 9860, 9630, 9397, 9161,
8923, 8682, 8438, 8191, 7943, 7691, 7438, 7182, 6924, 6663, 6401, 6137, 5871, 5603, 5334, 5062,
4790, 4516, 4240, 3963, 3685, 3406, 3126, 2845, 2563, 2280, 1996, 1712, 1427, 1142, 857, 571, 285, 0
};

...

// ----------------------------------------------
// SIN/COS from 90 degrees LUT
// ----------------------------------------------
long SIN(unsigned int angle) {
angle += 90;
if (angle > 450) return LUT(0);
if (angle > 360 && angle < 451) return -LUT(angle-360);
if (angle > 270 && angle < 361) return -LUT(360-angle);
if (angle > 180 && angle < 271) return  LUT(angle-180);
return LUT(180-angle);
}

long COS(unsigned int angle) {
if (angle > 360) return LUT(0);
if (angle > 270 && angle < 361) return  LUT(360-angle);
if (angle > 180 && angle < 271) return -LUT(angle-180);
if (angle > 90  && angle < 181) return -LUT(180-angle);
return LUT(angle);
}
``````

At this point I was completely avoiding the use of floats, cos/sin functions and was able to clear the screen without any apparent flickering:

32x40 buffer, 3x3 tiles, matrices, fixed point, lut

It was time to implement a basic backface culling, it would save us from calculating and drawing edges that shouldn't be visible. Using the shoelace algorithm you can determine the surface of a triangle in your mesh, if the return value is negative it means the triangle is facing away from us. (ref: mathopenref.com)

Along with the nodes (vertices) position of the cube, I also had a multidimensional array of triangles specifying which nodes represent each one of them. In order to see if my current triangle was hidden or not I would just send its index and depending on the return value would decide if it needs to be drawn on screen. The following snippet might be a bit confusing without the full source, EDGE is a macro that returns a node from a triangle stored in the triangles array mentioned earlier:

``````// ----------------------------------------------
// Shoelace algorithm to get the surface
// ----------------------------------------------
int shoelace(unsigned char (*n), unsigned char index) {
unsigned char t = 0;
int surface = 0;
for (; t<3; t++) {
// (x1y2 - y1x2) + (x2y3 - y2x3) ...
surface += (n[EDGE(index,t)]           * n[EDGE(index,(t<2?t+1:0))]) -
(n[EDGE(index,(t<2?t+1:0))] * n[EDGE(index,t)]);
}
return surface * 0.5;
}

// ----------------------------------------------
// Shoelace algorithm for triangle visibility
// ----------------------------------------------
bool is_hidden(unsigned char index) {
// (x1y2 - y1x2) + (x2y3 - y2x3) ...
return ( ( (proj_nodes[EDGE(index,0)] * proj_nodes[EDGE(index,1)]) -
(proj_nodes[EDGE(index,1)] * proj_nodes[EDGE(index,0)])   ) +
( (proj_nodes[EDGE(index,1)] * proj_nodes[EDGE(index,2)]) -
(proj_nodes[EDGE(index,2)] * proj_nodes[EDGE(index,1)])   ) +
( (proj_nodes[EDGE(index,2)] * proj_nodes[EDGE(index,0)]) -
(proj_nodes[EDGE(index,0)] * proj_nodes[EDGE(index,2)])   ) ) < 0 ? true : false;
}
``````

Triangles facing away from us were successfully hidden now:

32x40 buffer, 3x3 tiles, matrices, fixed point, lut, backface culling

At this point I knew I was able to color the cube faces, my initial plan (which I eventually aborted) was to render a flat shaded version of a rotating mesh. I made various test with triangle filling and it seemed possible until I realized something:

32x40 buffer, 3x3 tiles, matrices, fixed point, lut, backface culling, flat colors

32x40 buffer, 3x3 tiles, matrices, fixed point, lut, backface culling, flat colors

## It's just an ugly cube..

Now let's be honest, it works fine, I can put colors, rotate the mesh and I don't have any flickering going on. But wow it is ugly..

It was now certain that if I had a complex mesh to render I would be witnessing the battle of a bunch of pixels moving around in the middle of the screen, I had to weight pros and cons of the current state and eventually decide how I should eventually proceed and which goal (flat shading) I would need to forget about.

Pros of the buffered version:

• fast
• clear screen without flickering
• flat shading is possible (initial goal)

Cons:

• use a lot of memory
• aesthetically unpleasant (it's amazingly ugly!)
• resolution too small to display complex meshes
• 256 colors maximum
• huge switch/case as more colors are defined

In short, I'm stuck with an ugly cube and I thought it would be a shame to stay at this state and start working on the flat shading.

## Back to full res

Switching back to full resolution wasn't a problem, I just had to remove the buffer eating most of the AVR memory and use XarkLabs and Adafruit drawing functions to directly draw on screen. It was also really nice to see how fast the wireframe version of the cube was rendering, and I could compare my first experiments before implementing fixed point and lut with the current stage of the project.

Out of curiosity I tried a colored version of the cube but clearing the screen, or even just the area using a bounding box dirt mask was causing a lot of flicker:

Full resolution (128x160), matrices, fixed point, lut, backface culling, render types

This is the moment I decided to forget about flat shading (and colors) to focus on the wireframe rendering. The way the engine was loading meshes allowed me to try something different than a cube for a change, it was time to find how to add models following the proper format I was using.

## More 3D models!

At first I was planning on making a script to convert OBJ files, but after having a closer look at Blender's different exporting format options I noticed that STL (ASCII) files are much simpler to use. It's already using a set of vertices to declare each triangle, the STL (ASCII) file for the cube mesh is the following:

``````solid Exported from Blender-2.74 (sub 0)
facet normal -0.000000 0.000000 -1.000000
outer loop
vertex 1.000000 1.000000 -1.000000
vertex 1.000000 -1.000000 -1.000000
vertex -1.000000 -1.000000 -1.000000
endloop
endfacet
... more facets ...
facet normal 0.000000 1.000000 0.000000
outer loop
vertex -1.000000 1.000000 -1.000000
vertex -1.000000 1.000000 1.000000
vertex 1.000000 0.999999 1.000000
endloop
endfacet
endsolid Exported from Blender-2.74 (sub 0)
``````

Each facet (triangle) is first defined by its normal value and after that, each node (vertex) representing the facet is declared between outer loop and endloop as a set of three vertices. You just need to gather all vertices, keep only unique vertex by position, and then gather all facets pointing to these vertices into an array by looking for these values.

The following python script is doing this exact procedure and output a header file with the proper data format:

``````#!/usr/bin/env python
# -----------------------------------------------------------------------------
# STL2H by Themistokle "mrt-prodz" Benetatos
# -------------------------------------------
# Convert STL 3D models (ASCII) to header for Tiny 3D Engine
#
# ------------------------
# http://www.mrt-prodz.com
# http://github.com/mrt-prodz/ATmega328-Tiny-3D-Engine
# -----------------------------------------------------------------------------
import sys, getopt, os

# global parameters
param_verbose = False
param_normals = False
param_yes     = False
param_scale   = 1.0

def checkFile(outfile):
# keep asking user until overwrite is chosen or a non-existing file name is entered
while (os.path.isfile(outfile) is True):
overwrite = raw_input('[!] Output data file "%s" already exists, overwrite? [y/n] ' % outfile)
if overwrite in ('y', 'Y'):
return outfile
elif overwrite in ('n', 'N'):
outfile = raw_input('[?] Enter new output data file name: ')
if (outfile == ''):
outfile = 'temp.h'
return outfile

def printVerbose(str):
global param_verbose
if param_verbose is True:
print str,

def saveDAT(nodes, triangles, outfile, normals = None):
print '[+] Saving output file:', outfile
data  = '// exported with stl2h\n'
data += '// '
data += ' '.join(sys.argv[:]) + '\n'
data += '#ifndef MESH_H\n'
data += '#define MESH_H\n'
data += '\n'
data += '#define NODECOUNT ' + str(len(nodes)) + '\n'
data += '#define TRICOUNT ' + str(len(triangles)) + '\n'
data += '\n'
data += '#define NODE(a, b) (long)(pgm_read_dword(&nodes[a][b]))\n'
data += '#define EDGE(a, b) pgm_read_byte(&faces[a][b])\n'
data += '#define NORMAL(a, b) (long)(pgm_read_dword(&normals[a][b]))\n'
data += '\n'
data += 'const long nodes[NODECOUNT] PROGMEM = {\n'
for index, node in enumerate(nodes):
data += '  {(long)(' + str(round(float(node), 5)*param_scale) + '*PRES), '\
+ '(long)(' + str(round(float(node), 5)*param_scale) + '*PRES), '\
+ '(long)(' + str(round(float(node), 5)*param_scale) + '*PRES)},\n'
data += '};\n\n'
data += 'const unsigned char faces[TRICOUNT] PROGMEM = {\n'
for index, face in enumerate(triangles):
data += '  {' + str(face) + ', ' + str(face) + ', ' + str(face) + '},\n'
data += '};\n\n'
data += 'const long normals[TRICOUNT] PROGMEM = {\n'
for index, normal in enumerate(normals):
data += '  {(long)(' + str(round(float(normal), 5)) + '*PRES), '\
+ '(long)(' + str(round(float(normal), 5)) + '*PRES), '\
+ '(long)(' + str(round(float(normal), 5)) + '*PRES)},\n'
data += '};\n\n'
data += '#endif // MESH_H\n'
dat = open(outfile, 'w')
dat.write(data)
dat.close()
printVerbose(data)

def parseSTL(infile, outfile):
global param_verbose
if param_verbose is True:
print '[+] Parsing STL file with verbose output'
# if -y is not set check for file being overwritten
global param_yes
if param_yes is False:
outfile = checkFile(outfile)
print '[+] Input STL file:', infile
print '[+] Output header file:', outfile
stl = open(infile, 'r')
nodes = []
unique_nodes = []
triangles = [[]]
normals = []
global param_normals
if param_normals is True:
print '[+] Saving facet normals information'
# store vertex into list first
print('[+] Gathering vertices')
for index, line in enumerate(stl):
# split line into tokens and keep only vertices
token = line.split()
if (len(token) == 4) and (token == 'vertex'):
# store x y z into list
nodes.append([token, token, token])
printVerbose('    ' + line)
# quickly check that we have sets of 3 vertices for each triangle
check = len(nodes) % 3
print '[+] STL file check:', ('good' if check == 0 else 'bad')
if check != 0:
print '[!] Error with STL file, each triangle should be made of 3 vertices (missing %d vertex)' % (3 - check)
sys.exit(2)
# keep only unique nodes
print('[+] Keeping unique vertices')
[unique_nodes.append(item) for item in nodes if item not in unique_nodes]
# output stored vertices
if param_verbose is True:
for node in unique_nodes:
# print previous triangle created
printVerbose('    ' + str(node) + ', '
+ str(node) + ', '
+ str(node) + '\n')

# seek start of file to get triangles
stl.seek(0, 0)
# assign vertex index to triangle list
print('[+] Gathering triangles')
index = 0
for line in stl:
# split line into tokens
token = line.split()
if (len(token) == 4) and (token == 'vertex'):
# compare current array to unique_nodes and set index in triangle
for i, node in enumerate(unique_nodes):
# if xyz match
if [token, token, token] == node:
# add to current triangle list if array at index has less than 3 items
if len(triangles[index]) < 3:
triangles[index].append(i)
if len(triangles[index]) == 3:
# print previous triangle created
printVerbose('    ' + str(triangles[index]) + ', '
+ str(triangles[index]) + ', '
+ str(triangles[index]) + '\n')
break
# else create new array
else:
# increase index
index += 1
# append array with new item
triangles.append([i])
break

# gather normals if parameter is true
if param_normals is True:
print('[+] Gathering normals')
# seek start of file to get normals
stl.seek(0, 0)
# gather normals
for line in stl:
# split line into tokens
token = line.split()
if (len(token) == 5) and (token == 'facet') and (token == 'normal'):
# store vector normal
normal = [token, token, token]
# print normal if verbose mode
printVerbose('    ' + str(normal) + ', '
+ str(normal) + ', '
+ str(normal) + '\n')
normals.append(normal)

# print stats
print '[+] Vertices: ', len(unique_nodes)
print '[+] Triangles:', len(triangles)
if param_normals is True:
print '[+] Normals:', len(normals)
saveDAT(unique_nodes, triangles, outfile, normals)
else:
saveDAT(unique_nodes, triangles, outfile, normals)
print '[+] Done\n'

def usage():
print 'Usage: %s -i  -o ' % sys.argv
print 'Convert a 3D mesh saved as STL format (ASCII) to header for Tiny 3D Engine.\n'
print '  -i, --inputfile \t3D mesh in STL file format'
print '  -o, --outputfile\toutput filename of converted data'
print '  -s, --scale     \tscale ratio (default 1.0)'
print '  -n, --normals   \tsave face normals'
print '  -y, --yes       \tanswer yes to all requests'
print '  -v, --verbose   \tverbose output'

def main(argv):
infile = ''
outfile = ''
# parse command line arguments
try:
opts, args = getopt.getopt(sys.argv[1:],'i:o:s:nhyv',['help', 'yes', 'normals', 'scale=', 'input=', 'output=', 'verbose='])
except getopt.GetoptError as error:
print 'Error:', str(error)
usage()
sys.exit(2)
if len(opts) < 2:
usage()
sys.exit(2)
for opt, arg in opts:
if opt in ('-h', '--help'):
usage()
sys.exit()
elif opt in ('-i', '--input'):
infile = arg
elif opt in ('-o', '--output'):
outfile = arg
elif opt in ('-s', '--scale'):
global param_scale
try:
param_scale = float(arg)
except Exception as converror:
print 'Error:', str(converror)
sys.exit(2)
elif opt in ('-v', '--verbose'):
global param_verbose
param_verbose = True
elif opt in ('-y', '--yes'):
global param_yes
param_yes = True
elif opt in ('-n', '--normals'):
global param_normals
param_normals = True
if (not infile) or (not outfile):
print 'Error: You need to specify both input and output files'
usage()
sys.exit(2)
if infile == outfile:
print 'Error: Input and output files are the same'
usage()
sys.exit(2)
# parse STL file and convert to data
parseSTL(infile, outfile)

if __name__ == '__main__':
main(sys.argv[1:])
``````

Running the script with cube.stl as input and cube.h as output with a scale of 14 would save the following header file:

``````#ifndef MESH_H
#define MESH_H

#define NODECOUNT 8
#define TRICOUNT 12

const long nodes[NODECOUNT] PROGMEM = {
{(long)(14.0*PRES), (long)(14.0*PRES), (long)(-14.0*PRES)},
{(long)(14.0*PRES), (long)(-14.0*PRES), (long)(-14.0*PRES)},
{(long)(-14.0*PRES), (long)(-14.0*PRES), (long)(-14.0*PRES)},
{(long)(-14.0*PRES), (long)(14.0*PRES), (long)(-14.0*PRES)},
{(long)(14.0*PRES), (long)(14.0*PRES), (long)(14.0*PRES)},
{(long)(-14.0*PRES), (long)(14.0*PRES), (long)(14.0*PRES)},
{(long)(-14.0*PRES), (long)(-14.0*PRES), (long)(14.0*PRES)},
{(long)(14.0*PRES), (long)(-14.0*PRES), (long)(14.0*PRES)},
};

const unsigned char faces[TRICOUNT] PROGMEM = {
{0, 1, 2},
{2, 3, 0},
{4, 5, 6},
{6, 7, 4},
{0, 4, 7},
{7, 1, 0},
{1, 7, 6},
{6, 2, 1},
{2, 6, 5},
{5, 3, 2},
{4, 0, 3},
{3, 5, 4},
};

#endif // MESH_H
``````

Including this file in the source would make the engine automatically load and render the mesh after compiling. Converting and loading new meshes after that was really easy:

## Interaction with the mesh

I implemented a really basic controller for the mesh rotation using a 3 axis accelerometer (ADXL335) and a joystick thumb. The way the mesh behaves based on movement of the controllers could be much better but it gives a basic idea of what is possible and I found the idea pretty amusing. You can see these in action in the video preview below. ## Source and video preview

Here is a small video showcasing the engine running on the ATmega328 and ST7735.
(Arduino UNO and Sainsmart 1.8" TFT screen)

You can find the project source on github: github.com/mrt-prodz/ATmega328-Tiny-3D-Engine