Gaussian Blur Evaluation

Approximate Gaussian Filter Evaluation

The Gaussian Blur filter algorithm is used in image processing to smooth over noisy images. They generally generate a new color value for each pixel by incorporating the color values of neighboring pixels, weighted depending on the distance between pixel and neighbor.

Precise implementations of the Gaussian blur algorithms calculate this by moving a sliding window over the image, adding up neighboring pixels (multiplied by weights).

We compare a precise implementation of the Gaussian blur algorithm with a 3x3 window (greyscale) to implementations using

the Lookup Table
the approximate ALU

Evaluation: Lookup Table

Implementation

The Gaussian LUT implementation passes over the image twice: once with a 3x1 window in the horizonal direction to generate an intermediate image, then with a 1x3 window in the vertical to generate the final pixel value. The lookup table is fed with the 3 most significant bits from each pixel value in the window.

Gaussian window

Only the 3 most significant bits of each pixel in the window are fed into the lookup table to first look-up an intermediate result for the horizontal traveling window. Another lookup from the intermediate image returns the final value of each pixel.

Code for our Lookup Table implementation of the Gaussian filter algorithm

Approach

The application was run on a ml605 FPGA using the Makefile.

64x64, 128x128, 192x192 and 256x256 (below) images were processed with the LUT Gaussian application, each 20 times. The speedup was measured by cycle-counting in comparison to the precise Gaussian filter implementation. Standard deviation of the runtime was calculated.

The resulting approximate images were compared pixel by pixel to the output of the precise Gaussian filter implementation. Mean absolute and relative deviation were calculated from the differences and distributions of the absolute deviations were compiled for each image size.

Results

For a 256 by 256 pixel image, the original:

Lenna original

a precisely Gaussian-blurred version:

Lenna native

and an approximated Gaussian blur version, created with a Lookup Table with 9 inputs and 128 possible entries:

Lenna lut

Gaussian LUT Speedup graph

The Gaussian LUT speedup compared to the precise version was approximately 3, with slight improvements as image size rose. One standard deviation is marked in the graph but barely discernible.

Image size	LUT Speedup
64x64	2.934
128x128	3.003
192x192	3.032
256x256	3.044

In comparison of the resulting images we found the following mean absolute and relative errors of grey values (0-255):

Image size	Mean absolute error	Mean relative error
64x64*	-	0.04074
128x128	6.427	0.06617
192x192	6.587	0.06992
256x256	6.537	0.07129

* Star image, so not comparable with Lenna images.

We were suspicious of the rise in mean relative error with image size, so we decided to conclude with a histogram of error magnitudes:

Histogram error magnitude

Evaluation: Approximate ALU

Implementation

This implementation approximates the multiplication of one pixel with the filter kernel using mul.approx and the additions of the resulting values from each multiplication using add.approx. Other than that, it corresponds to the native implementation.

Linked: The complete code including a Makefile.

Approach

The application was run on a ml605 FPGA using the script run_eval.py. This script iterates over a Lenna image in three different sizes: 128x128, 192x192, and 256x256. For each iteration it applies the gaussian filter once using the native implementation and afterwards it applies the filter for each possible neglect amount combination the two instructions can have. An image is saved once for the native run and for each possible combination run using the following naming convention:

Native -> native.png
mul.approx with 2 bits neglected, add.approx not used -> MUL63.png
add.approx with 4 bits neglected, mul.approx not used -> ADD62.png
add.approx with 7 bits neglected, mul.approx with 4 bits neglected -> ADD60MUL62.png

Note: The values after MUL and ADD in the image name correspond to the decimal value of the binary neglect masks defined in the Design Document.

Results

The Table below shows the results for an image of the size 256x256 pixel. The first column of the table describes how many bits were neglected on a mul.approx instruction, the second column is likewise for add.approx instructions. A “ - “ means that a precise instruction was used. The third column is a calculation of the approximation error between the natively calculated image and the respective approximately computed image. The last column show the resulting image of the approximate calculation.

MUL neclect amount	ADD neglect amount	Approximation error
-	-	-
-	2	0
-	4	0.0004360977
-	7	0.0025993256
-	10	0.0199205464
-	15	0.4975493702
-	>15	1
2	-	0.0161600577
2	2	0.0161600577
2	4	0.0161600577
2	7	0.0185167734
2	10	0.0367156413
2	15	0.5161841232
2	>15	1
4	-	0.1064377457
4	2	0.1064377457
4	4	0.1064377457
4	7	0.1064377457
4	10	0.1143191514
4	15	0.6022564094
4	>15	1
7	-	0.7206358056
7	2	0.7206358056
7	4	0.7206358056
7	7	0.7206358056
7	10	0.7206358056
7	15	0.7403286548
>7	*	1