Embedded Fixed Point FFT

Student project for ECE438 Fall 2014


Motivation and Background

Performing an FFT analysis on incoming signals can be used in many applications. Frequency analysis of signals can be used in fields as diverse as compression to sound visualization. Additionally, real time FFT analysis can be useful when debugging RF and audio systems as well as analyzing EMI compliance. In Week 2 of the DFT lab, a recursive FFT was implemented in MATLAB. It was found that this FFT was significantly faster than performing a DFT; however, the system performing the math was significantly more powerful than an embedded microprocessor. Performing frequency analysis on a microcontroller adds many additional layers of complexity, stemming from quantization issues of the analog to digital converters, memory limitations, and perhaps most importantly- fixed point arithmetic. Applications such as MATLAB utilize floating point numbers. These numbers have a fixed number of significant figures but can have the decimal point moved in the number. Many microcontrollers and embedded FFT algorithms utilize binary fractions. Binary numbers have each bit correspond to a 2n, where n is the bit number. For binary fractions, the most significant bit corresponds to n = − 1, the next bit to n = − 2, and so on. This allows for storing a number between 0 and 1 − 2N, where N is the number of bits in the stored number. This has some obvious limitations including but not limited to the fact that numbers cannot be stored that are greater than one and resolution is limited to powers of 2. This adds some issues to FFT calculation in the embedded world, but it can all be worked around. In the example application, an FFT audio visualizer will be created to demonstrate the fixed point FFT.

Application Overview

The example application will apply a 16 point FFT to an incoming signal using the 9S12C Microcontroller. There is an external analog LPF that bandlimits the signal to 20kHz. The audio is sampled at 44100 Hz, the typical audio sampling frequency. Each number is quantized to a signed 8 bit character. The The "twiddle factors" $ W_{16}^{k} $ are precalculated and stored in memory as #define statements. This allows the compiler to optimize math operations for the microcontroller. Because of the symmetric nature of the twiddle factors, 8 numbers are required (rather than all 16). The display needs to update at least every 1/60th of a second. This allows for persistance of vision to fill in the gaps between samples on the display. The 9S12C is operating at a clock speed of 24MHz, so in 1/60th of a second, the processor can use 400,000 clock cycles per calculation (ideally). There is overhead code required for the display and other operations, so it is important to avoid pushing the limit. Additionally, since the 9S12C is an 8 bit microcontroller, all 16 bit operations cannot be performed only by the ALU and the compiler must create assembly code that performs the desired operations. Another aspect of the calculation is finding the magnitudes of the numbers. Performing a square root is very computation intensitive, so the norm squared is often used if the actual magnitude isn't needed. In this application, the square root was calculated iteratively.

C Code

The C code for the calculation is included below. Unlike with MATLAB or C on a computer, recursion cannot be used on a microcontroller because of the risks of stack overflow. This required iterating through the FFT algorithm and writing each stage independently. The fft function computes the fft of a 16 point array and the findRoot4 function finds the magnitude of the complex number passed into it.


/* FFT CONSTANTS*/
#define SIN2PI 49//sin(2*pi/16) * 2^7
#define SIN4PI 90
#define SIN6PI 118
#define COS2PI_P_SIN2PI 167//(cos(2*pi/16) + sin(2*pi/16)) * 2^7
#define COS2PI_M_SIN2PI 56//cos(2*pi/16) - sin(2*pi/16)) * 2^7
#define COS6PI_P_SIN6PI 167
#define COS6PI_M_SIN6PI -56
#define OneTwentyEight 128
void fft(int inarr[16],int mags[8] ) {
  int atemp, temp, temp0, temp1, temp2, temp3, temp4, temp5, temp6, temp7, temp8;
  int temp9,temp10,temp11,temp12,temp13,temp14,temp15;
  int outarr[16]; temp0=inarr[0]+inarr[8];
  temp1=inarr[1]+inarr[9];
  temp2=inarr[2]+inarr[10];
  temp3=inarr[3]+inarr[11];
  temp4=inarr[4]+inarr[12];
  temp5=inarr[5]+inarr[13];
  temp6=inarr[6]+inarr[14];
  temp7=inarr[7]+inarr[15];
  temp8=inarr[0]-inarr[8];
  temp9=inarr[1]-inarr[9];
  temp10=inarr[2]-inarr[10];
  temp11=inarr[3]-inarr[11];
  temp12=inarr[12]-inarr[4];
  temp13=inarr[13]-inarr[5];
  temp14=inarr[14]-inarr[6];
  temp15=inarr[15]-inarr[7]; temp =(temp13-temp9)*(SIN2PI)/OneTwentyEight;
  temp9=temp9*(COS2PI_P_SIN2PI)/OneTwentyEight+temp;
  temp13=temp13*(COS2PI_M_SIN2PI)/OneTwentyEight+temp; temp10 = temp10 *   (SIN4PI)/OneTwentyEight;
   temp14 = temp14 * (SIN4PI)/OneTwentyEight;
  atemp = temp14;
  temp14=temp14-temp10;
  temp10+=atemp; temp = (temp15-temp11)*(SIN6PI)/OneTwentyEight;
  temp15=temp15*(COS6PI_P_SIN6PI)/OneTwentyEight+temp;
  temp11=temp11*(COS6PI_M_SIN6PI)/OneTwentyEight+temp;

  atemp = temp8;
  temp8+=temp10;
  temp10=atemp-temp10;
  atemp = temp9;
  temp9+=temp11;
  temp11=atemp-temp11;
  atemp = temp12;
  temp12+=temp14;
  temp14=atemp-temp14;
  atemp = temp13;
  temp13+=temp15;
  temp15=atemp-temp15; outarr[1]=temp8+temp9;
  outarr[3]=temp10-temp15;
  outarr[5]=temp10+temp15;
  outarr[7]=temp8-temp9;
  outarr[9]=temp12+temp13;
  outarr[11]=-temp14-temp11;
  outarr[13]=temp14-temp11;
  outarr[15]=temp13-temp12;
  atemp = temp0;
  temp0=temp0+temp4;
  temp4=atemp-temp4;
  atemp = temp1;
  temp1=temp1+temp5;
  temp5=atemp-temp5;
  atemp = temp2;
  temp2+=temp6;
  temp6=atemp-temp6;
  atemp = temp3;
  temp3+=temp7;
  temp7=atemp-temp7; outarr[0]=temp0+temp2;
  outarr[4]=temp0-temp2;
  temp1+=temp3;
  outarr[12]=(temp3<<1)-temp1;
  outarr[0]+=temp1;
  outarr[8]=outarr[0]-temp1-temp1; atemp = temp5 * (SIN4PI)/ OneTwentyEight;
  temp7 = temp7 * (SIN4PI)/ OneTwentyEight;
  temp5=atemp-temp7;
  temp7+=atemp; outarr[14]=temp6-temp7;
  outarr[2]=temp5+temp4;
  outarr[6]=temp4-temp5;
  outarr[10]=-temp7-temp6; //Magnitude calculations
  if(outarr[8] > 0) {mags[0] = outarr[8];}
  else {mags[0] = -outarr[8];}
  mags[1] = findRoot4(outarr[7], outarr[15]);
  mags[2] = findRoot4(outarr[6], outarr[14]);
  mags[3] = findRoot4(outarr[5], outarr[13]);
  mags[4] = findRoot4(outarr[4], outarr[12]);
  mags[5] = findRoot4(outarr[3], outarr[11]);
  mags[6] = findRoot4(outarr[2], outarr[10]);
  mags[7] = findRoot4(outarr[1], outarr[9]); }
int findRoot4(int x, int y)
{
  if(x < 0)x = -x;
  if(y < 0)y = -y;
  if (x > y) {
    if (x > 2180) return ((15 * (long)x) / 16) + ((15 * (long)y) / 32); //2180 is about 2^15 / 15
    return ((15 * x) / 16) + ((15 * y) / 32);
  } else{
     if (y > 2180) return (int)(((15 * (long)y) / 16) + ((15 * (long)x) / 32));
     return ((15 * y) / 16) + ((15 * x) / 32);
  }
}

 Results

When used with music, the FFT mode creates an interesting light show, which can be seen in this video. When a pure sine wave was passed into the device, the effects of windowing were very apparent. The sidelobes proved to be rather large which caused additional lights to turn on using the visualizer. In order to reduce the effect of the sidelobes, it would be necessary to use a better window than a rectangular window  or to use a larger FFT in order to be able to see more detail in the windowing.


Back to ECE438, Fall 2014

Alumni Liaison

Ph.D. 2007, working on developing cool imaging technologies for digital cameras, camera phones, and video surveillance cameras.

Buyue Zhang