Tips for further optimizing this Perlin noise function

I'm trying to create a more optimized function of the perlin noise algorithm to use in a light installation.
I would like to know if someone sees more possibilities to improve the speed.

Its now pretty fast already (compared with versions I've tried already) and I'm happy with how it looks (sure the other ones are more sophisticated)

It takes about 7 milliseconds for 20 lights. However its just one octave of noise. Two octaves take already 12 milliseconds.
With other interpolations (not lerp but cosine) calculation time will grow fast.

Just to give you an insight of my progress:

I've tested different implementations of the algorithm already. They all took longer than about 20 milliseconds.
http://www.arduino.cc/cgi-bin/yabb2/YaBB.pl?num=1191768812 (Two implementations).
Error (Simplex noise)

And I've read some explanations about the algorithm:
http://freespace.virgin.net/hugo.elias/models/m_perlin.html
http://www.gamedev.net/blog/73/entry-1382657-fast-perlin-noise/

I've finally ended up with the version below mainly based on this source (which is kind of the same as the Mike Edwards implementations from the forum post). For me it was easy to undress this code, because it uses clear functions and naming:
http://code.google.com/p/britable/source/browse/trunk/britable/britable.pde

I partly understand the Maths behind it the algorithm, that makes it difficult to come up with good solutions myself (I tried but failed..).

One improvement I can think of can be the use of a lookup table with random values instead of the randomGenerator used. The lookup table can't be massive either due memory constraints.

But is it worth it and how do I implement it in the functions below?

There are examples that use a lookup table (like the 2nd example in this topic: http://www.arduino.cc/cgi-bin/yabb2/YaBB.pl?num=1191768812), but this code is kind of messy (or at least not clear from a programming perspective, I think a mathematician wouldn't have trouble reading this).

Below a script where I measure the timing and that prints out some values to check if it works in the Serial monitor.

I suppose its already useful for people who want to do something with this and some leds, so feel free to use it.

I've added the code in an .ino file as attachment as well.

/* Code based on perlin function in this source
   http://code.google.com/p/britable/source/browse/trunk/britable/britable.pde#
   http://code.google.com/p/britable/

   kasperkamperman.com 16-09-2012
*/

const byte amountOfLights = 20;

byte lightArray[amountOfLights];  // lightArray is in bytes just values in range of 0 - 255

unsigned long currentTime;     // for time measuring purposes
unsigned long passedTime;     // for time measuring purposes
unsigned long longestTime = 0; // for time measuring purposes

float perlinTimeInc      = 0.07;
float perlinXInc         = 0.05;  
float perlinTimePosition;

void setup()
{   Serial.begin(57600);
}

void loop()
{   
    currentTime = micros(); // store current time
    
    for(byte i=0;i<amountOfLights;i++)
    { 
      float x = float(i)*perlinXInc; // input for x value in the renderNoise function
      
      byte val = renderNoise(x, perlinTimePosition); 
      lightArray[i] = val;
    }
    
    // go a step further in time (input for y function in perlin noise)
    perlinTimePosition = perlinTimePosition + perlinTimeInc;  
    
    // calculate the time the whole calculation took
    passedTime = micros()-currentTime;
    
    // because times will variate, remember the maximum time it took
    if(passedTime>longestTime) longestTime = passedTime;
    
    // print the time it took for the current calculation and the maximum time
    Serial.print("time:  ");
    Serial.print(passedTime);
    Serial.print(" max: ");
    Serial.println(longestTime);
    Serial.print("array: ");
    
    // print the array to see the result
    // I calculate back to float just for printing purposes
    // I this way so actually see a perlin effect
    for(byte i=0;i<amountOfLights;i++)
    { float printFloat = (lightArray[i]/255.0) * 9; 
      Serial.print(printFloat,0);
      Serial.print(", ");
    }
    Serial.println();
    Serial.println();
    
    delay(50);
}

// returns a value between 0 - 255 for lights
byte renderNoise(float x, float y)
{	
  float noise;
  
  // 2 octaves
  //noise = perlin_function(x,y) + perlin_function(x*2,y*2);
  
  noise = perlin_function(x,y); // gives noise in the range of -1 to +1
  noise = noise *128+127;       // scale to a number between 0 - 255
  	 
  return (byte) noise;  
}

float perlin_function(float x, float y)
{
  int fx = floor(x);
  int fy = floor(y);
  
  float s,t,u,v;
  s=perlin_noise_2d(fx,fy);
  t=perlin_noise_2d(fx+1,fy);
  u=perlin_noise_2d(fx,fy+1);
  v=perlin_noise_2d(fx+1,fy+1);
  
  float inter1 = lerp(s,t,x-fx);
  float inter2 = lerp(u,v,x-fx);  

  return lerp(inter1,inter2,y-fy);
}

float perlin_noise_2d(int x, int y) {
  long n=(long)x+(long)y*57;
  n = (n<<13)^ n;
  return 1.0 - (((n *(n * n * 15731L + 789221L) + 1376312589L)  & 0x7fffffff) / 1073741824.0);    
}

float lerp(float a, float b, float x)
{ return a + x * (b - a);
}

perlinOptimizeQuestion.ino (2.77 KB)