Pages: [1] 2   Go Down
Author Topic: Preliminary investigation of the uM-FPU V3.1 device  (Read 3042 times)
0 Members and 1 Guest are viewing this topic.
Global Moderator
Offline Offline
Brattain Member
*****
Karma: 485
Posts: 18810
Lua rocks!
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

I was alerted to the existence of this in another thread:

http://arduino.cc/forum/index.php/topic,98133

I ordered one of the 32-bit chips (Floating Point Co-Processor uM-FPU v3.1) from Sparkfun for $US 19.65.

Manufacturer's page:

http://www.micromegacorp.com/index.html

It's an 18 leg DIP chip, with straightforward wiring requirements. For SPI, just MOSI, MISO, SCK, +5V and Gnd. Plus tying a few of the other pins either to +5V or Gnd.

In its default configuration it doesn't need slave select (SS) however you can configure it to do so, if you need to use other SPI devices as well.

The supplied Arduino library and demo sketch worked as advertised, displaying things like a simple graph of a sine wave.

The big questions are: how fast is it? is it easy to use?

The default speed of the processor in the chip is 29.48 MHz, which is almost twice the clock speed of the normal Arduinos.

However offset against that is the need to communicate with the chip, although SPI is pretty fast. You basically send (via SPI) simple "commands" like "load register 1 with literal 10", "add register 1 to register 2", "retrieve register 2".

Test of calculating sines:

Code:
#include <SPI.h>
#include <Fpu.h>
#include <FpuSerial.h>


void setup()
{
  Serial.begin(115200);
  Serial.println("Nick FPU test");

  SPI.begin();
  Fpu.begin();

  if (Fpu.sync() == SYNC_CHAR)
    FpuSerial.printVersionln();
  else
  {
    Serial.println("FPU not detected");
    while(1) ; // stop if FPU not detected
  }  
}

unsigned long start1, end1, start2, end2;

volatile  float result;

void loop()
{
#define RESULT  1                       // uM-FPU register
  float rad;                          

  start1 = micros ();

  for (rad = 0.0; rad < 3.0; rad += 0.1)
  {

    Fpu.write(SELECTA, RESULT);
    Fpu.write(FWRITE0);
    Fpu.writeFloat(rad);
    Fpu.write(FSET0, SIN);
  }

  end1 = micros ();

  start2 = micros ();
  for (rad = 0.0; rad < 3.0; rad += 0.1)
    result = sin (rad);
  end2 = micros ();

  Serial.print (end1 - start1);
  Serial.println (" uS for maths chip.");
  Serial.println ();

  Serial.print (end2 - start2);
  Serial.println (" uS for inbuilt.");
  Serial.println ();

  Serial.println("Done.");
  while(true)
  {
  }
}  // end of loop

Results:

Code:
Nick FPU test
uM-FPU V3.1.2
2572 uS for maths chip.

3572 uS for inbuilt.

Done.

In this case the FPU chip gave a reasonable speed increase (72% of the execution time).

Trying division however:

Code:
// same setup as above ...

unsigned long start1, end1, start2, end2;

volatile  float result;
volatile float foo;                          
volatile float bar = 424242;                              

void loop()
{
#define RESULT  1                       // uM-FPU register

  start1 = micros ();

  for (foo = 1.0; foo < 1000.0; foo++)
  {

    Fpu.write(SELECTA, RESULT);
    Fpu.write(FWRITE0);
    Fpu.writeFloat(foo);
    Fpu.write(FSET0, FWRITE0);
    Fpu.writeFloat(bar);
    Fpu.write(FDIV0);
  }

  end1 = micros ();

  start2 = micros ();
  for (foo = 1.0; foo < 1000.0; foo++)
    result = foo / bar;
  end2 = micros ();

  Serial.print (end1 - start1);
  Serial.println (" uS for maths chip.");
  Serial.println ();

  Serial.print (end2 - start2);
  Serial.println (" uS for inbuilt.");
  Serial.println ();

  Serial.println("Done.");
  while(true)
  {
  }
}  // end of loop

Results:

Code:
124228 uS for maths chip.

42628 uS for inbuilt.

Exponentiation doesn't seem faster either:

Code:
 for (foo = 1.0; foo < 1000.0; foo++)
  {

    Fpu.write(SELECTA, RESULT);
    Fpu.write(FWRITE0);
    Fpu.writeFloat(foo);
    Fpu.write(FSET0, EXP);

  }

// ...

  for (foo = 1.0; foo < 1000.0; foo++)
    result = exp (foo);

Code:
85360 uS for maths chip.

36528 uS for inbuilt.

Mind you, 85350 uS for the chip is probably mostly the 1000 lots of SPI sending the command and getting the result. In fact I'm a tiny bit skeptical about the results for the inbuilt maths, they seem a bit fast.

(to be continued)
« Last Edit: April 14, 2012, 12:28:45 am by Nick Gammon » Logged


Global Moderator
Offline Offline
Brattain Member
*****
Karma: 485
Posts: 18810
Lua rocks!
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

Precision

This particular chip has 32-bit registers internally, so it will be able to handle longs and normal floats (not doubles). Thus you wouldn't get it for 8-byte double precision. However the manufacturers do have a 64-bit version (not sure of the cost of that).

Ease of use

To be honest, using the thing isn't the simplest thing in the world. It is more-or-less like Assembler programming ... "load register", "add", "store".

However they do supply an IDE environment (Windows only) that takes things like this:

Code:
RESULT EQU F1
deg VAR FLOAT

RESULT = SIN ( DEGREES (deg))

And turn it into:

Code:
//-------------------- uM-FPU Register Definitions -----------------------------
#define RESULT  1                       // uM-FPU register

//-------------------- Variable Definitions ------------------------------------
float deg;                              // float variable

//-------------------- Generated Code ------------------------------------------
    // RESULT EQU F1
    // deg VAR FLOAT
    //
    //
    // RESULT = SIN ( DEGREES (deg))
    Fpu.write(SELECTA, RESULT);
    Fpu.write(FWRITE0);
    Fpu.writeFloat(deg);
    Fpu.write(FSET0, DEGREES, SIN);
    //

So that simplifies using it a bit.

Probably the big time saving would be if you wanted to do parallel processing. Particularly for something time-consuming like sines/cosines etc., because the chip can be doing that in the background while you are doing something else like taking readings.
Logged


Global Moderator
Dallas
Offline Offline
Shannon Member
*****
Karma: 208
Posts: 12936
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

Quote
SPI compatible interface up to 15 MHz

Try...
SPI.setClockDivider( SPI_CLOCK_DIV2 );
Logged

Global Moderator
Offline Offline
Brattain Member
*****
Karma: 485
Posts: 18810
Lua rocks!
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

Good idea.

Before:

Code:
85360 uS for maths chip.

36528 uS for inbuilt.

Add that line:

Code:
  SPI.begin();
  SPI.setClockDivider( SPI_CLOCK_DIV2 );  // faster SPI
  Fpu.begin();

After:

Code:
85360 uS for maths chip.

36528 uS for inbuilt.


Hmmm.

Let's investigate. We did Fpu.begin ():

Code:
void UMFPU::begin(void)
{
  begin(SS_PIN);
}

And begin() does:

Code:
void UMFPU::begin(byte pin)
{
  // initialize the chip select
  _cs = pin;
  digitalWrite(_cs, HIGH);
  pinMode(_cs, OUTPUT);
  reset();
}

Which is kind of strange because I'm not using chip select.

Code:
#define SS_PIN    10

Looks like I'm using pin 10 whether I like it or not.

And reset() does:

Code:
void UMFPU::reset()
{
  digitalWrite(_cs, LOW);

  // disable SPI.Master
  SPI.end();

  // reset the FPU
  digitalWrite(MOSI_PIN, HIGH);
  for (byte i = 0; i < 80; i++)
  {
    digitalWrite(SCK_PIN, HIGH);
    digitalWrite(SCK_PIN, LOW);
  }
  digitalWrite(MOSI_PIN, LOW);

  delay(10);

  // enable SPI.Master
  SPI.setDataMode(SPI_MODE0);
  SPI.setBitOrder(MSBFIRST);
  SPI.setClockDivider(SPI_CLOCK_DIV4);
  SPI.begin();

  digitalWrite(_cs, HIGH);
}

Aha! It sets the clock divider. And calls SPI.begin().

So why do their examples call SPI.begin()?

Well, we'll fix that:

Code:
  SPI.begin();
  Fpu.begin();
  SPI.setClockDivider( SPI_CLOCK_DIV2 );

Fooled them! Now to test it ...

Code:
Nick FPU test
FPU not detected

Looks like that clock speed isn't supported.
Logged


Global Moderator
Dallas
Offline Offline
Shannon Member
*****
Karma: 208
Posts: 12936
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset


Does this work...

  SPI.begin();
  Fpu.begin();
  SPI.setClockDivider( SPI_CLOCK_DIV4 );

Logged

Global Moderator
Offline Offline
Brattain Member
*****
Karma: 485
Posts: 18810
Lua rocks!
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

I presume it will because the code above had:

Code:
  SPI.setClockDivider(SPI_CLOCK_DIV4);

in it.
Logged


nr Bundaberg, Australia
Offline Offline
Tesla Member
***
Karma: 129
Posts: 8531
Scattered showers my arse -- Noah, 2348BC.
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

So far then it looks like it's faster for sines but way slower for divisions, in that case there's no point using the chip unless you can make use of the parallel processing it offers.

Based on that (and assuming the other trig functions are fast) if you had a trig-intensive application that properly pipelined the actions of asking for data and using the data it might be good. Also if you had some constants loaded into these "registers" that would save a few operations I assume.

I think I've seen the Picaxe guys talking about this, it would certainly make sense for them.

______
Rob
Logged

Rob Gray aka the GRAYnomad www.robgray.com

Global Moderator
Netherlands
Offline Offline
Shannon Member
*****
Karma: 217
Posts: 13742
In theory there is no difference between theory and practice, however in practice there are many...
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Hi Nick,

Good work (again !),

Q: The chip also has a 12bit ADC, did you get that working?

That would make the chip interesting as one could connect e.g. a thermister or so and do

in arduinish based upon - http://arduino.cc/playground/ComponentLib/Thermistor2 -
Code:
double Thermister() {
 int rawADC = analogRead(1)
 double Temp = log(((40960000/RawADC) - 10000));
 Temp = 1 / (0.001129148 + (0.000234125 + (0.0000000876741 * Temp * Temp ))* Temp );
 Temp = Temp - 273.15;            // Convert Kelvin to Celcius
 Temp = (Temp * 9.0)/ 5.0 + 32.0; // Convert Celcius to Fahrenheit
 return Temp;
}

The "only thing" to do is to start the conversion-routine and read back the value of the temp register..
 
Q: is it possible to store code on the Fpu and reuse it ...(my implicit assumption above)
--update --
A: It also provides Flash memory and EEPROM for storing user-defined functions (from the man. webpage)



« Last Edit: April 14, 2012, 04:46:09 am by robtillaart » Logged

Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

Rapa Nui
Offline Offline
Edison Member
*
Karma: 60
Posts: 2073
Pukao hats cleaning services
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

..fyi - they used dspic30F3012 (18pin, 16bitter, 30MIPS) for that fpu chip, at least in the past.. you may take a pic32mx2xx in dip28 (or any newer pic24, dspic33), add the microchip math libs (ie those within the C30, C32 compilers) and SPI slave, and you will be faster smiley
« Last Edit: April 14, 2012, 06:03:36 am by pito » Logged

Global Moderator
Offline Offline
Brattain Member
*****
Karma: 485
Posts: 18810
Lua rocks!
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset


Q: is it possible to store code on the Fpu and reuse it ...(my implicit assumption above)

I believe so, not that I have done it. That would be one of the major advantages - you could store quite a complex calculation, feed in a couple of parameters, and have it work it out.

Of course, you could do a similar thing with a second Atmega328, I suppose it depends on your application.
Logged


Offline Offline
Edison Member
*
Karma: 48
Posts: 1635
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

I also have the um-fpu V3.1 and tried timing it against the Arduino code and was getting similar odd results.
I changed the divider in the fpu reset function to DIV2 and it compiles and works for me. I also modified Nick's FPU code slightly to this:
Code:
    Fpu.write(SELECTA, RESULT); // 4
    Fpu.write(FWRITEA); // 5
    Fpu.writeFloat(foo);
    Fpu.write(FWRITE0); // 5
    Fpu.writeFloat(bar);
    Fpu.write(FDIV0); // 20
    Fpu.wait();
The number in the comment is the execution time in microseconds for that instruction as specified in Appendix B of the datasheet.
The write instructions don't wait for the result to complete and if you write more than 256 instructions you might get weird results. The Fpu.wait() waits until the instructions have executed. This slows it down even more but is a more realistic comparison.
This fpu code writes the first value into the A register, the second value into register zero and then divides register A by register zero.
With the DIV2 clock, the code is faster but not enough to overtake the Arduino.
The FPU takes 130780 vs. 42620 with the built-in code.

I also timed that code when using a multiply instead of divide in both loops and got 124384ms for the FPU but the built-in code was 21124ms. This makes no sense. The FPU should be able to do a multiply much faster than it can do a divide and the divide has to be faster than what the Arduino can do in software.
The fpu instructions are 12 bytes long. Sending those 12 bytes to the fpu with DIV2 (clock is at 8MHz) should take about 12 microseconds and the execution time of the actual fpu instructions should add another 34 microseconds. So once through the loop should take about 46 microseconds and a thousand times around should take about 46 milliseconds.

Something is really wrong!

Pete
Logged

Global Moderator
Offline Offline
Brattain Member
*****
Karma: 485
Posts: 18810
Lua rocks!
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

Ah yes, good point about the wait. Now I feel a bit silly. Of course, without waiting you are pumping stuff out to the chip faster than it can do it. So with the last test (exponentiation) the results now are:

Code:
110016 uS for maths chip.

36528 uS for inbuilt.

For calculating sines:

Code:
5436 uS for maths chip.

3572 uS for inbuilt.

For division:

Code:
144212 uS for maths chip.

42620 uS for inbuilt.

This is with the standard SPI, not the DIV2 one.

Even the above tests are a bit unfair because I added the Fpu.wait() but didn't read the results back. So that would have added a bit more time to get the result back into the main chip.

Quote
Something is really wrong!

To be fair to the chip manufacturers I would be pleased to hear in what way my tests are not correct. On the face of it though, you aren't saving any time (on these particular tests) using it.

I can't believe that the normal Uno can do 1000 sines in 3572 uS. That's a sine calculation in 3.572 uS! (57 clock cycles). Despite my use of volatile they must be being optimized away. I'll look into that some more.
Logged


Global Moderator
Offline Offline
Brattain Member
*****
Karma: 485
Posts: 18810
Lua rocks!
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

Oops. The sine test was only 30 iterations. (0.0 to 3.0 by 0.1 intervals). So that's 30 sine calculations in 119 uS each. That's 1905 clock cycles. I suppose that's believable.
« Last Edit: April 14, 2012, 05:08:23 pm by Nick Gammon » Logged


Rapa Nui
Offline Offline
Edison Member
*
Karma: 60
Posts: 2073
Pukao hats cleaning services
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

..attached is the MCHP document with number of cycles for floating point math, which is most probably (99.9% smiley ) included in that fpu chip you investigate..
.. ie.: that uM-FPU V3.1 chip (dspic30F3012) and 32bit floating point function calc @29MIPS takes
SIN ~2238cycles = ~77usecs
MUL ~109cycles = ~3.76usecs
DIV ~361cycles = ~12.45usecs

* dspic30float.pdf (479.07 KB - downloaded 42 times.)
« Last Edit: April 15, 2012, 07:43:43 am by pito » Logged

Global Moderator
Netherlands
Offline Offline
Shannon Member
*****
Karma: 217
Posts: 13742
In theory there is no difference between theory and practice, however in practice there are many...
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Quote
Oops. The sine test was only 30 iterations. (0.0 to 3.0 by 0.1 intervals). So that's 30 sine calculations in 119 uS each. That's 1905 clock cycles. I suppose that's believable.
But the Arduino loop was also 30 iterations so the performance ratio still hold.
Logged

Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

Pages: [1] 2   Go Up
Jump to: