Go Down

Topic: learning the ARM CORTEX M3 (Read 2802 times) previous topic - next topic

suppose that I wanted to learn the ARM CORTEX M3 programming in detail... is using development boards is the right answer? something like Arduino Due or mbed NXP LPC1768 board (http://www.pololu.com/catalog/product/2150)??

this is because I really would like to go deeper into ARM technology. any thoughts are welcome.

Graynomad

Most dev boards should allow you to "learn the ARM CORTEX M3", I don't know if you can get down and dirty with the mBed, but certainly the Due you can and any of the LPCxpresso boards and the Teensy 3 I would think.

The trouble is that 99.9% of programming on an "ARM" is not actually programming the ARM itself, it's programming the peripherals and they are different for every chip vendor. Typically you get control of the CPU after the system init code has run, this code is supplied by the vendor and while you can play with it there's normally no reason to do so.

So what is your end goal? What is it you really want to learn?

_____
Rob
Rob Gray aka the GRAYnomad www.robgray.com

the truth... I have read these books about the cortex m3:

http://www.amazon.com/Definitive-Guide-Cortex-M3-Second-Edition/dp/185617963X
http://www.amazon.com/ARM-System-Chip-Architecture-Edition/dp/0201675196
http://books.google.com.eg/books?id=HKKUkDQE17QC&redir_esc=y

I mean I read most of them. now I am trying to practically implement everything that I have learned.

Graynomad

Then I would buy an LPCxpresso or a Due and get stuck in.

I'm not sure what practical things you can learn by digging deep into the CPU, certainly very few people need that knowledge I would say, but knock yourself out :)

_____
Rob
Rob Gray aka the GRAYnomad www.robgray.com


westfw

Here's a quiz:
1) What are the differences between a CM0, CM3, and CM4, and ARM9  ARM architecture?
2) What are the differences between the ARM instruction set an the THUMB instruction set?
3) Compare a simple register-based IO port, a "intelligent" IO port (with set/clear/toggle registers), and a register-based IO port with bit-banding.  Write an implementation of Arduino's "digitalWrite" and "digitalRead" function using each model.  What other mechanisms have ARM processor vendors provided for fast bit-twiddling?
4) What aspects of the ARM are most likely to make writing cycle-accurate code difficult?
5) compare advantages and disadvantages of link-time vs compile-time definition of peripheral addresses.
6) which features are constant across all ARM CMx chips?  Which are most likely to vary wildly?  How do the various ARM microcontroller (CMx) vendors differentiate their products?
7) Who currently sells the most (number) of CMx chips?  Who sells the most ($$)?
8) Why did TI discontinue ("Not Recommended for New Designs") the entire Stellaris line of CM3 chips?
9) Why did SI Labs pay $170 million to acquire Energy Micro?
10) What aspects of an ARM CMx chip/system add complexity and cost to a hardware design?
11) What aspects of an ARM CMx chip/system contribute to bloated binary executable size?  Which are avoidable, and is it worth avoiding?
12) What is CMSIS?  How does it related to ASF?
13) Several vendors provide extensive peripheral libraries for their ARM chips.  Is there any commonality?

Lab: Build an ARM development environment from source code; for chips from two different vendors.

:-)

pito

#6
Jun 18, 2013, 12:54 pm Last Edit: Jun 18, 2013, 06:46 pm by pito Reason: 1
FYI - I am toying with M3 and I've done a small comparison, quite surprised, though..  :smiley-eek:
PS: Can somebody try with Due and IDE 1.5.2?
Loop:
Code: [Select]
void setup() {
Serial.begin(115200);
volatile long int t1, t2, i;
volatile double temp;

t1 = micros();
for (i=1; i<=10000; i++) {
 temp = sqrt(i);
 // temp = sin(i);
 // temp = i + 945.242e6;
 // temp = i * 945.242e6;
 // temp = i / 945.242e6;
}
t2 = micros();

t2 = t2 - t1 - 11320;  //11320 is an empty 10000x loop
Serial.println(t2/10000.0);
}

void loop() { }

hiduino

#7
Jun 18, 2013, 10:32 pm Last Edit: Jun 18, 2013, 11:24 pm by hiduino Reason: 1
I got this:
Code: [Select]

Due @84MHz IDE 1.5.2
double usec
empty 0.16
add 2.93
mul 2.49
div 11.9
sin 52.08
sqrt 25.91

Maple @72MHz IDE 0.0.12
double usec
empty 0.2
add 2.87
mul 2.52
div 13.32
sin 53.3
sqrt 24

uC32 @80MHz MPIDE 0023
double usec
empty 0.11
add 1.54
mul 1.76
div 2.87
sin 1.57
sqrt 13.78

UnoR3 @16MHz IDE 1.5.2
single usec double
empty 2.14 4.28
add 12.72 25.44
mul 12.35 24.7
div 33.81 67.62
sin 126.77 253.54
sqrt 48.77 97.54

UnoR3 @16MHz IDE 1.0.4
single usec double
empty 2.14 4.28
add 12.72 25.44
mul 12.35 24.7
div 33.81 67.62
sin 122.91 245.82
sqrt 34.28 68.56

1284P @16MHz IDE 1.0.4
single usec double
empty 2.14 4.28
add 12.72 25.44
mul 12.35 24.7
div 33.8 67.6
sin 122.9 245.8
sqrt 34.28 68.56

1284P @20MHz IDE 1.0.4
single usec double
empty 1.6 3.2
add 9.54 19.08
mul 9.26 18.52
div 25.36 50.72
sin 92.18 184.36
sqrt 25.71 51.42




AWOL

Quote
UnoR3   @16MHz   IDE 1.5.2
double   usec

Not really a fair comparison - 32 bit float vs. 64 bit double.
"Pete, it's a fool looks for logic in the chambers of the human heart." Ulysses Everett McGill.
Do not send technical questions via personal messaging - they will be ignored.

pito

#9
Jun 18, 2013, 10:42 pm Last Edit: Jun 18, 2013, 10:44 pm by pito Reason: 1
Above in my post there is double vs double comparison. Interesting are the @1MHz results.
The best compiler is the microchip's C32, imho..

afremont


Above in my post there is double vs double comparison. Interesting are the @1MHz results.
The best compiler is the microchip's C32, imho..


That PIC32 is really fast.
Experience, it's what you get when you were expecting something else.

pito

Quote
That PIC32 is really fast.

pic32mx is not much faster than cortex-m3, but it seems the gcc compilers are really slow with floating point. Microchip's C30/32 compilers are optimized for their chips..

Quote
pic32mx is not much faster than cortex-m3, but it seems the gcc compilers are really slow with floating point. Microchip's C30/32 compilers are optimized for their chips.

also, the PIC32MX uses a 5 stage pipe-lining. while the M3 uses 3 stages pipe-lining making it a little slower than the PIC. 

westfw

Quote
The best compiler is the microchip's C32, imho..

The best floating point library seems to be in in the Microchip compiler.

This probably means that microchip is providing an optimized-for-MIPS floating point library, while Atmel/Arduino is providing a floating-point library based on the gcc generic soft-float functions. (I'm not quite sure how to check this, though.)  (AVR also has highly optimized floating point functions.)  TI also recently updated their math library for MSP430 and wound up with significantly improved performance. (http://www.ti.com/tool/mspmathlib )   This is somewhat sad commentary on the gcc soft-float code, I think.  Although I think that's part of the gnu "encouragement" of vendors to provide "contributions" to the effort.

pito

#14
Jun 19, 2013, 01:42 pm Last Edit: Jun 19, 2013, 02:14 pm by pito Reason: 1
I've tried sin() and cos() with CORDIC (32iterations, 32bit fixed point results, cortex-M3, @72MHz) and it does BOTH sin() and cos() in ~1000 mcu cycles (13.9usecs per calculation).
Precision ~7digits (here is 1000.0*cos(theta) and 1000.0*sin(theta) printed out):
Code: [Select]

Cordic Cos       Cos()      Cordic Sin      Sin()
1000.000007 : 1000.000000 ... -0.000001 : 0.000000 cycles= 984  elpsd= 13.667E-06
999.506560 : 999.506560 ... 31.410758 : 31.410759 cycles= 1001  elpsd= 13.903E-06
998.026730 : 998.026728 ... 62.790518 : 62.790520 cycles= 1001  elpsd= 13.903E-06
995.561964 : 995.561965 ... 94.108311 : 94.108313 cycles= 1001  elpsd= 13.903E-06
992.114699 : 992.114701 ... 125.333234 : 125.333234 cycles= 1001  elpsd= 13.903E-06
987.688337 : 987.688341 ... 156.434465 : 156.434465 cycles= 1001  elpsd= 13.903E-06
982.287251 : 982.287251 ... 187.381320 : 187.381315 cycles= 1001  elpsd= 13.903E-06
975.916764 : 975.916762 ... 218.143241 : 218.143241 cycles= 1001  elpsd= 13.903E-06
968.583161 : 968.583161 ... 248.689890 : 248.689887 cycles= 1001  elpsd= 13.903E-06
960.293690 : 960.293686 ... 278.991110 : 278.991106 cycles= 1001  elpsd= 13.903E-06
951.056517 : 951.056516 ... 309.016999 : 309.016994 cycles= 1001  elpsd= 13.903E-06
940.880770 : 940.880769 ... 338.737918 : 338.737920 cycles= 1001  elpsd= 13.903E-06

That would be 1000usecs @1MHz (single precision SIN or COS), now closer to Microchip's 439us @1MHz (but it does with double)..

http://www.dcs.gla.ac.uk/~jhw/cordic/

Go Up