learning the ARM CORTEX M3

suppose that I wanted to learn the ARM CORTEX M3 programming in detail... is using development boards is the right answer? something like Arduino Due or mbed NXP LPC1768 board (http://www.pololu.com/catalog/product/2150)??

this is because I really would like to go deeper into ARM technology. any thoughts are welcome.

Most dev boards should allow you to "learn the ARM CORTEX M3", I don't know if you can get down and dirty with the mBed, but certainly the Due you can and any of the LPCxpresso boards and the Teensy 3 I would think.

The trouble is that 99.9% of programming on an "ARM" is not actually programming the ARM itself, it's programming the peripherals and they are different for every chip vendor. Typically you get control of the CPU after the system init code has run, this code is supplied by the vendor and while you can play with it there's normally no reason to do so.

So what is your end goal? What is it you really want to learn?


Rob

the truth... I have read these books about the cortex m3:

http://books.google.com.eg/books?id=HKKUkDQE17QC&redir_esc=y

I mean I read most of them. now I am trying to practically implement everything that I have learned.

Then I would buy an LPCxpresso or a Due and get stuck in.

I'm not sure what practical things you can learn by digging deep into the CPU, certainly very few people need that knowledge I would say, but knock yourself out :slight_smile:


Rob

will do, thanks :slight_smile:

Here's a quiz:

  1. What are the differences between a CM0, CM3, and CM4, and ARM9 ARM architecture?
  2. What are the differences between the ARM instruction set an the THUMB instruction set?
  3. Compare a simple register-based IO port, a "intelligent" IO port (with set/clear/toggle registers), and a register-based IO port with bit-banding. Write an implementation of Arduino's "digitalWrite" and "digitalRead" function using each model. What other mechanisms have ARM processor vendors provided for fast bit-twiddling?
  4. What aspects of the ARM are most likely to make writing cycle-accurate code difficult?
  5. compare advantages and disadvantages of link-time vs compile-time definition of peripheral addresses.
  6. which features are constant across all ARM CMx chips? Which are most likely to vary wildly? How do the various ARM microcontroller (CMx) vendors differentiate their products?
  7. Who currently sells the most (number) of CMx chips? Who sells the most ($$)?
  8. Why did TI discontinue ("Not Recommended for New Designs") the entire Stellaris line of CM3 chips?
  9. Why did SI Labs pay $170 million to acquire Energy Micro?
  10. What aspects of an ARM CMx chip/system add complexity and cost to a hardware design?
  11. What aspects of an ARM CMx chip/system contribute to bloated binary executable size? Which are avoidable, and is it worth avoiding?
  12. What is CMSIS? How does it related to ASF?
  13. Several vendors provide extensive peripheral libraries for their ARM chips. Is there any commonality?

Lab: Build an ARM development environment from source code; for chips from two different vendors.

:slight_smile:

FYI - I am toying with M3 and I've done a small comparison, quite surprised, though.. :astonished:
PS: Can somebody try with Due and IDE 1.5.2?
Loop:

void setup() {
Serial.begin(115200);
volatile long int t1, t2, i;
volatile double temp;

t1 = micros();
for (i=1; i<=10000; i++) {
  temp = sqrt(i);
  // temp = sin(i);
  // temp = i + 945.242e6;
  // temp = i * 945.242e6;
  // temp = i / 945.242e6;
}
t2 = micros();

t2 = t2 - t1 - 11320;  //11320 is an empty 10000x loop
Serial.println(t2/10000.0);
}

void loop() { }

m3vsatmel.bmp (631 KB)

I got this:

Due	@84MHz	IDE 1.5.2
double	usec	
empty	0.16	
add	2.93	
mul	2.49	
div	11.9	
sin	52.08	
sqrt	25.91

Maple	@72MHz	IDE 0.0.12
double	usec	
empty	0.2	
add	2.87	
mul	2.52	
div	13.32	
sin	53.3	
sqrt	24	

uC32	@80MHz	MPIDE 0023
double	usec	
empty	0.11	
add	1.54	
mul	1.76	
div	2.87	
sin	1.57	
sqrt	13.78

UnoR3	@16MHz	IDE 1.5.2
single	usec	double
empty	2.14	4.28
add	12.72	25.44
mul	12.35	24.7
div	33.81	67.62
sin	126.77	253.54
sqrt	48.77	97.54

UnoR3	@16MHz	IDE 1.0.4
single	usec	double
empty	2.14	4.28
add	12.72	25.44
mul	12.35	24.7
div	33.81	67.62
sin	122.91	245.82
sqrt	34.28	68.56

1284P	@16MHz	IDE 1.0.4
single	usec	double
empty	2.14	4.28
add	12.72	25.44
mul	12.35	24.7
div	33.8	67.6
sin	122.9	245.8
sqrt	34.28	68.56

1284P	@20MHz	IDE 1.0.4
single	usec	double
empty	1.6	3.2
add	9.54	19.08
mul	9.26	18.52
div	25.36	50.72
sin	92.18	184.36
sqrt	25.71	51.42

UnoR3 @16MHz IDE 1.5.2
double usec

Not really a fair comparison - 32 bit float vs. 64 bit double.

Above in my post there is double vs double comparison. Interesting are the @1MHz results.
The best compiler is the microchip's C32, imho..

pito:
Above in my post there is double vs double comparison. Interesting are the @1MHz results.
The best compiler is the microchip's C32, imho..

That PIC32 is really fast.

That PIC32 is really fast.

pic32mx is not much faster than cortex-m3, but it seems the gcc compilers are really slow with floating point. Microchip's C30/32 compilers are optimized for their chips..

pic32mx is not much faster than cortex-m3, but it seems the gcc compilers are really slow with floating point. Microchip's C30/32 compilers are optimized for their chips.

also, the PIC32MX uses a 5 stage pipe-lining. while the M3 uses 3 stages pipe-lining making it a little slower than the PIC.

The best compiler is the microchip's C32, imho..

The best floating point library seems to be in in the Microchip compiler.

This probably means that microchip is providing an optimized-for-MIPS floating point library, while Atmel/Arduino is providing a floating-point library based on the gcc generic soft-float functions. (I'm not quite sure how to check this, though.) (AVR also has highly optimized floating point functions.) TI also recently updated their math library for MSP430 and wound up with significantly improved performance. (MSPMATHLIB Driver or library | TI.com ) This is somewhat sad commentary on the gcc soft-float code, I think. Although I think that's part of the gnu "encouragement" of vendors to provide "contributions" to the effort.

I've tried sin() and cos() with CORDIC (32iterations, 32bit fixed point results, cortex-M3, @72MHz) and it does BOTH sin() and cos() in ~1000 mcu cycles (13.9usecs per calculation).
Precision ~7digits (here is 1000.0cos(theta) and 1000.0sin(theta) printed out):

Cordic Cos       Cos()      Cordic Sin      Sin()
1000.000007 : 1000.000000 ... -0.000001 : 0.000000 cycles= 984  elpsd= 13.667E-06
999.506560 : 999.506560 ... 31.410758 : 31.410759 cycles= 1001  elpsd= 13.903E-06
998.026730 : 998.026728 ... 62.790518 : 62.790520 cycles= 1001  elpsd= 13.903E-06
995.561964 : 995.561965 ... 94.108311 : 94.108313 cycles= 1001  elpsd= 13.903E-06
992.114699 : 992.114701 ... 125.333234 : 125.333234 cycles= 1001  elpsd= 13.903E-06
987.688337 : 987.688341 ... 156.434465 : 156.434465 cycles= 1001  elpsd= 13.903E-06
982.287251 : 982.287251 ... 187.381320 : 187.381315 cycles= 1001  elpsd= 13.903E-06
975.916764 : 975.916762 ... 218.143241 : 218.143241 cycles= 1001  elpsd= 13.903E-06
968.583161 : 968.583161 ... 248.689890 : 248.689887 cycles= 1001  elpsd= 13.903E-06
960.293690 : 960.293686 ... 278.991110 : 278.991106 cycles= 1001  elpsd= 13.903E-06
951.056517 : 951.056516 ... 309.016999 : 309.016994 cycles= 1001  elpsd= 13.903E-06
940.880770 : 940.880769 ... 338.737918 : 338.737920 cycles= 1001  elpsd= 13.903E-06

That would be 1000usecs @1MHz (single precision SIN or COS), now closer to Microchip's 439us @1MHz (but it does with double)..

http://www.dcs.gla.ac.uk/~jhw/cordic/

If you are going to be doing a lot of floating point, you should be using a chip that has actual floating point instructions in it. Unfortunately, the Due does not use such a chip. Systems like Rasberry Pi, Beaglebone Black, etc. tend to have floating point. So you have to weigh is the floating point dominating your system that it would be better to switch to doing it via an alternate system, or do you just use a little floating point, and it is acceptable to use the software emulated floating point.

In addition to standard floating point in the Arm chip, in these systems that typically are designed for Linux, often times have GPUs (graphical processing unit) that you can push more of the calculations to the GPU. It depends on how dedicated you are to getting the most performance.

Systems like Rasberry Pi, Beaglebone Black, etc. tend to have floating point.

STM32F4xx, 3xx (CM4) have got it as well.

westfw:
Here's a quiz:

All interesting questions, especially:

  1. Why did TI discontinue ("Not Recommended for New Designs") the entire Stellaris line of CM3 chips?
  2. Why did SI Labs pay $170 million to acquire Energy Micro?

I Googled around a little, found that the Stellaris CM3 line had bugs and used old 250nm production process. Energy Micro will give SI Labs low power and wireless functionality. That was just real quick, so would be glad to hear other input.

westfw, I could answer all those questions but I think it's only fair to let someone else have a go :slight_smile:


Rob

Graynomad:
westfw, I could answer all those questions but I think it's only fair to let someone else have a go :slight_smile:

Sounds like youse guys could teach a class. Probably couldn't afford you though :wink: