Pages: 1 [2] 3   Go Down
Author Topic: Help Choosing Upgrade from ATMEGA328 on UNO for Production Version of PCB  (Read 3698 times)
0 Members and 1 Guest are viewing this topic.
Dallas, TX USA
Offline Offline
Faraday Member
**
Karma: 70
Posts: 2762
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset


Quote
As far as digitalWrite() goes there are several implementations that are MUCH faster when the values are constants.

Absolutely. However the digitalWrite code allows for variables, so this isn't the same.

Sure requiring pure constants isn't the same,
but the method used in fm's library DOES allow variables and normal constructors that
are processed at runtime. It does not depend on compile time constants to get its speed.
While it isn't nearly as fast as what can be done with pure constants,
it is portable across processors, and still WAY faster than using digitalWrite()
With respect to code size, it should be close to a wash with respect using digitalWrite().

The code that Paul did for Teensy still uses the same functions and arguments
as digitalWrite()/digitalRead() but takes advantage of compile time capabilities to get faster AND smaller code
even when all the arguments are not constants.

The irony of all this, is that the majority of sketch code that I've looked at actually does use
constants at least for the pins. And in the case of Teensy, that will yield smaller and faster code
even if the value used for setting the pin is a variable.
The code supplied in the IDE doesn't take advantage of compile time knowledge as much
as it should.
For example, the compiled code is not ever going run on different processors with different
pin mappings.
While the pin number may not be constant the pin mappings are, and the way the code
is currently written, it does take advantage of the knowledge.
The code could be refactored to reduce the number of table lookups
since the pin mappings are known at compile time.

And if the core code would simply take advantage of when the arguments are constants
the code gets MUCH faster (50x for digitalWrite() ) and smaller, particularly
if all the calls to digitalWrite use pin constants since all the port/pin tables will
no longer be linked in.



You need to provide proof of code that is faster and smaller. Not just faster or smaller.

I know the original poster wanted smaller, but your original comment was:
Quote
I don't know that the Arduino libraries are, per se, egregiously slow and inefficient.

I think you have conceded that the Arduino code includes portions that are egregiously slow.
On the faster and smaller, there are examples where the code can get smaller and still be faster.
I have shown a simple example like the one of changing the retarded unsigned int in HardwareSerial
used for the buffer head/tail indexes to uint8_t that makes the code faster and smaller.
(This simple optimization should have been done years ago)


There are many of these kinds of simple optimizations out there in the core code and Libraries.

--- bill
Logged

Global Moderator
Melbourne, Australia
Offline Offline
Brattain Member
*****
Karma: 510
Posts: 19306
Lua rocks!
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

Have a look at the HardwareSerial code.
It retardly uses unsigned int for the head/tail indexes.
The reason I say that this is "retarded" is that even though the code declares them as volatile, the code will break if the values get larger than a single byte because other parts of the code do not properly deal with atomicity.
Changing these to uint8_t makes the code  faster and saves a few hundred bytes.
Nothing is lost by making this change since the code won't work right if the buffers require larger than 8 bit values anyway.
(I make this change to every single Arduino release)

I've thrown my oar into the existing bug report here:

https://github.com/arduino/Arduino/issues/1078
Logged

http://www.gammon.com.au/electronics

Please post technical questions on the forum - not to me by personal message. Thanks a lot.

Global Moderator
Melbourne, Australia
Offline Offline
Brattain Member
*****
Karma: 510
Posts: 19306
Lua rocks!
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

As for digitalWriteFast. Test code:

Code:
void setup ()
  {
  digitalWrite (1, HIGH);
  digitalWrite (2, HIGH);
  digitalWrite (3, HIGH);
  digitalWrite (4, HIGH);
  digitalWrite (5, HIGH);
  digitalWrite (6, HIGH);
  digitalWrite (7, HIGH);
  digitalWrite (8, HIGH);
  digitalWrite (9, HIGH);
  }  // end of setup

void loop () { }

Size:

Code:
Binary sketch size: 796 bytes (of a 32,256 byte maximum)



Now with digitalWriteFast:

Code:
#include <digitalWriteFast.h>

void setup ()
  {
  digitalWriteFast (1, HIGH);
  digitalWriteFast (2, HIGH);
  digitalWriteFast (3, HIGH);
  digitalWriteFast (4, HIGH);
  digitalWriteFast (5, HIGH);
  digitalWriteFast (6, HIGH);
  digitalWriteFast (7, HIGH);
  digitalWriteFast (8, HIGH);
  digitalWriteFast (9, HIGH);
  }  // end of setup

void loop () { }

Size:

Code:
Binary sketch size: 478 bytes (of a 32,256 byte maximum)

Just saved 318 bytes!
Logged

http://www.gammon.com.au/electronics

Please post technical questions on the forum - not to me by personal message. Thanks a lot.

Global Moderator
Melbourne, Australia
Offline Offline
Brattain Member
*****
Karma: 510
Posts: 19306
Lua rocks!
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

Using variables resulted in a slight code size increase (8 bytes):

Code:
void setup ()
  {
  for (int i = 1; i < 10; i++)
    digitalWrite (i, HIGH);
  }  // end of setup

void loop () { }

Still, often you use constants, so I'd go along with Bill on this one.
Logged

http://www.gammon.com.au/electronics

Please post technical questions on the forum - not to me by personal message. Thanks a lot.

Dallas, TX USA
Offline Offline
Faraday Member
**
Karma: 70
Posts: 2762
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

The comparison is not a good one since the code is not really the same.
The second example has a loop. It has only on 1 call but it does
an AVR inefficient int vs a uint8_t
but without looking at the assembler it is hard to say what caused the bump up.

One thing to keep in mind when using optimized digital i/o routines is that
the savings will be more on larger AVRs since the data tables are larger because they have more pins.


With Teensyduino, you get that kind of optimization for free without having to use
different API calls because the core code handles all this "magic" all behind the scenes by doing a bunch of pre-processor
magic to check for constants then uses the smaller/faster code when possible
when not possible you are not worse off and in some cases like when just the value parameter
is a variable you still get the good optimization.



--- bill
« Last Edit: July 25, 2013, 05:14:30 pm by bperrybap » Logged

Dallas, TX USA
Offline Offline
Faraday Member
**
Karma: 70
Posts: 2762
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Back on topic....

epinc,
I tend to agree with both Nick and with mrburnette
in that arduino is not really the best platform for production products.
It wasn't really designed for small code. It's main design goal was to be easy.
And that making a change so late is very costly in terms
of time particularly if you change processor architectures.
(AVR to ARM or PIC)

The first thing I'd do is look over a real listing output
to see exactly where all the code space is going. You may
find some surprises and get lucky and be able to free up
some space that was not needed.

Without looking at a link map or listing output, it is impossible to know
how much space can be optimized and were to start looking.

If you really want something bigger/faster
Then I'd take a long hard look at your desired schedule and cost goals see
how much of a hit you can take.
If it isn't much, then you are either done and will have to rely on future
optimizations to squeeze in new features or are limited to AVR
processors.

--- bill





« Last Edit: July 25, 2013, 05:15:48 pm by bperrybap » Logged

Global Moderator
Melbourne, Australia
Offline Offline
Brattain Member
*****
Karma: 510
Posts: 19306
Lua rocks!
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

The second example has a loop. It has only on 1 call but it does
an AVR inefficient int vs a uint8_t
but without looking at the assembler it is hard to say what caused the bump up.

Changing int to byte made no difference in code size. The compiler optimized away the loop counter into a register.

As for the 8 bytes:

Code:
void setup ()
  {
  for (byte i = 1; i < 10; i++)
    digitalWrite (i, HIGH);
  }  // end of setup

void loop () { }

Generates:

Code:
00000102 <setup>:
 102: 1f 93        push r17
 104: 11 e0        ldi r17, 0x01 ; 1
 106: 81 2f        mov r24, r17
 108: 61 e0        ldi r22, 0x01 ; 1
 10a: 0e 94 8c 00 call 0x118 ; 0x118 <digitalWrite>
 10e: 1f 5f        subi r17, 0xFF ; 255
 110: 1a 30        cpi r17, 0x0A ; 10
 112: c9 f7        brne .-14      ; 0x106 <setup+0x4>
 114: 1f 91        pop r17
 116: 08 95        ret

00000118 <digitalWrite>:
}
}

However:

Code:
#include <digitalWriteFast.h>

void setup ()
  {
  for (byte i = 1; i < 10; i++)
    digitalWriteFast (i, HIGH);   
  }  // end of setup

void loop () { }

Generates:

Code:
00000102 <setup>:
 102: cf 93        push r28
 104: df 93        push r29
 106: c1 e0        ldi r28, 0x01 ; 1
 108: d0 e0        ldi r29, 0x00 ; 0
 10a: 8c 2f        mov r24, r28
 10c: 61 e0        ldi r22, 0x01 ; 1
 10e: 0e 94 90 00 call 0x120 ; 0x120 <digitalWrite>
 112: 21 96        adiw r28, 0x01 ; 1
 114: ca 30        cpi r28, 0x0A ; 10
 116: d1 05        cpc r29, r1
 118: c1 f7        brne .-16      ; 0x10a <setup+0x8>
 11a: df 91        pop r29
 11c: cf 91        pop r28
 11e: 08 95        ret

00000120 <digitalWrite>:
}
}
Logged

http://www.gammon.com.au/electronics

Please post technical questions on the forum - not to me by personal message. Thanks a lot.

Global Moderator
Melbourne, Australia
Offline Offline
Brattain Member
*****
Karma: 510
Posts: 19306
Lua rocks!
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

I still don't see why you would change processors this late in the process "just in case" you want more space after they have been deployed. How would that be useful?
Logged

http://www.gammon.com.au/electronics

Please post technical questions on the forum - not to me by personal message. Thanks a lot.

Dallas, TX USA
Offline Offline
Faraday Member
**
Karma: 70
Posts: 2762
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

I still don't see why you would change processors this late in the process "just in case" you want more space after they have been deployed. How would that be useful?
I thought the goal was the have some space left available in flash before
the first set of boards deployed to ensure that there is space for a future firmware update.
It is an attempt to "futureproof" the hardware to allow for future s/w updates that have additional features.
It makes sense as everybody likes the "just in case", options
but this late, it will cost a schedule hit and increase the cost of the board,
particularly if the board has to get larger or change process types (through hole to surface mount etc).

The schedule hit could be extended further if the the libraries used are not already available
for the new processor.

--- bill
Logged

Anaheim CA.
Offline Offline
Faraday Member
**
Karma: 48
Posts: 2935
...
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

A really quick solution would be to put a 644 on board and connect the common pins from the 328 to the 644. Dip of course... when it comes time for code updates just plug in new 644's pre-programmed after having removed the old, smaller 328's first..

Doc
Logged

--> WA7EMS <--
“The solution of every problem is another problem.” -Johann Wolfgang von Goethe
I do answer technical questions PM'd to me with whatever is in my clipboard

Global Moderator
Boston area, metrowest
Offline Offline
Brattain Member
*****
Karma: 549
Posts: 27418
Author of "Arduino for Teens". Available for Design & Build services. Now with Unlimited Eagle board sizes!
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

That's how I got started with the bigger chips:

http://forum.arduino.cc/index.php?topic=56567.0


* Hacked_Uno.jpg (147.29 KB, 988x765 - viewed 27 times.)
Logged

Designing & building electrical circuits for over 25 years. Check out the ATMega1284P based Bobuino and other '328P & '1284P creations & offerings at  www.crossroadsfencing.com/BobuinoRev17.
Arduino for Teens available at Amazon.com.

SF Bay Area (USA)
Offline Offline
Tesla Member
***
Karma: 137
Posts: 6805
Strongly opinionated, but not official!
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Quote
On the faster and smaller
I would tend to doubt that anyone would make much of a dent in a program that is "almost 32k" by switching to a different set of libraries.
The difference between 400 bytes and 900 bytes for a simple digitalWrite(fast) test is impressive, but the difference between 32000 and 31500 bytes is pretty meaningless...

Quote
you need to read about software/hardware development life-cycles!
The original poster wasn't very clear on how "early" a prototype their "prototype" was.  An Arduino Uno where the code barely fits, is a FINE place to do a "proof of concept" type prototype.  And they've successfully discovered that the m328 is probably not-quite big enough for their app.  Perfect. 

The "production PCB" phrase was probably ill-advised.  If you haven't picked a suitable CPU yet, you're not ready to do a "production" pcb, you're still THINKING about doing your PROTOTYPE CUSTOM PCB.  You need to figure out which CPU you're going to use, first.   The best way to do that is probably to try out some of the other CPUs that are readily available.  MEGA is the high-end (and official) Arduino CPU at the moment; your code should port to it relatively easily.  You can consider other AVR Arduino boards, using the ATmega644 or ATmega1284 chips.  This will require stepping out of the "official" territory, though.   And all of these run at the same speed as the Uno.

Arduino Due is the official Arduino ARM board, running significantly faster than the AVR Arduinos, if it turns out you need more speed.   There are lots of different ARM chips, but only this one "officially supported."  Despite having a common CPU architecture, ARM chips are NOT all the same, and using a different ARM chip could mean a lot of work adjusting for differences in peripherals and libraries.

ChipKit, using the Microchip PIC32 chips, is another option that supports the Arduino libararies (more or less.)  I believe that this supports a significant variety of PIC32 chips at this point, including some that are available in 28pin DIP packages, with more memory than the 28pin DIP AVRs.

All of those are solutions that you can try to implement very quickly and cheaply (much cheaper than making your own PCB, regardless of chip!)

Porting your code to one or more alternative chips is likely to be very educational, and you should end up with nice portable code that doesn't make you nervous about needing to port it somewhere else, eventually.  If you're lucky.
Logged

Atlanta, USA
Offline Offline
Edison Member
*
Karma: 56
Posts: 1847
AKA: Ray Burne
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Quote
A really quick solution would be to put a 644 on board and connect the common pins from the 328 to the 644. Dip of course...

Arg... Op has been testing with a 328P-PU.  Changing the uC would require a full complement of both hardware and software tests!  This is a schedule breaker.

I've worked in Research in a School of EE and we were so anal that we insisted on parts from the same manufacturing "lot" for some of our research projects!  It is rumored that Intel has a master plan for all of its fab facilities... same paint color, same paint manufacturer, same everything!  When changes are made, they are made on all facilities.

Reference: http://www.geek.com/chips/intels-manufacturing-prowess-skips-some-45nm-cpus-goes-straight-to-32nm-807881/
Quote
Intel uses a “copy exactly” technique for duplicating successes. When a particular development fab is able to reach a level of yield, production capacity, quality, cost structure, etc., everything about that fab is copied into the other fabs. This includes even such things as humidity, temperature, air pressure, everything. For all practical purposes, the conditions inside of one Intel fab facility are identical to those in all the others.
Logged

Anaheim CA.
Offline Offline
Faraday Member
**
Karma: 48
Posts: 2935
...
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

@ Mr Burnette Changing anything at this point:
Quote
Arg... Op has been testing with a 328P-PU.  Changing the uC would require a full complement of both hardware and software tests!  This is a schedule breaker.
IS more or less a deal / schedule breaker.. Even software optimizations still require full qualification unless "Oops" is a part of the common vocabulary.
My comment about 644 was only half a joke.. IMO it's the easiest method to get code tested with some possibility of growth later, As pointed out it works with the Arduino IDE.
That is certainly less work than employing a PIC or an ST Micro... There was a discovery package sold for $17.++ with 256 K of flash, 32K of sram and 8K Eeprom. It's a 32 bit Arm 'Cortex' M3. It's a specific low power device and quite impressive for features and price.
Much better to stay with what you have done and make what allowances for "Feature Bloat" you can at this late stage in the design.

Doc
Logged

--> WA7EMS <--
“The solution of every problem is another problem.” -Johann Wolfgang von Goethe
I do answer technical questions PM'd to me with whatever is in my clipboard

Global Moderator
Melbourne, Australia
Offline Offline
Brattain Member
*****
Karma: 510
Posts: 19306
Lua rocks!
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

The difference between 400 bytes and 900 bytes for a simple digitalWrite(fast) test is impressive, but the difference between 32000 and 31500 bytes is pretty meaningless...

Unless they plan to add new features half-way through the production cycle, a spare 500 bytes is just what you need if you find a subtle bug which can be fixed with 5 lines of code. Clearing a register flag, for example.
Logged

http://www.gammon.com.au/electronics

Please post technical questions on the forum - not to me by personal message. Thanks a lot.

Pages: 1 [2] 3   Go Up
Jump to: