Alternatives to ATMega328 with more SRAM?

I am working on a project for which I'm coming up against the limits of total SRAM on the Arduino Nano's that I'm prototyping with. I've already gone through various optimizations to minimize that usage, but its clear that to really accomplish my end goal is going to require a chip with more SRAM than the 2KB in the ATMega328, though 4KB would certainly be enough for everything I want to do with it.

I'm going to be building my own boards and had been planning to use the ATMega328 (DIP package) on those, is there a alternate chip with more RAM and the same pin layout? If not can someone recommend a DIP packaged chip alternative thats reasonably similar to the 328 in functionality?

As your using a nano not really. You could try posting your code!. We may have some good ideas that your not tried or you could look for a bigger chip on a breakout.

Mark

Try the ATmega644p or 1284p. They're a much larger package, but you get a ton of I/O, flash, and RAM for the trouble.

Agree that you should post your code first though. Someone may be able to suggest a more efficient way of doing something.

holmes4:
As your using a nano not really. You could try posting your code!. We may have some good ideas that your not tried or you could look for a bigger chip on a breakout.

The nano is just for prototyping, I'm in the process of designing a PCB for the finished products.

The Arduino is driving a WS2801 based LED strand. I have laser-cut wood triangles with holes near each corner that the LEDs fit into and the triangles are then connected to each other with angled connectors. The memory problem mainly comes from the need for the geometry to be stored. So for each triangle:

9B - 3x 3B: RGB value
3B - 3x 1B:  index of neighbors to each side
9B - 3x3x 1B: each vertex of the triangle can have up to three neighbors
4B - Miscellaneous state info for animations
=
27B Total

So there is the primary space usage. Other stuff seems to come to around 400B utilization, a lot of which isn't directly from my code but from various libraries I'm using that I'd rather not rewrite for optimization. So with that it comes out to a max of around 55 triangles all together and I'm ultimately looking to drive around 100 from a single board.

random_vamp:
The memory problem mainly comes from the need for the geometry to be stored.

Does this data all need to change at runtime? If not, you don't need to hold it all in RAM.

Do you really need to store information about the neigbors? Or can it be calculated?

As the last two posts hinted, information that does not change while the program is running (presumably, the neighbor configuration) can be put into program memory instead of SRAM.

Is there anything (other than the additional complexity) that would prevent you from using multiple Arduino Nano boards, daisy-chained together (perhaps in-line with the strips serial connections)?

jremington:
As the last two posts hinted, information that does not change while the program is running (presumably, the neighbor configuration) can be put into program memory instead of SRAM.

The geometry probably could be put into program memory (I assume you mean whats referenced here - http://www.arduino.cc/en/Reference/PROGMEM), but it is something which is being constantly accessed so I'm hesitant to make that switch as I believe there is a performance hit for pulling stuff out of program memory.

(I didn't post the code earlier as theres a lot of it, but you're welcome to look at GitHub - aphelps/Triangle-Lights: Code for Adam's triangle lights)

cr0sh:
Is there anything (other than the additional complexity) that would prevent you from using multiple Arduino Nano boards, daisy-chained together (perhaps in-line with the strips serial connections)?

I'm sure it would be possible, but ultimate that would cost more than just using one of the larger RAM ATMega chips someone mentioned above. I can change my PCT to use a larger footprint easily enough.

I suspect it won't work for a variety of reasons, but the Arm based Teensy 3.0 (Teensy USB Development Board) does use a DIP-28 package like the Nano. However, it does have 16K of SRAM, 128K of flash memory, and 2K of EEPROM. Because it is an ARM, it is 3.3v instead of 5v. It is programmed using a variant of the 1.0.x Arduino IDE and library set. I know some people have asked about doing development on Teensy's and then moving them to their own chips, but I didn't bother to save the links.

I would try first moving stuff to PROGMEM, and perhaps optimizing things, for example encoding multiple things into a single byte rather than using separate fields, etc. Yeah, it might be slower, but it may be you have enough cycles, that the CPU time is not a critical factor.

Another option (though I agree with trying and adding it to the program memory) is to add it onto an SD card and only access parts of it at a time.

I have '1284P cards if you want to update your prototype with 16K SRAM, 4K EEPROM, 128K Flash.
http://www.crossroadsfencing.com/BobuinoRev17/
Onboard USB/Serial via FTDI module pictured, can be offboard with FTDI Basic/equivalent.

Also you may try to work with Arduino Mega 2560 + RAM expansion. There are two little different types:
QuadRAM 512KB - 16 x 32KB banks
SRAM expansion shield 512KB - 8 x 64KB banks

I own second one and this shield works fine (but shield has not R3 pinout and comes unassembled).

K5CZ:
Also you may try to work with Arduino Mega 2560 + RAM expansion. There are two little different types:
QuadRAM 512KB - 16 x 32KB banks
SRAM expansion shield 512KB - 8 x 64KB banks

I own second one and this shield works fine (but shield has not R3 pinout and comes unassembled).

I don't think something like this is necessary, I really only do need ~4K RAM for the current implementation. However that second one looks interesting for other purposes, is there any particular reason those are limited to ATMega's (I mean, other than the shield based arrangement)?

The 2560 brings out the internal address/data bus for the expansion shield.
Other chips may not.

There was a concern voiced about the speed of PROGMEM access. From what I understand, access requires one extra clock cycle per byte (62.5 nanoseconds @ 16MHz). Usually, this is not enough of a difference to matter. An extra 27 clock cycles to read 27 bytes is an extra ~1.7 microseconds if I'm calculating it right. So how tight are your timing requirements? I'm looking at the WS2801 datasheet and it appears to be a synchronous (clocked) protocol, so unless you have to maintain a high refresh rate, the WS2801 won't care if the clocks come in a little more slowly.

If you have to send a burst of data faster than PROGMEM can handle directly, you can fill a small RAM buffer from PROGMEM during a quiet period and then blast it out from RAM when the right instant arrives.

Also, if you're building custom PCBs for Atmega328 chips anyway, slap a 20MHz crystal on your board. Free 25% speed boost to help counter the PROGMEM speed drag!

tylernt:
There was a concern voiced about the speed of PROGMEM access. From what I understand, access requires one extra clock cycle per byte (62.5 nanoseconds @ 16MHz). Usually, this is not enough of a difference to matter. An extra 27 clock cycles to read 27 bytes is an extra ~1.7 microseconds if I'm calculating it right. So how tight are your timing requirements? I'm looking at the WS2801 datasheet and it appears to be a synchronous (clocked) protocol, so unless you have to maintain a high refresh rate, the WS2801 won't care if the clocks come in a little more slowly.

II can't really say what the timing requirements are going to be, as I'm still building this out. At a guess this will be good enough if it can refresh the entire LED array at ~30-50hz (say 20ms per refresh). Worst case scenario for an animation would require fetching the entire geometry from PROGMEM, so if your calculations are accurate that'd be around 170µs. That's probably fine. And the geometry is actually only around half those 27B, so it would actually be a good bit less than that.

tylernt:
Also, if you're building custom PCBs for Atmega328 chips anyway, slap a 20MHz crystal on your board. Free 25% speed boost to help counter the PROGMEM speed drag!

This is also another good point. This would muck with everything involving timing, is there an easy way to set the clock speed used by millis()/etc? A quick google search implies this may involve changing the bootloader, but I can probably handle that.

random_vamp:
is there an easy way to set the clock speed used by millis()/etc?

In theory it just needs you to create a new board definition with the appropriate CPU frequency. However, Nick Gammon reported recently that the arithmetic used to calculate micros() isn't very accurate unless the frequency is a power of two MHz. He calculated that at 20MHz the error was about 10%. If you can live with that bug, everything else should (!) just work, once you have defined the boards entry correctly.

tylernt:
There was a concern voiced about the speed of PROGMEM access. From what I understand, access requires one extra clock cycle per byte (62.5 nanoseconds @ 16MHz).

Reading PROGMEM is essentially done via a pointer so a minimum of two register loads is typically necessary. In my experience, reading from PROGMEM also interferes a little with optimization. For time budgeting I'd go with an 8 cycle hit per byte.