Go Down

Topic: Fast! SPI 23LC1024 RAM-Bank+Photos+Code (Was: SPI EEPROM speeds better?) (Read 3485 times) previous topic - next topic

Koepi

Further down you find a 640 kByte RAM Bank and test sketches/module switching SPI RAM Bank library.

Dear Arduino Pros :)

I'm new to Arduino and just have plenty of Nanos, RTCs, LCDs, SDcard shields, ... on their way to me.
I'll start with some blinking/PWM/clock/direct-4-digit-7-segment-LEDs first to get a grip with Arduino and direct µC coding.

But on the long run I want to build a standalone timelapse camera (starting off with Mr_arduinos/drkblogs work), powered by a single 18650 LiIon battery so I can set and forget it until next year, to have a nice all-seasons-movie.

For that I run into the issue that ATmega328 has too little RAM to hold  a full 640x480x2 image which I could write block-wise to SDcard - which is pretty slow (found something around <250kByte/s in some forum posts here, but that would be under ideal conditions. For EEPROM I just found timing measurements which are a bit abstract for the given problem; also, this makes a comparison very hard ;) ). Thus I ordered several W25Q16BV SPI EEPROMS, SO8-to-DIP-sockets, level-shifters and want to use that for fast temporary storage. When flushed into EEPROM, I have plenty of time to slowly copy the data over to SDcard (maybe even doing some tiny post-processing like converting the RAW data into a proper written BMP).

According to the data sheet, the EEPROM even supports byte-wise writing. Though I'm uncertain if I get any speed advantages from the EEPROM at all. Since it is written block-wise, too, it'll have some delays every now and then to write-out the data. I wanted to abuse it as RAM replacement, hopeing that a little jitter due to writing cycles won't matter much. The data source will need the data fetched at ~1MHz clock, one byte per tick, for the one version of the camera (with FIFO). The other will offer the data less time critical, though I have to cycle through the frame for every block when I interrupt the reading for writing to the EEPROM (possibly producing "jitter" in the image as leaves will move back and forward and so on).

As I see plenty of threads dealing with SPI EEPROM, SPI SDcard and so on here in the subforum, but no comparison or usage scenarios which resemble mine. I thought I'll ask you guys with a whole lot more experience in this arena if my idea is fundamentally flawed or if this may make sense.

Thank you for your time, input and help!

groundfungus

#1
Mar 11, 2014, 02:45 pm Last Edit: Mar 11, 2014, 02:48 pm by groundfungus Reason: 1
May I suggest the Microchip 23LCV1024 SPI SRAM chips.  They come in an 8 pin DIP package, 128K bytes per chip, SPI interface, have 2.5-5.5V supply, are very fast (20MHz), no limit to writes,  and, with a 1.5-3V button cell added, non-volatile.

Koepi

#2
Mar 11, 2014, 03:23 pm Last Edit: Mar 11, 2014, 03:30 pm by Koepi Reason: 1
Thank you for your reply, groundfungus.

Do you mean like using five of the 23LCV1024 SPI RAMS for a total of 640 kByte (the image uses 640 pixels x 480 pixels x 2 bytes)? Won't I run out of Pins when using one SDcard SPI, 5 RAM SPIs und then the camera module?
I found the module here and here; problem is, they only sell to companies. Quite expensive, but for end users and already in a nice DIP package here; for 5 pieces already 2,50€ cheaper here at Voelkner..
Much higher costs compared to SPI EEPROM; is it really worth it?

pito

#3
Mar 11, 2014, 03:35 pm Last Edit: Mar 11, 2014, 04:04 pm by pito Reason: 1
Quote
SPI EEPROM speeds better than SDcard

SPI speeds are limited by arduino SPI clock frequency (8MHz with 16MHz main clock). Another issue is the SDcards have got "write latencies" which decrease the average write speeds even more.
You may try the 8MB external ram (http://forum.arduino.cc/index.php?topic=220918.0) - the code is there so you may elaborate how fast it would be with your application (you do not need the module for tests).
You may put 13 pictures on it fast and then move the pictures to the sdcard (with lower speed). 1us per byte would be doable with the ramdisk itself (12MB/sec max theoretical speed), the atmega code would require finetunig, however..

fat16lib

Writing flash takes lots of time.  Your EEPROM  part takes 0.7 ms typical and 3 ms max to program a 256 byte page.  You need to program 2400 pages so transfer from the camera to RAM, transfer from RAM to the EEPROM buffer and programing the EEPROM will take a number of seconds at the 8 MHz SPI speed.

An SD card will also take a long time, maybe a little faster than the EEPROM.  The SD card can have occasional longer delays while wear leveling and flash erase operations occur.  The speeds posted for SD cards are for the SD library in single block mode which is slow and the SD.h library uses 4 MHz SPI.  The SdFat library uses 8 MHz SPI and is a little faster.

You can use a 3 to 8 decoder with the 23LCV1024 and reduce chip select to three pins.  You will still need nearly two seconds since SPI is programmed I/O so you never achieve the 8 MHz SPI rate.

SRAM chips are probably the best answer on a 328 Arduino.  With more pins you could use pito's Ramdisk http://forum.arduino.cc/index.php?topic=220918.msg1606560#msg1606560.

pito

#5
Mar 11, 2014, 04:12 pm Last Edit: Mar 11, 2014, 04:41 pm by pito Reason: 1
You may try to feed the data from the camera (8bit) directly to the ramdisk, without passing it through the atmega. You have to generate /WR pulses, the ramdisk has got an address auto-increment feature. So you set the starting address (see the function loadadr() ), and then you will start to generate /WR pulses (active low pulses, 1us period, ~100ns duration for example), and the ramdisk will load in byte-wise everything it sees on the data bus (the data capture at /WR rising edge) while clocking with /WR pulses.. It works as a standard RAM with 8bit data bus, but no address lines are needed, fortunately..  :)

Koepi

#6
Mar 11, 2014, 07:00 pm Last Edit: Mar 12, 2014, 08:13 am by Koepi Reason: 1
A search for that module came up with a source in Slovakia, 20€ + >20€ shipment. That's by far too much for this kind of "cheap'o project". Though I really like that RAM - it's just beyond my price point. :)

It's good to know that my EEPROM idea seems to be pure BS. WIth a fast SDcard, I get the same or better timings. Thus I'll see that I manage to get a 23LCV1024 or a few to read out that memory in as little portions as possible.

Edit: Though I thought I get more sustainable data rate as after copying the image over to SDcard (1 image per five minutes/ per hour/per day, no real time stuff) I could send the EEPROM the clear command which completely erases the EEPROM, thus offering per definition clear pages to fast-write to. I don't think I can do that with SDcard, thus my idea was to use a widely controllable EEPROM in first place. I don't care about wearing out the memory, as writing 100 000 times (or with erase counted, 50 000 times) to it will take several years.

Thanks for your help, it really helps me understanding ATmega better! :)


Koepi

#7
Mar 12, 2014, 07:13 pm Last Edit: Mar 27, 2014, 02:46 pm by Koepi Reason: 1
I'm building an old-school RAM bank now. Parts needed: 5 * 23LC1024 SPI-RAM, 1*74HC138 3-to-8 Multiplexer (to save Pins on the Arduino).
640 kByte cost less than 20€, including shipping. Each 74HC138 could manage up to 8 RAM modules, so without cascading 1 MByte RAM is possible. It should be possible to switch between the modules in so little time that a continues data stream of 1 MByte/s can be written. By using the 20-€-cents 74HC138 in cascades, even more RAM can be used.

My EEPROMs are useless now, but I hope to have found the solution to my problem :)

/* NB: The wiring of the 74HC138 was completely new to me. Pin 8 to GND, Pin 16 to Vcc, Y0-7 to SS/CS on the RAM. A0,1,2 to GPIO on the Arduino for selecting the RAM module. But I think E1,2 need to go to GND and E3 to Vcc so that the chip gets active at all (E1,2,3 are for cascading). */
/* Attached a fritzing sketch. As there is no 23LC1024/SPI I took a 24LC1025/I2C as symbol. I think I will make a staple and bend away the Pins I need directly (CS and HOLD, I think). */

/* The sketches were useless. A friend was so kind to shake a nice building plan out of his CAD program which made real sense. Hundred+ solder points later I finally have a RAM bank. Looks good so far, connecting a LiIon battery to GND/+ doesn't fry the chips. Now only an Arduino has to arrive, finally, so I can test the RAM at least. :) Could look nicer as I'm really bad in hand work like soldering. */

Koepi

#8
Mar 19, 2014, 11:19 pm Last Edit: Mar 30, 2014, 09:13 pm by Koepi Reason: 1
Dear all,

here the solution I successfully tested! (Today finally my first Arduino arrived :D )
During debugging it seems to be really slow (~8kByte/sec when writing and reading afterwards, possibly due to much chip-select-overhead and the serial console).

I took sources I found in the forum for SPI RAM and even more specialized for 23LC1024 RAM. I adopted the code to work with my RAM bank and the pin-saving 74HC138 NXP multiplex chip. Works alright, but slow.

Find attached the modified library and my test sketch.

Edit 20.03.2014 08:00h: Some cleanup helps improving the speed. The first test sketch was switching read/write mode on every byte, slowing down the code to a crawl (on 4MHz SPI, 8kbyte read+write per second, 80 seconds runtime for 640kByte).
I divided the sample to first fully write the bank with 256-bytes-buffers and then read them back and compare the read buffer byte-by-byte. This leads to less than 6 seconds runtime for 640 kByte (at 8 MHz SPI), writing and reading them. So I'm getting close to 256kbyte/s read or write, even with doing something like compare. :)

Output of the fast test sketch (now updated with timer output):
Code: [Select]
Ram Tests Begin. Milliseconds: 0
Writing buffers to Module 0: ......... Done. 131072 Bytes written. Milliseconds: 375
Writing buffers to Module 1: ......... Done. 131072 Bytes written. Milliseconds: 763
Writing buffers to Module 2: ......... Done. 131072 Bytes written. Milliseconds: 1152
Writing buffers to Module 3: ......... Done. 131072 Bytes written. Milliseconds: 1541
Writing buffers to Module 4: ......... Done. 131072 Bytes written. Milliseconds: 1929
Reading buffers from Module 0: ......... Done. 131072 Bytes read. Milliseconds: 2445
Reading buffers from Module 1: ......... Done. 131072 Bytes read. Milliseconds: 2957
Reading buffers from Module 2: ......... Done. 131072 Bytes read. Milliseconds: 3469
Reading buffers from Module 3: ......... Done. 131072 Bytes read. Milliseconds: 3981
Reading buffers from Module 4: ......... Done. 131072 Bytes read. Milliseconds: 4493
RAM-module 0 is  OK!
RAM-module 1 is  OK!
RAM-module 2 is  OK!
RAM-module 3 is  OK!
RAM-module 4 is  OK!
Ram Tests Finished. Milliseconds: 4623



The attached files on Post #9 are:
SpiRAMBank-lib.zip: The modified library. MUX1-3 in the #defines of SpiRAM.h should be adopted to the pins where you connect your 74HC138 A1-A3-pins for selecting the proper chip.
SpiRAM_Test.zip: The cleaned version of the test sketch, writing and reading byte-for-byte (8 MHz SPI).

Koepi

Now the final solution, no one seems to have read the sources ;) The mux function table of the 73HC138 was correct - just my assignment of the MUX1-3 pins in the switch statement was plain wrong. I attached the cables on the atmenga2560 wrong first,so this code worked. Now with correctly sorting A0-2, the code has to be corrected as well. Everything works again. :D
Attached is the correct and little bit cleaned up code which works as well on ATmega328 as on ATmega2560.


Problem turned out to be real, also on the ATmega2560 chip 4 is detected as defect. Have to rebuild the ram bank properly. Sorry for bothering you with that.

Weird problem: All works great on the ATmega2560.
Since my first Nanos arrived, I wanted to shrink down everything.
First I used pins A0-A2 for interfacing the multiplex 74HC138.
It seems everything is working right, but memory chip 4 is checked as being defect. Looking at the returned buffers, it spits out only random 0 or 255. Unchanged with DIV128 on SPI (~250kHz).
I did electrical checking of all pins and their connection to the bus interfaces, after resoldering every pin just to be sure. I also switched to pin D2-D4 for the multiplex, still the same result, chip 4: Defect.
So I switched to Arduino 1.5.6r2 from 1.0.5 and edited platform.txt to remove the compiler optimizations. Still no changes.
Then I removed the second buffer in my sources and read back into the write buffer, comparing the values directly with the loop counter. I only check one module at a time (need to change the chip value and recompile/reupload the sketch) and not all in a row.
Now it works, every single chip shows up ok. Going back to analog pins A0-A2 shows module 4 as broken again. On D2 to D4 everything as expected. Even when I switch on compiler optimizations again. Still, when running through a loop of checking all 5 chips, module 4 is shown defect.

Can anybody suggest what is going on there, next to my bad coding style? I fail to see where a stack overun / memory overflow of the SRAM could happen in this simple code. 2k SRAM should be enough for these few variables and the 256 bytes buffer.

Edit: Ok, weird. Removing the check-calls from setup() and looping through the checks once in the main loop solves this issue. Find a Nano version of the ram_test attached.
Edit2: weird, after trying A0-A2 again, this only showed chip 5 ok; back to D2-D4, it is chip 4 again which again gets reported defect. I'm lost.

Koepi

I took some vacation and thus found some time to do a little more "sophisticated" coding. I dug up everything I could find about AVR hardware SPI and the 23LC1024 data sheet, also read up about direct port manipulation and some more "plain C" instead of Arduino coding.

First I replaced all digitalWrite()s with port manipulation. This gave a little speedup of a few milliseconds.

Now I wanted to get rid of the Arduino SPI library and the demo SPI_Ram-library from this subforum. I handcrafted the SPI interface in plainC now and added plenty of comments for learning purposes if someone is interested in using faster SPI (yes, it's 8 MHz already). Find the standalone sketch attached to this post. Any feedback is welcome.

Code: [Select]
SPI Ram Tests Begin. Milliseconds: 0
Writing buffers to Module 0: ......... Done. 128 kBytes written. ms: 284 -> 461 kByte/s.
Writing buffers to Module 1: ......... Done. 128 kBytes written. ms: 285 -> 459 kByte/s.
Writing buffers to Module 2: ......... Done. 128 kBytes written. ms: 285 -> 459 kByte/s.
Writing buffers to Module 3: ......... Done. 128 kBytes written. ms: 285 -> 459 kByte/s.
Writing buffers to Module 4: ......... Done. 128 kBytes written. ms: 286 -> 458 kByte/s.
Reading buffers from Module 0: ......... Done. 128 kBytes read and compared. ms: 310 -> 422 kByte/s.
Reading buffers from Module 1: ......... Done. 128 kBytes read and compared. ms: 311 -> 421 kByte/s.
Reading buffers from Module 2: ......... Done. 128 kBytes read and compared. ms: 311 -> 421 kByte/s.
Reading buffers from Module 3: ......... Done. 128 kBytes read and compared. ms: 311 -> 421 kByte/s.
Reading buffers from Module 4: ......... Done. 128 kBytes read and compared. ms: 311 -> 421 kByte/s.
RAM-module 0 is  OK!
RAM-module 1 is  OK!
RAM-module 2 is  OK!
RAM-module 3 is  OK!
RAM-module 4 is  OK!
Ram Tests Finished. Milliseconds: 2996


As you can see, the improvement is bigger than 25% - more than 4 seconds with Arduino libraries, now less than 3 seconds. I hope I can read out my camera module fast enough with that, taking into consideration that serial debug output slows down the AVR a lot.

Koepi

Even if I might talk to myself, I want to give some insights into the progress made until now.

As I removed plenty of Arduino stuff already in the sources until now, I went on to proper and real plain C coding with avr-gcc, Eclipse,  AVR-Plug-in for Eclipse, ... on Mac OS X. As Arduino is mostly a wrapper for glibc functions, I hoped for more speed improvements. And to learn something during that task. (Also, I ordered a STM32F0 Discovery board just in case I fail with OV7670 on Arduino ;) ). So I'm all setup for switching between platforms already.

Loading the code above into the plain C environment resulted in plenty errors and warnings. Some things to consider:
- Arduino offers timing functions like millis(). In plain C, you have to code that yourself via timer and interrupt functions. Plenty of samples available on the net, but for ATmega328, those samples are seldom and/or not the best solution for the problem in my eyes (using the 16 bit timer1 is like using a whole plane for transporting a single letter.)
- Serial console output. You need something like Hyperterminal (goSerial works quite nice on Mac OS X, set line length to 120 chars in the options.) Needless to say, you need to code that yourself as well - there is no Serial.begin() and stuff.
- String output is different. Either use a uart_puts() function or redirect stdout and use mighty (s)printf.
- Find the console port yourself and enter it into the GUI. (ls /dev/cu.* helps with that ;) )
- Variable types are proper C now. There is no boolean, it's bool.
- Compiler optimizations for project aren't necessarily used for your own classes. I was wondering about bad speed (>7s) for my RAM_bank-sample and finally found out that -O0 was set for main.cpp. -Os (size optimization) led to 5.2 seconds runtime, -O2 and O3 to the same 3.029 seconds.

Code: [Select]
SPI Ram Tests Begin. Milliseconds: 0
Writing to Module 0: ......... Done. 128 kBytes written. ms: 287 -> 456 kByte/s.
Writing to Module 1: ......... Done. 128 kBytes written. ms: 288 -> 455 kByte/s.
Writing to Module 2: ......... Done. 128 kBytes written. ms: 287 -> 456 kByte/s.
Writing to Module 3: ......... Done. 128 kBytes written. ms: 287 -> 456 kByte/s.
Writing to Module 4: ......... Done. 128 kBytes written. ms: 287 -> 456 kByte/s.
Reading from Module 0: ......... Done. 128 kBytes read and compared. ms: 313 -> 418 kByte/s.
Reading from Module 1: ......... Done. 128 kBytes read and compared. ms: 313 -> 418 kByte/s.
Reading from Module 2: ......... Done. 128 kBytes read and compared. ms: 313 -> 418 kByte/s.
Reading from Module 3: ......... Done. 128 kBytes read and compared. ms: 313 -> 418 kByte/s.
Reading from Module 4: ......... Done. 128 kBytes read and compared. ms: 313 -> 418 kByte/s.
RAM-module 0 is  OK!
RAM-module 1 is  OK!
RAM-module 2 is  OK!
RAM-module 3 is  OK!
RAM-module 4 is  OK!
Ram Tests Finished. Milliseconds: 3029


And there the problem is - it's a tiny bit slower now (2.996 seconds with optimized Arduino version). The reason for this may be my less-than-optimal serial communication, amongst other things.

Anyhow, this might be useful for others digging into the same stuff, thus I post the code here.

Koepi

The STM32F0 was a bit disappointing: New architecture for me as beginner, steep learning curve. But as I switched to plain c without Arduino lib with the little Arduinos already, the first hard thing was already solved, setting up a proper building environment. After plenty of hours I managed to get SPI to work on it. Ran into an issue which is about 8 bit and 16 bit SPI transfers. After properly sending out 8 bit data the RAM bank was accessible!

Code: [Select]
Ram-Tests begin. HCLK is 48 MHz.
........Writing RAM chip 0 in 665 ms -> 197 kByte/s.
........Writing RAM chip 1 in 666 ms -> 196 kByte/s.
........Writing RAM chip 2 in 665 ms -> 197 kByte/s.
........Writing RAM chip 3 in 666 ms -> 196 kByte/s.
........Writing RAM chip 4 in 665 ms -> 197 kByte/s.
........Reading and comparing RAM chip 0 in 690 ms -> 189 kByte/s. OK!
........Reading and comparing RAM chip 1 in 690 ms -> 189 kByte/s. OK!
........Reading and comparing RAM chip 2 in 690 ms -> 189 kByte/s. OK!
........Reading and comparing RAM chip 3 in 690 ms -> 189 kByte/s. OK!
........Reading and comparing RAM chip 4 in 690 ms -> 189 kByte/s. OK!


Not quite the speeds that I expected, though SPI is running at 18 MHz. I also ordered very cheap STM32F103 development boards (like these: http://www.ebay.de/itm/NEW-ARM-Cortex-M3-STM32F103C8T6-STM32-Minimum-System-Development-Board-/371036921289). For less than 6 Euros it is the ideal center part for my time-lapse camera; also the size is impressive - it's just double as wide as the Arduino Nano, but delivers plenty of IO pins. This one was a bit harder to to get running as there is no documentation at all to the board. Need to flash it via setting a Boot0 pin on the second position and booting that way; access has to be done via UART /serial console. Also, compared to the smaller brother STM32F0 the register/GPIO access is slightly different, the code had to be adopted. First result ...

Code: [Select]
Ram-Tests begin. HCLK is 72 MHz, PCLK2 72 MHz, System 72 MHz.
........Writing RAM chip 0 in 257 ms -> 510 kByte/s.
........Writing RAM chip 1 in 258 ms -> 508 kByte/s.
........Writing RAM chip 2 in 258 ms -> 508 kByte/s.
........Writing RAM chip 3 in 258 ms -> 508 kByte/s.
........Writing RAM chip 4 in 258 ms -> 508 kByte/s.
........Reading and comparing RAM chip 0 in 274 ms -> 478 kByte/s. OK!
........Reading and comparing RAM chip 1 in 274 ms -> 478 kByte/s. OK!
........Reading and comparing RAM chip 2 in 274 ms -> 478 kByte/s. OK!
........Reading and comparing RAM chip 3 in 275 ms -> 476 kByte/s. OK!
........Reading and comparing RAM chip 4 in 274 ms -> 478 kByte/s. OK!


Finally! A little faster than on Arduino Nano/Pro Mini/MEGA2560!
Eventually some more material arrived: proper pin sets for PCBs, a few days ago a Victor VC97 - it has a continuity tester with buzzer, frequency meter up to several MHZ and is quite precise compared to my 5-€-cheapo multimeter I used before. So I properly resoldered the RAM bank to get rid of those thick cables and used enameled copper wire for those CS-lines from the Mux. Maybe my cables are too long now for the highest speed (18MHz) SPI connections. Anyhow, the RAM bank is now quite clean and also has some marks where to put which pin. (I think I'll need a sixth and maybe seventh connector to the RAM bank PCB for attaching more SPI devices - in the end I want to write to SDcard, that would be helpful.)

In the project settings I found that I didn't use any compiler optimizations yet. Changed to medium optimizations (-O2 instead -O0) ...

Code: [Select]
Ram-Tests begin. HCLK is 72 MHz, PCLK2 72 MHz, System 72 MHz.
........Writing RAM chip 0 in 137 ms -> 956 kByte/s.
........Writing RAM chip 1 in 137 ms -> 956 kByte/s.
........Writing RAM chip 2 in 138 ms -> 949 kByte/s.
........Writing RAM chip 3 in 137 ms -> 956 kByte/s.
........Writing RAM chip 4 in 137 ms -> 956 kByte/s.
........Reading and comparing RAM chip 0 in 151 ms -> 868 kByte/s. OK!
........Reading and comparing RAM chip 1 in 150 ms -> 873 kByte/s. OK!
........Reading and comparing RAM chip 2 in 150 ms -> 873 kByte/s. OK!
........Reading and comparing RAM chip 3 in 151 ms -> 868 kByte/s. OK!
........Reading and comparing RAM chip 4 in 150 ms -> 873 kByte/s. OK!


That looks great!

Now I'm starting with the camera module. Wiring that bugger up is some messy work :) Already supplied the XCLK pin with 9 MHz (easy to achieve on STM32). The module lives and drives VSYNC, HREF and PCLK. I had already first success sending I2C "Hello" and get acknowledgement of reception, but currently am stuck there, further responses don't happen yet.

This post is meant as inspiration that a change of platform for some projects isn't an expensive task, it's just that you need to learn more (which is why I started µC coding in first place). :)

Go Up