SSD1306 OLED slow

I recently bought a 1SSD1306 128x64 pixel OLED from ebay and after using it for a little while I have discovered it is a bit slow.

The way it seems to work, it has no onboard support for characters and colums/rows but instead when you want to display a character you need to draw it pixel by pixel. I have timed this operation and it seems the "drawing" operation of around 30 characters (1.5 lines) takes about 12ms whereas the displaying operation takes another 40ms, for a total of 50ms.

This is a huge amount of time for simply trying to display a line of text.

This may be because this is a pixel display which can display much more than simple characters.

A partial solution would be to dedicate an Atmega328p IC to this display thereby freeing the Arduino for other things. Or I can dedicate an Arduino Pro Mini or Nano to this display, ebay prices are almost as cheap as the raw Atmega328 chip!

But there must be a OLED display that has some GPU on board, are you aware of any?

How are you connected, SPI or I2C? Software SPI?
What library code are you using?

With hardware SPI and an OLED text-only library it takes 1.2ms to write 30 characters.

It is an SSD1306 Monochrome 0.96" OLED display with 128x64 pixels : it is a copy of Adafruit's one as sold by the hundreds on ebay.

The Arduino is a Nano at 5V with Atmega328p at 16MHz.

The module uses I2C to communicate with the Arduino. On the Arduino module it uses pins A4 (SDA) and A5 (SCL).

The Arduino code is hacked out from the Adafruit libraries, namely the Adafruit_SSD1306 and the Adafruit_GFX. Those libraries use the Wire library to talk to the module over the I2C bus. The Wire library appears to have no communication speed functions.

I have discovered a globally defined TWBR variable (I suppose it stands for "Twin Wire Bit Rate") but am not sure. I have set it to 0 and it seems to have shaved a few ms off.

It would appear as I said earlier than the SSD1306 module has no onboard buffer for the screen. The buffer is simply 128 x 64 pixels and it is an area of memory that resides on the Arduino module.

When you want to display something on the screen, first you modify your own local buffer, then you send the whole buffer, all the screen bits, rather than sending just the bits that have changed.

The Adafruit library does it this way and I presume they are not that stupid, if it could be done in a better way.

My timings are:

Write a full line of text : 6.5 ms. Remember this simply manipulates the local buffers. Printing a single character means plotting the right pixels (typically 6 x 8).

Display the screen buffer : 28.5 ms. This is a dumb function, it simply sends the whole buffer memory across the I2C and does not know or care what was changed. But maybe it cannot be done in another way.

To recap, it would take 6.5 + 28.5 = 35ms to simply write a single line of text on the screen. And during those 35ms the Arduino CPU is busy.

The code spends

I have the same display: 0.96" monochrome 128x64 OLED with SSD1306 controller.

The 1306 does have an on board memory buffer but it does not allow reading of the buffer when the interface is I2C or SPI; only with a parallel interface can you read the on board memory.

If you are only writing text to the display then there is a more efficient approach than the Adafruit library. Instead of setting one pixel at a time in RAM and then sending an entire 1K buffer to the display it is possible to write 8 pixels at a time directly to the display itself. Thus in 5 one-byte writes to the display you can transmit a character. This is the approach of the text-only library I am using. It does not employ a 1K RAM buffer.

The time I quoted for writing 30 characters (1.2ms) was using an SPI interface. I2C is slower. I did not test writing 30 characters in I2C but I would guess that it would be about 10X slower, or a little over 10ms to write 30 characters using I2C.

That sounds like exactly what I need - what is the "text only library" you are using ?

Here it is.

I2C 12Nov with examples.zip (12.6 KB)

Thank you. I have just tested it and it seems that a "clear" takes 40ms and a full line of text also takes 40ms.

I am not sure how the I2C works - is there a way to set up the speed? I did find a global variable "TWBR" but am not sure how to use it.

OK: changing everywhere TWBR to 0 (from 12 or from unchanged) resulted in "clear" in 20ms and full line of text in 10ms.

Much better now. Next step to either understand what the TWBR does or to get an SPI OLED!

TWBR sets the clock rate for the I2C. It should be 400 KHz for the 1306, which is TWBR = 12.

Thanks for pointing out the error with TWBR. It must have already been 12 when I tested it.

The SPI branch of the code has a similar fault in that it only sets the clock and mode at initialization instead of with each access. This can cause a problem if another SPI device sets the clock or mode to something different.

I am not sure if TWBR is 12 in order to achieve the best speed. Cannot be asked going to the oscilloscope now, or reading manuals but I have tested with values from 0 onwards and it seems 0 is the quickest and 12 is the slowest.

I have modified my code, in case you are interested, so that it does not use the reset pin, as it does not exist on the I2C versions of the OLED display and the code blindingly assumes that a digital pin of the Arduino will always be tied to some Reset pin. I also added a class variable _twbr which is passed on during construction to select different TWBRs.

The SPI library allows you to select the comms speed. Maybe because it is 1-on-1. The TWI / I2C as I understand it would allow a chain of devices on the same bus, thus the speed is selected by whom? The slowest device on the bus maybe?

TWBR set to 0 is faster but it exceeds the max specification of 400 KHz for the 1306. If it works and you're happy that's great but know that you are overclocking the board. That's assuming your prescaler is set to 1.

From the Atmel data sheet:

SCL = CPU_clock / (16 + 2TWBRprescaler)

The prescaler is set in another CPU register, TWSR.

I don't know what the rule is for I2C clock speed. With SPI each library is responsible for setting the clock and mode before access. Many libraries do not do this however. But with I2C I would think that setting the clock higher than the slowest device would be a problem. Again, I really don't know.

That line in the code I sent you where it is set to 12 (400 MHz) was taken from the Adafruit library code. I didn't really think about it when I added it, I just copied it blindly. I only tried my board jumpered for I2C out of curiosity and reconfigured it for SPI shortly afterward.

That's interesting about the reset pin on your board. On mine the reset must go low even when it is jumpered for I2C. I couldn't find anything in the 1306 data sheet that was clear about this but without a reset pulse my display does not work.

ok the TWBR is something I can look at later.

In the meanrime i made another discovery. it seems that once you position the cursor you can sebd bytes (columns of pixels) and the SSD1306 increases its own internal cursor. That way you can send a bunch of characters without repositioning the cursor each time.

Yet another discovery is that you can send a bunch of bytes to the SSD1306 about 30 in one go thats about 5 characters.

I therefore wrote a different "write(char *)" which splits the string in bunches of 5 characters and uses "senddata(byte *, len) which I also wrote.

Result: line of text now in 3ms!!!!

I have also written similar functions to erase characters/ portions of lines since it is just a bunch of 0s.

It is all very quick now.

akis_t:
it seems that once you position the cursor you can sebd bytes (columns of pixels) and the SSD1306 increases its own internal cursor. That way you can send a bunch of characters without repositioning the cursor each time.

That's probably something I should have mentioned.

akis_t:
Yet another discovery is that you can send a bunch of bytes to the SSD1306 about 30 in one go thats about 5 characters.

I therefore wrote a different "write(char *)" which splits the string in bunches of 5 characters and uses "senddata(byte *, len) which I also wrote.

Even though I plan to use the SPI interface with this display I'd be interested to see what you did to improve the I2C throughput. I only read the words "I2C", "SPI" and "Arduino" about four months ago and I am still on an exponential learning curve. Could you post the code you wrote or send it to me?

Yes of course. The original "senddata(byte)" function probably spends 90% of its time setting up (C code and I2C handshaking) and 10% of its time sending the one and only byte.

This strikes me as very innfficient. Once you have paid the C penalties and the I2C fixed costs (handshaking) would it not be better to send as many bytes as you can?

This code below is test code, not meant for public consumption. I have also changed the meaning of "col_" to mean character columns rather than pixel columns. I found it very inconsistent otherwise and interfaces ought to be consistent.

So we change the write(const char* s) function and a gotcha is that the Print class does NOT call it (virtually) so we need to call it directly. Did not dig deeper to see what else I was missing, ideally, since we have derived from the Print class we should not have to mix xx.print("hello") and xx.write("hello") - yet another inconsistency but what the hell.

Here is an extract.

size_t SSD1306_text::write(const char* s)
{
size_t n = strlen(s);
#if 1
size_t cn = 0;
while (n > 0)
{
int num_chars = min(5, n); // hardcoding 5 chars max before the SSD1306 coughs up
if (col_ + num_chars >= SSD1306_LCDWIDTH/6)
{
num_chars = SSD1306_LCDWIDTH/6 - col_;
if (num_chars == 0) break;
}

byte byte_buf[30]; // max 5 chars
for (int i=0;i<num_chars;i++,cn++)
{
byte ch = s[cn];
ch -=32;
uint8_t *base = font + 5 * ch;
int j=0;
for (j=0;j<5;j++)
byte_buf[i*6+j] = pgm_read_byte(base + j);
byte_buf[i*6+5]=0;
}

sendData(byte_buf, num_chars*6);
n-=num_chars;
col_ += num_chars;
}
return cn;

and the sendData

void SSD1306_text::sendData(uint8_t *pbytes, uint8_t len)
{
#if I2C
TWBR = _twbr;
uint8_t control = 0x40; // Co = 0, D/C = 1
Wire.beginTransmission(_address);
Wire.write(control);
Wire.write(pbytes,len);
Wire.endTransmission();
#else

akis_t:
The original "senddata(byte)" function probably spends 90% of its time setting up (C code and I2C handshaking) and 10% of its time sending the one and only byte.

Okay, that's what I thought you were doing. An improvement by a factor of 3X is quite nice. I wonder if the SPI branch would also benefit from this approach.

For comparison, the original code, as I found it, took 178ms to clear the screen.

akis_t:
This code below is test code, not meant for public consumption. I have also changed the meaning of "col_" to mean character columns rather than pixel columns. I found it very inconsistent otherwise and interfaces ought to be consistent.

It was originally the way you changed it back to. I made the column bit-wise because it allowed me to place characters on the screen more precisely. It is perfectly consistent with the way the hardware works, but I understand your point of view.

jboyton:
I wonder if the SPI branch would also benefit from this approach.

I optimized the code for SPI and the time to write a line of text went from 0.82 ms to 0.35 ms, better than twice as fast as before. A full screen of text takes less than 3 ms.

My main concern is over the time it takes to output larger characters since I use 10X14 size quite a bit and occasionally 15x21 size. They are quite slow in comparison. I've had it in the back of my mind to see how much I could speed that up for a while. You've inspired me to do it sooner rather than later.

EDIT: I was able to speed up the larger, scaled characters by a factor of nearly 6. And along the way I reduced the code size, fixed a bug with the scaled character spacing, corrected the SPI configuration code and added the features to allow the text to be either top or bottom justified within its row and proportional spacing for certain punctuation characters.

I think this text-only approach is quite valuable since it uses so little memory compare to a full-blown graphics library. Ideally it should be extended to other displays and made public.

As I need as much screen space as possible in the smallest possible package size, I re-wrote the whole thing to cater only for the "standard" 5X7 fonts, single spacing, 8 rows of 21 columns.

The functions I ended up using are

Print(row, col, char *, ...) like printf
Erase(row)

With 32K code size and 2K RAM, it is all about memory footprint.

What did you shrink it down to?

How large is the following sketch (or equivalent with your API)?
How much memory does it use?

void setup() {
  display.init();
  display.clear();
  display.write("Hello world.");
}

void loop(){}

EDIT:

I jumpered my board to I2C again to experiment. It wouldn't run with TWBR = 0 but would with it set to 1, at least some of the time. At that speed it generated a line of text (21 characters) in 3.6ms using your code. With TWBR=12 it took 4.8ms. With my latest code, sending 1 character at a time, it took 6.5ms with TWBR=12.

By the way, the 5 characters or 30 bytes "in one go" that you found works is due to the size of the Wire TX buffer (32). If you try to send more than 32 bytes the method returns an error.

One thing I noticed while doing these tests is that the Wire library has a significant footprint. It increased the size of my code by over 1200 bytes and also required an additional 210 bytes of SRAM. If I were stuck using an I2C board I would consider optimizing the Wire library.

@ jboyton: Nice job. Would it be possible to port the library to the Pro Mini as well? (I do not have those skills ...)

I once took over jboyton idea and created a new library: U8x8. U8x8 is part of U8g2, the successor of U8glib. You can just install U8g2 from the Arduino IDE. Examples for U8x8 are included, here is HelloWorld.ino:

U8X8_SSD1306_128X64_NONAME_4W_HW_SPI u8x8(/* cs=*/ 10, /* dc=*/ 9, /* reset=*/ 8);
void setup(void) {
  u8x8.begin();
}
void loop(void) {
  u8x8.setFont(u8x8_font_chroma48medium8_r);
  u8x8.drawString(0,1,"Hello World!");
}

Fonts for this API are listed here:

Reference Manual is here:

U8x8 should support all current Arduino Boards and also supports many more displays.

Oliver