A fast PCD8544 library (Nokia 5110)

*The code download is at the bottom.
Version 1.2 with considerable speed optimization and bug fixes is now available.

So I got my LCD a few days ago and started tweaking.
First thing I noticed in most libraries is that they use the Shiftout method and it is very slow.
Since I am a speed freak, I wanted to make something that will actually be fast.
And here is the result:

(The times displayed are in microseconds)

The library uses Arduino's SPI bus. It assumes your arduino is running at 16MHz, and uses a /4 SPI (default) divider for an effective 4MHz SPI speed. This is the maximum speed the LCD supports.

Schematic:
The library is configured to run on the Atmega168/328 (Uno, Mini, Nano, Duemilanove), but it can be reconfigured to run on any AVR based Arduino board.
By default it uses Digital Pins 8 - 13 (PORT B).

  • Click to enlarge.

Schematic explained:
This LCD is a 3.3v device. The backlight requires 3.3v as well. My LCD has been pre-soldered to a breakout board with backlight resistors, so all I had to do is connect the LED pin to ground. Some LCDs may require 3.3v on their LED pin.

I use a CD4050 as a level shifter since my Arduino is a 5v device. The datasheet claims that the device is +5v tolerant on the signal lines, but I'm not taking any chances.
The CD4050's output voltage is the same as the VCC voltage provided, thus by driving it from the Arduino's 3.3v pin we get nice and clean 3.3v signals to the LCD.

Code:
There are two versions of this library.
One utilizes a frame buffer, it uses 504 bytes of SRAM on your device, but it gives you additional flexibility and speed. It allows you to set an individual pixel on or off, it lets you draw a rectangle (empty or filled, empty is the default) and a horizontal or vertical lines (an attempt to draw a line at an angle will be ignored).
The second version doesn't use a frame buffer, therefore the memory footprint of this library is not very large. The lack of a frame buffer carries a small penalty in performance and only allows printing text and or bitmap data onto the screen. Nevertheless it is only just a little bit slower, so if you don't need the setPixel, writeRect and writeLine functions, save your RAM and use the non-frame buffered version.

The code may not be very pretty, as it's a rough beta version, but it's probably more than what I need for my plans for this LCD so I decided to share it with you.
It does not contain any protection against going out of the frame buffer (or the physical screen) so make sure you handle the screen borders properly.

What you need to know is this:
You can call the 'print' function like you do with the LiquidCrystal library or the Serial. It only supports ASCII characters 0x20 - 0x7F (32 - 127).
Call 'begin()' in your setup() method.
Call renderAll() or renderString() when using the Frame Buffer library, after you finish doing all your writing, call a render function. A call to print() returns a valid the number of bytes printed, therefore you can use this data to render a string much quicker than rendering the whole screen.
The setPixel method takes 'x', 'y' and value parameters (0 clears the pixel anything else sets it)
writeLine takes x1, y1, x2, y2. As I mentioned it only works on horizontal or vertical lines.
writeRect takes x, y, width, height, fill (true/false, default is false).
These last 3 methods assume the screen a 84x48 display.
The gotoXY is used for printing text, it looks at the LCD as display with 84 columns and 6 rows (banks). You can refer to the controllers data sheet for more information on this.

Methods:

	PCD8544_SPI	lcd; // Declares a non-FrameBuffered instance.
	PCD8544_SPI_FB	lcd; // Declares a FrameBuffered instance.

	// Call a render method after any print/write methods are called.
	// For best perofrmance aggragate all writes before calling a render method.
	void renderAll();
	void renderString(uint8_t x, uint8_t y, uint16_t length);
	
	void setPixel(uint8_t x, uint8_t y, uint8_t value);
	
	// WriteLine currently only supports horizontal and vertical lines.
	void writeLine(uint8_t x1, uint8_t y1, uint8_t x2, uint8_t y2);
	void writeRect(uint8_t x, uint8_t y, uint8_t width, uint8_t height, bool fill = false);
	// Methods above are only available in the FrameBuffer version.

	void begin();
	void clear();
	void gotoXY(uint8_t x, uint8_t y);
	virtual size_t write(uint8_t uint8_t);
	void writeBitmap(const uint8_t *bitmap, uint8_t x, uint8_t y, uint8_t width, uint8_t height);

PCD8544_SPI.zip (8.68 KB)

TheCoolest,

This library is just what I need. Adafruit is too heavy and slow. Old hardSPI libraries do not compile on 1.5.2beta. Your library is light and works. But I have few issues.
Could you please help with bitmaps?
First - offsets, I assume second and third argument in bitmap function is X and Y position offsets. It does affect position, but single unit change trow it a few pixels away and large values (like 5) makes it stick to upper left corner. Pretty erratic.
Second - the size. As an example I'm using simple 8x8 bitmap. I did set fourth and fifth arguments to 8.

B11111111,
B00000001,
B00000010,
B00000100,
B00001000,
B00010000,
B00100000,
B11111111,

it does appeared not as Z but as N. OK, it is transposed, but there is always trailing garbage below.

Could you please explain a bit more how to compile bitmap sprites and what exactly arguments for bitmap function do.

Thank you for any help!

Each pixel on the display is represented by a single bit in the PCD8544's RAM. Each byte in the RAM correlates to a column of 8 pixels.
The X coordinate works on a per-pixel basis, and accepts values between 0 and 83.
The Y coordinate on the other hand accepts values of 0 - 5. As the screen is 48 pixel high, there are 6 "rows of bytes" in the controller's RAM. So the bitmap can only be displayed on a per row basis.
This is a limitation of the current code in this version, but it makes it small and quick and simple to write. I'm not sure whether I will be developing it any further, so feel free to make your own contribution to it if you want.
X,Y values out of that scope will make the library ignore the gotoXY call and it might write your bitmap to the current position on the screen.

I'm all up for minimizing code!

So, X position in pixels, Y position is *8 pixels.
What about dimensions, are they following same convention as above? Or Y still in pixels?

Best regards!

Oh, trial and error... Just did it.

I just did test it and found that Y dimension is in 8*, just as position.

Great library!
Thank you!

Great, thanks for the feedback. I got this screen to play around with, I was thinking about making a simple arduino-scope.
I measured the time it took the Adafruit library to write 84 chars (fill the screen) and it took about 100ms, with this library and using hardware SPI it takes just over 3ms, massive difference in performance if you need to make some time consuming computations. This is why I decided to write it.

Thanks for the library ... it's The Coolest 8) I like code that is like my women, fast

Since I am a speed freak, I wanted to make something that will actually be fast.

A quick look at the code

void PCD8544_SPI_FB::writeBitmap(const uint8_t *bitmap, uint8_t x, uint8_t y, uint8_t width, uint8_t height)
{
	if (x >= LCD_X || y >= LCD_Y) return;
	this->gotoXY(x, y);
	uint16_t pos = this->m_Position;
	for (uint8_t y = 0; y < height; y++)
	{
		memcpy(this->m_Buffer + pos, bitmap + (y*width), width);
		pos += LCD_X;
	}
}

could be squeezed a bit by bringing some math out of the loop.

void PCD8544_SPI_FB::writeBitmap(const uint8_t *bitmap, uint8_t x, uint8_t y, uint8_t width, uint8_t height)
{
	if (x >= LCD_X || y >= LCD_Y) return;
	this->gotoXY(x, y);

	uint16_t pos = this->m_Buffer + this->m_Position;
        uint16_t maxY = bitmap + height * width;

	for (uint16_t y = bitmap; y < maxY; y+=width)
	{
		memcpy(pos, y, width);
		pos += LCD_X;
	}
}

can you time it?

That's a good idea, sadly I don't have it connected right now. But if you look at the benchmark sample sketch that comes with the library you can time it yourself.
Run the sketch with the original code and then run it again with the revised version and see how many us you shaved off the time of drawing a bitmap.

Don't had such display nearby, still some numbers from some dry runs

Original Timing (reference on slightly modified sketch so it outputs to serial)

The time it took draw a rect and 3 lines: 2024
The time it took to print 84 chars is:    3484
The time it took to draw a 25x3 (25x18) bitmap is: 1720
The time it took to run setPixel on all 4032 pixels and render it:    18660

(alll tweaks in PCD8544_SPI_FB)

tweak: replaced % with if

size_t PCD8544_SPI_FB::write(uint8_t data)
{
	// Non-ASCII characters are not supported.
	if (data < 0x20 || data > 0x7F) return 0;

	memcpy_P(this->m_Buffer + this->m_Position, ASCII[data - 0x20], 5);
	this->m_Buffer[this->m_Position+5] = 0x00;
	this->m_Position += 6;
	if (this->m_Position >= BufLen) this->m_Position -= BufLen;
	//this->m_Position %= BufLen;
	return 1;
}

tweak: get math out of the loop

void PCD8544_SPI_FB::writeBitmap(uint8_t *bitmap, uint8_t x, uint8_t y, uint8_t width, uint8_t height)
{
	if (x >= LCD_X || y >= LCD_Y) return;
	this->gotoXY(x, y);

	uint8_t *pos = this->m_Buffer + this->m_Position;
        uint8_t *maxY = bitmap + height * width;

	for (uint8_t *y = (uint8_t*) bitmap; *y < *maxY; y += width)
	{
		memcpy(pos, y, width);
		pos += LCD_X;
	}
}

tweak: reverse loop

void PCD8544_SPI_FB::writeLcd(uint8_t dataOrCommand, const uint8_t *data, uint16_t count)
{
	PORTB = (PORTB & ~0x05) | dataOrCommand;
	// for (uint16_t i = 0; i < count; i++)
		// SPI.transfer(data[i]);
    for (uint16_t i = count; i >0; i--)
		SPI.transfer(data[count-i]);
	PORTB |= 0x4;
}

after 3 optimizations

The time it took draw a rect and 3 lines: 1992 (-32)
The time it took to print 84 chars is:    2436 (-1048)
The time it took to draw a 25x3 (25x18) bitmap is: 1636 (-84)
The time it took to run setPixel on all 4032 pixels and render it:    18628 (-32)

Conclusion:
the printing of the characters is optimized substantial (~30%);
the other only a small percentage (0-5%)

Disclaimer: I cannot confirm the optimizations work on actual display as I didn't have one to test.

Wow! That's impressive! Is subtraction on AVR so much faster than addition?
I haven't really studied the AVR architecture or any fine-optimization techniques, but 30% is a very substantial increase in performance.

Thanks a lot for your time in looking at the code and helping out with the optimizations.

  • Arthur

TheCoolest:
Wow! That's impressive! Is subtraction on AVR so much faster than addition?
I haven't really studied the AVR architecture or any fine-optimization techniques, but 30% is a very substantial increase in performance.

Thanks a lot for your time in looking at the code and helping out with the optimizations.

  • Arthur
  1. No, it is the comparing with zero that is faster than comparing with nonzero const; and % is just expensive.

  2. You're welcome,
    There is little room to optimize (you can check this by commenting out the lowest level functions).
    The line() is now a call to rectangle, there might be some gain making it dedicated.

The code looks quite good, good layered design, clear function and variable names and very little comments.

some remarks:

  • Point of attention is that there is a begin() an init() and a clear(), sounds like one to many
    ==> merge begin() and init() into one. Some of the constants in init() could be parameters for begin(); // #define them.

  • swap could be inlined

  • remove all the testing of x and y if (x >= LCD_X || y >= LCD_Y) return; or change signature and return FAIL/SUCCESS.
    now the user just don't know if a call did something when it returns.

  • from write() * if (data < 0x20 || data > 0x7F) return 0;* you could also map non printable data on space, might save some layout. (design choice)

  • clear() this->m_Position = 0; is not needed as it is set in gotoXY()

  • BufLen is a #define ==> BUFLEN should be used, more consistent style

  • rectangle code could use some explaining.

just my 2 cents ,

Thanks a lot for the feedback, I'll take it to my attention and fix up the code and reupload it when it is ready.

Hi, this library looks great. Could you alter it so i can change the non spi pins and put the chip select of the 5110 to ground so i can use the pin for something different? I cant figure out how to do it.

Thank you. Since this thread got bumped I thought I'd mention that I started working on the changes robtillaart suggested. I hope to get the code working tomorrow and post an updated version.
I will add an option to change the port/pin mapping for the LCD control pins (DC, CE, RST), but they will have to be on a single port. For example either they will have to be mapped to pins D2 to D7 (D0 and D1 too, but on the Arduino they are usually used for serial communication), A0 to A5 or D8 to D10.
When I post an updated version, open the PCD8544_SPI.H file for instructions.

Thanx alot for your reply! I am looking forward to this.

@ bumsbert:
I've uploaded the new version, it's in the OP (Post #1). Please read the header file to see how to use different pins on your Arduino.

@ robtillaart:
Thanks again for you valuable feedback, I've taken into consideration most of your suggestions and implemented the optimizations you suggested to do.
Everything seems to works great.

robtillaart:

  1. No, it is the comparing with zero that is faster than comparing with nonzero const; and % is just expensive.
    I've made a comparison, and using an 'if' instead of % is indeed the reason for the huge difference in speed.

  2. You're welcome,
    There is little room to optimize (you can check this by commenting out the lowest level functions).
    The line() is now a call to rectangle, there might be some gain making it dedicated.
    I decided to leave line() as is for now, when I implement the ability to make diagonal lines, I may revisit the idea of adding dedicated code for straight lines.

The code looks quite good, good layered design, clear function and variable names and very little comments.

some remarks:

  • Point of attention is that there is a begin() an init() and a clear(), sounds like one to many
    ==> merge begin() and init() into one. Some of the constants in init() could be parameters for begin(); // #define them.
    I changed it, now you can use the 'simple' begin() which acts just as the begin in the previous version, but it also lets you select whether the display will be inverted or not.
    The second begin() enables you to define invertion and custom Vop, Temperature coefficient and Bias values.

  • swap could be inlined
    Done, forgot about that.

  • remove all the testing of x and y if (x >= LCD_X || y >= LCD_Y) return; or change signature and return FAIL/SUCCESS.
    now the user just don't know if a call did something when it returns.
    Made this change as well, will return 1 (PCD8544_SUCCESS) when the function succeeds and 0 (PCD8544_ERROR) if it fails.
    I have not applied the change to setPixel() because it slowed it down noticeably.

  • from write() * if (data < 0x20 || data > 0x7F) return 0;* you could also map non printable data on space, might save some layout. (design choice)
    Do you mean allow for user defined characters? I've left this out for now, I may implement it in a future version.

  • clear() this->m_Position = 0; is not needed as it is set in gotoXY()
    Oops. Fixed.

  • BufLen is a #define ==> BUFLEN should be used, more consistent style
    Changed.

  • rectangle code could use some explaining.
    Done.

just my 2 cents ,

OK, motivated choices, I like that !

Can you post the new timings you get? (as my measurements might be biased)

Sure, these are the results:

The time it took draw a rect and 3 lines: 1960
The time it took to print 84 chars is:    2316
The time it took to draw a 25x3 (25x18) bitmap is: 1560
The time it took to run setPixel on all 4032 pixels and render it:    18252

Thanks, even faster than my timings :slight_smile:

assuming commands and data never exceeds 255 chars one could use 8 bit count iso 16 bit. (or can it?)

void PCD8544_SPI_FB::writeLcd(uint8_t dataOrCommand, const uint8_t *data, uint8_t count)
{
PORTB = (PORTB & ~0x05) | dataOrCommand;

for (uint8_t i = count; i >0; i--)
SPI.transfer(data[count-i]);

PORTB |= 0x4;
}