A fast PCD8544 library (Nokia 5110)

Each pixel on the display is represented by a single bit in the PCD8544's RAM. Each byte in the RAM correlates to a column of 8 pixels.
The X coordinate works on a per-pixel basis, and accepts values between 0 and 83.
The Y coordinate on the other hand accepts values of 0 - 5. As the screen is 48 pixel high, there are 6 "rows of bytes" in the controller's RAM. So the bitmap can only be displayed on a per row basis.
This is a limitation of the current code in this version, but it makes it small and quick and simple to write. I'm not sure whether I will be developing it any further, so feel free to make your own contribution to it if you want.
X,Y values out of that scope will make the library ignore the gotoXY call and it might write your bitmap to the current position on the screen.

I'm all up for minimizing code!

So, X position in pixels, Y position is *8 pixels.
What about dimensions, are they following same convention as above? Or Y still in pixels?

Best regards!

Oh, trial and error... Just did it.

I just did test it and found that Y dimension is in 8*, just as position.

Great library!
Thank you!

Great, thanks for the feedback. I got this screen to play around with, I was thinking about making a simple arduino-scope.
I measured the time it took the Adafruit library to write 84 chars (fill the screen) and it took about 100ms, with this library and using hardware SPI it takes just over 3ms, massive difference in performance if you need to make some time consuming computations. This is why I decided to write it.

Thanks for the library ... it's The Coolest 8) I like code that is like my women, fast

Since I am a speed freak, I wanted to make something that will actually be fast.

A quick look at the code

void PCD8544_SPI_FB::writeBitmap(const uint8_t *bitmap, uint8_t x, uint8_t y, uint8_t width, uint8_t height)
{
	if (x >= LCD_X || y >= LCD_Y) return;
	this->gotoXY(x, y);
	uint16_t pos = this->m_Position;
	for (uint8_t y = 0; y < height; y++)
	{
		memcpy(this->m_Buffer + pos, bitmap + (y*width), width);
		pos += LCD_X;
	}
}

could be squeezed a bit by bringing some math out of the loop.

void PCD8544_SPI_FB::writeBitmap(const uint8_t *bitmap, uint8_t x, uint8_t y, uint8_t width, uint8_t height)
{
	if (x >= LCD_X || y >= LCD_Y) return;
	this->gotoXY(x, y);

	uint16_t pos = this->m_Buffer + this->m_Position;
        uint16_t maxY = bitmap + height * width;

	for (uint16_t y = bitmap; y < maxY; y+=width)
	{
		memcpy(pos, y, width);
		pos += LCD_X;
	}
}

can you time it?

That's a good idea, sadly I don't have it connected right now. But if you look at the benchmark sample sketch that comes with the library you can time it yourself.
Run the sketch with the original code and then run it again with the revised version and see how many us you shaved off the time of drawing a bitmap.

Don't had such display nearby, still some numbers from some dry runs

Original Timing (reference on slightly modified sketch so it outputs to serial)

The time it took draw a rect and 3 lines: 2024
The time it took to print 84 chars is:    3484
The time it took to draw a 25x3 (25x18) bitmap is: 1720
The time it took to run setPixel on all 4032 pixels and render it:    18660

(alll tweaks in PCD8544_SPI_FB)

tweak: replaced % with if

size_t PCD8544_SPI_FB::write(uint8_t data)
{
	// Non-ASCII characters are not supported.
	if (data < 0x20 || data > 0x7F) return 0;

	memcpy_P(this->m_Buffer + this->m_Position, ASCII[data - 0x20], 5);
	this->m_Buffer[this->m_Position+5] = 0x00;
	this->m_Position += 6;
	if (this->m_Position >= BufLen) this->m_Position -= BufLen;
	//this->m_Position %= BufLen;
	return 1;
}

tweak: get math out of the loop

void PCD8544_SPI_FB::writeBitmap(uint8_t *bitmap, uint8_t x, uint8_t y, uint8_t width, uint8_t height)
{
	if (x >= LCD_X || y >= LCD_Y) return;
	this->gotoXY(x, y);

	uint8_t *pos = this->m_Buffer + this->m_Position;
        uint8_t *maxY = bitmap + height * width;

	for (uint8_t *y = (uint8_t*) bitmap; *y < *maxY; y += width)
	{
		memcpy(pos, y, width);
		pos += LCD_X;
	}
}

tweak: reverse loop

void PCD8544_SPI_FB::writeLcd(uint8_t dataOrCommand, const uint8_t *data, uint16_t count)
{
	PORTB = (PORTB & ~0x05) | dataOrCommand;
	// for (uint16_t i = 0; i < count; i++)
		// SPI.transfer(data[i]);
    for (uint16_t i = count; i >0; i--)
		SPI.transfer(data[count-i]);
	PORTB |= 0x4;
}

after 3 optimizations

The time it took draw a rect and 3 lines: 1992 (-32)
The time it took to print 84 chars is:    2436 (-1048)
The time it took to draw a 25x3 (25x18) bitmap is: 1636 (-84)
The time it took to run setPixel on all 4032 pixels and render it:    18628 (-32)

Conclusion:
the printing of the characters is optimized substantial (~30%);
the other only a small percentage (0-5%)

Disclaimer: I cannot confirm the optimizations work on actual display as I didn't have one to test.

Wow! That's impressive! Is subtraction on AVR so much faster than addition?
I haven't really studied the AVR architecture or any fine-optimization techniques, but 30% is a very substantial increase in performance.

Thanks a lot for your time in looking at the code and helping out with the optimizations.

  • Arthur

TheCoolest:
Wow! That's impressive! Is subtraction on AVR so much faster than addition?
I haven't really studied the AVR architecture or any fine-optimization techniques, but 30% is a very substantial increase in performance.

Thanks a lot for your time in looking at the code and helping out with the optimizations.

  • Arthur
  1. No, it is the comparing with zero that is faster than comparing with nonzero const; and % is just expensive.

  2. You're welcome,
    There is little room to optimize (you can check this by commenting out the lowest level functions).
    The line() is now a call to rectangle, there might be some gain making it dedicated.

The code looks quite good, good layered design, clear function and variable names and very little comments.

some remarks:

  • Point of attention is that there is a begin() an init() and a clear(), sounds like one to many
    ==> merge begin() and init() into one. Some of the constants in init() could be parameters for begin(); // #define them.

  • swap could be inlined

  • remove all the testing of x and y if (x >= LCD_X || y >= LCD_Y) return; or change signature and return FAIL/SUCCESS.
    now the user just don't know if a call did something when it returns.

  • from write() * if (data < 0x20 || data > 0x7F) return 0;* you could also map non printable data on space, might save some layout. (design choice)

  • clear() this->m_Position = 0; is not needed as it is set in gotoXY()

  • BufLen is a #define ==> BUFLEN should be used, more consistent style

  • rectangle code could use some explaining.

just my 2 cents ,

Thanks a lot for the feedback, I'll take it to my attention and fix up the code and reupload it when it is ready.

Hi, this library looks great. Could you alter it so i can change the non spi pins and put the chip select of the 5110 to ground so i can use the pin for something different? I cant figure out how to do it.

Thank you. Since this thread got bumped I thought I'd mention that I started working on the changes robtillaart suggested. I hope to get the code working tomorrow and post an updated version.
I will add an option to change the port/pin mapping for the LCD control pins (DC, CE, RST), but they will have to be on a single port. For example either they will have to be mapped to pins D2 to D7 (D0 and D1 too, but on the Arduino they are usually used for serial communication), A0 to A5 or D8 to D10.
When I post an updated version, open the PCD8544_SPI.H file for instructions.

Thanx alot for your reply! I am looking forward to this.

@ bumsbert:
I've uploaded the new version, it's in the OP (Post #1). Please read the header file to see how to use different pins on your Arduino.

@ robtillaart:
Thanks again for you valuable feedback, I've taken into consideration most of your suggestions and implemented the optimizations you suggested to do.
Everything seems to works great.

robtillaart:

  1. No, it is the comparing with zero that is faster than comparing with nonzero const; and % is just expensive.
    I've made a comparison, and using an 'if' instead of % is indeed the reason for the huge difference in speed.

  2. You're welcome,
    There is little room to optimize (you can check this by commenting out the lowest level functions).
    The line() is now a call to rectangle, there might be some gain making it dedicated.
    I decided to leave line() as is for now, when I implement the ability to make diagonal lines, I may revisit the idea of adding dedicated code for straight lines.

The code looks quite good, good layered design, clear function and variable names and very little comments.

some remarks:

  • Point of attention is that there is a begin() an init() and a clear(), sounds like one to many
    ==> merge begin() and init() into one. Some of the constants in init() could be parameters for begin(); // #define them.
    I changed it, now you can use the 'simple' begin() which acts just as the begin in the previous version, but it also lets you select whether the display will be inverted or not.
    The second begin() enables you to define invertion and custom Vop, Temperature coefficient and Bias values.

  • swap could be inlined
    Done, forgot about that.

  • remove all the testing of x and y if (x >= LCD_X || y >= LCD_Y) return; or change signature and return FAIL/SUCCESS.
    now the user just don't know if a call did something when it returns.
    Made this change as well, will return 1 (PCD8544_SUCCESS) when the function succeeds and 0 (PCD8544_ERROR) if it fails.
    I have not applied the change to setPixel() because it slowed it down noticeably.

  • from write() * if (data < 0x20 || data > 0x7F) return 0;* you could also map non printable data on space, might save some layout. (design choice)
    Do you mean allow for user defined characters? I've left this out for now, I may implement it in a future version.

  • clear() this->m_Position = 0; is not needed as it is set in gotoXY()
    Oops. Fixed.

  • BufLen is a #define ==> BUFLEN should be used, more consistent style
    Changed.

  • rectangle code could use some explaining.
    Done.

just my 2 cents ,

OK, motivated choices, I like that !

Can you post the new timings you get? (as my measurements might be biased)

Sure, these are the results:

The time it took draw a rect and 3 lines: 1960
The time it took to print 84 chars is:    2316
The time it took to draw a 25x3 (25x18) bitmap is: 1560
The time it took to run setPixel on all 4032 pixels and render it:    18252

Thanks, even faster than my timings :slight_smile:

assuming commands and data never exceeds 255 chars one could use 8 bit count iso 16 bit. (or can it?)

void PCD8544_SPI_FB::writeLcd(uint8_t dataOrCommand, const uint8_t *data, uint8_t count)
{
PORTB = (PORTB & ~0x05) | dataOrCommand;

for (uint8_t i = count; i >0; i--)
SPI.transfer(data[count-i]);

PORTB |= 0x4;
}

What do you think of the way I decided to manage pin mappings? Any interesting ideas how to improve on that and make it easier to use for less advanced users?

clear is inefficient as it makes 504 calls and set the PORT ditto times x 2,
try this:

void PCD8544_SPI::clear()
{
        PCD8544_PORT = (PCD8544_PORT & ~PINS_CE_DC) | PCD8544_DATA;
	for (uint16_t i = BUF_LEN; i >0; i--) SPI.transfer(0x00);
       	PCD8544_PORT |= PIN_CE;
	this->gotoXY(0, 0);
}

slightly more code but I expect quite some performance gain, please verify.

not dived into the pinmappings yet.