Function optimisation not behaving as expected. Help please?

Hello all, I am having a hard time finding the best way of optimising a function as some of the things I have tried do not behave as expected and I am struggling to understand why?

This is the fastest performing version so far :-

      if (_UTFT->_transparent==0) {
        if (!(_UTFT->orient == LANDSCAPE && _UTFT->display_model == CPLD)) {
          setXY(X, Y, X + Image_Width - 1, Y + Image_Height - 1, 1);
		  CD_DATA;
          for (x_1 = 0; x_1 < Image_Height; x_1++) {
            _SPIread(databyte, (im2));
            for (y_1 = 0; y_1 < im2; y_1+=2) {
              write16((databyte[y_1 ] << 8 | databyte[y_1 + 1]));
            }
          }
        }
        else { // CPLD PORTRAIT
          for (y_1 = 0; y_1 < Image_Height; y_1++) {
            _SPIread(databyte, im2);
            setXY(X, Y + y_1, X + Image_Width, Y + y_1, 1);
            CD_DATA;
            for(x_1=im2-2;x_1>0;x_1-=2) {
              write16((databyte[x_1 ] << 8 | databyte[x_1 + 1]));
            }
          }
        }
      }
      else
      { // Transparent
        for (y_1 = 0; y_1 < Image_Height; y_1++) {
          _SPIread(databyte, im2);
          for (x_1 = 0; x_1 < im2; x_1+=2) {
            if (((databyte[x_1] << 8) + databyte[x_1 + 1]) != 0) {
              setXY(x_1 + X, y_1 + Y, x_1 + X, y_1 + Y, 0);
              WriteData((databyte[x_1] << 8 | databyte[x_1 + 1]));
            }
          }
        }
      }
      if (result < flash1) sbi(P_F_CS, B_F_CS);
      else if (result >= flash1 && (result < (flash1 + flash2))) sbi(P_F_CS2, B_F_CS2);
      else sbi(P_F_CS3, B_F_CS3);

Here are the performance figures for the function shown above for an image size of 320x180

425 frames drawn in 18067.91ms =23.52 FPS
425 frames drawn in 18067.94ms =23.52 FPS
425 frames drawn in 18067.94ms =23.52 FPS
425 frames drawn in 18067.94ms =23.52 FPS
425 frames drawn in 18067.94ms =23.52 FPS

But, none of the images I wish to draw in the current test uses transparent images, so you might assume that by removing all of the transparency conditions it would speed things up like this, but this has the effect of slowing it down dramatically, why ?

        if (!(_UTFT->orient == LANDSCAPE && _UTFT->display_model == CPLD)) {
          setXY(X, Y, X + Image_Width - 1, Y + Image_Height - 1, 1);
		  CD_DATA;
          for (x_1 = 0; x_1 < Image_Height; x_1++) {
            _SPIread(databyte, (im2));
            for (y_1 = 0; y_1 < im2; y_1+=2) {
              write16((databyte[y_1 ] << 8 | databyte[y_1 + 1]));
            }
          }
        }
        else { // CPLD PORTRAIT
          for (y_1 = 0; y_1 < Image_Height; y_1++) {
            _SPIread(databyte, im2);
            setXY(X, Y + y_1, X + Image_Width, Y + y_1, 1);
            CD_DATA;
            for(x_1=im2-2;x_1>0;x_1-=2) {
              write16((databyte[x_1 ] << 8 | databyte[x_1 + 1]));
            }
          }
        }
      if (result < flash1) sbi(P_F_CS, B_F_CS);
      else if (result >= flash1 && (result < (flash1 + flash2))) sbi(P_F_CS2, B_F_CS2);
      else sbi(P_F_CS3, B_F_CS3);

Here are the performance figures for the function shown above for an image size of 320x180

425 frames drawn in 22707.50ms =18.72 FPS
425 frames drawn in 22707.48ms =18.72 FPS
425 frames drawn in 22707.48ms =18.72 FPS
425 frames drawn in 22707.48ms =18.72 FPS
425 frames drawn in 22707.48ms =18.72 FPS

Regards,

Graham

That's not really a dramatic slowdown.

You'll have to look at the assember generated and compare the two to get an insight - typically
rearranging code means assignments or variables to registers/stack locations will change and that
can affect the speed of inner loops using those variables.

MarkT:
That's not really a dramatic slowdown.

26% is pretty dramatic........
But, thank you, I was beginning to come to the conclusion I would need to dig around in the assembler output.

So you have three options to actually write data.
Can you annotate the original code of their purpose?
Actually some comments in general would be nice.
What does CD_DATA (macro ?) do.
It is likely that display formating - landscape / portrait and rotation adds some more code to process.
Sorry cannot be of more help, my USB video to LCD is still not working - getting empty packets from USB.
Jim

Was USB to LCD something I was involved with? It is not ringing any bells....

ghlawrence2000:
Was USB to LCD something I was involved with? It is not ringing any bells....

No, that was just comment.

Out of curiosity - are you going for full streaming video or "just" retrieving data from SD?

:stuck_out_tongue: Well, funny you should ask.

It started a few weeks ago a forum user was quite pleased with herself that she was able to get 4FPS out of the DUE, when in fact I have a clip on YouTube around 2 years old showing 'video' at 5.7FPS, and I was just curious what I could achieve now with the improvements I have made to my UTFT_GHL library. This uses SPI Flash as storage and the absolute best to date being the 23.5FPS which is taken from 425 stills.

Thanks for the note.
May I ask what forum you are referring to?
I just rebuild my debugging scheme and ended up with same result.
I am getting data from the USB pipe (video camera),but it is missing "the header" info.
I sure would like to talk to someone about it, since the author of the article I have been using is no longer around.
This USB stuff is very interesting. The piece parts in the article are OK, but the overall flow is where I am getting lost.
Back to the books , I guess,
Jim

This seems like what you are after? It is beyond my interests but may help you...

Regards,

Graham

Well, at a guess there are a whole bunch of transparent frames (or whatever), so naturally when you don't skip them it takes longer.

You could take a count of how many do and do'nt match and print that to serial to confirm.

If you want to speed up your code, you would write out chunks of data rather than one word at a time. Also, I bet write16() is pulling apart the word into its separate bytes to write them. What a waste, when you already have the bytes as bytes in dataByte. If the bytes in dataByte are in the right order, you can probably just emit the whole row in one go.