Pages: 1 [2]   Go Down
Author Topic: Byte mirroring  (Read 2363 times)
0 Members and 1 Guest are viewing this topic.
Brebu
Offline Offline
Sr. Member
****
Karma: 1
Posts: 251
New to Arduino
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

this is the function to draw the char
Code:
uint8_t S65Display::drawChar(uint8_t x, uint8_t y, char c, uint8_t size, uint16_t color, uint16_t bg_color)
{
uint8_t ret;
uint8_t data, mask;
uint8_t i, j, width, height;
const prog_uint8_t *ptr;
i  = (uint8_t)c;
ptr= &font[(i-FONT_START)*(8*FONT_HEIGHT/8)];
width  = FONT_WIDTH;
height = FONT_HEIGHT;
if(size <= 1)
{
ret = x+width;
if(ret > S65_WIDTH) {return S65_WIDTH+1;}
setArea(x, y, (x+width-1), (y+height-1));
s65_drawStart();
#ifdef S65_L2F50
for(; height!=0; height--)
{
data = (pgm_read_byte(ptr) * 0x0202020202ULL & 0x010884422010ULL) % 1023;ptr+=1;//mirror bits
for(mask=(1<<(width-1)); mask!=0; mask>>=1)
{
if(data & mask)
s65_draw(color);
else
s65_draw(bg_color);
}
}
#else
for(; height!=0; height--)
{
data = pgm_read_byte(ptr); ptr+=1;
for(mask=(1<<(width-1)); mask!=0; mask>>=1)
{
if(data & mask)
s65_draw(color);
else
s65_draw(bg_color);
}
}
#endif
s65_drawStop();
}
else
{
ret = x+(width*size);
if(ret > S65_WIDTH) {return S65_WIDTH+1;}
s65_setArea(x, y, (x+(width*size)-1), (y+(height*size)-1));
s65_drawStart();
for(; height!=0; height--)
{
data = pgm_read_byte(ptr); ptr+=1;
for(i=size; i!=0; i--)
{
for(mask=(1<<(width-1)); mask!=0; mask>>=1)
{
if(data & mask)
{
for(j=size; j!=0; j--)
{
s65_draw(color);
}
}
else
{
for(j=size; j!=0; j--)
{
s65_draw(bg_color);
}
}
}
}
}
s65_drawStop();
}
return ret;
}
Logged

Phoenix, Arizona USA
Offline Offline
Faraday Member
**
Karma: 36
Posts: 5519
Where's the beer?
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

Quote
likely someone in the past 40-odd years (or greater if you want to include the pre-personal computing era) has already run into the issue, and a solution already exists;
many interesting algorithms were invented/designed in the 60ies - most memorable Quicksort in 1960(!)


Definitely - some of the best stuff in computing, both hardware and software, was developed during that decade; many of the things we think of as "advanced" in a PC were actually available on mainframes of the era. The advance is in the miniaturization, and comoditization of these developments (which certainly shouldn't be discounted).

 smiley
Logged

I will not respond to Arduino help PM's from random forum users; if you have such a question, start a new topic thread.

Global Moderator
Netherlands
Offline Offline
Shannon Member
*****
Karma: 170
Posts: 12465
In theory there is no difference between theory and practice, however in practice there are many...
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

@lefty
CHAR-bit is #bits in a char so it must be 8. eg  #define CHAR_BIT 8

(tested for the first 100 values not tested for other datatypes
Code:
void setup()
{
  Serial.begin(115200);
  Serial.println("I am Arduino");
}

void loop()
{
  for (unsigned int i=0; i < 65535; i++)
  {
    unsigned int v = i;
    Serial.print(v, BIN);
    Serial.print(' ');
    unsigned int s = sizeof(v) * 8;   // bits/byte; must be power of 2
    unsigned int mask = ~0;         
    while ((s >>= 1) > 0)
    {
      mask ^= (mask << s);
      v = ((v >> s) & mask) | ((v << s) & ~mask);
    }
    Serial.println(v, BIN);
  }
}
Logged

Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

Global Moderator
Netherlands
Offline Offline
Shannon Member
*****
Karma: 170
Posts: 12465
In theory there is no difference between theory and practice, however in practice there are many...
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

@Cr0sh
Quote
Where's the beer?
http://maps.google.nl/maps?f=q&source=s_q&hl=nl&geocode=&q=beer+England&aq=&sll=52.469397,5.509644&sspn=3.681428,13.392334&ie=UTF8&hq=&hnear=Beer,+Seaton,+Verenigd+Koninkrijk&ll=50.709504,-3.094368&spn=0.119576,0.41851&z=12
smiley
Logged

Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

UK
Offline Offline
Faraday Member
**
Karma: 16
Posts: 2883
Gorm deficient
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

A very lovely village, but a long way from Sandwich
Logged

Per Arduino ad Astra

Ontario
Offline Offline
God Member
*****
Karma: 20
Posts: 835
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

 
Code:
b = ((b * 0x0802LU & 0x22110LU) | (b * 0x8020LU & 0x88440LU)) * 0x10101LU >> 16;

This requires 3 long multiplications plus some 32-bit logical ops. On AVR I would stand either of my solutions up against this.  The optimal solution is likely a lookup table which would do the job in something like 4 instructions.
Logged

Global Moderator
Netherlands
Offline Offline
Shannon Member
*****
Karma: 170
Posts: 12465
In theory there is no difference between theory and practice, however in practice there are many...
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Quote
b = ((b * 0x0802LU & 0x22110LU) | (b * 0x8020LU & 0x88440LU)) * 0x10101LU >> 16;

This requires 3 long multiplications plus some 32-bit logical ops. On AVR I would stand either of my solutions up against this.  The optimal solution is likely a lookup table which would do the job in something like 4 instructions.

You are probably right, optimal allways depends on context, that might be size or speed or something in between.

- a lookuptable would take 256 bytes and addressing it is (beginaddress + index) so that is optimized for speed. For the Arduino 256 bytes is 13% of the RAM (or store it in prog_mem). Which version is optimized for size I don't know, I think the loop enrolled version is a good compromise of speed and size.
Logged

Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

Brebu
Offline Offline
Sr. Member
****
Karma: 1
Posts: 251
New to Arduino
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

Once again thank you all for the help provided

I have found an easier way to resolve my problem by generating an rotated + mirrored font file

this is a great toll
http://www.min.at/prinz/software/pixelfont/
Logged

Global Moderator
Netherlands
Offline Offline
Shannon Member
*****
Karma: 170
Posts: 12465
In theory there is no difference between theory and practice, however in practice there are many...
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

A small test for different methods, (IDE 21, Atmel 328, windows 7)

Code:
void setup()
{
  Serial.begin(115200);
}

void loop()
{
  unsigned long start = micros();
  for (int i=0; i<256; i++)
  {
    volatile uint8_t b = i;
    b = ((b * 0x0802LU & 0x22110LU) | (b * 0x8020LU & 0x88440LU)) * 0x10101LU >> 16;
    b = ((b * 0x0802LU & 0x22110LU) | (b * 0x8020LU & 0x88440LU)) * 0x10101LU >> 16;
  }
  Serial.println(micros() - start);
  while (1);
}
Size : 2758
Time: 6408  usec (512 flips) => 12.5 usec =  200 clockcycles / flip


Code:
void setup()
{
  Serial.begin(115200);
}

void loop()
{
  unsigned long start = micros();
  for (int i=0; i<256; i++)
  {
    volatile uint8_t b = i;
    b = (b * 0x0202020202ULL & 0x010884422010ULL) % 1023;
    b = (b * 0x0202020202ULL & 0x010884422010ULL) % 1023;
  }
  Serial.println(micros() - start);
  while (1);
}
Size : 6872
Time: 151804 usec (512 flips)  => 296.5 usec per flip = 4744 clockcycles / flip

Although the formula is shorter the use of ULL datatype in combination with modulo makes this far bigger and slower



Code:
void setup()
{
  Serial.begin(115200);
}

// big lookup table
uint8_t x[256] = { 0,128,64,192,32,160,96,224,16,144,80,208,48,176,112,240,8,136,72,200,40,168,104,232,24,152,88,216,56,184,120,248,4,132,68,196,36,164,100,228,20,148,84,212,52,180,116,244,12,140,76,204,44,172,108,236,28,156,92,220,60,188,124,252,2,130,66,194,34,162,98,226,18,146,82,210,50,178,114,242,10,138,74,202,42,170,106,234,26,154,90,218,58,186,122,250,6,134,70,198,38,166,102,230,22,150,86,214,54,182,118,246,14,142,78,206,46,174,110,238,30,158,94,222,62,190,126,254,1,129,65,193,33,161,97,225,17,145,81,209,49,177,113,241,9,137,73,201,41,169,105,233,25,153,89,217,57,185,121,249,5,133,69,197,37,165,101,229,21,149,85,213,53,181,117,245,13,141,77,205,45,173,109,237,29,157,93,221,61,189,125,253,3,131,67,195,35,163,99,227,19,147,83,211,51,179,115,243,11,139,75,203,43,171,107,235,27,155,91,219,59,187,123,251,7,135,71,199,39,167,103,231,23,151,87,215,55,183,119,247,15,143,79,207,47,175,111,239,31,159,95,223,63,191,127,255};


void loop()
{
  unsigned long start = micros();
  for (int i=0; i<256; i++)
  {
    volatile uint8_t b = i;
    b = x[b];
    b = x[b];
  }
  Serial.println(micros() - start);
  while (1);
}
Size : 2804
Time:  436 usec (512 flips)  => 0.85 usec per flip = 13.6 clockcycles / flip

Really fast as expected, but not the smallest footprint so far

Code:
void setup()
{
  Serial.begin(115200);
}

// small lookup table - 4bit level iso 8 bit
// notice the pattern between the first 8 and the latter 8 in the table
uint8_t xx[16] = { 0,8,4,12,  2,10,6,14,  1,9,5,13,  3,11,7,15};

void loop()
{
  unsigned long start = micros();
  for (int i=0; i<256; i++)
  {
    volatile uint8_t b = i;
    b = xx[b & 0x0F] << 4 | xx[b>>4];
    b = xx[b & 0x0F] << 4 | xx[b>>4];
  }
  Serial.println(micros() - start);
  while (1);
}
Size : 2624
Time:  1468 usec (512 flips)  => 2.86 usec per flip = 46 clockcycles / flip

Three times slower than the fastest, but still second fastest.
With 2624 the smallest in footprint so far.


Code:
void setup()
{
  Serial.begin(115200);
}

void loop()
{
  unsigned long start = micros();
  for (int j=0; j<256; j++)
  {
    volatile uint8_t value = j;
    volatile uint8_t ret = 0;
    for (int i = 0; i < 8; i++)
      if ((value & (uint8_t)(1 << i)) != 0) ret += (uint8_t)(1 << (7 - i));
    value = ret;
    for (int i = 0; i < 8; i++)
      if ((value & (uint8_t)(1 << i)) != 0) ret += (uint8_t)(1 << (7 - i));
  }
  Serial.println(micros() - start);
  while (1);
}
Size : 2676
Time:  14008 usec (512 flips)  => 27.4 usec per flip = 438 clockcycles / flip

Good footprint, but relative slow.


Code:
void setup()
{
  Serial.begin(115200);
}

void loop()
{
  unsigned long start = micros();
  for (int j=0; j<256; j++)
  {
    volatile uint8_t in = j;
    uint8_t out = 0;
    if (in & 0x01) out |= 0x80;
    if (in & 0x02) out |= 0x40;
    if (in & 0x04) out |= 0x20;
    if (in & 0x08) out |= 0x10;
    if (in & 0x10) out |= 0x08;
    if (in & 0x20) out |= 0x04;
    if (in & 0x40) out |= 0x02;
    if (in & 0x80) out |= 0x01;
    in = out;
    out = 0;
    if (in & 0x01) out |= 0x80;
    if (in & 0x02) out |= 0x40;
    if (in & 0x04) out |= 0x20;
    if (in & 0x08) out |= 0x10;
    if (in & 0x10) out |= 0x08;
    if (in & 0x20) out |= 0x04;
    if (in & 0x40) out |= 0x02;
    if (in & 0x80) out |= 0x01;
  }
  Serial.println(micros() - start);
  while (1);
}
Size : 2596
Time:  996 usec (512 flips)  => 1.95 usec per flip = 32 clockcycles / flip

Smallest footprint, second in performance

So the 256 byte lookup table is fastest but the enrolled loop is very fast too and has a 200byte smaller footprint.

All disclaimers apply smiley

Logged

Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

Ontario
Offline Offline
God Member
*****
Karma: 20
Posts: 835
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

You need a control in which you calculate the computational complexity of the loop overhead and then factor that out of the other measurements.  Your figure of 13.6 clockcycles/flip for the table lookup includes the overhead of incrementing and comparing the loop variable, the conditional branch at the top and the branch back from the bottom.  That's, what? 4 instructions -- comparable to the loop contents.

And the table lookup should be to progmem -- making it at least one instruction longer.  The way you have it, it takes 256 bytes of ROM *and* another 256 bytes of RAM.  It is probably the worst approach since RAM is so scarce.

Anyhow, nice work and thank you for backing up my claims!
Logged

Global Moderator
Netherlands
Offline Offline
Shannon Member
*****
Karma: 170
Posts: 12465
In theory there is no difference between theory and practice, however in practice there are many...
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

You're welcome, was kinda fun

You are 100% right that these tests are only indicative as overhead is included, but that said, they are indicative enough  smiley

Logged

Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

Offline Offline
Newbie
*
Karma: 0
Posts: 1
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

You could try this
byte data = ???;
byte tmp = 0;      // temporary variable to hold mirror of data
for ( int i =0; i <8; i++ ) bitWrite( tmp, 7-i, bitRead( data, i ) );

data = tmp;
Logged

Pages: 1 [2]   Go Up
Jump to: