Plz help to reduce code size (and even speed it up)

Hi everybody!

I'm programming a scientific RPN-calculator for an ATTINY85 and a QYF-TM1638-board (8 digit LED display with 16 buttons).

So far the prototype works surprisingly good and is rich of features.

Now I ran out of memory (code size > 8k):
The attached code produces 7642 bytes and I want to add at least trigonometric functions (need approximately 1500 bytes) and if possible hyperbolic functions, some screensaver/sleep functions and other mathematical functions (ie. convert polar to rectangular coordinates and vice versa).

I spent many hours in avoiding libraries and followed many "code resizing hints". But I'm not familiar what really costs bytes. So far I see only few possibilities to save code and can not evaluate if it is worth to go this ways:

  • Reprogram EEPROM.h (did not find hints how to code this)
  • Using few flag variables (instead of boolean-bytes) and manipulate their bits (&|~).
  • Reducing the number of used mathematical functions (i.e. exp(0.5*log(x)) instead of sqrt(x))
  • Using own written taylor series instead of mathematical functions

I would really appreciate any idea or hint to save bytes.
As the code is "slow-moving" on the ATTINY I would appreciate hints for speeding up too.

Thanks in advance for any help and regards
deetee

sarc.ino (23.2 KB)

Reprogram EEPROM.h

For what purpose?

Using few flag variables (instead of boolean-bytes) and manipulate their bits (&|~).

That may reduce SRAM requirements, at the cost of more flash space.

Reducing the number of used mathematical functions (i.e. exp(0.5*log(x)) instead of sqrt(x))

I instantly recognize what sqrt() does. I have to think about whether exp(0.5*log(x)) does the same thing. YMMV.

Using own written taylor series instead of mathematical functions

You think you are a better programmer than the person that wrote sqrt() or exp() or sin() or cos()?

You might try switching from the built-in digitalWrite and shiftOut to Mikael Patel's GPIO library. It is much faster and smaller. It has a derived SRPIO class that implements I/O shift registers. The declaration would look like this:

#include <GPIO.h>
#include <SRPIO.h>

GPIO<BOARD::D9>    STROBE; // Strobe pin // Arduino/Genuino Micro
const BOARD::pin_t CLOCK  = BOARD::D8; // Clock pin
const BOARD::pin_t DATA   = BOARD::D7; // Data pin
//#define DATA   2 // 5 green    // ATTINY85
//#define CLOCK  1 // 6 yellow   // HW-pins: 8 7 6 5  SW-pins: 3 2 1 0
//#define STROBE 0 // 7 orange   //          1 2 3 4           4 5 6 7

SRPIO<LSBFIRST, DATA, CLOCK> display;  // Assumes DATA pin has an external pull-up resistor

... and the usage looks like this:

void cmd(byte val) { // Send command to shield
  STROBE.low();
    display.write( val );
  STROBE.high();
}

This saves more than 800 bytes of program space (see attached).

As the code is "slow-moving" on the ATTINY I would appreciate hints for speeding up too.

You should get rid of the delay in getKey. Instead, use a polling technique like that used by the Bounce2 library. Save a millis timestamp when the keys change, then compare that timestamp later to see if the keys stabilize. To understand the concept, read these:

Cheers,
/dev

deetee.ino (23 KB)

Hello -dev!

Thanks for your fast and very professional response.

You gave me new hope to manage my challenge - even if your view is a totally new area for me and will give me lots to do for the next weekends :slight_smile:

Regards
deetee

You could put seldom used CONSTANTS and error message strings in EEPROM, it's a lot slower to retrieve data than from sram or flash, also frequently WRITING to EEPROM will wear out the cells, but READING has no (or little) effect.

@outsider:
Thanks for this hint. I planned some error messages- and to store this text in EEPROM (like I did with physical constants) is a good idea.

@dev:
I tried your GPIO-hint and it seems really to save some hundreds of bytes. Unfortunately the I/O (printbuf and getbuttons) do not work at all. I found 2 reasons why this could be:

  • I had to install delay_basic.h (from avr-libc-master) to get GPIO compiled.
  • A comment says that SRPIO assumes a pull-up resistor on the DATA pin. I am not so familiar with this - do I have to connect the DATA-pin with Vcc with a resistor (ie 10k)?

What do you mean? Do you see other reasons why I/O do not work?

TIA
deetee

deetee:

  • I had to install delay_basic.h (from avr-libc-master) to get GPIO compiled.

Hmm... The build for an UNO seemed to work fine, but the ATtiny x5 build did not. It probably doesn't matter how you got it to compile, because delay_loop_2 is only used by the pulse method. You don't use that method, so it's not even linked in.

If you want to investigate this further, you'll need to post the exact sketch, the errors you get, and the core you are building with.

  • A comment says that SRPIO assumes a pull-up resistor on the DATA pin. I am not so familiar with this - do I have to connect the DATA-pin with Vcc with a resistor (ie 10k)?

Yes. When it is possible for two devices to "transmit" at the same time, you should avoid the possibility for one device to write a HIGH while another device writes a LOW. The SRPO class only writes a LOW. For transmitting a HIGH, it puts the DATA pin in INPUT mode, assuming that an external resistor "pulls" the wire to a HIGH state. If all devices take that approach (aka open-drain), they will never "fight" over whether the line should be HIGH or LOW. Devices drive LOW, resistor pulls HIGH.

If you really don't want to do that, I could describe the mods to the SRPO class. I wouldn't be able to try it, of course.

My progress so far:

  • GPIO/SRPIO
    I tried /dev's code with a pullup resistor (10k between DATA-pin and Vcc) - but unfortunately without success. See attached code. This method could save me 400 bytes - so that is the most promising way to reduce the code size.

  • main/init/while instead of setup/loop
    To replace the setup/loop code with following main/init/while-construction saves me 90 bytes. Unfortunately I instantly loose the serial connection after upload and I have to reset (connect GND and RST) the arduino to upload again. Uncomfortable, but it works - and finally I will not need a serial connection for my calculator.

int main(void) {
  init();
  {
    // Setup code
  }
  while(1) {
    // Loop code
  }
}

TIA for any hint/help
deetee

180118_sarc_x_dev.ino (24.3 KB)

  • GPIO/SRPIO
    I tried /dev's code with a pullup resistor (10k between DATA-pin and Vcc) - but unfortunately without success.

Here is a SRPIO2 class that does not require the pullup resistor. It forces the data line to output mode whenever it writes a value, then leaves the data line in input mode. If read is ever called, it is still in input mode. Matching sketch attached.

If this still isn't working, you might try sending each written or read byte to Serial as a HEX value, just to make sure I didn't mess up the calls. I would think the STROBE line code is ok.

If you have a scope or logic analyzer, this would be a good time to take a look. A second arduino could also be used to double check what the sketch is doing.

SRPIO2.h (2.87 KB)

deetee2.ino (23 KB)

Speed up

double pow10(int8_t e) { // Returns 10^e
  double f = 1.0;
  for (byte i = 0; i < abs(e); i++) e >= 0 ? f *= 10 : f /= 10;
  return (f);
}

==> almost 2x faster (similar size)

double pow10(int8_t e)  // Returns 10^e
{
  bool ne = (e < 0);  		// negative exponent
  double f = 1.0;
  for (byte i = abs(e); i > 0; i--) f *= 10 ;
  if (ne) f = 1/f;
  return (f);
}

==> almost 3x faster (but definitely bigger)

double pow103(int8_t e)  // Returns 10^e
{
  bool ne = (e < 0);  // negative exponent
  double f = 1.0;
  byte i = abs(e);
  while ( i >= 3) { i-=3; f *= 1000; }
  while ( i-- > 0) f *= 10 ;
  if (ne) f = 1 / f;
  return (f);
}

division is more expensive than multiply.
big multiplies are faster than small multiplies.
(speed ups tested on an 328)

Hi all!

@dev
Thanks for making another library which doesn't need a pullup-resistor. Unfortunately the compiler gives an error
"SRPIO2 does not name a type"
when compiling the line
SRPIO2<LSBFIRST, DATA, CLOCK> display;
(??)

@robtillaart
That is very kind of you to optimize and test my subroutines. I am going to use your second solution which is fast and short. Unfortunately my program slows down when using menus. Due to a hint of /dev I could speed it up when getting rid of the DEBOUNCE-delay in the getkey-routine. And I think I have to review/rebuild the menu-routines (functions, physical constants) too.

Regards
deetee

Hi all!

@dev
I found a typo when linking to SRPIO2.h
Now it compiles - but without success (nothing to see on the display and no reaction when reading a val from keyboard).

Sorry that I am not able to help myself but I am not familiar with this kind of software (and I don't own things like a logic ananalyzer).

Regards
deetee

I found a typo when linking to SRPIO2.h
Now it compiles - but without success (nothing to see on the display and no reaction when reading a val from keyboard).

Ok, here's a version of SRPIO that leaves the data pin in the output state. It is possible that SRPIO2 switched to the input state too soon after clocking out the last bit written. Maybe the "hold time" on that last bit was too short.

With this file, change the include statement and the template instantiation line in your sketch to "SRPIO3".

Cheers,
/dev

SRPIO3.h (2.93 KB)

Hello /dev!

Thank you very much for your patience and work - but still no success.

Unfortunately I can not tell what's wrong. Compiling is done but the display remains dark and reading the keyboard (val) remains unchanged when typing.

Right now I am working on other frontiers (will post the code when finished):

  • Using the EEPROM for storing menus and constants (doesn't work on ATTINY ??)
  • Writing subroutine for trigonometric functions sin and atan (similar Taylor series) - other trigonometric functions can be calculated with sin or atan.
  • Rewriting menu selection routine.

Unfortunately this slows down the speed on the ATTINY dramatically :frowning:

Regards
deetee

Hello /dev!

At least I found out that display.write (shiftOut-substitute) confuses the board completely - even forgets the Port, so I have to reset the arduino by hand (connect RST with GND).

Setting ports (STROBE.low() and STROBE.high()) seem to work.

Maybe that helps.

I found out that the regular shiftIn-command slows down the code extremely. So your code will help me double - if it works.

Regards
deetee

At least I found out that display.write (shiftOut-substitute) confuses the board completely - even forgets the Port, so I have to reset the arduino by hand (connect RST with GND).

Well, that's too bad... After looking at another library, the only differences I can see are

(1) The other library sets the data line to LOW after clocking a byte out (here). You could do the same thing in the write method at the end:

 void write(uint8_t value)
  {
    if (BITORDER == LSBFIRST) {
         ...
    }
    m_data.low();    <---
  }

This library does not do that, so I'm skeptical.

(2) The GPIO library is 10 to 100 times faster, so the pulses are much faster. The length of your wires and how well they are connected could make a difference. This seems likely, because it sounds like the data written is corrupt.

The library from MikesModz also does direct port manipulation, so it is also fast. You could try it, but those files are not for the Arduino IDE. For example, the main program is called "main", not "setup" and "loop". You could try porting it to see if it works or doesn't.

After a quick look at the spec, I'll guess that GPIO is too fast. You'll have to slow down the GPIO library. :-/

You can use _delay_loop_1( n ) to delay for (n * 3 * 1000000)/F_CPU microseconds. That function is in the delay_basic file.

For example, to delay for 1us on a 16MHz (?) system, pass in a 6. Some timings:

  • The data line must be held for 0.1us before toggling the clock (Tsetup)
  • The data line must be held for 0.1us after toggling the clock (Thold)
  • The clock cannot be toggled faster than 0.4us (PWclk)
  • There must be 1us between consecutive bytes (Twait)
  • The strobe must be held for 1us after the last data bit (Tclk-stb).

The SRPIO write method should look like this:

 void write(uint8_t value)
  {
    if (BITORDER == LSBFIRST) {
      uint8_t mask = 1;
      do {
        m_data = value & mask;
        _delay_loop1( 1 ); // guarantee Tsetup
        m_clock.toggle();
        mask <<= 1;
        _delay_loop1( 2 ); // guarantee PWclk
        m_clock.toggle();
        _delay_loop1( 1 ); // guarantee Thold
      } while (mask);
    }
    else {
      uint8_t mask = 0x80;
      do {
        m_data = value & mask;
        _delay_loop1( 1 ); // guarantee Tsetup
        m_clock.toggle();
        mask >>= 1;
        _delay_loop1( 1 ); // guarantee PWclk
        m_clock.toggle();
        _delay_loop1( 1 ); // guarantee Thold
      } while (mask);
    }
    _delay_loop1( 5 ); // guarantee Twait and Tclk - Tstb
  }

Maybe? Along with short wires/good connections, this might be the problem.

Cheers,
/dev

Hi /dev!

Thanks for all of your work and hints.
Unfortunately I can not get your library to work.
Trying your _delay_loop_1-hint leads to a total disaster. The arduino and board got extremely confused that I have to reboot my computer and reset the arduino some times - and had to upload some other program to get the system up and in balance.

Tuning your high level software is far beyond my programming level. I admire your skills.

I am very close to give up. :frowning:

On the other hand I had a sense of achievement in speeding up my ATTINY. I didn't know that it was working with 1 MHz. And burning a bootloader which supports 16 Mhz speeds up my code drastically.

So speed has now a lower priority than reducing the code size. :slight_smile:

Regards
deetee

The arduino and board got extremely confused that I have to reboot my computer and reset the arduino some times - and had to upload some other program to get the system up and in balance.

You know, this sounds like you have a "board" issue, like you're building for the wrong platform.

The GPIO library has conditionals for various MCUs that control how the pin registers should be for that platform. If it uses the wrong MCU definitions, it could be writing values in the Wrong Places. So.

What core are you using?
What board did you select in the IDE?
Did you modify boards.txt to have your own entry?
Show us a schematic.

I started wondering when you mentioned the 1MHz vs 16MHz clock speed. There could be other problems.

I would also suggest posting a minimal sketch that works with digitalWrite/shiftOut. For example, just clear the screen and write a message.

Then try that with GPIO. Just use GPIO for the STROBE line, then try SRPIOx for the DATA/CLOCK lines. Attach the sketch that doesn't work.

Hi /dev!

Thank you for all of your engagement and help.

I think your strategy to find out what's not fitting together is excellent.
But please be appreciative that this seems to be far beyond my capabilities and knowledge.

So I will concentrate on "traditional code improvement" so far and keep your strategy as joker in my mind.
For instance I found out to save a lot of bytes by coding the exponential function due to taylor series - so I have room for implementing the essential trigonometric functions (my first success!).

Regards
deetee