Execution speed of a function

Hello

I measured the execution speed of a function and came to something surprising, at least to me. The first time the function is executed, it takes longer than the next times. The difference is significant to raise my eyebrow. I tried googling but it's all irrelevant stuff and I'm not sure how exactly to google it.

Below is the speed results. First column is number of clock cycli it took to complete, without prescaler. Second one is not important.


My only guess is that the needed registers get loaded the first time and can be kept there. Not sure though. And I'd like to know why this is.

the main.cpp file is about 400 lines, I've attached it, as i think it's a bit overkill to post that here in tags. There are other files in my project, but I don't think they really matter.

void handle_adc_res(uint8_t adc_ch) {

    #ifdef __FL__
    update_feat_vec(ptr_fv[adc_ch], ptr_buffer[adc_ch]);

    #else
    update_feat_vec(ptr_fv[adc_ch], 
                    carry_mav + adc_ch, 
                    temp_rms + adc_ch, 
                    carry_wfl + adc_ch,
                    ptr_buffer[adc_ch]);
    
    #endif

}

void setup() {

    Serial.begin(9600);
    delay(2000);

    flag.sample_rdy = 0;
    flag.window_rdy = 0;
    flag.fv_finalize = 0;
    counter_overflow = 0;

	cli();

	DDRB = 0xFF;
	PORTB = 0x00;

	init_global_var();
	init_sample_timer();
	init_ADC();
    init_counter_timer();

	enable_ADC();

	sei();

	enable_sample_timer();
    // enable_window_timer();
    
    clr_bit(&PORTB, pin_overflow);

    uint8_t i = 0;

    memset(cnts, 0, smp_lngth * sizeof(uint16_t));

    while (i < smp_lngth) {

        while(!flag.sample_rdy) {}

        flag.sample_rdy = 0;

        enable_counter_timer();

        handle_adc_res(flag.channel_rdy);

        disable_counter_timer();
        cnts[i] = TCNT2 + 0xFF * counter_overflow;
        counter_overflow = 0;

        ++i;

    }


}

void loop() {

    uint8_t i;
    Serial.println(F("speed results"));
    for (i = 0; i < smp_lngth; ++i) {
        Serial.print(cnts[i]);
        Serial.print("\t");
        Serial.println(feature_vector[i]);

    }

    while(1) {}

}

main_test.cpp (7.21 KB)

example_features.cpp (2.09 KB)

What processor is this on?

For anything with caches this is entirely expected.

Did you break micros() with that code?

It looks like you are trying to time the "handle..." function. Isn't that an interrupt handler? So you can't really call it like that.

For timing interrupt handlers, sometimes you just have to resort to writing to a pin and then use an oscilloscope to measure the pulse length on the pin.

Sorry, Arduino Uno!

I don't know if I broke it. At the moment I use all timers. T2 for counting the clockcycli. I don't use micros(). I can disable T0 and T1, don't really need them for this speed measurement as I can directly start the ADC, without needing a fixed sample frequency.

handle_adc_res gets called from the main thread. A flag gets set in the adc interrupt, to know when it's done converting.
EDIT: oh you meant the naming of the function. yeah i'm bad at naming. That function "handles" the adc result that's stored in a buffer

Yeah, i measured it with an oscilloscope, you can still see the pin toggling in my program. But it's an old analog one, and I'd like to be a bit more precise.

Once you allow for the extra time taken to turn the external pin on and off, the scope is the most precise method.

If you don't aabsolutely need T0, then don't stomp on the Arduino code which uses it for millis() and micros(). Then you will have those useful functions available when you need them. You can even use delay() when occasion requires it.

This is useless...

void setup() {
...
    memset(cnts, 0, smp_lngth * sizeof(uint16_t));

Accessing (especially writing) flag.sample_rdy in setup without a critical section creates a race condition. That needs to be fixed before continuing.

MorganS:
Once you allow for the extra time taken to turn the external pin on and off, the scope is the most precise method.

Yeah, I know, but my scope is an old analogue one. I can't do single triggering and when the resolution is around 0.1 ms, it jumps too much.

MorganS:
If you don't aabsolutely need T0, then don't stomp on the Arduino code which uses it for millis() and micros(). Then you will have those useful functions available when you need them. You can even use delay() when occasion requires it.

I don't use Arduino code, so why would I need them? This is an honest informative question. What I get from your posts is that you're trying to push me to using micros()'s and delay()'s. delay()'s to put after each time the function has executed? What's the difference with how I do it now?

[quote author=Coding Badly link=msg=3736525 date=1526594683]
Accessing (especially writing) flag.sample_rdy in setup without a critical section creates a race condition. That needs to be fixed before continuing.[/quote]

Yeah, you're right about the memset, don't need that one. Guess it's a leftover from a previous method I tried.

What do you mean by 'critical section'? Where can the race condition occure?
It already occured to me that overflows in samples can happen. That's why I directly clear the flag after reading it. I also have a TODO in my code to make a copy of the flag and later check it, so the program knows that the previous sample has been dealt with.

A bit more about the program. I have 4 input adc channels with sample frequency Fs. So timer0 generates that sample signal. After period 1/Fs, the interrupt starts the first adc conversion. After the first conversion, the flag gets set, and the second conversion gets started. Meanwhile the sample is used to update the feature vector.
Second sample is ready, flag gets set, third conversion is started, second sample is used to update second part of feature vector.
and so on till the 4 adc channels are read. after that, its pause or do other stuff till the new interrupt of timer0.

Speklap:
What do you mean by 'critical section'?

Speklap:
I don't use Arduino code, so why would I need them? This is an honest informative question. What I get from your posts is that you're trying to push me to using micros()'s and delay()'s. delay()'s to put after each time the function has executed? What's the difference with how I do it now?

You're posting on an Arduino forum. Usually that means you're using Arduino.* If you're not using Arduino then our advice here will be mostly unhelpful.

  • Here, I mean the Arduino core, which provides useful functions such as millis() and delay(). If you're not using the Arduino core then you're just using a processor that has the ability to run the Arduino core or something completely different.

Thanks, interesting read. I will try to rewrite it. The problem is I don't want to miss an interrupt, so the time interrupts are disabled, should be as short as possible as well.

MorganS:
You're posting on an Arduino forum. Usually that means you're using Arduino.* If you're not using Arduino then our advice here will be mostly unhelpful.

  • Here, I mean the Arduino core, which provides useful functions such as millis() and delay(). If you're not using the Arduino core then you're just using a processor that has the ability to run the Arduino core or something completely different.

Oh ok, guess I should also change PORTB usages to digitalwrites and pinmode if I want help here. Better not tell here that I'm using platformio core and sublime text ...
I'm using the arduino because it's a nice piece of hardware, easy to test stuff on and cheap. And because I thought it has a nice and helpfull community. The code that I'm writing only works on the ATmega328P, it is written for the ATmega328P. I want to know more about the insides of a processor and how to use it properly. I won't get that from always using functions that are defined for every arduino board.

Yes, I admit that micros() is a usefull function but I was already using T0 and didn't want to change it. Creating T2 with prescaler 1 and an overflow flag is done rather quickly. Now that I'm rewriting anyways, I'll include the micros() function and ditch the timer counter.

Yes, the first sentence is sarcastic, since I can't stand behind that answer. I thought it would be because it's more precise or better, not because I just need to cause it's part of the Arduino Core. I'm also not using analogueRead, but using the registers, cause I need speed perfomance. analogueRead just takes to long.

The code that I'm writing only works on the ATmega326P, it is written for the ATmega326P.

Have you got a reference for the datasheet?

Haha funny, ofcourse I meant 328P ...
A bit low to call me out on that.

Speklap:
Thanks, interesting read. I will try to rewrite it. The problem is I don't want to miss an interrupt, so the time interrupts are disabled, should be as short as possible as well.

Interrupts are queued. So if it occurs during the time that interrupts are disabled, it will be handled as soon as you re-enable. Note that only one of each type is queued: one timer0, one timer1, one serial, one pin-change....

But a safe copy of an int should only take 2 instructions. Not long at all.

MorganS:
Interrupts are queued. So if it occurs during the time that interrupts are disabled, it will be handled as soon as you re-enable. Note that only one of each type is queued: one timer0, one timer1, one serial, one pin-change....

But a safe copy of an int should only take 2 instructions. Not long at all.

Ok good to know. I just read it also in the link to gammon forum. -the first time I read it diagonally as I didn't have much time.
I have 1 now timer to have a fixed sample rate, now T2. This interrupt starts the first conversion. When complete, the adc results gets put into a buffer, and the next channel conversion starts. (4 channels in total).

The top signal is the toggling of T2. The bottom signal is HIGH when the ADC is busy converting.

I rewrote the program. I disabled the interrupts when changing the flag, and made a new buffer, so only 1 value gets changed in the interrupt handler.
The first thing that I noticed was that the difference between the first and the others was gone.

I have really no idea how this comes. And I'm doubting if it is because I changed my program or because I used micros().
Though I will have to do some more debugging since if I set a pin high during the handling of the adc results (in the main thread), I get no proper output, linked to the output of pin_adc.

const uint8_t length = 12;
unsigned long time_passed[length];

const uint8_t pin_sample = 0;   // toggle after each period
const uint8_t pin_adc = 1;      // during conversion high

struct flag {

    uint8_t window_rdy:1;
    uint8_t fv_finalize:1;

    uint8_t sample_rdy:1;
    uint8_t channel_rdy:2;

};

volatile uint8_t adc_channel;
volatile struct flag flag;

volatile T buffer_sample[N_SENS]; 
T buffer[3 * N_SENS];
T *ptr_buffer[N_SENS];

T feature_vector[N_TOT_FEAT];
T *ptr_fv[N_SENS];
T fv_save[N_TOT_FEAT];
T *ptr_fv_save[N_SENS];

#ifndef __FL__

T2 temp_rms[N_SENS];
Tsmall carry_mav[N_SENS];
Tsmall carry_wfl[N_SENS];

#endif

/// @brief set bit of an 8bit register
/// \param reg
/// \param bit
static inline void set_bit(volatile uint8_t *reg, uint8_t bit) {
    *reg |= (1 << bit);
}

/// @brief clear bit of an 8bit register
/// \param reg
/// \param bit
static inline void clr_bit(volatile uint8_t *reg, uint8_t bit) {
    *reg &= ~(1 << bit);
}

/// @brief toggle bit of an 8bit register
/// \param reg
/// \param bit
static inline void tgl_bit(volatile uint8_t *reg, uint8_t bit) {
    *reg ^= (1 << bit);
}


static inline void arr_set(T *arr, T val, uint8_t n) {

    #ifdef __FL__
    uint8_t i = 0;
    for (; i < n; ++i)
        arr[i] = 0;

    #else
    memset(arr, val, n * sizeof(T));

    #endif

}

void init_global_var() {

    arr_set(feature_vector, 0, N_TOT_FEAT);
    arr_set(fv_save, 0, N_TOT_FEAT);
    arr_set(buffer, 0, 3 * N_SENS);

    adc_channel = 0;

    uint8_t i = 0;
    for (; i < N_SENS; ++i) {

        #ifndef __FL__
    	temp_rms[i] = 0;
    	carry_mav[i] = 0;
    	carry_wfl[i] = 0;
        #endif
        ptr_buffer[i] = buffer + N_SENS * i + 2;
        ptr_fv[i] = feature_vector + N_FEAT * i;
        ptr_fv_save[i] = feature_vector + N_FEAT * i;

    }

}

void init_sample_timer() {

    TCCR2A = 0;                             //T2 control register
    TCCR2A |= (1 << WGM21);
    TCCR2B = 0;

    //OCR0A = f_clk / (f*prescale) - 1
    OCR2A = 250;                            //set compare match register
    TCNT2 = 0;                              //init counter value to 0
    TIMSK2 |= (1 << OCIE2A);                //enable T0 interrupt

}

void enable_sample_timer() {

    TCCR2B |= (1 << CS22)|                  //set prescale                 64
              (0 << CS21)|
              (0 << CS20);

}

void init_ADC() {

    ADMUX = 0;
    ADMUX |= (1 << REFS0);                  //AVcc voltage reference

    ADMUX = 0;
    ADMUX |= (1 << REFS0);                  //AVcc voltage reference

    ADCSRA = 0;
    ADCSRA |= (1 << ADPS2)|                 //ADC prescale = 128 -> 125kHz
              (1 << ADPS1)|
              (1 << ADPS0);
    ADCSRA |= 1 << ADEN;

    uint8_t i = 0;
    for (; i < 20; ++i) {

        ADCSRA |= 1 << ADSC;
        while (ADCSRA & (1 << ADSC)) {};
        ADCL;

    }

    ADCSRA = 0;
    ADCSRA |= (0 << ADATE)|                 //auto trigger disable
              (1 << ADIE)|                  //ADC interrupt enable
              (1 << ADPS2)|                 //ADC prescale = 128 -> 125kHz
              (1 << ADPS1)|
              (1 << ADPS0);

    ADCSRB = 0;

}

void enable_ADC() {

    ADCSRA |= 1 << ADEN;                    //enable ADC

}

void disable_ADC() {

    ADCSRA &= ~(1 << ADEN);                    //enable ADC

}

void start_conversion(uint8_t adc_ch) {

    set_bit(&PORTB, pin_adc);
    ADMUX = (ADMUX & 0xF0)|                 //safety mask
            (adc_ch & 0x0F);                //set ADC channel
    ADCSRA |= 1 << ADSC;

}

void handle_adc_res(uint8_t adc_ch) {

    #ifdef __FL__
    update_feat_vec(ptr_fv[adc_ch], ptr_buffer[adc_ch]);

    #else
    update_feat_vec(ptr_fv[adc_ch], 
                    carry_mav + adc_ch, 
                    temp_rms + adc_ch, 
                    carry_wfl + adc_ch,
                    ptr_buffer[adc_ch]);
    
    #endif

}

void setup() {

    Serial.begin(9600);
    delay(2000);

    cli();
    flag.sample_rdy = 0;
    flag.window_rdy = 0;
    flag.fv_finalize = 0;

	DDRB = 0xFF;
	PORTB = 0x00;

	init_global_var();
	init_sample_timer();
	init_ADC();
	
    sei();

    enable_ADC();  
    enable_sample_timer();

    uint8_t i = 0;
    while (i < length) {

        while(!flag.sample_rdy) {}

        cli();
        flag.window_rdy = 0;
        uint8_t ch = flag.channel_rdy;
        *(ptr_buffer[ch]) = buffer_sample[ch];
        sei();
        
        unsigned long time_start = micros();

        handle_adc_res(ch);

        *(ptr_buffer[ch] - 2) = *(ptr_buffer[ch] - 1);
        *(ptr_buffer[ch] - 1) = *(ptr_buffer[ch]);
        
        time_passed[i] = micros() - time_start;

        ++i;

    }

}

void loop() {

    uint8_t i;
    Serial.println(F("speed results"));
    for (i = 0; i < length; ++i) {
        Serial.print(time_passed[i]);
        Serial.print("\t");
        Serial.println(feature_vector[i]);

    }

    while(1) {}

}

// sets sampling frequency
// starts conversion of first sensor
ISR(TIMER2_COMPA_vect) {

    tgl_bit(&PORTB, pin_sample);
    
    adc_channel = 0;
    start_conversion(adc_channel);

}

// interrupt handler of the adc
// saves sample + starts next sensor conversion
ISR(ADC_vect) {

    clr_bit(&PORTB, pin_adc);

    // read ADCL
    uint8_t lsb = ADCL;
    // read ADCH and save 10 bit value
    buffer_sample[adc_channel] = (ADCH << 8) | lsb;

    flag.channel_rdy = adc_channel;
    flag.sample_rdy = 1;

    if (++adc_channel < N_SENS)
        start_conversion(adc_channel);

}