time to process each line?

i'm doing something very time sensetive and i want to wait a specific time t_i before executing the next digitalWrite command.

void loop()
{
digitalWrite(i,HIGH);
digitalWrite(i,LOW);

delayMicroseconds(t_i);
i++;
if(i==4)
{
    i=1;
}
}

each code must take some time t_j, what is t_j? so i can subtract

variable, depending on the statement. You'd have to look at the machine code produced to be sure, and even then you'd be subject to changes from one compiler version to another.

bmarconi:
i’m doing something very time sensetive and i want to wait a specific time t_i before executing the next digitalWrite command.

void loop()

{
digitalWrite(i,HIGH);
digitalWrite(i,LOW);

delayMicroseconds(t_i);
i++;
if(i==4)
{
   i=1;
}
}





each code must take some time t_j, what is t_j? so i can subtract

Each “line” of C/C++ code that’s compiled gets transformed ultimately into machine language which looks something like this:

[b]
57 81           ldd     r21, Z+7        ; 0x07
24 81           ldd     r18, Z+4        ; 0x04
35 81           ldd     r19, Z+5        ; 0x05
42 17           cp      r20, r18
53 07           cpc     r21, r19
44 f4           brge    .+16            ; 0x1dea <fputc+0x38>
a0 81           ld      r26, Z
b1 81           ldd     r27, Z+1        ; 0x01
9d 01           movw    r18, r26
2f 5f           subi    r18, 0xFF       ; 255
3f 4f           sbci    r19, 0xFF       ; 255
31 83           std     Z+1, r19        ; 0x01
20 83           st      Z, r18
8c 93           st      X, r24[/b]

The AVR processor is quite efficient and executes a lot of instructions in only 1 or 2 clock cycles (a clock cycle is 1/16 MHz or 62.5 microseconds nanoseconds).

However, to know exactly what instructions are generated for something like “digitalWrite (x, y)”, you would have to compile your code, then disassemble it, look up how many clock cycles each instruction uses and add them up.

Plus, the actual code (as well as, of course, how long it takes to run) varies depending on the compiler optimizations. The default optimization setting for the Arduino IDE is “-Os” which means “optimize the code to be as small as possible without regard to speed”. If you use a different compiler flag such as “-O0” (which means “minimal optimization, larger code but faster code”, there will be a lot more machine instructions generated for a particular instruction and obviously each instruction will take longer to run.

On top of that, a seemingly simple instruction like "digitalWrite (x, y) does quite a few things “under the hood”. Here is the actual C code for digitalWrite:

void digitalWrite (uint8_t pin, uint8_t val)
{
    uint8_t timer = digitalPinToTimer (pin);
    uint8_t bit = digitalPinToBitMask (pin);
    uint8_t port = digitalPinToPort (pin);
    volatile uint8_t *out;

    if (port == NOT_A_PIN) { return; }

    // If the pin that support PWM output, we need to turn it off
    // before doing a digital write.
    if (timer != NOT_ON_TIMER) { turnOffPWM (timer); }

    out = portOutputRegister (port);

    uint8_t oldSREG = SREG;
    cli ();

    if (val == LOW) {
        *out &= ~bit;

    } else {
        *out |= bit;
    }

    SREG = oldSREG;
}

So, as you can see, figuring out how many clock cycles, and therefore the EXACT time any particular code takes to run is quite a large task.

However, if all you need is a precise DELAY, things are a lot easier.

You can use the stock delay code that comes with Arduino and it’s quite accurate. Or, you can use this (which is what I use) shamelessly borrowed from the GLCD project:


[

[b]delay.h[/b]

](http://www.hobbytent.com/other/files/delay.h)

This file REPLACES the stock delay.h in the tools/avr/include/util directory and provides a nice, clean and accurate delay with millisecond, microsecond or nanosecond range, and it accepts variables or static values (that is, you can say _delay_ms (100); or x=100; _delay_ms(x) and both work - and work accurately).

Of course, you will still run into the problem of execution time overhead of other instructions. For example, if you do this:

digitalWrite (x, y);
_delay_ms (1000);
digitalWrite (x, y);

it will take slightly more that 1.0 seconds because of the time the digital writes require.

A better thing to do is something like this (pseudo code):

t = start_time
do something
do something else
while current_time > t + 1000 wait();

This will take exactly 1 second to run regardless of how long the inside instructions take because you are watching ELAPSED time.

Lastly, be aware that, incredibly, the Arduino uses a ceramic resonator as a clock source and this in itself is rather inaccurate as well as temperature sensitive. For more precise timing, I suggest replacing it with a real crystal (you’ll notice that things that NEED precision (such as the serial interface) use a crystal which proves my point).

Hope this helps.

Snippets.

If you are serious, you will tell us just what it is you need to do, and exactly how much time is involved.

Krupski:
The AVR processor is quite efficient and executes a lot of instructions in only 1 or 2 clock cycles (a clock cycle is 1/16 MHz or 62.5 microseconds).

I think that one instruction takes less than 1 microsecond. Probably you meant 62.5nanoseconds?

If you program in assembler then you'll know exactly how fast your program is. But then, assembler programming is harder than programming in C++. If you use interrupts, those timing can be different. Because an interrupt can arrive whenever your program is running.

LMI:
I think that one instruction takes less than 1 microsecond. Probably you meant 62.5nanoseconds?

If you program in assembler then you'll know exactly how fast your program is. But then, assembler programming is harder than programming in C++. If you use interrupts, those timing can be different. Because an interrupt can arrive whenever your program is running.

Yes, my mistake. One clock cycle for a 16 MHz AVR is 62.5 NANO seconds, not microseconds. My bad.

I corrected the text in my post as well. Thanks for pointing it out.

LMI:
If you program in assembler then you'll know exactly how fast your program is.

I've done a lot of assembler programming in Motorola 6809 and 68HC11 and to a lesser extent Intel x86 and many times had the need for a PRECISE delay or execution time and actually did take the data sheet, looked up how many clock cycles each instruction used and added them up.

I should learn AVR assembler, but it seems so foreign to what I'm used to (Motorola and Intel). Oh well, ASM is ASM... if I can learn one, I can learn any one I guess......

Krupski:
I should learn AVR assembler, but it seems so foreign to what I'm used to (Motorola and Intel). Oh well, ASM is ASM... if I can learn one, I can learn any one I guess......

Amen to that.

The 6809 was so neat!

Paul__B:
Amen to that.

The 6809 was so neat!

The 6809 is my all time favorite processor. I love the PCR (program counter relative) instructions, the indirect addressing ability, the ability to push or pull any or all registers with a single instruction (the operand bits represented each register) and the auto incrementing / decrementing index registers (like [b]LDD ,X++[/b] would load the 16 bit D register from memory pointed to by X and also increment X by 2). A single + incremented by 1 for 8 bit reads or writes.

I did a lot of code (drivers) for OS-9 Level II which, if you don't know, was (is?) a Unix-like multitasking, multiuser operating system that runs on the 6809 and was used with the Radio Shack Color Computer III.

If you were ever "into" the Color Computer, you may have heard of RGB-DOS and the hard drives for the COCO RS-DOS and OS-9? RGB-DOS is me... I wrote it... as well as the OS9-LII hard disk drivers and the Dallas SmartWatch real time clock drivers for OS-9.

Well, I never got into OS-9, but I actually spent an inordinate amount of time over the years 1983 to 1986 or so, completely disassembling (using “Dynamite”) and re-writing FLEX09, including the CoCo version to compact and debug the code, including many of the associated utilities (such as making the date functions Y2000 compliant).

Enthusiasm for OS-9 was limited by not having the 6829 MMU and lots of RAM.

Sadly, my CPU board faulted and I have not repaired it, and the odd format (single-density track 0) of the FLEX disks has prevented me from salvaging them in the time since, so FWIW, that work is essentially lost. I also completed the conversion of a BASIC00 into BASIC09 which I do have somewhere here on four 2K EPROMs, and a printout of that source in a box somewhere.

Krupski:
I've done a lot of assembler programming in Motorola 6809 and 68HC11 and to a lesser extent Intel x86 and many times had the need for a PRECISE delay or execution time and actually did take the data sheet, looked up how many clock cycles each instruction used and added them up.

I should learn AVR assembler, but it seems so foreign to what I'm used to (Motorola and Intel). Oh well, ASM is ASM... if I can learn one, I can learn any one I guess......

I have done my share with Intel 8086/8088 Microsoft Masm. Fine processor and assembler but awful compiler. It was said to have quirks mode always on.

I think that the assembler language is not important with modern CPUs because most people use C or something like it and don't care if assembler is cryptic or not.

LMI:
I think that the assembler language is not important with modern CPUs because most people use C or something like it and don't care if assembler is cryptic or not.

Except Steve.

LMI:
I think that the assembler language is not important with modern CPUs because most people use C or something like it and don't care if assembler is cryptic or not.

Assembler is not at all cryptic once you've used it for a few minutes. For example, here's the code you would use to copy 256 bytes from address 0x2000 to 0xE000 in 6809 assembler:

;; HOW IT'S CALLED
;; somewhere else, call the subroutine
      bsr    mv      ; call mv (relative branch +127 to -128 bytes)
      lbsr   mv      ; call mv (relative branch, +32767 to -32768 bytes)
      jsr    mv      ; call mv by it's absolute address (non-relocatable)
;; code executes after "mv" is done


;; ACTUAL CODE
;; mv - code to copy 256 bytes from X to Y
mv    ldb    #0      ; 0 rolls around 256 times back to 0
      ldx    #$2000  ; source address
      ldy    #$e000  ; destination address
cp    lda    ,x+     ; get a source byte into a, auto-increment pointer
      sta    ,y+     ; write a to destination, auto-increment pointer
      decb           ; decrement b
      bne    cp      ; unsigned test if b is not equal to 0 then goto "cp"
      rts            ; return to caller

note "X" and "Y" are 16 bit registers, usually used as pointers. "A" and "B" are 8 bit registers, and can also be used as one 16 bit register called "D" which consists of A (hi byte) and B (lo byte).

There is also the "CC" (condition code) register which indicates negative, zero, overflow, interrupts on or off, etc and "SP" which is the stack pointer, the "U" which is a "user" register (16 bit can be used just like X and Y), the "PC" (program counter) which has the address of the current opcode or operand being executed and "DP" which is "direct page".

Direct page is nice because you can set it to point to any 256 byte block of memory, then all accesses to it only require 8 bits of the address (which doubles the access speed).

For example, if you need to have the fastest possible access to internal SRAM, you set DP to $00 (which is where SRAM is mapped). Then, all accesses to addresses $0000 to $00FF are twice as fast.

Also, sram and ports can be relocated anywhere in the address space if desired. And, I/O ports are addressed the same way as memory (there is no separate "code" and "data" sections or opcodes).

For example, if PORT A is located at address $1000, then you simply write to address $1000 and the bits appear on Port A. Each port and each bit of each port can be set as an input or output (like AVR), but there is no built in pullup facility. If you want the equivalent of "pinMode (X, INPUT_PULLUP)", you have to wire in an actual resistor.

Notice that hex values are denoted with a dollar sign rather than "0x" or "h". In Motorola-speak, hex 4000 is $4000, not 0x4000 or 4000h.

The 6809 supports relative addressing, so it's simple and easy to write code that executes anywhere in the memory space. If you assemble code starting at, say, address $2000, then load it into $C000, it still runs (if you write it as PCR or "Program Counter Relative").

There are different addressing modes. For example:

      ldx    $8000    ; load X with the 16 bit data at address $8000 and $8001
      ldx    #$8000   ; load X with the immediate (actual) number "$8000"
      ldx    [$8000]  ; load X with the 16 bit data located at the address pointed to by $8000-$8001  


$4000  12  ; $1234 is stored at address $4000
$4001  34
....
....
$8000  40  ; $4000 is stored at address $8000
$8001  00

So if the hex value "$4000" is stored at location $8000-8001 (big endian), and the value "$1234" is stored at location $4000-4001, then the code above does this:

(1) X == $4000
(2) X == $8000
(3) X == $1234

The last one needs to be looked at a few times for the concept to "sink in". :grin:

You can also arbitrarily get data from a pointer with an offset added. For example, if, starting at address $6000 you have a sequence of 0,1,2,3,4..... etc... stored (like this):

[b]$6000    00
$6001    01
$6002    02
....
$60FC    252
$60FD    253
$60FE    254
$60FF    255[/b]

Then point "X" to $6000 by saying "[b]ldx #$6000[/b]"

the code "[b]  lda  34,x[/b]" loads the value stored at $6022 into "A" (which is 34 decimal). Likewise, hex can be used: "[b]  lda $34,x[/b]" loads "A" with the value stored at $6034 which is hex 34 (decimal 52).

If no prefix is used with a number, it's considered to be decimal. For example "50" is decimal 50, while "$50" is hex 50 (decimal 80).

Wonderful processor... my favorite one actually.

Those old CPUs were meant to be programmed in assembler, so they were build so and Assembly language was kept simple. And in those days you could not build a big, complicated and keep it affordable.

I remember 6809.

There is a web site about ARM assembler, I think. But DUE is an Atmel version of ARM architecture, I wonder is it different.

I have some sample ARM assembler here: GitHub - WestfW/Minimal-ARM: Minimalist ARM Cortex Microcontroller development, in assembler.
(Blink, and HelloWorld on “Maple Mini” class hardware.)

here’s the code you would use to copy 256 bytes from address 0x2000 to 0xE000 in 6809 assembler

And you get to write it again for AVR, and again for AVR using the gnu assembler, and again for ARM (at least 2 different assemblers, again), and again for x86, and so on and on and on. Whereas if you had trustworthy C compilers, you could write it once and be done, and the code would run just about as fast. Here’s the code that avr-gcc produces:

  uint8_t i=0;
  uint8_t *x=(uint8_t*)0x2000, *y=(uint8_t*)0x4000;
  do {
    *x++ = *y++;
  } while (--i != 0);

;becomes

     ldi     r26, 0x00
     ldi     r27, 0x20
     ldi     r30, 0x00
     ldi     r31, 0x40
lp:  ld      r24, Z+
     st      X+, r24
     cp      r30, r1
     ldi     r24, 0x41
     cpc     r31, r24
     brne    lp

It’s slightly interesting in that it compares one of the final addresses instead of using a separate counter register (which incidentally leave a register free that might be useful elsewhere.) I’m not sure it’s quite how I would have written it if I had wanted the fastest code possible, but it’s an example of the sort of optimization that a compiler will routinely make (having that register free saves code/cycles over in this other place) that are rarely done by assembly programmers.

westfw:
....but it's an example of the sort of optimization that a compiler will routinely make (having that register free saves code/cycles over in this other place) that are rarely done by assembly programmers.

After one has been programming a particular processor in ASM, they end up seeing (or learning from others) little "one byte here, one byte there" optimizations.

For example, a subroutine may save some of the registers on the stack (the ones it will use) in order to be able to return them to their original values at the end. Now on a 6809 processor, the PULS (pull from stack) opcode has only one byte after it for an operand... each bit of that corresponds to a register. (Same with PSHS push to the stack).

Since the program counter is one of the registers, it, too, can be pulled from the stack.

So if a subroutine did something like this:

pshs a,b,x,y ; save a, b x and y
move some bytes
compare some bytes
yada-yada
puls a, b, x, y ; restore regs
rts (return)

... a "clever" programmer would "optimize" one byte by adding the program counter to the PULS instruction and doing away with the rts, like this:

pshs a, b, x, y
....
puls a, b, x, y, pc

Same result, saves 1 byte (and is a few clock cycles faster).

There are other "tricks" like that. A neat one is the $8C "skip two bytes" trick. Let's say you have some code like this:

[b]newline  lda  #$0a ; newline
return   lda  #$0d ; c/r
ding     lda  #$07 ; bell
         jsr  print ; print the char[/b]

Now you want to be able to print any of those characters by calling the appropriate label. Well, if you call "newline", the A register gets loaded with 0A, then the next line loads it with 0D, then 07. Obviously you need a jump of some sort between each one to avoid that problem. But a jump is 3 bytes, a short branch is 2 bytes, can we do it with only one? Yes! Simply put $8C between each one like this:

[b]newline  lda  #$0a ; newline
         fcb  $8C
return   lda  #$0d ; c/r
         fcb  $8C
ding     lda  #$07 ; bell
         ; don't need it here
         jsr  print ; print the char[/b]

The $8C is "compare X immediate", so the code above ends up being:
load a with 0A
compare X to $860D
compare X to $8607
jsr print

(the $86 is lda and the 0D or 07 is the operand).

So comparing X does nothing that we care about at this point, and it skips two bytes for one $8C.

There's also a "skip 1" which is a $21 (BRN = branch never).

I wonder if compilers do things like that for optimizations?

ARM also has a push/pop multiple, and it’s very common to see a subroutine end with a “pop multiple” that includes the PC. (actually, ARM lacks the usual stack-based “call” instruction, instead using an instruction that puts the return address in a register. So a subroutine usually starts by pushing that register (and any others needed), and ends by popping to the PC instead. Short stub functions don’t use the stack at all.)

One of the differences between modern computers and those of old (like the 6809) is the number of registers. When you only have one or two “accumulators”, the optimization decision tree is a lot smaller than when you have 12 or more truly general purpose registers.