Microsecond resolution output

Hi all,

I am working on a project that sends and receives data from a Nintendo 64 controller. The controller is a 3.3V device with a single bidirectional Signal wire. A logical 1 is LOW for 1us, then HIGH for 3us. A logical 0 is LOW for 3us, then HIGH for 1us. The line is HIGH while idle. Signals start on a falling edge.

I have working code for an Ardunio Uno, which is a 16MHz, 5V device. It uses the open-drain/open-collector technique to send either LOW or floating output, which is pulled up to 3.3V inside the controller.

I am trying to rewrite the functionality for an Arduino Pro Mini, which is an 8MHz, 3.3V device.

The author of the Uno code has used a combination of asm volatile commands and goto statements to ensure that there are 16 cycles per microsecond of desired output. This usually requires a number of "nop" commands to pad out the timing.

On the Pro Mini, I will have only 8 cycles per microsecond. I understand that I will need to use PORTD directly because digitalWrite will take too long.

My question: How can I examine my code/assembly output to ensure that there are 8 cycles per microsecond in the time-sensitive sections?

It would help if you posted your current code, but one I found on the web already uses PORTD.

How can I examine my code/assembly output to ensure that there are 8 cycles per microsecond in the time-sensitive sections?

Er, well, you look at the datasheet for the processor and count cycles. If you are going from 16 MHz to 8 MHz then I presume this part is easy:

               // nop block 2
                // we'll wait only 2us to sync up with both conditions
                // at the bottom of the if statement
                asm volatile ("nop\nnop\nnop\nnop\nnop\n"  
                              "nop\nnop\nnop\nnop\nnop\n"  
                              "nop\nnop\nnop\nnop\nnop\n"  
                              "nop\nnop\nnop\nnop\nnop\n"  
                              "nop\nnop\nnop\nnop\nnop\n"  
                              "nop\nnop\nnop\nnop\nnop\n"  
                              );

Just halve the number of NOPs. (Although I only count 30 there, which is 30 x 1 cycle which would be slightly less than 2 µS at 16 MHz). If you have a logic analyzer it might help time your code.

Judging by the fact that this guy's comments don't match his code, it may not be all that critical:

               // remain low for 3us, then go high for 1us
                // nop block 3
                asm volatile ("nop\nnop\nnop\nnop\nnop\n"  
                              "nop\nnop\nnop\nnop\nnop\n"  
                              "nop\nnop\nnop\nnop\nnop\n"  
                              "nop\nnop\nnop\nnop\nnop\n"  
                              "nop\nnop\nnop\nnop\nnop\n"  
                              "nop\nnop\nnop\nnop\nnop\n"  
                              "nop\nnop\nnop\nnop\nnop\n"  
                              "nop\n");

I don't have one of those controllers to test with, but I am wondering if you could just do it with SPI. Let the hardware do the work for you.

Thanks for the response. It looks like you found the code I was using.

There's a certain amount of tolerance in the signal: the start bit is LOW for 1us, the end bit is HIGH for 1us, and there's 2us in the middle for the actual data.

Here's the code that I'm looking at

// Starting a bit, set the line low
            asm volatile (";Setting line to low");
            N64_LOW; // 1 op, 2 cycles

            asm volatile (";branching");
            if (*buffer >> 7) {
                asm volatile (";Bit is a 1");
                // 1 bit
                // remain low for 1us, then go high for 3us
                // nop block 1
                asm volatile ("nop\nnop\nnop\nnop\nnop\n");
                
                asm volatile (";Setting line to high");
                N64_HIGH;

I don't know how many cycles the *buffer >> 7 line will use. There's only 5 nop cycles, and I'm worried that the existing code will use more than 8 cycles to do its logic, stretching the start bit out for too long.

I'll have a look into SPI to see if that's a feasible alternative.

I've ordered one from eBay for the challenge of making it work. :slight_smile:

I don't know how many cycles the *buffer >> 7 line will use.

Run your .elf file through the disassembler.

Then you can count cycles by comparing to the datasheet.

Isn't this sort of timing the sort of thing the built-in timers would be good at?

...R

After reading this:

http://afermiano.com/index.php/n64-controller-protocol

I withdraw my remarks about SPI, that doesn't look appropriate.

However I think you can probably forget about really tight timing loops. What you need is an interrupt on the low bit (receiving it), and then wait 3 µS and sample the incoming bit value.

Sending should be pretty trivial if you can check your results with a logic analyzer or scope, because you just have a loop and have to get it to iterate at the right rate.

In fact I would be prepared to bet that, since it is self-clocking, you could tolerate a bit of jitter. What I mean by that is, since each bit starts on a high->low transition, a few hundred nanoseconds either way probably won't matter.

Robin2:
Isn't this sort of timing the sort of thing the built-in timers would be good at?

...R

I was thinking that too, but I'll reserve a decision until I've tried it. The timers would have to generate an interrupt and an interrupt takes a microsecond or two to be serviced.

Glad you've ordered a controller to help get this working!! :slight_smile:

The code I'm looking at does indeed disable interrupts for the time-sesitive stuff:

// don't want interrupts getting in the way
    noInterrupts();
    // send those 3 bytes
    N64_send(command, 1);
    // read in data and dump it to N64_raw_dump
    N64_get();
    // end of time sensitive code
    interrupts();

By the way, my end-goal with this project is to create an adapter that turns the N64 Controller into a Bluetooth Human Interface Device that can be used on Android and iOS emulators :slight_smile:

Isn't there some mode in which the timer directly sets a pin?

Now that I have an oscilloscope gathering dust I may try it myself later today.

...R

Yes, there is, absolutely. But to set a pin corresponding to an 8 or 16-bit value, I don't think so. PWM for example would set up a (say) 25% duty cycle. But to vary that from one bit to the next is the tricky bit.

Just as a reference, this is related:

There I output varying data (VGA pixels) at a fast rate, and using tight timing loops. The timers are used to trigger off a sequence (like a scan line) but the individual pixels are done with carefully calculated tight loops.

I can't believe that this project would be any harder than that.

That code output a different bit every 125 nS.

I have written a draft version of N64_send() that runs at 8MHz. All 4 control paths take 32 cycles ie 4us to execute. Unlike the existing function, my version does not devour the input buffer.

I have made some assumptions about the number of cycles used in IF statements. If my assumptions are incorrect, I will have to save a few cycles somehow.

void N64_send(const unsigned char* buffer, unsigned int length)
{
  int shift = 8;
  int counter = 0;
  
  //Four control paths: 
  //bit is 0, not last bit in byte
  //bit is 1, not last bit in byte
  //bit is 0, last bit in byte
  //bit is 1, last bit in byte
  //Each control path should take exactly 32 cycles ie 4 microseconds @ 8Mhz
  byte_loop:
  {
   bit_loop:
   {
      N64_LOW; //two cycles
      asm volatile("nop\nnop\n");     
      --shift;
 
      if(buffer[counter] >> shift) //how many cycles? assuming 3
      {
        N64_HIGH; //two cycles
        asm volatile("nop\nnop\nnop\nnop\n"
                     "nop\nnop\nnop\nnop\n"
                     "nop\nnop\nnop\nnop\n"
                     "nop\nnop\nnop\nnop\n");                    
      }     
      else
      {
        asm volatile("nop\nnop\nnop\nnop\n"
                     "nop\nnop\nnop\nnop\n"
                     "nop\nnop\nnop\nnop\n"
                     "nop\nnop\nnop\nnop\n");
        N64_HIGH;                     
      }     
      
     if(shift)//how many cycles? assuming 1
     {
       asm volatile("nop\nnop\nnop\nnop\n");
       goto bit_loop;
     }      
     
     ++counter;
     
     if(counter < length) //how many cycles? assuming 2
     {
       shift = 8;
       goto byte_loop;
     }
     
     asm volatile("nop\nnop\n");
     
     //Stop bit
     N64_LOW;
     asm volatile("nop\nnop\nnop\nnop\nnop\n"
                  "nop\nnop\n");
                  
     N64_HIGH;
     asm volatile("nop\nnop\nnop\nnop\nnop\nnop\n"
                  "nop\nnop\nnop\nnop\nnop\nnop\n"
                  "nop\nnop\nnop\nnop\nnop\nnop\n"
                  "nop\nnop\nnop\nnop\nnop\nnop\n");                  
    } 
  }
}

Your assumptions about clock cycles are way out. I've disassembled your code and added the source back in so you can see where:

00000100 <_Z8N64_sendPKhj>:
 100:   20 e2           ldi     r18, 0x20       ; 32   (1)
 102:   23 b9           out     0x03, r18       ; 3   (1)    // toggle D13
 104:   fc 01           movw    r30, r24   (1)
 106:   27 e0           ldi     r18, 0x07       ; 7   (1)
 108:   30 e0           ldi     r19, 0x00       ; 0   (1)
 10a:   40 e0           ldi     r20, 0x00       ; 0   (1)
 10c:   50 e0           ldi     r21, 0x00       ; 0   (1)

byte_loop:
bit_loop:

 N64_LOW; //two cycles

 10e:   52 9a           sbi     0x0a, 2 ; 10   (2)

 asm volatile("nop\nnop\nnop\nnop\nnop");

 110:   00 00           nop   (1)
 112:   00 00           nop   (1)
 114:   00 00           nop   (1)
 116:   00 00           nop   (1)
 118:   00 00           nop   (1)

 if(*current_byte >> shift) //how many cycles? assuming 2

 11a:   80 81           ld      r24, Z   (2)
 11c:   90 e0           ldi     r25, 0x00       ; 0   (1)
 11e:   02 2e           mov     r0, r18   (1)
 120:   02 c0           rjmp    .+4             ; 0x126 <_Z8N64_sendPKhj+0x26>   (2)
 122:   95 95           asr     r25   (1)
 124:   87 95           ror     r24   (1)
 126:   0a 94           dec     r0   (1)
 128:   e2 f7           brpl    .-8             ; 0x122 <_Z8N64_sendPKhj+0x22>   (1/2)


 12a:   89 2b           or      r24, r25   (1)
 12c:   91 f0           breq    .+36            ; 0x152 <_Z8N64_sendPKhj+0x52>   (1/2)

  N64_HIGH;

 12e:   52 98           cbi     0x0a, 2 ; 10   (2)

  asm volatile("nop\nnop\nnop\nnop\n"
               "nop\nnop\nnop\nnop\n"
               "nop\nnop\nnop\nnop\n"
               "nop\nnop\nnop\nnop\n");                    

 130:   00 00           nop   (1)
 132:   00 00           nop   (1)
 134:   00 00           nop   (1)
 136:   00 00           nop   (1)
 138:   00 00           nop   (1)
 13a:   00 00           nop   (1)
 13c:   00 00           nop   (1)
 13e:   00 00           nop   (1)
 140:   00 00           nop   (1)
 142:   00 00           nop   (1)
 144:   00 00           nop   (1)
 146:   00 00           nop   (1)
 148:   00 00           nop   (1)
 14a:   00 00           nop   (1)
 14c:   00 00           nop   (1)
 14e:   00 00           nop   (1)

 150:   11 c0           rjmp    .+34            ; 0x174 <_Z8N64_sendPKhj+0x74>   (2)

else

 asm volatile("nop\nnop\nnop\nnop\n"
              "nop\nnop\nnop\nnop\n"
              "nop\nnop\nnop\nnop\n"
              "nop\nnop\nnop\nnop\n");

 152:   00 00           nop   (1)
 154:   00 00           nop   (1)
 156:   00 00           nop   (1)
 158:   00 00           nop   (1)
 15a:   00 00           nop   (1)
 15c:   00 00           nop   (1)
 15e:   00 00           nop   (1)
 160:   00 00           nop   (1)
 162:   00 00           nop   (1)
 164:   00 00           nop   (1)
 166:   00 00           nop   (1)
 168:   00 00           nop   (1)
 16a:   00 00           nop   (1)
 16c:   00 00           nop   (1)
 16e:   00 00           nop   (1)
 170:   00 00           nop   (1)

N64_HIGH;

 172:   52 98           cbi     0x0a, 2 ; 10   (2)

 // end of if --

 --shift;

 174:   21 50           subi    r18, 0x01       ; 1   (1)
 176:   30 40           sbci    r19, 0x00       ; 0   (1)

 if(shift >= 0)//how many cycles? assuming 1

 178:   37 fd           sbrc    r19, 7   (1/2/3)
 17a:   05 c0           rjmp    .+10            ; 0x186 <_Z8N64_sendPKhj+0x86>   (2)

 asm volatile("nop\nnop\nnop\nnop\n");

 17c:   00 00           nop   (1)
 17e:   00 00           nop   (1)
 180:   00 00           nop   (1)
 182:   00 00           nop   (1)

 goto bit_loop;

 184:   c4 cf           rjmp    .-120           ; 0x10e <_Z8N64_sendPKhj+0xe>   (2)

 ++counter;

 186:   4f 5f           subi    r20, 0xFF       ; 255   (1)
 188:   5f 4f           sbci    r21, 0xFF       ; 255   (1)

 if(counter < length) //how many cycles? assuming 1

 18a:   46 17           cp      r20, r22   (1)
 18c:   57 07           cpc     r21, r23   (1)
 18e:   20 f4           brcc    .+8             ; 0x198 <_Z8N64_sendPKhj+0x98>   (1/2)

 ++current_byte;

 190:   31 96           adiw    r30, 0x01       ; 1   (2)

 shift = 7;

 192:   27 e0           ldi     r18, 0x07       ; 7   (1)
 194:   30 e0           ldi     r19, 0x00       ; 0   (1)

 goto byte_loop;

 196:   bb cf           rjmp    .-138           ; 0x10e <_Z8N64_sendPKhj+0xe>   (2)

 N64_LOW;

 198:   52 9a           sbi     0x0a, 2 ; 10   (2)

 asm volatile("nop\nnop\nnop\nnop\nnop\n"
              "nop\nnop\n");

 19a:   00 00           nop   (1)
 19c:   00 00           nop   (1)
 19e:   00 00           nop   (1)
 1a0:   00 00           nop   (1)
 1a2:   00 00           nop   (1)
 1a4:   00 00           nop   (1)
 1a6:   00 00           nop   (1)

 N64_HIGH;

 1a8:   52 98           cbi     0x0a, 2 ; 10   (2)

  asm volatile("nop\nnop\nnop\nnop\nnop\nnop\n"
                "nop\nnop\nnop\nnop\nnop\nnop\n"
                "nop\nnop\nnop\nnop\nnop\nnop\n"
                "nop\nnop\nnop\nnop\nnop\nnop\n");                  

 1aa:   00 00           nop   (1)
 1ac:   00 00           nop   (1)
 1ae:   00 00           nop   (1)
 1b0:   00 00           nop   (1)
 1b2:   00 00           nop   (1)
 1b4:   00 00           nop   (1)
 1b6:   00 00           nop   (1)
 1b8:   00 00           nop   (1)
 1ba:   00 00           nop   (1)
 1bc:   00 00           nop   (1)
 1be:   00 00           nop   (1)
 1c0:   00 00           nop   (1)
 1c2:   00 00           nop   (1)
 1c4:   00 00           nop   (1)
 1c6:   00 00           nop   (1)
 1c8:   00 00           nop   (1)
 1ca:   00 00           nop   (1)
 1cc:   00 00           nop   (1)
 1ce:   00 00           nop   (1)
 1d0:   00 00           nop   (1)
 1d2:   00 00           nop   (1)
 1d4:   00 00           nop   (1)
 1d6:   00 00           nop   (1)
 1d8:   00 00           nop   (1)

// return from function
 1da:   08 95           ret   (4)

I added an extra line to toggle D13 because I was initially not getting a result, but that won't affect the results.

You are using int where you can probably use byte. That generates a lot more code.

Just as one example, shifting something right actually involves a loop, so the timing would vary depending on the number of bits:

 122:   95 95           asr     r25   (1)
 124:   87 95           ror     r24   (1)
 126:   0a 94           dec     r0   (1)
 128:   e2 f7           brpl    .-8             ; 0x122 <_Z8N64_sendPKhj+0x22>   (1/2)

I timed about 4.3 µS per bit, and that was running at 16 MHz!

I've never looked at disassembled source before, but I'm assuming the number in brackets is the number of clock cycles. If so, what do 1/2 and 1/2/3 represent?

The first problem is obviously the bit shift. I assumed that this operation happened in constant time -- but apparently not.

How did you time the code to get the 4.3µs figure? I'm not familiar with this technique.

what does 1/2 represent?

One or two cycles depending on the result of the operation.
For example if a branch is taken or not.

LogicalUnit:
I've never looked at disassembled source before, but I'm assuming the number in brackets is the number of clock cycles.

Yes it is.

If so, what do 1/2 and 1/2/3 represent?

What Mike said. For example with BRPL, one cycle if false, two cycles if true. In other words, branching takes longer, probably because the next instruction is pre-fetched by the processor and it now has to discard it.

How did you time the code to get the 4.3µs figure? I'm not familiar with this technique.

With my logic analyzer.

Sounds like I need to buy an oscilloscope and/or logic analyser. Can you make a recommendation? :smiley:

http://www.robotshop.com/ca/en/saleae-16-channel-100mhz-logic-analyzer.html
They have new versions:

EDIT:
You can download the software and try it out without buying the hardware.