Go Down

Topic: Arduino/ATmega328 C64 Emulator (Read 31792 times) previous topic - next topic

janost

I'm at work at the moment but will put something out tonight.

It is using the UART in SPImode like Nick's VGA code so its very similar.
So the TX pin is the video output from the videoshifter and D2 is sync.

You lose the TX/RX serialport on the Arduino but I don't think it matters for the application.

My code still outputs a Square box for every character since I have to solve the videoshift loading in 16cycles.

fungus


I'm at work at the moment but will put something out tonight.


No hurry. This is a medium-term project I'm thinking about.


It is using the UART in SPImode like Nick's VGA code so its very similar.
So the TX pin is the video output from the videoshifter and D2 is sync.

You lose the TX/RX serialport on the Arduino but I don't think it matters for the application.


It makes program development a real pain though.

Can it use the real SPI port instead?


My code still outputs a Square box for every character since I have to solve the videoshift loading in 16cycles.


That might be important...  :)
No, I don't answer questions sent in private messages (but I do accept thank-you notes...)

janost



I'm at work at the moment but will put something out tonight.


No hurry. This is a medium-term project I'm thinking about.


It is using the UART in SPImode like Nick's VGA code so its very similar.
So the TX pin is the video output from the videoshifter and D2 is sync.

You lose the TX/RX serialport on the Arduino but I don't think it matters for the application.


It makes program development a real pain though.

Can it use the real SPI port instead?


My code still outputs a Square box for every character since I have to solve the videoshift loading in 16cycles.


That might be important...  :)



You only lose it while the sketch is running.
It still uploads with the Arduino IDE even with the videoresistors connected.
Just no serial debug.

The SPI interface runs with 9bits, not 8 so you get a white pixelgap on every byte.
The USART in SPImode runs in MSPIM mode hence no 9bit problem, just 8pixels/byte, back to back :)

fungus


You only lose it while the sketch is running.
It still uploads with the Arduino IDE even with the videoresistors connected.


Ok, it was the upload I was worried about. I can debug on I2c interface.

The SPI interface runs with 9bits, not 8 so you get a white pixelgap on every byte.
The USART in SPImode runs in MSPIM mode hence no 9bit problem, just 8pixels/byte, back to back :)
[/quote]

Oh, I forgot about clock cycle 9.

(And I didn't know the USART SPI doesn't do it)
No, I don't answer questions sent in private messages (but I do accept thank-you notes...)

janost

Ok, here is the ISR.
It fires 15748times/sec.

Remember, this is just a proof of concept and not optimized.

Code: [Select]

ISR(TIMER0_COMPA_vect){//timer0 interrupt   
  if ((scanline>17)&&(scanline<40)||(scanline>239)) {
    //DDRD=DDRD|0x02;
    UCSR0B = _BV(TXEN0);
    PORTD = 0; //Hsync
    UDR0 = 0x00; //Load first byte
    for (byte x=0; x < 3; x++){
     // wait for transmitter ready
     while ((UCSR0A & _BV (UDRE0)) == 0)
      {}
    // send pixelbyte
    UDR0 = 0x00;
    }
    while ((UCSR0A & _BV (UDRE0)) == 0)
      {} 
    if (border==0) UCSR0B = 0;
    PORTD =4;       
  }

if (scanline<18) {
    //UCSR0B = 0;
    UCSR0B = _BV(TXEN0);
    PORTD = 4; //Vsync
    UDR0 = 0x00; //Load first byte
    for (byte x=0; x <3; x++){
     // wait for transmitter ready
    while ((UCSR0A & _BV (UDRE0)) == 0)
      {}
    // send pixelbyte
    UDR0 = 0x00;
    }
    PORTD =0;
    UCSR0B = 0;
    videoptr=0;
    row=0;   
  } 
 
  if ((scanline>39)&&(scanline<240)) {
    UCSR0B = _BV(TXEN0);
    PORTD = 0; //Hsync
    UDR0 = 0x00; //Load first byte
    for (byte x=0; x < 3; x++){
     // wait for transmitter ready
     while ((UCSR0A & _BV (UDRE0)) == 0)
      {}
    // send pixelbyte
    UDR0 = 0x00;
    }
 
    while ((UCSR0A & _BV (UDRE0)) == 0)
      {}
     //send colorburst
     UDR0 = B10110010;
     PORTD =0;
     while ((UCSR0A & _BV (UDRE0)) == 0)
      {}
     PORTD =4;
   
    for (byte x=0; x < 3; x++){
     // wait for transmitter ready
     while ((UCSR0A & _BV (UDRE0)) == 0)
      {}
     // send pixelbyte
     UDR0 = 0x00;
    }
    for (byte x=0; x < 8; x++){
     // wait for transmitter ready
     while ((UCSR0A & _BV (UDRE0)) == 0)
      {}
    // send pixelbyte
    UDR0 = border;
    }
   
    for (byte x=0; x < 40; x++){
     //charcode=videomem[videoptr++];
     
     // wait for transmitter ready
     while ((UCSR0A & _BV (UDRE0)) == 0)
      {}
    // send pixelbyte
    UDR0 = pgm_read_byte(&charROM[row]);
    }
 
    while ((UCSR0A & _BV (UDRE0)) == 0)
      {}
    UDR0 = border; //Front porch
   
    if (border==0) UCSR0B = 0;
   
    videoptr=((scanline-64)&0xF8)*40;
    row=scanline&0x07;
  }
 
  scanline++;
  if (scanline>261) scanline=0;
 
}

nickgammon


Because the pixel Clock is 8MHz I'm now struggling with loading a videoshift byte in just 16 clockcycles.


I got a character out of program memory for VGA output in 17 cycles, if that helps:

http://www.gammon.com.au/forum/?id=11608

Didn't need assembly either, although I looked at it to make sure optimal code was generated.
Please post technical questions on the forum, not by personal message. Thanks!

More info: http://www.gammon.com.au/electronics

fungus

Ok, I see. The scan lines are started on an interrupt. They send a complete line of video, then return.

The problem is to read a byte from a charmap, look up a byte of data from the character ROM based on that, output it to the USART. all in 16 clock cycles.

I think the AVR chip can do that in ten cycles, a loop will add three cycles, you still need three NOPs to pad it to 16!

Coercing the compiler into doing it is another matter. You might have to resort to inline assembly language.
No, I don't answer questions sent in private messages (but I do accept thank-you notes...)

nickgammon

Quote

The USART in SPImode runs in MSPIM mode hence no 9bit problem, just 8pixels/byte, back to back


I think I used MSPIM mode, I got a bit distracted and went onto something else before I solved the 17th clock cycle issue. It might be possible.
Please post technical questions on the forum, not by personal message. Thanks!

More info: http://www.gammon.com.au/electronics

fungus


you still need three NOPs to pad it to 16!


...but you can use those three cycles if you want to invert the char when the top bit is set.

No, I don't answer questions sent in private messages (but I do accept thank-you notes...)

nickgammon


Coercing the compiler into doing it is another matter. You might have to resort to inline assembly language.


This C line generates the 17-cycle line to output:

Code: [Select]

// blit pixel data to screen    
 while (i--)
   UDR0 = pgm_read_byte (linePtr + (* messagePtr++));


You might be able to optimize away one cycle, and in any case 16 cycles is the absolute minimum, of course. My testing at some point revealed the 17th cycle was necessary or the hardware threw away some output.
Please post technical questions on the forum, not by personal message. Thanks!

More info: http://www.gammon.com.au/electronics

fungus

#25
Nov 06, 2013, 09:05 pm Last Edit: Nov 06, 2013, 09:33 pm by fungus Reason: 1
Incidentally, this code:

Code: [Select]

(1)    f0e: e7 fd       sbrc r30, 7
(1)    f10: f0 95       com r31


Is exactly what you need to invert the data if the top bit is set, thus proving it can be done.

No, I don't answer questions sent in private messages (but I do accept thank-you notes...)

nickgammon

That looks like it cut it down to 16 cycles. And it shows you don't need to muck around with assembler to achieve it. :)

Code: [Select]

     ee8:       8d 91           ld      r24, X+   (2)
     eea:       f9 01           movw    r30, r18   (1)
     eec:       e8 0f           add     r30, r24   (1)
     eee:       f1 1d           adc     r31, r1   (1)
     ef0:       e8 59           subi    r30, 0x98       ; 152   (1)
     ef2:       ff 4f           sbci    r31, 0xFF       ; 255   (1)
     ef4:       e4 91           lpm     r30, Z+   (3)
     ef6:       e0 93 c6 00     sts     0x00C6, r30   (2)
     efa:       a4 17           cp      r26, r20   (1)
     efc:       b5 07           cpc     r27, r21   (1)
     efe:       a1 f7           brne    .-24            ; 0xee8 <_Z13doOneScanLinev+0x64>   (1/2)


Changed line:

Code: [Select]

  register byte * messagePtr = (byte *) & (message [messageLine] [0] );


One of my other attempts (using the main SPI hardware) had a couple of clock cycles up its sleeve:

Code: [Select]

  // pre-load pointer for speed
  const register byte * linePtr = &screen_font [ (vLine >> 1) & 0x07 ] [0];
  register char * messagePtr =  & (message [messageLine] [0] );

  // how many pixels to send
  register byte i = horizontalBytes;

  // turn transmitter on
  SPSR = _BV (SPI2X);
  SPCR = _BV (SPE) | _BV (MSTR);

  // blit pixel data to screen   
  while (i--)
    {
    SPDR = pgm_read_byte (linePtr + (* messagePtr++));
    nop; nop;
    }


And believe me, I wouldn't have thrown in NOPs if they weren't necessary. :)

Please post technical questions on the forum, not by personal message. Thanks!

More info: http://www.gammon.com.au/electronics

fungus


That looks like it cut it down to 16 cycles. And it shows you don't need to muck around with assembler to achieve it. :)


Should have saved two...

I notice the end of your loop has changed from this:

Code: [Select]

(1)    f20: 81 50        subi r24, 0x01 ; 1
(2)    f22: 98 f7        brcc .-26      ; 0xf0a


To this:

Code: [Select]

     efa:       a4 17           cp      r26, r20   (1)
     efc:       b5 07           cpc     r27, r21   (1)
     efe:       a1 f7           brne    .-24            ; 0xee8 <_Z13doOneScanLinev+0x64>   (1/2)


Which takes one cycle longer...


And it shows you don't need to muck around with assembler to achieve it. smiley


Yes, you often massage the code output if you look at the disassembly and fiddle.

I worry about all the different versions of the compiler out there though. Will they all do the same?

No, I don't answer questions sent in private messages (but I do accept thank-you notes...)

nickgammon

Compare to:

Code: [Select]

     ee4:       ed 91           ld      r30, X+   (2)
     ee6:       ff 27           eor     r31, r31   (1)
     ee8:       e7 fd           sbrc    r30, 7   (1/2/3)
     eea:       f0 95           com     r31   (1)
     eec:       e2 0f           add     r30, r18   (1)
     eee:       f3 1f           adc     r31, r19   (1)
     ef0:       e8 59           subi    r30, 0x98       ; 152   (1)
     ef2:       ff 4f           sbci    r31, 0xFF       ; 255   (1)
     ef4:       e4 91           lpm     r30, Z+   (3)
     ef6:       e0 93 c6 00     sts     0x00C6, r30   (2)
     efa:       81 50           subi    r24, 0x01       ; 1   (1)
     efc:       98 f7           brcc    .-26            ; 0xee4 <_Z13doOneScanLinev+0x60>   (1/2)


There are other differences. Assuming the last branch takes two cycles and the sbrc takes one, then  that adds up to 17.
Please post technical questions on the forum, not by personal message. Thanks!

More info: http://www.gammon.com.au/electronics

janost

I tried your last codeexample Nick, and it works, sort of.
The shifter loads in 16cycles so I have 320x200pixels across.

But something else happened with the Vsync that needs to be resolved as its not steady anymore.

Also it does not solve the XOR-ing so it needs a full font of 256characters.


Go Up