Hi,
I am trying to write a Timer ISR which needs to be as efficient as possible. Basically the shorter it is, the faster I can clock the timer (the faster the better).
I am attempting to control a 4.3" LCD with an Arduino Mega PWM module - I know, there are better processors to use, and better ways of doing it, but I am just curious how well I can make it work!
The Interrupt needs to occur once for every pulse of the pixel clock in order to read 24bit data from a parralel SRAM, and then output it to three 8bit ports (R,G,B). It also needs to be able to determine whether the clock occured inside the VPorch or HPorch as no data is needed there.
To read the data, it has to output the correct address to the SRAM which it does so on a mixture of PORTL (bits 2-9), PH0 (bit 10). It also uses PE3-4 (bits 0,1) to select the data for a given colour (0 = Red, 1=Green, 3=Blue) as it is only an 8bit SRAM.
So far I have managed to push the pixel clock to just of 222kHz by carefully choosing which ports to use so as to allow sbi() and cbi() to be favoured over the longer commands needed on higher ports. I have also sacraficed a couple of boolean variables and am instead saving the variable to a couple of the atmega1280's pins not used by the arduino IDE (PortG3,4), as this is amusing much more efficient!
If anyone has any suggestions on if this code can be made more efficient, I welcome it. I haven't tested this with the actual LCD yet, but am using a logic analyser to verify that the output matches the screens datasheet. With the current pixel clock, I estimate it is possible to get 1.35 FPS, which is on the low side, but still rather impressive for a 480x272px screen. At the moment, the DCLK timer has a period of 72 clock cycles.
Following is the C code, and compiler output (assembler).
C-Code (I have ommited large parts including timer setup, but basically Timer2 is the DCLK, Timers 1,3 are DataEn, and HSync. The DE and HSync use seperate timers which are set to use an External clock source. The Timer2 ISR, then writes to the external input pins, allowing it to provide a clock):
#define DCLKPort PORTH
#define DCLKDir DDRH
#define DCLKMask 0b01000000
#define DCLKUnMask 0b10111111
#define T3Port PORTE
#define T3Dir DDRE
#define T3Bit 6
#define T3Mask 0b01000000
#define T3UnMask 0b10111111
#define DEPort PORTE
#define DEDir DDRE
#define DEMask 0b00100000
#define DEUnMask 0b11011111
#define T4Port PORTH
#define T4Dir DDRH
#define T4Bit 7
#define T4Mask 0b10000000
#define T4UnMask 0b01111111
#define VSyncPort PORTH
#define VSyncDir DDRH
#define VSyncMask 0b00100000
#define VSyncUnMask 0b11011111
#define T1Port PORTD
#define T1Dir DDRD
#define T1Bit 6
#define T1Mask 0b01000000
#define T1UnMask 0b10111111
#define HSyncPort PORTB
#define HSyncDir DDRB
#define HSyncMask 0b10000000
#define HSyncUnMask 0b01111111
ISR(TIMER2_OVF_vect) { //DCLK Rising
//----Route T2 to T3 and T1--------------
mask(T3Port,T3Mask); //rising edge increments DE timer
mask(T1Port,T1Mask); //rising edge increments HSync timer
//---------------------------------------
if(TIFR3 & (1<<OCF3B)){
//Check for interrupt of DE first (will occur one cpu clock cycle after mask(T3Port,T3Mask). This speeds it up as there is then only one ISR to run, not two.
setBit(TIFR3,OCF3B); //clear the interrupt so its ISR isnt triggered
if(PORTG & 0b00010000){
mask(PORTG,0b00001000);
//This bit is in two places as the compiler then will optimise and use rjmp commands to use the same bit of code for both the two (saves program space).
unMask(ColourPort,ColourUnMask);
}
} else if(PORTG & 0b00001000){//!HPorch){}
//Valid data clock
//output the dcount-th data onto the RGB pins here
setReg(RedPort,DataPin); //Read data from SRAM, and output to Red port. Saves ~15 instructions compared with saving to a variable.
mask(ColourPort,ColourGreen);
setReg(GreenPort,DataPin);
mask(ColourPort,ColourBlue);
setReg(BluePort,DataPin);
byte dcount = ColLowPort;
dcount++;
setReg(ColLowPort,dcount);
if(dcount == 0){
mask(ColHighPort,ColHighMask); //When dcount rolls over (i.e. it == 0, then set the A8 bit)
}
//This bit is in two places as the compiler then can optimise and use rjmp commands to link the two (saves program space).
unMask(ColourPort,ColourUnMask);
}
unMask(T3Port,T3UnMask); //return low
unMask(T1Port,T1UnMask); //return low
}
Assembler (again, just for the interrupt).
0000014c <__vector_15>:
14c: 1f 92 push r1
14e: 0f 92 push r0
150: 0f b6 in r0, 0x3f ; 63
152: 0f 92 push r0
154: 11 24 eor r1, r1
156: 8f 93 push r24
158: 76 9a sbi 0x0e, 6 ; 14
15a: 5e 9a sbi 0x0b, 6 ; 11
15c: c2 9b sbis 0x18, 2 ; 24
15e: 05 c0 rjmp .+10 ; 0x16a <__vector_15+0x1e>
160: c2 9a sbi 0x18, 2 ; 24
162: a4 9b sbis 0x14, 4 ; 20
164: 1c c0 rjmp .+56 ; 0x19e <__vector_15+0x52>
166: a3 9a sbi 0x14, 3 ; 20
168: 17 c0 rjmp .+46 ; 0x198 <__vector_15+0x4c>
16a: a3 9b sbis 0x14, 3 ; 20
16c: 18 c0 rjmp .+48 ; 0x19e <__vector_15+0x52>
16e: 8f b1 in r24, 0x0f ; 15
170: 82 b9 out 0x02, r24 ; 2
172: 73 9a sbi 0x0e, 3 ; 14
174: 8f b1 in r24, 0x0f ; 15
176: 88 b9 out 0x08, r24 ; 8
178: 74 9a sbi 0x0e, 4 ; 14
17a: 8f b1 in r24, 0x0f ; 15
17c: 80 93 08 01 sts 0x0108, r24
180: 80 91 0b 01 lds r24, 0x010B
184: 8f 5f subi r24, 0xFF ; 255
186: 80 93 0b 01 sts 0x010B, r24
18a: 88 23 and r24, r24
18c: 29 f4 brne .+10 ; 0x198 <__vector_15+0x4c>
18e: 80 91 02 01 lds r24, 0x0102
192: 81 60 ori r24, 0x01 ; 1
194: 80 93 02 01 sts 0x0102, r2
198: 8e b1 in r24, 0x0e ; 14
19a: 87 7e andi r24, 0xE7 ; 231
19c: 8e b9 out 0x0e, r24 ; 14
19e: 76 98 cbi 0x0e, 6 ; 14
1a0: 5e 98 cbi 0x0b, 6 ; 11
1a2: 8f 91 pop r24
1a4: 0f 90 pop r0
1a6: 0f be out 0x3f, r0 ; 63
1a8: 0f 90 pop r0
1aa: 1f 90 pop r1
1ac: 18 95 reti