timing problem and assembly problem

hello . to make the story short i am making a program that need exact timing to run on a scale of 53us or less .
so i am writing an interrupt code to get that to work , my current problem is simple though .
firstly this function does not compile for some reason :

g_addr and prg_read are global variables.
i dont know if i should change them to _volatile or not

uint32_t g_addr;
uint32_t prg_read;

void program_memory(){     

        "out %2, %C1" "\n\t"        
        "movw r30, %1" "\n\t"       
        "elpm %0, Z+" "\n\t"        
        : "=r" (prg_read)           
        : "r" (g_addr),           
          "I" (_SFR_IO_ADDR(RAMPZ)) 
        : "r30", "r31"              

then i need to know the exact time cycles wasted by the "if" condition
and the "for" and "while" loops at each loop .
then i want to know how long in processor clocks this instruction takes :

thank you :slight_smile:

oh and that value "RAMPZ" i dont know what it stands for i think it's some device marker or something .
but i cant get the value of it and the compiled only ignores it when i define it to NULL

#define RAMPZ

uint32_t g_addr;
uint32_t prg_read;

...are stored in SRAM. Your code has no instructions to load the values from SRAM. They are four bytes each. Your code should have eight machine instructions to load the value of both variables.

Working backwards from that information you should be able to figure out why your code will not compile.

oh and that value "RAMPZ" i dont know what it stands for...

Try the Great and Wise Gorcle...

hello . to make the story short i am making a program that need exact timing to run on a scale of 53us or less .

You've cut the story too short. Why do you want to do this? 50 µs sounds a long time for an interrupt routine.

Have you heard of compilers? They can generate very good code these days.

        "out %2, %C1" "\n\t"        
        "movw r30, %1" "\n\t"       
        "elpm %0, Z+" "\n\t"        
        : "=r" (prg_read)           
        : "r" (g_addr),           
          "I" (_SFR_IO_ADDR(RAMPZ)) 
        : "r30", "r31"              

How are you planning to maintain that? Do you think the equivalent in C would be slower?

thank you "Code Badly".
Nick i am programming a PAL video generator , the length of the real video line cycle is 53us+12us=
65 to 64us . the interrupt would propably regenerate every 500us or 1ms . C cant give the exact timing setup without knowing the exact timing of each instruction .
in 53us i am attempting to set a line resolution of 240 pixels on 230pixels vertical .
a frame buffer on the SRAM would not give such a resolution . that's why i am planning to use a black and white tiled video mode . so the BW tiles would all be stored on the program memory and wo would only have a byte array on the SRAM so that every byte would point a tile on the program memory .
they would be 8x8 or 5x6 1bit tiles propably .

at this point i managed to get one SRAM tile to repeadetly show up . and i managed to correctly get a resoution of 200/210 pixels (not a limit) with the letter A placed over and over again at random places on the screen .
but the timing is messed up . and when i try to implement reading from program memory the resolution would drop and the timing would sometimes get so messed up that the screen image disappears

I have a lengthy post about that sort of thing here.

I got 160 pixels width and 480 pixels height using that code. There is no assembler except for NOP in a couple of places to stretch out the timing. Also further down that page another user wrote a Pong game with no assembler apart from NOP.

Also see the post about the Toorum's Quest game, which does use assembler. He also managed to generate music and have moving sprites.

thank you very much nick . :slight_smile:
i managed to get passed the problem .
i did read your code on VGA signal before .
i am triying to generate a PAL signal though. i also am not gunna use assembly for the most part . i am only using it for “nop” one cycle delay and reading from program memory .
though i figured that the pgmspace function “pgm_read_byte_far” only uses 7 or 8 clocks when using it’s assembly source with global vars :slight_smile: so i am using that .
at the moment i managed to get 240x240 pixels PAL . using only an arduino uno clocked at 16Mhz . a serial shift register would ramp the resolution to 320x250 pixels . or if we clock the Atmega328 at 20mhz or more we can also get more lines but i cant use that since i want this one to only use an UNO and a couple of resistors .

yeah i did check that guy . the AD725 is a good gadget to have . expensive though :slight_smile:

Reading a byte from program memory only takes one more cycle than from RAM.

LDS: 2 cycles, LPM: 3 cycles. Nothing like 7 or 8.

There are nearly always a few instructions (at least two) to initialize the Z register which makes LPM more expensive than three clocks.

But I agree. 7 / 8 seems high.

And, reading blocks instead of single bytes makes the initialization overhead much less significant.

The version of LPM that has Rd,Z+ as the argument could step through an array quickly with only having to load Z once. This would apply in situations where you pull graphics (tiles) out of an array.

(edit) Which is what you said. :slight_smile:

hey "Nick Gammon" and "code badly" :grin:
i managed to get the resolution up to 240/400pixels 8)
though that does not leave much time to read tiles from program memory .
the example you see below in the pictures uses a tile of "number five" i can get it to extract (30*50) tiles but from the SRAM since i can fit that into the time of the color burst and burst blanking and the blanking before the screen rendering even starts .
if you want to push atmega328 (UNO) to the extreme you propably can get 480/240pixels out of it
but you would propably use a shift register to manage the bytes instead of managing them in the burst
or the tiles would be bigger than 8x8 .
this is the current progress . reading tiles from program memory takes a shit load of time though
by the time i get to read one tile i would waste at least 1us of time wich is the output time of at least 20 lines on x axe .
here is the current RAM tiles result :slight_smile: reading a tile of number 5 . for now its uno only
(sorry for the buff error at the beginning of each line . the first byte burst seems to cause that)

:grin: it works great now
it's decent i flipped the tiles , and that removed the first buffer .
i had to use assembly to get it to work though :).
C compiler kept ouputting random stuff that did break the timing .
so now we might have stable tiles soon enough :slight_smile:

;D assembly is awsome dude .
not only have i been able to fit many letters in the display
but i was playing with I/O digital signal at the 4.7us of the color burst for a while and boom !!
i did get fixed color signals .
this is stable guys the color is stuck there :smiley: . so this is color PAL text with 16Mhz crystak and uno only . that should not even be possible i cant even explain this .
i guess ima be writing a library now so people would be able to write text on the screen and toggle these random colors :slight_smile: .
though the whole pressure is on the SRAM :confused: . the font array is on the ram because reading from program memory messes up timing and disables color

this is the hardware : an arduino (pin8 and pin7) and a couple of resistors (470ohm and 1kohm)

and this is the result : !


Looking good. BTW if you find that your C code is going slowly you can disassemble it and work out if you can improve it. For example in my VGA output project I re-ordered my fonts such that it avoided a multiply when extracting out the bits for each letter. But you can do that in C, you just have to realize what the compiler has generated.

would'nt inline assembly be more eazy then ?
your work on the VGA video ouput is seriously good btw i still cant understand how you did reach such a video resolution :slight_smile: (y) very nice .

at the moment i am having troubles loading Z+ and reading the contents with elpm . since all i know about the Z register is than its r30 for the low byte and r31 for the high one .
so each time i need to fetch an adress from the tiles array i would need to load r30 then load r31 then
get the value with lpm . already 5 cycles . the enhanced method seems to be capable to load the whole array but i still cant understand how it works +i still cant understand what the RAMPZ register is